Changes

Jump to navigation Jump to search
====Set Up====
*Possible packages for building CNN: TensorFlow, PyTorch, scikit
*Current training dataset: <code>The File to Rule Them All</code>, contains information of 160 accelerators (homepage url, found cohort url etc.). ** We will train our model on those use the data of 145 accelerators that , which have cohort urls found, for training and testing our CNN algorithm*Type * 100 out of 145(around 70%) of input the data will be used to train our model, the rest (45 accelerators, around 30%) will be used as the test data*The type of inputs for CNN model: picture #Picture of the web page(generated from the above screenshot tool) and cohort #Cohort indicator (1 - it is a cohort page, 0 - not a cohort page)** '''Note:''' The cohort indicator indicates implies that our input dataset is a labeled dataset, this may become handy helpful when choosing packages for building the CNN model
====Data Preprocessing====
This part aims to create an automation process for combining results generated from the Site Map Tool and the Screenshot Tool with cohort indicators. The generated dataset from this process will be fed into our CNN model.
227

edits

Navigation menu