Changes

Listing Page Classifier (view source)

Revision as of 16:00, 30 April 2019

188 bytes added , 16:00, 30 April 2019

====Set Up====

*Possible packages for building CNN: TensorFlow, PyTorch, scikit

*Current ~~training~~ dataset: <code>The File to Rule Them All</code>, contains information of 160 accelerators (homepage url, found cohort url etc.). ** We will ~~train our model on those~~ use the data of 145 accelerators ~~that~~ , which have cohort urls found, for training and testing our CNN algorithm*~~Type~~ * 100 out of 145(around 70%) of ~~input~~ the data will be used to train our model, the rest (45 accelerators, around 30%) will be used as the test data*The type of inputs for CNN model: ~~picture~~ #Picture of the web page(generated from the above screenshot tool) ~~and cohort~~ #Cohort indicator (1 - it is a cohort page, 0 - not a cohort page)** '''Note:''' The cohort indicator ~~indicates~~ implies that our ~~input~~ dataset is a labeled dataset, this may become ~~handy~~ helpful when choosing packages for building the CNN model

====Data Preprocessing====

This part aims to create an automation process for combining results generated from the Site Map Tool and the Screenshot Tool with cohort indicators. The generated dataset from this process will be fed into our CNN model.

NancyYu

227

edits

Changes

Listing Page Classifier (view source)

Revision as of 16:00, 30 April 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools