Changes

Jump to navigation Jump to search
===Image Processing===
This method would likely rely on a [https://en.wikipedia.org/wiki/Convolutional_neural_network convolutional neural network (CNN)] to classify HTML elements present in web page screenshots. Implementation could be achieved by combining the VGG16 model or ResNet architecture with batch normalization to increase accuracy in this context.
====Set Up====
*Possible packages for building CNN: TensorFlow, PyTorch, scikit
*Current training dataset: <code>The File to Rule Them All</code>, contains information of 160 accelerators (homepage url, cohort url etc.). We will train our model on those 145 accelerators that have cohort urls found
*Type of input for CNN model: picture of the web page(generated from the above screenshot tool) and cohort indicator (1 - it is a cohort page, 0 - not a cohort page)
 
====Data Preprocessing====
227

edits

Navigation menu