Changes

Jump to navigation Jump to search
** We will use the data of 121 accelerators, which have cohort urls found, for training and testing our CNN algorithm
** 90 out of 145(around 75%) of the data will be used to train our model, the rest (31 accelerators, around 25%) will be used as the test data
*The type of inputs for training CNN model:
#Picture of the web page (Image data that is generated from the above screenshot tool)
#Cohort indicator (Categorical data: 1 - it is a cohort page, 0 - not a cohort page)
'''Note:''' The cohort indicator implies that our dataset is a labeled dataset, this may become helpful when choosing packages for building the CNN model
====Data Preprocessing (IN PROGRESS)====
227

edits

Navigation menu