Changes

Jump to navigation Jump to search
====Data Preprocessing====
'''''Retrieving All Internal Links: ''''' this generate_dataset tool reads all homepage urls in the file <code>The File to Rule Them All.csv</code> csv file and then feed them into the Site Map Generator to retrieve their corresponding internal urls
*This process assigns corresponding cohort indicator to each url, which is separated by tab (see example below)
http://fledge.co/blog/ 0
http://fledge.co/about/ 0
*Results are automatically split into two text files: <code>train.txt </code> and <code>test.txt</code>.
Python file saved in
* Images are split into two folders: train and test
* Images are also separated into corresponding sub folders: cohort and not_cohort within the folder train and the folder test
 
====CNN Model====
Python file saved in
E:\projects\listing page identifier\cnn.py
227

edits

Navigation menu