Changes

Jump to navigation Jump to search
E:\projects\listing page identifier\generate_dataset.py
'''''Generate and Label Image Data: ''''' feed train.txt and text.txt that are generated by the generate_dataset tool into Screenshot Tool to get our image data*Results are split into two folders: train and testThis process ** also autoseparated into sub-generates class label folders: cohort and index in the name of the image file (see example below)  not_cohort[[File:autoName.png|450px]] *The leading 0 or 1 indicates whether it is a corhort webpage or not*The second number after the first '_' represents the indexmake sure to create train and test folders(row number) in the <code>same directory as train.txt</code> or <code>and text.txt</code>*These two numbers will become helpful during the modeling*Results are automatically split into two ), and their sub-folders: train cohort and testnot_cohort '''before''' running the Screenshot Tool
====CNN Model====
Python file saved in
E:\projects\listing page identifier\cnn.py
227

edits

Navigation menu