Changes

Listing Page Classifier (view source)

Revision as of 15:27, 17 May 2019

913 bytes added , 15:27, 17 May 2019

no edit summary

Python file saved in

E:\projects\listing page identifier\cnn.py

===Workflow===

This section summarizes a general process of utilizing above tools to get appropriate input for our CNN model, also serves as a guidance for anyone who wants to implement upon those tools.

# Feed raw data (as for now, our raw data is the <code>The File to Rule Them All.csv</code>) into <code> generate_dataset.py</code> to get text files (<code>train.txt</code> and<code>text.txt</code>) that contain a list of all internal urls with their corresponding indicator (class label)

# Create 2 folders: train and test, located in the same directory as <code>train.txt</code> and <code>text.txt</code>, also create 2 sub-folders: cohort and not_cohort within these 2 folders

# Feed the directory/path of <code>train.txt</code> and <code>text.txt</code> into <code>screen_shot_tool.py</code>. This process will automatically group images into their corresponding folders that we just created in step 2

NancyYu

227

edits

Changes

Listing Page Classifier (view source)

Revision as of 15:27, 17 May 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools