Changes

Accelerator Demo Day (view source)

Revision as of 15:27, 23 July 2018

926 bytes added , 15:27, 23 July 2018

no edit summary

}}

==Project Introduction==

This project that utilizes Selenium and Machine Learning to get good candidate web pages and classify ~~webpages~~ web pages as a demo day page containing a list of cohort companies, ~~currently~~ ultimately to gather good candidates to push to Mechanical Turk. The code is written using ~~scikit learn's random forest model~~ Python 3 using Selenium and ~~a bag of words approach~~Tensorflow (Keras)

==Code Location==

The source code and relevant files for the project can be found here:

E:\McNair\Projects\Accelerator Demo Day\

The current working model using RF is in: E:\McNair\Projects\Accelerator Demo Day\Test Run

==Development Notes==

Both model is currently using the Bag-of-word approach to preprocess data, but I will try to use Yang's code in the industry classifier to preprocess using word2vec. I'm not familiar with this approach, but I will try to learn this.

==How to Use this Project==

Running the project is as simple as executing the code in the correct order. The files are named in the format "STEPX_name", where as X is the order of execution. To be more specific, run the following 4 commands:

python3 STEP1_crawl.py #crawl Google to get the data for the demo day pages for the accelerator stored in ListOfAccsToCrawl.txt

python3 STEP2_preprocessing_feature_matrix_generator.py #preprocess data using a bag of word approach: each page is characterized by the frequencies of chosen keywords. Chosen keywords are stored in words.txt. This script reates a file called feature_matrix.txt

python3 STEP3_train_rf.py #train the RF model

python3 STEP4_classify_rf.py #run the model to predict on the HTML of the crawled HTMLs.

Th

==The Crawler Functionality==

To be updated

Leminh.ams

197

edits

Changes

Accelerator Demo Day (view source)

Revision as of 15:27, 23 July 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools