Changes

Jump to navigation Jump to search
no edit summary
The current working model using RF is in:
E:\McNair\Projects\Accelerator Demo Day\Test Run
The RNN model is in:
E:\McNair\Projects\Accelerator Demo Day\Experiment
The RNN is still under much development. Modifying anything in this folder is not recommended
 
All the other folders are used for experimenting purposes, please don't touch them.
==Development Notes==
==How to Use this Project==
Running the project is as simple as executing the code in the correct order. The files are named in the format "STEPX_name", where as X is the order of execution. To be more specific, run the following 4 commands:
''# Crawl Google to get the data for the demo day pages for the accelerator stored in ListOfAccsToCrawl.txt''
python3 STEP1_crawl.py
''# Preprocess data using a bag of word approach: each page is characterized by the frequencies of chosen keywords. Chosen keywords are stored in words.txt. This script reates a file called feature_matrix.txt''
python3 STEP2_preprocessing_feature_matrix_generator.py
''# Train the RF model''
python3 STEP3_train_rf.py
''# Run the model to predict on the HTML of the crawled HTMLs.''
python3 STEP4_classify_rf.py
==The Crawler Functionality==
To be updatedThe crawler functionality is stored in the file:
==The Classifier==
197

edits

Navigation menu