Changes

Demo Day Page Google Classifier (view source)

Revision as of 18:42, 11 April 2018

239 bytes removed , 18:42, 11 April 2018

E:\McNair\Projects\Accelerators\Spring 2018\demo_day_classifier\DemoDayHTMLFull\Demo Day URLs.xlsx

~~2018-04-09~~Usage: Wrote the code to put everything together. It runs the google crawler, creates the features matrix from the results, and then runs the classifier on it. This can be used to increase the size of the dataset and improve the accuracy of the classifier.

* Steps to train the model: Put all of the html files to be used in DemoDayHTMLFull. Then run web_demo_features.py to generate the features matrix, training_features.txt. Then, run demo_day_classifier_randforest.py to generate the model, classifier.pkl. Make sure that in demo_day_classifier_randforest.py, USE_CROSS_VALIDATION is set to False in order to generate the model.

* Steps to run: In the file crawl_and_classify.py, set the variables to whatever is wanted. Then, run crawl_and_classify using python3. It will download all of the html files into the directory CrawledHTMLPages, and then it will generate a matrix of features, CrawledHTMLPages\features.txt. It will then run the trained model saved in classifier.pkl to predict whether these pages are demo day pages, and then it will save the results to CrawledHTMLPages\predicted.txt.

==Possibly useful programs==

Kyranstar

226

edits

Changes

Demo Day Page Google Classifier (view source)

Revision as of 18:42, 11 April 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools