Changes

Accelerator Demo Day (view source)

Revision as of 13:38, 23 July 2018

663 bytes added , 13:38, 23 July 2018

no edit summary

|Does subsume=Demo Day Page Parser, Demo Day Page Google Classifier

}}

==ProjectIntroduction==

This project that utilizes Selenium and Machine Learning to get good candidate web pages and classify webpages as a demo day page containing a list of cohort companies, currently using scikit learn's random forest model and a bag of words approach

==Code Location==

The source code and relevant files for the project can be found here:

E:\McNair\Projects\Accelerator Demo Day\

==Development Notes==

Right now I am working on two different classifier: Kyran's old Random Forest model - optimizing it by tweaking parameters and different combination of features - and my RNN text classifier.

The RF model has a ~92% accuracy on the training data and ~70% accuracy on the test data.

The RNN currently has a ~50% accuracy on both train and est data, which is rather concerning.

Test : train ration is 1:3 (25/75)

Both model is currently using the Bag-of-word approach to preprocess data, but I will try to use Yang's code in the industry classifier to preprocess using word2vec. I'm not familiar with this approach, but I will try to learn this.

==The Crawler Functionality==

To be updated

==The Classifier==

===Input (Features)===

The input (features) right now is the frequency of X_NUMBER of words appearing in each documents. The word choice is hand selected. This is the naive bag-of-word approach.

Leminh.ams

197

edits

Changes

Accelerator Demo Day (view source)

Revision as of 13:38, 23 July 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools