Demo Day Page Google Classifier

From edegan.com
Jump to navigation Jump to search


McNair Project
Demo Day Page Google Classifier
Project logo 02.png
Project Information
Project Title Demo Day Page Google Classifier
Owner Kyran Adams
Start Date 2/5/2018
Deadline
Keywords Accelerator, Demo Day, Google Result, Word2vec, Tensorflow
Primary Billing
Notes
Has project status Active
Is dependent on Accelerator Seed List (Data), Demo Day Page Parser
Copyright © 2016 edegan.com. All Rights Reserved.


Project

This is a tensorflow project that classifies webpages as either a demo day page or not, currently using logistic regression. The classifier itself should take the output of Peter's DemoDayHits.py program and output whether the page is a demo day page. It is trained on a file outputted by DemoDayHits.py and a hand-classified set of google results, some of which are demo day pages.

It may later take other inputs, such as the text of the page itself.

A demo day page is an advertisement page for a "demo day," which is a day that cohorts graduating from accelerators can pitch their ideas to investors. These demo days give us a good idea of when these cohorts graduated from their accelerator.

The random forest implementation doesn't work on windows, so it is located in the Z drive to be run from the linux box.

Located:

 E:\McNair\Projects\Accelerators\Spring 2018\google_classifier\
 Z:\demoday


Training data:

 E:\McNair\Projects\Accelerators\Fall 2017\Demo Day URLs.xlsx

Possibly useful programs

Google bindings for python

 E:\McNair\Projects\Accelerators\Spring 2017\Google_SiteSearch

PDF to text converter

 E:\McNair\Projects\Accelerators\Fall 2017\Code+Final_Data\Utilities\PDF_Ripper

HTML to text converted

 E:\McNair\Projects\Accelerators\Fall 2017\Code+Final_Data

Demo Day Page Parser

Resources