Changes

Jump to navigation Jump to search
==Project==
This is a tensorflow project that classifies webpages as a demo day page containing a list of cohort companies, currently using scikit learn's random forest modeland a bag of words approach. The classifier itself takes:
<strong>Features:</strong> The number of times each word in words.txt occurs in the titles or headers of a webpage. This is calculated by web_demo_features.py in the same directory. It also takes the number of occurrences of years from 1900-2099, month words grouped in seasons, and phrases of the form "# startups". It also takes the number of simple links (links in the form www.abc.com or www.abc.org) and the number of those that are attached to images. It also takes the number of "strong" html tags in the body.
226

edits

Navigation menu