Changes

Jump to navigation Jump to search
772 bytes added ,  15:23, 9 April 2019
no edit summary
==Background==
We wanted to create a google web crawler that could collect data from web searches specific to individual cities. The searches could be in the format of "incubator" + "city, state". It was modeled off of previous researcher's web crawler which collected information on accelerators. We could not simply modify their web crawler as it used an outdated python module.
 
The output from this crawler could be used in several ways:
# The URLs determined to be incubator websites can be input for the [[Listing Page Classifier]] that takes an incubator website URL and identifies which page contains the client company listing.
# The title text can be analyzed using n-grams to look for keywords in order to classify the URL as an incubator. This strategy is discussed in [[Geocoding Inventor Locations (Tool)]].
# Key elements of a page's HTML can be feed into an adapted version of the [[Demo Day Page Google Classifier]] to identify demo day webpages that contain a list of cohort companies.
# The page can be passed over to Amazon's [https://www.mturk.com/ Mechanical Turk] to outsource the task of classifying pages as being incubators.
 
 
==Implementation==
65

edits

Navigation menu