Changes

Jump to navigation Jump to search
==Finding Company URLs==
RESULTS: I used STEP1_crawl.py and STEP2_findcorrecturl.py to add approximately 1000 more URLs into 'The File to Rule Them All'. E:\McNair\Projects\Accelerators\Summer 2018\The File to Rule Them All.xlx In this file (sheet: 'Most Recent Merged Data' note that this is just a copy of 'Cohorts Final' in 'The File to Rule Them All'):
E:\McNair\Projects\Accelerators\Summer 2018\Merged W Crunchbase Data as of July 17.xlx
It seems reasonable to assume that if the company URL cannot be found within the first 4 valid search results, then that company probably does not have URL at all. This is the case for many of the unfound 20 URLs from my test run above.
 
 
The companies we needed to find URLs for are in a file called 'ACTUALNEEDEDCOMPANIES.txt'.
145

edits

Navigation menu