Changes

Jump to navigation Jump to search
no edit summary
RESULTS: ====Results====I used STEP1_crawl.py and STEP2_findcorrecturl.py to add approximately 1000 more URLs into 'The File to Rule Them All.xlsx'. ====Testing====
In this file (sheet: 'Most Recent Merged Data' note that this is just a copy of 'Cohorts Final' in 'The File to Rule Them All'):
It seems reasonable to assume that if the company URL cannot be found within the first 4 valid search results, then that company probably does not have URL at all. This is the case for many of the unfound 20 URLs from my test run above.
'''====Actual Run Info'''====
The companies we needed to find URLs for are in a file called 'ACTUALNEEDEDCOMPANIES.txt'.
145

edits

Navigation menu