Changes

Jump to navigation Jump to search
no edit summary
---------------------------------------------------------------------------------------------
7/23 - I talked with Connor to work out a method for finding the URLs that we're missing. There is code available to crawl google, and I have modified some code to compute a "match score" between a URL and the company name. We can take the url with the highest score of the first 5 google results. Connor and I discussed that if the URL cannot be found within the first 5 results, then that company probably doesn't have a url at all.
 
7/24 - I ran a test file through my two URL finder scripts, and determined that we could get around 50% accuracy. Also, the URLs that I could not find were usually invalid or foreign. I then cleaned the data of actual company names that we need and started running that.
145

edits

Navigation menu