Changes

Jump to navigation Jump to search
1,610 bytes added ,  16:51, 3 August 2018
no edit summary
---------------------------------------------------------------------------------------------
7/23 - I talked with Connor to work out a method for finding the URLs that we're missing. There is code available to crawl google, and I have modified some code to compute a "match score" between a URL and the company name. We can take the url with the highest score of the first 5 google results. Connor and I discussed that if the URL cannot be found within the first 5 results, then that company probably doesn't have a url at all.
 
7/24 - I ran a test file through my two URL finder scripts, and determined that we could get around 50% accuracy. Also, the URLs that I could not find were usually invalid or foreign. I then cleaned the data of actual company names that we need and started running that.
 
7/25 - while my URL finder was running, I helped Minh fill in Demo Day training data info.
 
7/26 - I helped Minh fill in Demo Day training data info, sat in on a conference call with Hira and Ed, and worked on processing the results from my URL finder.
 
7/27 - Cleaned results from my URL finder and added them into 'The File to Rule Them All'. I also helped Connor find addresses for accelerators, and I updated wiki pages to describe my URL finder work.
 
---------------------------------------------------------------------------------------------
7/30 - Updated wiki pages for Google URL Finder (http://mcnair.bakerinstitute.org/wiki/U.S._Seed_Accelerators#Finding_Company_URLs) and started running Whois Parser
 
7/31 - Worked with [[Minh Le]] to build the [[Seed DB Parser]]. Helped Connor recode Founder's job experience
 
8/1 - Filtered and organized data from seed-db crawl, resulting in data for 257 more companies. Of these companies, only 100 resulted in new info that we didn't already have. I also helped [[Grace Tan]] with filling in data for the minor code mapping.
 
8/2 - Modified my URL finder and reran it on the crawl results to get about 200 more URLs, which I placed in The File to Rule Them All
 
8/3 - Helped Connor manually add timing info for companies for which we could not find timing data from other sources
145

edits

Navigation menu