Changes

Jump to navigation Jump to search
2,217 bytes added ,  16:51, 3 August 2018
no edit summary
7/19 - The results do not improve with more training data. I then tried to use Yang's code with LSTM. The accuracy rate is still too low, and I'm trying to learn more about LSTM to see how to adjust the parameters.
7/20 - I have not yet figured out how to get Yang's code to run at 60% accuracy as his wiki page says. If I use less labels(5-7 instead of 40) with Christy's code(IndustryClassifierCONDENSED-USETHIS.py), the accuracy rate is around 50%. However, with 40 labels, the accuracy rate is only 25%. With Yang's code, less labels also increases the accuracy rate, but not to 60%.I also helped Minh fill out some of his test data for his Demo Day project.  ---------------------------------------------------------------------------------------------7/23 - I talked with Connor to work out a method for finding the URLs that we're missing. There is code available to crawl google, and I have modified some code to compute a "match score" between a URL and the company name. We can take the url with the highest score of the first 5 google results. Connor and I discussed that if the URL cannot be found within the first 5 results, then that company probably doesn't have a url at all. 7/24 - I ran a test file through my two URL finder scripts, and determined that we could get around 50% accuracy. Also, the URLs that I could not find were usually invalid or foreign. I then cleaned the data of actual company names that we need and started running that. 7/25 - while my URL finder was running, I helped Minh fill in Demo Day training data info.  7/26 - I helped Minh fill in Demo Day training data info, sat in on a conference call with Hira and Ed, and worked on processing the results from my URL finder.  7/27 - Cleaned results from my URL finder and added them into 'The File to Rule Them All'. I also helped Connor find addresses for accelerators, and I updated wiki pages to describe my URL finder work. ---------------------------------------------------------------------------------------------7/30 - Updated wiki pages for Google URL Finder (http://mcnair.bakerinstitute.org/wiki/U.S._Seed_Accelerators#Finding_Company_URLs) and started running Whois Parser 7/31 - Worked with [[Minh Le]] to build the [[Seed DB Parser]]. Helped Connor recode Founder's job experience 8/1 - Filtered and organized data from seed-db crawl, resulting in data for 257 more companies. Of these companies, only 100 resulted in new info that we didn't already have. I also helped [[Grace Tan]] with filling in data for the minor code mapping. 8/2 - Modified my URL finder and reran it on the crawl results to get about 200 more URLs, which I placed in The File to Rule Them All 8/3 - Helped Connor manually add timing info for companies for which we could not find timing data from other sources
145

edits

Navigation menu