Grace Tan (Work Log)

From edegan.com
Revision as of 17:34, 26 June 2018 by GraceTan (talk | contribs)
Jump to navigation Jump to search


McNair Center Staff
{{{name}}}
Profile placeholder.png
Staff Information
Status Active
McNairCenterⓂ




Grace Tan Work Logs (log page)

2018-06-26: Ended up finding all founders manually. Then talked to Ed and figured out how to get the founder data off the crunchbase API with a link (see project page). Created a python script that goes through all the API pages for each accelerator API and returns a dictionary of accelerator UUIDs mapped to founder UUIDs. I found 209 founders on the API but 224 manually so we'll look at the discrepancy tomorrow.

2018-06-25: We took the 157 Accelerator UUIDs we found and created a new table that includes all the attributes of the accelerator that we want from organizations.csv called AccAllInfo. Maxine and I then split into our respectful projects. I tried joining people to the companies they are linked to in order to find the founders of each accelerator. I found about 90 matches but this there are still a lot of missing holes since some accelerators have no founders and others have multiple founders. Still unsure of how to fix this.

2018-06-22: Matched Connor's master list of accelerators with organizations.csv based on homepage_url and company_name. Found 90 that matched along with 76 blanks. Then tried matching with homepage_url or company_name and manually found about 30 more that had slight variations in url or name that we should keep. Using ILIKE we found ~25 more company UUIDs that match with accelerators on the list.

2018-06-21: Downloaded all 17 v3.1 csv tables and updated LoadTables.sql to match our data. We did this by manually updating the name and size of the fields. To solve the problem of "" from yesterday, we used regular expressions to change the empty string to nothing (see project page). We then worked with Connor to start extracting the accelerators from the organizations in the Crunchbase data. We found a lot of null matches based on company_name and a few that have the same name but are actually different companies. Maybe try matching with homepage_url tomorrow.

2018-06-20: Learned more SQL. Started working on Crunchbase Data project with Maxine. Old code contained 22 csv tables but new Crunchbase data only has 17 csv tables. We will be using the new Crunchbase API v3.1 ( not v3) with only 17 csv tables as data. We then started updating the old SQL tables to align with the 17 tables we have. We ran into a problem where a field of "" in the data for a date type and SQL did not like that. Ed was helping us with this but we have not found a solution yet.

2018-06-19: Set up monitors and continued learning SQL. We were also introduced to our projects. I will be continuing Christy's work on the Google Scholar Crawler as well as working with Maxine to update the Crunchbase data and then use that data to crawl Linkedin to find data on startup founders that go through accelerators.

2018-06-18: Introduced to the wiki, connected to RDP, and learned SQL.