Connor Rothschild (Work Log)
- Recoded founders' education
- In the process of recoding founders' job experience
- Worked with Minh to test MTurk survey
- Talked through MTurk logistics and strategy with Minh
- Recoded equity and investment variables given new SeedDB data
- Renormalized investment amount based on midpoint of ranges, and upper bounds (upon Hira's request)
- Finalized multiple campuses work, refined addresses
- Upon Hira's request, recoded dead/alive variable for updated accuracy
- Recoded founders
- Fixed multiple campuses and cohorts
- Fixed the Google Sheet
- Cleaned up and fixed the Google Sheet with timing info.
- Recoded the employee count variable.
- Normalized investment amount
- Created a comprehensive Google Sheet with new timing info, collaborated with other interns to find data. Cleaned up sheet.
7/24/2018 - Sick day :(
- Helped Minh with Demo Day information.
- Helped Minh with training data for Demo Day Crawler
- Helped Augi with MA cleaning
- Talked to Minh about Demo Day progress
- Worked with Ed to add/merge data from Crunchbase to existing data. This was a replication of the process but done by Ed in SQL, not Excel. New data can be found in
/McNair/Projects/Accelerators/Summer 2018/Merged With Crunchbase Info as of July 17.xlsx
NOTE: Use this data rather than the sheet mentioned in yesterday's entry.
- Merged cohort company data with Crunchbase data, by doing a Vlookup then cleaning up data. I used a =IF(A2="",B2,A2) formula to merge cells only when blanks were present. This provided us updated data for four columns:
- colocation (removed 6324 blanks)
- codescription (removed 5151 blanks)
- costatus (removed 7342 blanks)
- courl (removed 6670 blanks)
and new columns:
- founded_on date
These new variables can be found in:
/McNair/Projects/Accelerators/Summer 2018/Crunchbase Info Populated Empty Cells.xlsx (OUTDATED:: DON'T USE)
Upon Ed's approval, I'll move this sheet to replace Cohort Companies in The File to Rule Them All.
- Using SQL, matched our cohort companies with information from Crunchbase. This gave us a lot of new information, like employee counts, company status, the date founded, and the location of the company. This data can be found here:
/McNair/Projects/Accelerators/Summer 2018/Cohort Companies With Crunchbase Info.xlsx
- Created 'The File to Rule them All' with finalized info on accelerators, cohort companies, and founders.
- Attempted to match our company data to Crunchbase data with SQL to get more info on companies.
- Worked on LinkedIn Founders data. Cleaned up data, removed duplicates, checked for fidelity.
- Worked with Maxine to finish Crunchbase matching.
- Merged Clean Cohort Data (Veeral) and Cohort List (new) in the Accelerator Master Variable List file. Cross-referenced this list with Ed's data sent last week, titled accelerator_data_noflag.txt. We found that there are 4866 more entries in the new merged file, meaning Ed's merging may have dropped valid entries. (This was after filtering the list so we only looked at the accelerators on our list).
- Worked with Maxine to remove duplicates/gather clean data for Crunchbase matching
- Finished manually coding an equity variable in Master Variable List sheet (with the help of Maxine Tao).
- Finished editing terms of joining accelerator:
- Given the above two tasks, there are five new columns in our Master Variable List sheet:
- Terms of joining - terms of joining accelerator and important details about program
- equity (1/0) - cells contain a 1 if the accelerator take equity, a 0 if an accelerator definitively does not, and is blank if we could not find that information
- equity amount - the % of equity the accelerator will take (can sometimes be a range (eg. 5-7%))
- investment - the $ the accelerator invests in a company to begin, if relevant (also could be a range or a "up to $######")
- notes - anything to comment on previous 4 columns
- Taught Maxine Tao how to VLookup :D
- Began manually coding an equity variable in Master Variable List sheet.
- Edited terms of joining accelerator.
- Helped Grace with LinkedIn crawler.
- Finished coding duplicates. Final file can be found at:
/bulk/McNair/Projects/Accelerators/Summer 2018/Duplicate Companies.xlsx
- Dylan taught interns Excel skills
- Began coding duplicates in CohortMainBaseWCounts.txt file that Ed sent. Sorted by company name alphabetically, then used conditional formatting to highlight when an accelerator had the same name as the accelerator above. This narrowed down the results to instances in which a company would go through the same accelerator twice. Most of the time, this was due to an error with the normalizer, so I moved those un-normalized company names to their own sheet and deleted them from the file.
- Went through and manually fixed discrepancies between our accelerator data and the Crunchbase data, found at
/bulk/McNair/Projects/Accelerators/Summer 2018/Accelerators Matched by Name and Homepage URL.xlsx
- Finalized a sheet with a list of accelerator names as we code them, as Crunchbase codes them, and the appropriate UUID for each accelerator. I recommend updating the names in our spreadsheet of accelerators to the Crunchbase list so that we will be able to look up that name without having an in-between. The list can be found in the rightmost columns here:
/bulk/McNair/Projects/Accelerators/Summer 2018/Accelerator Master Variable List - Revised by Ed V2.xlsx
- Finished going through Accelerator Master Variable List to refine industry classification and update addresses/accelerator statuses.
- Began manually editing entries in Accelerator Master Variable List.
- Reached out to Grace and Maxine and sent them the necessary sheets/txt files so they could begin on their Crunchbase project.
- I also made these graphics to better represent what our collaborative work would look like, and what the final project would include:
- Talked with Ed about project details.
- Began looking through the Accelerator Master List to better understand project description.
- Sent Grace and Maxine the relevant company names listed in the Accelerator Master Spreadsheet so they could begin using their relevant parsers and tools to sort through data.
- Set up work stations on balcony, trained
- Trained, met other interns