Connor Rothschild (Work Log)

From edegan.com
Jump to navigation Jump to search

Summer 2018

7/16/2018 -

  • Merged cohort company data with Crunchbase data, by doing a Vlookup then cleaning up data. I used a =IF(A2="",B2,A2) formula to merge cells only when blanks were present. This provided us updated data for four columns:
    • colocation (removed 6324 blanks)
    • codescription (removed 5151 blanks)
    • costatus (removed 7342 blanks)
    • courl (removed 6670 blanks)

and new columns:

    • address
    • founded_on date
    • employee_count
    • linkedin_url

These new variables can be found in:

/McNair/Projects/Accelerators/Summer 2018/Crunchbase Info Populated Empty Cells.xlsx

Upon Ed's approval, I'll move this sheet to replace Cohort Companies in The File to Rule Them All.

7/13/2018 -

  • Using SQL, matched our cohort companies with information from Crunchbase. This gave us a lot of new information, like employee counts, company status, the date founded, and the location of the company. This data can be found here:
/McNair/Projects/Accelerators/Summer 2018/Cohort Companies With Crunchbase Info.xlsx

7/12/2018 -

  • Created 'The File to Rule them All' with finalized info on accelerators, cohort companies, and founders.
  • Attempted to match our company data to Crunchbase data with SQL to get more info on companies.

7/11/2018 -

  • Worked on LinkedIn Founders data. Cleaned up data, removed duplicates, checked for fidelity.
  • Worked with Maxine to finish Crunchbase matching.

7/10/2018 -

  • Merged Clean Cohort Data (Veeral) and Cohort List (new) in the Accelerator Master Variable List file. Cross-referenced this list with Ed's data sent last week, titled accelerator_data_noflag.txt. We found that there are 4866 more entries in the new merged file, meaning Ed's merging may have dropped valid entries. (This was after filtering the list so we only looked at the accelerators on our list).

7/9/2018 -

  • Worked with Maxine to remove duplicates/gather clean data for Crunchbase matching

06/29/2018 -

  • Finished manually coding an equity variable in Master Variable List sheet (with the help of Maxine Tao).
  • Finished editing terms of joining accelerator:
  • Given the above two tasks, there are five new columns in our Master Variable List sheet:
    • Terms of joining - terms of joining accelerator and important details about program
    • equity (1/0) - cells contain a 1 if the accelerator take equity, a 0 if an accelerator definitively does not, and is blank if we could not find that information
    • equity amount - the % of equity the accelerator will take (can sometimes be a range (eg. 5-7%))
    • investment - the $ the accelerator invests in a company to begin, if relevant (also could be a range or a "up to $######")
    • notes - anything to comment on previous 4 columns
  • Taught Maxine Tao how to VLookup :D

06/28/2018 -

  • Began manually coding an equity variable in Master Variable List sheet.
  • Edited terms of joining accelerator.
  • Helped Grace with LinkedIn crawler.

06/27/2018 -

  • Finished coding duplicates. Final file can be found at:
/bulk/McNair/Projects/Accelerators/Summer 2018/Duplicate Companies.xlsx
  • Dylan taught interns Excel skills

06/26/2018 -

  • Began coding duplicates in CohortMainBaseWCounts.txt file that Ed sent. Sorted by company name alphabetically, then used conditional formatting to highlight when an accelerator had the same name as the accelerator above. This narrowed down the results to instances in which a company would go through the same accelerator twice. Most of the time, this was due to an error with the normalizer, so I moved those un-normalized company names to their own sheet and deleted them from the file.

06/25/2018 -

  • Went through and manually fixed discrepancies between our accelerator data and the Crunchbase data, found at
/bulk/McNair/Projects/Accelerators/Summer 2018/Accelerators Matched by Name and Homepage URL.xlsx
  • Finalized a sheet with a list of accelerator names as we code them, as Crunchbase codes them, and the appropriate UUID for each accelerator. I recommend updating the names in our spreadsheet of accelerators to the Crunchbase list so that we will be able to look up that name without having an in-between. The list can be found in the rightmost columns here:
/bulk/McNair/Projects/Accelerators/Summer 2018/Accelerator Master Variable List - Revised by Ed V2.xlsx

and here:

https://docs.google.com/spreadsheets/d/1n1sX5DqZrm_0vbUXG9ZaZIagF9sa0Kva9PAno-6H854/edit?usp=sharing


06/22/2018 -

  • Finished going through Accelerator Master Variable List to refine industry classification and update addresses/accelerator statuses.

06/21/2018 -

  • Began manually editing entries in Accelerator Master Variable List.
  • Reached out to Grace and Maxine and sent them the necessary sheets/txt files so they could begin on their Crunchbase project.
  • I also made these graphics to better represent what our collaborative work would look like, and what the final project would include:
https://docs.google.com/document/d/13Mb7lOLydm9r-ENYxSlZJVGgY9wxClATR6Hy8F9YK1Y/edit?usp=sharing

06/20/2018 -

  • Talked with Ed about project details.
  • Began looking through the Accelerator Master List to better understand project description.
  • Sent Grace and Maxine the relevant company names listed in the Accelerator Master Spreadsheet so they could begin using their relevant parsers and tools to sort through data.

06/19/2018 -

  • Set up work stations on balcony, trained

06/18/2018 -

  • Trained, met other interns