U.S. Seed Accelerators

From edegan.com
Jump to navigation Jump to search


McNair Project
U.S. Seed Accelerators
Project logo 02.png
Project Information
Project Title U.S. Seed Accelerators
Owner Connor Rothschild
Start Date 06/18/2018
Deadline
Keywords accelerators, data
Primary Billing
Notes [[Has notes::Continuation of Accelerator Data]]
Has project status Active
Is dependent on Industry Classifier, Demo Day Page Parser
Copyright © 2016 edegan.com. All Rights Reserved.


Relevant Former Projects

This page serves as an updated and tidied version of the data and work presented on the Accelerator Seed List (Data) Project, which subsumed Accelerator Data. Both of these projects (and as a corollary, this project) are dependent on the Demo Day Page Parser, Industry Classifier, and the WhoIs Parser.

An Overview

This project will be used to determine which accelerators are the most effective at churning out successful startups, as well as what characteristics are exhibited by these accelerators. First, we need to gather as much data as we can about as many accelerators as we can in order to look at factors that differentiate successful vs. unsuccessful ventures. Next, we need to create a web crawling program which will gather information about accelerators across the world by accessing their websites and extracting information. I believe that our overall goal with this research project is to gain insight into the methods of successful accelerators, as well as to find out what exactly differentiates very successful accelerators from dead accelerators.

Helpful Links: http://seedrankings.com/

This project is developing broad and near-population data on accelerators and their cohort companies. The objective is to identify which cohorts of which accelerators a cohort company was trained in, obtain details of the accelerators, and obtain details of the cohort companies, including information about any venture capital investment that the cohort company might have received and any IPO or acquisition the company may have experienced.

The primary use of this data is for an academic paper detailed on the Matching Entrepreneurs to Accelerators and VCs (Academic Paper) page.

However, this project can also provide useful data to other academic papers (Urban Start-up Agglomeration, Hubs (Academic Paper), and Hubs Scorecard (Academic Paper)), projects (Houston Entrepreneurship) and blog posts (under the Emerging Ecosystems umbrella project).

The most recent update provided on Accelerator Seed List (Data) was on 05/21/2018. This update included the most recent master file of accelerator data, found at

E:\McNair\Projects\Accelerators\Summer 2018\Accelerator Master Variable List.xlsx

The Google Sheets Master Sheet is found here

https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=0

Remaining To Dos

The last update on Accelerator Seed List (Data) said the following needed to be done:

  • Cross-reference sheet with data from Peter's old accelerator consolidation file ("accelerator_data_noflag" and "accelerator_data" in "All Relevant Files") and fill in missing data
  • Variables that are 100% NOT in these 2 files:
    • Cohort Breakout?
    • Subtype
    • Designed for Students?
    • Campuses
    • Stage
    • Software Tech
    • What stage do they look for?

TODO:

McNair/Projects/Accelerators/Fall 2017/unfound_founders.txt

A 0 means we don't have founder data for that accelerator. Specs: A tab delimited text file with the following fields:

Accelerator   First Name   Last Name   LinkedInURL(if possible)

Getting the LinkedInURL will ensure accuracy, but will work without it.

  • Shrey: Find "demo day" keywords, so that we can search AcceleratorName Year Keyword and get back potential demo day pages

It is unclear if any of these tasks have been done since the update on 05/21. I will begin by seeing which of these things have been carried out.

Other Listed To Dos

  • We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data" or on the "Accelerator Master Variable List" Google sheet.
  • We have listed all of the startups from the accelerators that have break out cohorts on their website on the "Accelerator Master Variable List" Google sheet. This contains the following information in the "Cohort List (new)" sheet: accelerator name, year, cohort name, company name, description, founders, category/sector, and location.
  • Next steps include going through the demo day pages that have been downloaded and writing notes on the different types if possible (see Demo Day Page Google Classifier).

06/20/2018 Update

This project will begin by working with Grace Tan and Maxine Tao to connect accelerators to their founders and cohort companies using Crunchbase and LinkedIn crawlers. Grace and Maxine will go through Crunchbase www.crunchbase.com and find the UUID for companies and their founders. Connect them using SQL and feed the names of founders into our LinkedIn crawler (headed by Grace Tan).

Accelerators linking to cohort companies is slightly more difficult. Here, we will focus only on accelerators which take equity from their cohort companies (found in Ed’s updated spreadsheet). We will find investments of a given accelerator, and assumes (or checks if that is possible) the company is taking equity in the company it invests in, and the date they invested is the year of the cohort for that company.

LinkedIn Founders

The list of founders for accelerators can be found at

McNair/Projects/Accelerators/Fall 2017/founders_linkedin.txt

The Unfound Founders file is accurate in that it codes a 0 for all companies not listed within the LinkedIn Founders file, and a 1 for those that do have founders listed.