Difference between revisions of "Accelerator Seed List (Data)"

From edegan.com
Jump to navigation Jump to search
 
(50 intermediate revisions by 9 users not shown)
Line 1: Line 1:
{{McNair Projects
+
{{Project
 +
|Has project output=Data,Tool
 +
|Has sponsor=McNair Center
 
|Has title=Accelerator Seed List (Data)
 
|Has title=Accelerator Seed List (Data)
|Has owner=Shrey Agarwal, Matthew Ringheanu, Veeral Shah,
+
|Has owner=Shrey Agarwal, Matthew Ringheanu, Veeral Shah, Connor Rothschild,
 
|Has start date=Fall 2016
 
|Has start date=Fall 2016
 
|Has keywords=Accelerators,Data
 
|Has keywords=Accelerators,Data
|Has project status=Active
+
|Has project status=Subsume
 
|Is dependent on=Industry Classifier
 
|Is dependent on=Industry Classifier
 
}}
 
}}
 +
=Current Work=
 +
 +
===As of 05/21/2018 the Google Sheet Workbook has been downloaded to the E drive. The now Excel Workbook is saved at E:\McNair\Projects\Accelerators\Summer 2018\Accelerator Master Variable List.xlsx. This is now the master file.===
 +
 +
Google Master Sheet: https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=0
 +
*Cross-reference sheet with data from Peter's old accelerator consolidation file ("accelerator_data_noflag" and "accelerator_data" in "All Relevant Files") and fill in missing data
 +
*Variables that are 100% NOT in these 2 files:
 +
**Cohort Breakout?
 +
**Subtype
 +
**Designed for Students?
 +
**Campuses
 +
**Stage
 +
**Software Tech
 +
**What stage do they look for?
 +
 +
TODO:
 +
McNair/Projects/Accelerators/Fall 2017/unfound_founders.txt
 +
A 0 means we don't have founder data for that accelerator.
 +
Specs: A tab delimited text file with the following fields:
 +
Accelerator  First Name  Last Name  LinkedInURL(if possible)
 +
Getting the LinkedInURL will ensure accuracy, but will work without it.
 +
 +
 +
*Shrey: Find "demo day" keywords, so that we can search AcceleratorName Year Keyword and get back potential demo day pages
 +
 +
 +
==Accelerator Type project==
 +
 +
File to edit is called "Accelerator type list".  Located in the folder E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs.  More systematic information and instructions are in"Instructions for Accelerator type project" in E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs.
 +
 +
NOTE: until we get through all 270 accelerators, we will just categorize each accelerator into the following three categories as quickly as possible with short notes in teh "other info" column for these; once we have this, we will go back through the ones that aren't categorized and add notes to the "other info" column. 
 +
 +
 +
Type list:
 +
*Private
 +
*Corporate
 +
*Academic
 +
Note: if DEAD, noted here.
 +
 +
 +
Other info:
 +
*nonprofit? (y/n)
 +
 +
*Subtype abbreviations:
 +
**S: for if a social entrepreneurship initiative
 +
**I: for if an incubator
 +
**A: for an angel group
 +
**F: for foreign
 +
**C: for in coworking space/hub/etc
 +
**V: for if part of venture fund
 +
**G: for if government funded/partnered
 +
**T: for international
 +
 +
 +
Note: subtypes (from individual text files in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data) were only found for 23 of the 270 accelerators.  These accelerators were initially intended to be removed from the master list.  Remaining subtypes are currently being added.
 +
 +
other info:
 +
 +
international offices, founders, industries, org type, program duration, or other interesting, easily accessed variables.  Additional information is especially important for accelerators that have no other subtype abbreviation listed.
 +
 +
 +
===Steps to research an accelerator===
 +
 +
1. Copy/paste URL listed in Accelerator type list file into google.  If website is insufficient, try googling:
 +
the name of the accelerator
 +
the name of the accelerator + "crunchbase"
 +
the name of the accelerator + "nonprofit"
 +
 +
the above steps sometimes lead to other helpful databases/news articles
 +
 +
2. Note whether:
 +
1) Academic/Corporate/Private
 +
2) For Profit/Nonprofit.  Sometimes this isn't directly stated but can be inferred through their description of, say their investment process.  If they don't address this at all it's probably For Profit.
 +
3) subtype (S, I, A, F, C, V, G, T). 
 +
4) Additional, easily-accessed info.  Number 4 is really important if there's no subtype.
 +
 +
All 270 need to be done by the end of the semester.
 +
 +
 +
Type list file saved as
 +
"Accelerator type list" in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs.
 +
The list of ListofAccs, from which we drew Accelerator type list, should have no matches with any of the flagged accelerators in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data.  There are 23 matches though.  So all subtypes must be searched and entered manually.  Whether some were a nonprofit was listed in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs, called "whether nonprofit...".  Accelerators with no info there on whether nonprofit need to have info entered manually.
 +
 +
=Funded By Accelerators=
 +
 +
Reference the like-named portion in [[Crunchbase Data#Funded by Accelerators|Crunchbase Data]]
 +
 
=End of Semester Report=
 
=End of Semester Report=
 
The end of semester report will focus on ranking accelerators and environments based on the variables we have gathered. Our primary form of categorization will be ranking individual accelerators based on their venture capital raise rate. We can probably generate information over time for accelerators and the amount of VC they raised to get a sense of what locations have developed in the past five years from the dates of transactions recorded by SDC. To obtain these rankings, we will identify which cohorts companies were trained in, as well as complete details of the accelerator and the details of cohort companies. We will focus only on accelerators because there are many other entities in each ecosystem. We will also utilize information on IPO or acquisition by companies, obtained through Crunchbase, to gain some sense of how successful startups emerging from a particular accelerator are. To obtain the data over time, we will need to fill out the cohort date information column in our cohort data, which will require the help of either Crunchbase or the Wayback machine for older accelerators. In ranking the accelerators across regions, we can also track industry-specific hotspots for accelerators such as medicine in Memphis or technology in San Francisco.
 
The end of semester report will focus on ranking accelerators and environments based on the variables we have gathered. Our primary form of categorization will be ranking individual accelerators based on their venture capital raise rate. We can probably generate information over time for accelerators and the amount of VC they raised to get a sense of what locations have developed in the past five years from the dates of transactions recorded by SDC. To obtain these rankings, we will identify which cohorts companies were trained in, as well as complete details of the accelerator and the details of cohort companies. We will focus only on accelerators because there are many other entities in each ecosystem. We will also utilize information on IPO or acquisition by companies, obtained through Crunchbase, to gain some sense of how successful startups emerging from a particular accelerator are. To obtain the data over time, we will need to fill out the cohort date information column in our cohort data, which will require the help of either Crunchbase or the Wayback machine for older accelerators. In ranking the accelerators across regions, we can also track industry-specific hotspots for accelerators such as medicine in Memphis or technology in San Francisco.
Line 67: Line 156:
 
=End of Semester Notes=
 
=End of Semester Notes=
  
*We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data".
+
*We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data" or on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet].
 +
*We have listed all of the startups from the accelerators that have break out cohorts on their website on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet]. This contains the following information in the "Cohort List (new)" sheet: accelerator name, year, cohort name, company name, description, founders, category/sector, and location.
 +
*Next steps include going through the demo day pages that have been downloaded and writing notes on the different types if possible (see [[Demo Day Page Google Classifier]]).
  
 
=Data Collection Notes=
 
=Data Collection Notes=
Line 175: Line 266:
 
==Link to Crunchbase API application==
 
==Link to Crunchbase API application==
  
https://about.crunchbase.com/forms/research-access-apply/
+
https://about.crunchbase.com/forms/research-access-apply/ (Does not work anymore)
 +
 
 +
https://data.crunchbase.com/v3/docs/using-the-api (Has new instructions for application)
  
 
==Sign-Ups==
 
==Sign-Ups==
Line 784: Line 877:
 
*http://www.represent.la/
 
*http://www.represent.la/
 
*http://www.launch.co/blog/complete-list-of-incubators-and-accelerators-like-y-combinat.html
 
*http://www.launch.co/blog/complete-list-of-incubators-and-accelerators-like-y-combinat.html
*https://angel.co/accelerator-4
+
*https://angel.co/accelerator-4 (Does not work - seems to be replaced by https://angel.co/companies?company_types[]=Incubator )
  
 
(Obtained from Google search: "Accelerator Database")
 
(Obtained from Google search: "Accelerator Database")
Line 823: Line 916:
  
  
==Source: http://www.seed-db.com/accelerators/all==
+
==Source: http://www.seed-db.com/accelerators==
 
#Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results.
 
#Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results.
 
#Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
 
#Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
Line 852: Line 945:
 
*Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive.
 
*Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive.
 
*Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc.
 
*Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc.
 
  
 
==Source: http://www.seed-db.com/accelerators==
 
==Source: http://www.seed-db.com/accelerators==
Line 976: Line 1,068:
 
*Examples of single accelerators found
 
*Examples of single accelerators found
 
:#TMCx: http://www.tmc.edu/innovation/innovation-programs/tmcx/
 
:#TMCx: http://www.tmc.edu/innovation/innovation-programs/tmcx/
:#RED labs: http://redlabs.uh.edu/8
+
:#RED labs: http://redlabs.uh.edu/
 
:#SURGE accelerator: https://kirkcoburn.com/
 
:#SURGE accelerator: https://kirkcoburn.com/
 
:#OwlSpark: http://owlspark.com/
 
:#OwlSpark: http://owlspark.com/
 
:#NextHIT: http://www.houstonhealthventures.com/nexthit-accelerator-program-application/
 
:#NextHIT: http://www.houstonhealthventures.com/nexthit-accelerator-program-application/
 +
 
===Los Angeles Accelerators===
 
===Los Angeles Accelerators===
 
:#Amplify: http://amplify.la/
 
:#Amplify: http://amplify.la/
Line 1,360: Line 1,453:
 
===CrunchBase===
 
===CrunchBase===
  
CrunchBase 2013 Snapshot (All Organizations)- E:\McNair\Projects\Accelerators\organizations.xls
+
CrunchBase 2013 Snapshot '''(All Organizations)'''- E:\McNair\Projects\Accelerators\organizations.xls
  
CrunchBase 2013 Snapshot (Potential Accelerators)- E:\McNair\Projects\Accelerators\organizations.accdb under "Potential Accelerators query"  
+
CrunchBase 2013 Snapshot '''(Potential Accelerators)'''- E:\McNair\Projects\Accelerators\organizations.accdb under "Potential Accelerators query"  
  
 
*Obtained using keyword matches in the descriptions of the potential accelerators.
 
*Obtained using keyword matches in the descriptions of the potential accelerators.
  
CrunchBase 2013 Snapshot (New Verified Accelerators) - E:\McNair\Projects\Accelerators\New CrunchBase Accelerators.xls
+
CrunchBase 2013 Snapshot '''(New Verified Accelerators)''' - E:\McNair\Projects\Accelerators\New CrunchBase Accelerators.xls
  
 
We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies.
 
We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies.
 +
 +
===AngelList===
  
 
===seed-db===
 
===seed-db===
Line 1,376: Line 1,471:
 
===Global Accelerator Network (GAN)===
 
===Global Accelerator Network (GAN)===
  
GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data
+
GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\scrapeaccel.py
  
 
GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data
 
GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data
Line 1,412: Line 1,507:
 
===Desired Information/Variables===
 
===Desired Information/Variables===
  
Key People (founders, lead entrepreneurs, strategists, etc.)
+
*Key People (founders, lead entrepreneurs, strategists, etc.)
Total number of launched companies
+
*Total number of launched companies
A FAQ for application details, accelerator vision, and
+
*A FAQ for application details, accelerator vision, and
Funds raised per company (average)
+
*Funds raised per company (average)
Features offered by accelerator (perks, space, tools, etc)
+
*Features offered by accelerator (perks, space, tools, etc)
 +
 
 +
==Desired Tools/Information==
 +
 
 +
===Automating the Process of Obtaining Cohorts===
 +
*Automating this process would save a lot of time and really progress the project.
 +
 
 +
===Obtaining More Details on Accelerators===
 +
 
 +
*Having the kind of thorough information on industry, companies, funding, location, exits, mentors, leadership,  that we got for the GAN companies would be fantastic.
 +
 
 +
===List of Alive/Dead Accelerators===
 +
 
 +
This is a dream but would be very helpful

Latest revision as of 13:44, 21 September 2020


Project
Accelerator Seed List (Data)
Project logo 02.png
Project Information
Has title Accelerator Seed List (Data)
Has owner Shrey Agarwal, Matthew Ringheanu, Veeral Shah, Connor Rothschild
Has start date Fall 2016
Has deadline date
Has keywords Accelerators, Data
Has project status Subsume
Is dependent on Industry Classifier
Dependent(s): Accelerator Data, Demo Day Page Google Classifier
Subsumed by: U.S. Seed Accelerators
Has sponsor McNair Center
Has project output Data, Tool
Copyright © 2019 edegan.com. All Rights Reserved.

Contents

Current Work

As of 05/21/2018 the Google Sheet Workbook has been downloaded to the E drive. The now Excel Workbook is saved at E:\McNair\Projects\Accelerators\Summer 2018\Accelerator Master Variable List.xlsx. This is now the master file.

Google Master Sheet: https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=0

  • Cross-reference sheet with data from Peter's old accelerator consolidation file ("accelerator_data_noflag" and "accelerator_data" in "All Relevant Files") and fill in missing data
  • Variables that are 100% NOT in these 2 files:
    • Cohort Breakout?
    • Subtype
    • Designed for Students?
    • Campuses
    • Stage
    • Software Tech
    • What stage do they look for?

TODO:

McNair/Projects/Accelerators/Fall 2017/unfound_founders.txt

A 0 means we don't have founder data for that accelerator. Specs: A tab delimited text file with the following fields:

Accelerator   First Name   Last Name   LinkedInURL(if possible)

Getting the LinkedInURL will ensure accuracy, but will work without it.


  • Shrey: Find "demo day" keywords, so that we can search AcceleratorName Year Keyword and get back potential demo day pages


Accelerator Type project

File to edit is called "Accelerator type list". Located in the folder E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs. More systematic information and instructions are in"Instructions for Accelerator type project" in E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs.

NOTE: until we get through all 270 accelerators, we will just categorize each accelerator into the following three categories as quickly as possible with short notes in teh "other info" column for these; once we have this, we will go back through the ones that aren't categorized and add notes to the "other info" column.


Type list:

  • Private
  • Corporate
  • Academic
Note: if DEAD, noted here.


Other info:

  • nonprofit? (y/n)
  • Subtype abbreviations:
    • S: for if a social entrepreneurship initiative
    • I: for if an incubator
    • A: for an angel group
    • F: for foreign
    • C: for in coworking space/hub/etc
    • V: for if part of venture fund
    • G: for if government funded/partnered
    • T: for international


Note: subtypes (from individual text files in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data) were only found for 23 of the 270 accelerators.  These accelerators were initially intended to be removed from the master list.  Remaining subtypes are currently being added.

other info:

international offices, founders, industries, org type, program duration, or other interesting, easily accessed variables. Additional information is especially important for accelerators that have no other subtype abbreviation listed.


Steps to research an accelerator

1. Copy/paste URL listed in Accelerator type list file into google. If website is insufficient, try googling:

the name of the accelerator
the name of the accelerator + "crunchbase"
the name of the accelerator + "nonprofit" 

the above steps sometimes lead to other helpful databases/news articles

2. Note whether:

1) Academic/Corporate/Private 
2) For Profit/Nonprofit.  Sometimes this isn't directly stated but can be inferred through their description of, say their investment process.  If they don't address this at all it's probably For Profit. 
3) subtype (S, I, A, F, C, V, G, T).  
4) Additional, easily-accessed info.  Number 4 is really important if there's no subtype. 

All 270 need to be done by the end of the semester.


Type list file saved as

"Accelerator type list" in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs.

The list of ListofAccs, from which we drew Accelerator type list, should have no matches with any of the flagged accelerators in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data. There are 23 matches though. So all subtypes must be searched and entered manually. Whether some were a nonprofit was listed in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs, called "whether nonprofit...". Accelerators with no info there on whether nonprofit need to have info entered manually.

Funded By Accelerators

Reference the like-named portion in Crunchbase Data

End of Semester Report

The end of semester report will focus on ranking accelerators and environments based on the variables we have gathered. Our primary form of categorization will be ranking individual accelerators based on their venture capital raise rate. We can probably generate information over time for accelerators and the amount of VC they raised to get a sense of what locations have developed in the past five years from the dates of transactions recorded by SDC. To obtain these rankings, we will identify which cohorts companies were trained in, as well as complete details of the accelerator and the details of cohort companies. We will focus only on accelerators because there are many other entities in each ecosystem. We will also utilize information on IPO or acquisition by companies, obtained through Crunchbase, to gain some sense of how successful startups emerging from a particular accelerator are. To obtain the data over time, we will need to fill out the cohort date information column in our cohort data, which will require the help of either Crunchbase or the Wayback machine for older accelerators. In ranking the accelerators across regions, we can also track industry-specific hotspots for accelerators such as medicine in Memphis or technology in San Francisco.

To complete the report, we need to fill information in:

  • Industry and focus
  • Location
  • Name, description
  • Matched VC data
  • Founder information (maybe)

Overview

This project is developing broad and near-population data on accelerators and their cohort companies. The objective is to identify which cohorts of which accelerators a cohort company was trained in, obtain details of the accelerators, and obtain details of the cohort companies, including information about any venture capital investment that the cohort company might have received and any IPO or acquisition the company may have experienced.

The primary use of this data is for an academic paper detailed on the Matching Entrepreneurs to Accelerators and VCs (Academic Paper) page.

However, this project can also provide useful data to other academic papers (Urban Start-up Agglomeration, Hubs (Academic Paper), and Hubs Scorecard (Academic Paper)), projects (Houston Entrepreneurship) and blog posts (under the Emerging Ecosystems umbrella project).

This project needs the results of the Industry Classifier, Whois Parser, and other tools.

Current Project Write-Up

Things To Do

  • Obtain all URLs for accelerators in order to run through the Wayback Machine to find out when they started.
  • Match Crunchbase Data with our Accelerator List to see if they have any accelerators that we do not.
  • Obtain an example of accelerator that started early and has multiple companies but does not separate them into cohorts and figure out a way to determine which companies went through each cohort.

What Each File in the "Accelerator" Folder on the RDP Contains

  • "Accelerator List Sources" (Folder) - This folder contains most of the sources that we pulled accelerator names from at the very beginning of the project.
  • "Code+Final_Data" (Folder) - This folder contains Peter's code for pulling the data from the text files in the "Data" folder.
  • "Crunchbase Snapshot" (Folder) - This folder contains the data we obtained from Crunchbase. There is a massive amount of data which we will need to sort through to find useful information and hopefully match that data with our current cohort data.
  • "Data" (Folder) - This folder contains all of our data on accelerators including cohort information and the html files of each cohort page. I would estimate that it is about 95% clean currently.
  • "Data - Copy" (Folder) - This is just a copy of our current "Data" folder.
  • "Data_Copy" (Folder) - This is a copy of our original "Data" folder before we did any manual cleaning.
  • "Enclosing_Circle" (Folder) - This folder seems to contain some data on VC but I'm not sure how it pertains to the Accelerator project.
  • "F6S Accelerator HTMLs" (Folder) - This folder contains the HTML pages of all the pages on the F6S website. We used it to add more potential accelerators to our list.
  • "Google_SiteSearch" (Folder) - This folder contains Python code for Google searches.
  • "Industry_Classifier" (Folder) - This folder seems to contain Python code but I'm not sure what for.
  • "Matcher" (Folder) - This folder contains the Matcher.
  • "Python WebCrawler" (Folder) - This folder contains code that is a work in progress for pulling descriptions from accelerator websites. It is Jeemin's project.
  • "Cleaned Cohort Data Copy" (Excel File) - This file contains a copy of our cleaned cohort data.
  • "Cleaned Cohort Data" (Excel File) - This file contains the most current, completely cleaned data on cohort company information.
  • "NormalizeFixedWidth" (PL File) - This is the normalizer.
  • "PortCoNames" (TXT File) - This file contains all of the names of the cohort companies as well as the accelerator they went through.
  • "VC Data" (Excel File) - This file contains all of the names of the companies that have ever received VC funding.
  • "VC_Data" (TXT File) - This file contains that non-normalized data of all of the VC information.
  • "VC_Data_Names" (TXT File) - This file contains all of the names of companies that have received VC funding.
  • "VC_Data_Names_Matched_PortCoNames" (Excel File) - This file contains all of the cohort companies that have also received VC funding. Still needs to be sorted through.

Process

After accumulating the massive amount of data on accelerators, their cohorts, and their html files, we began cleaning those text files, which are located in the "Data" folder within "Accelerators". After going through the first round of cleaning, we ran a code through the cohort data which put all of that information into an Excel document called "Cleaned Cohort Data". There were still some mistakes in the cohort information unfortunately, which we fixed within the Excel file itself. Therefore, there are some text files within the "Data" folder that do not match with the "Cleaned Cohort Data" file. If we were to run the cohort code through the "Data" folder, we would get something that does not match with the "Cleaned Cohort Data" file, which is problematic. The solution to this (other than manually cleaning the text files again) would be to write a code from the "Cleaned Cohort Data" file which would allow us to clean the data in the "Data" folder through the format of the Excel file. We have also matched all of the cohort companies with our list of all companies that have received VC funding.

Current To Do

  1. Work on the Crunchbase 2013 Snapshot
  2. Match cohort companies to VC-backed portfolio companies
  3. Refine our data to work out which cohort each cohort company was a member of, cohort start dates and locations, etc.
  4. Make a list of top accelerator lists (e.g., http://tech.co/top-startup-accelerators-ranked-2012-08) and check that we have those accelerators

End of Semester Notes

  • We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data" or on the "Accelerator Master Variable List" Google sheet.
  • We have listed all of the startups from the accelerators that have break out cohorts on their website on the "Accelerator Master Variable List" Google sheet. This contains the following information in the "Cohort List (new)" sheet: accelerator name, year, cohort name, company name, description, founders, category/sector, and location.
  • Next steps include going through the demo day pages that have been downloaded and writing notes on the different types if possible (see Demo Day Page Google Classifier).

Data Collection Notes

MATCHING

The files we used to match are located in the E drive. We used the matcher to match our portfolio company names from the cohort file located in E:\McNair\Projects\Accelerators.

  • The files used to matching are located E:\McNair\Projects\Accelerators\Matcher
  • Portco is the name of the companies pulled from the cohort file
  • AccCo includes both the cohort company name, along with the name of the accelerator itself
  • In the matcher, the inputs are the PortCo names, as well as the VC data from our pull in SDC
  • The outputs include the AccCo_VC data located in E:\McNair\Projects\Accelerators which give a lot of information on the matches, including:
  • name of the match itself
  • number of investments
  • dates that the company received its investments

SDC Pull

We accessed SDC platinum and pulled information on round-based funding that all registered companies received from between the years 1999 to 2017.

The receipt is as follows:

Session Details


Request Hits Request Description

  0        -     DATABASE: Portfolio Companies (VIPC)
  1     96155    Venture Related Deals: Select All Venture Related Deals
  2     79572    Round Date: 1/1/1999 to 3/1/2017 (Custom) (Calendar)
  3              Custom Report: VC Data (Columnar) - Save As:
                 E:\McNair\Projects\Accelerators\VC Data.txt

� Billing Ref # : 2054025 Capture File : riceuniv.2054025 Session Name :

The VC data pull includes the following variables:

Company Name Date Company Date Company Company Company City Company Street Address, Line 1 Company Street Address, Line 2 Total Known Company Industry Sub-Group 3 Company Industry Major Group Round Company Stage Level 3 Round Amt, Round Amt,

3 files

For each accelerator in the list, put files in E:\Projects\Accelerators\Data

  • AcceleratorName.txt - copy and paste the variables below into a (tab-delimited) txt file and complete
  • AcceleratorName.cohort - your cohort text file (see below)
  • AcceleratorName.html (possibly automatically with a folder too) - save a copy of the html of the cohort page

.txt Variables

Name	
Score	
Flag	
CohortURL	
Address	
Duration	
Vintage		
Industry	
Description	
Equity	
NonProfit	 
Notes	


Try to get Name, Score, Flag, Cohort URL and Address for all. ONLY GRAB OTHER VARIABLES IF EASY. Just leave things blank if you can't find them quickly.

If the score is 0, or the flag is S, I, A, or F just stop - don't bother downloading a cohort list, saving an HTML file, etc. If possible, do stick a very brief description of the problem in the notes field.

Notes:

  • Score: is 0-1 where 0 is definitely not an accelerator, 1 is definitely an accelerator
  • Flag: (leave blank if not needed), if multiple then separate by comma
    • S for social entrep
    • I for incubator
    • A for an angel group
    • F is for foreign
    • C for in coworking space/hub/etc
    • V for if part of venture fund
    • D is for Dead
  • Put just the root URL in Cohort URL if there isn't a Cohort page
  • Duration: in wks (months x 4.33 and round)
  • Vintage is year of first cohort if possible
  • Industry is industry focus but only if clear focus
  • Equity is a number (don't put %) or Y/N
  • Notes is only there if need it. Particularly try to use this field to note discards.

.cohort files

Your .cohort files must:

  • Be tab delimited txt
  • Have a header
  • The first column must be the portfolio company name
  • Grab as many columns as you can easily (and name them)

Standardized format for text files

Information Text file

  • 1 tab only after each category
  • No spaces after commas for flags or industry
  • For duration put only a number in weeks but do not write "weeks"
  • Equity is either only a number (no percent sign) or a Y/N


Cohort Text file

  • 1 tab between each column
  • Titles of each column on top
  • Make a new category for "Cohort Number" and write either "1 2 3 4 etc."
  • Matthew: 1-225 (done) Shrey: 226-550 (done)

Link to Crunchbase API application

https://about.crunchbase.com/forms/research-access-apply/ (Does not work anymore)

https://data.crunchbase.com/v3/docs/using-the-api (Has new instructions for application)

Sign-Ups

Ed - 1-10 (done)
Carlin -  11-20 (done)
Carlin - 21-40 (done)
Christy - 41-60 (done)
Avesh - 61-80 (done)
Eliza - 81-100 (done)
Meghana - 101-120 (done)
Peter - 121-140 (done)
Ramee - 141-160 (done)
Will - 161-180 (done)
Matthew - 181-200 (done)
Julia - 201-220 (done)
Peter - 221-240 (done)
Shrey - 241-260 (done)
Matthew - 261-280 (done)
Eliza - 281-300 (done)
Julia - 301-320 (done)
Shrey - 321-340 (done)
Carlin - 341-361 (done)
Julia - 362-380 (done)
Dylan - 381-393 (done)
Jake - 394-404 (done)
Dylan - 405-410 (done)
Avesh - 411-415 (done)
Dylan - 416-423 (done)
Peter - 424-460(done)
Carlin - 461-480 (done)
Peter - 481-490(done)
Julia - 491-510 (done)
Peter - 511-515 (done)
Julia - 516-529 (done)
Ben - 530-540 (done)
Shrey - 541-551 (done)

List of Accelerators

  1. 10Xelerator
  2. 1440
  3. 33entrepreneurs
  4. 500 Startups
  5. 9Mile Labs
  6. AIA Accelerator
  7. ARK Challenge
  8. AT&T Aspire Accelerator
  9. ATDC Community
  10. AZ TechCelerator
  11. AccelFoods
  12. Acceleprise
  13. Accelerate Baltimore
  14. Accelerate Genius
  15. Accelerate Tectoria Accelerator
  16. Accelerator Centre
  17. Advanced Technology Development Center (ATDC)
  18. Airbus BizLab
  19. Alchemist Accelerator
  20. AlphaLab
  21. Amplify.LA
  22. Angel Capital
  23. Angelcube
  24. Angelpad
  25. Annual Business BootCamp
  26. Arizona Center for Innovation
  27. Arizona Furnace
  28. Arrowhead Tech Incubator 2016
  29. Aspire 3 Accelerator 2017
  30. Atlanta Ventures Accelerator
  31. AutoXLR8R
  32. Awesome Inc.
  33. Axel Springer Plug and Play
  34. B 4 Change Impact Accelerator
  35. B2B Acceleration Program
  36. B4C Social Venture Accelerator
  37. BBC Worldwide Labs
  38. BMW Startup Garage
  39. Brandcelerate
  40. Bunker Labs
  41. Bank of Ireland Accelerator Programme
  42. Bantunium Labs Accelerator
  43. Barclays Accelerator
  44. Barclays New York Summer 2015
  45. Berkley Ventures
  46. Bessemer Business Incubation System
  47. Beta-i
  48. Beta.MN
  49. BetaFactory
  50. BetaSpring
  51. Betablox
  52. Betaspring RevUp (DUPLICATE)
  53. Bethnal Green Ventures
  54. BioAccel
  55. BioInspire
  56. Bir 2015
  57. BitAngel Engagement Level
  58. BitAngels Startup Summer Program of 2013
  59. Bizdom
  60. Black Forest Accelerator
  61. Blue Startups
  62. Blueprint Health
  63. Bolt Boston
  64. Bonnier Accelerator
  65. BoomStartup
  66. BoomStartup Winter 2017 (DUPLICATE)
  67. Boomtown Accelerator
  68. Boomtown Health Tech (DUPLICATE)
  69. Boost VC
  70. BootupLabs
  71. Brandery
  72. Brooklyn Beta Summer Camp
  73. Budweiser Dream Brewery
  74. Buildit
  75. BuiltinPGH Companies
  76. Business Innovation Center
  77. Business Opportunity Academy 2017
  78. Business Technology Development Center (BizTech)
  79. CLT Joules Energy Accelerator 2014
  80. CWI Ventures
  81. CWI Ventures Application (DUPLICATE)
  82. CableLabs Technology Tours 2016
  83. Capital Factory
  84. Capital Innovators
  85. Capital Investment Network (Startups)
  86. Caroline Plouff
  87. Catalyst Partners
  88. Cause Collective : Social Innovation Lab
  89. Center for Entrepreneurial Innovation
  90. Chain Reaction Innovations 2017
  91. Chemical Angel Network
  92. Chinaccelerator
  93. Cisco Entrepreneurs in Residence
  94. Citi Accelerator
  95. Citrix Startup Accelerator
  96. Claremont/Upland Makerspace Fablab
  97. Climate Ventures 2.0 Accelerator
  98. Co.Lab accelerator
  99. Code for America Accelerator
  100. Cohab's Traxtion Point
  101. Collision Conference Investors
  102. Common Bond
  103. Communitech Hyperdrive
  104. Conquer Accelerator
  105. Coolhouse Labs
  106. CuriousMinds Incubator / Accelerator
  107. CyberTECH San Diego
  108. DBS Accelerator
  109. DPD Last Mile labs
  110. DV X Labs
  111. Dat Ventures
  112. Decatur-Morgan County Entrepreneurial Center
  113. Deep Space Ventures
  114. Demo Accelerator 2016- 2017
  115. DeveloperTown
  116. Difference Engine
  117. Digital Malaysia Corporate Accelerator Program
  118. Digital Media Zone Incubator/Accelerator
  119. Disney Accelerator
  120. DogFish Accelerator
  121. Domi Station
  122. Dotforge accelerator
  123. Dream Funded
  124. DreamIT Health
  125. DreamStart - Free Mentoring Program
  126. Dreamit Ventures (DUPLICATE)
  127. Ducky Diggy Lloyd
  128. E-Capital Summit
  129. EC Mentor Skills Inventory
  130. EIGERlab
  131. ETRAC
  132. EY Startup Challenge
  133. Eco Holding
  134. Eleven Startup Accelerator
  135. Emerge Xcelerate
  136. EnterpriseWorks Incubation Program
  137. Entrepreneur Development Center
  138. Entrepreneurs Roundtable Accelerator
  139. Environmental Business Cluster
  140. Equity Legal
  141. Excelerate Labs
  142. Execution Labs
  143. Exhilarator
  144. Extreme Startups
  145. Extreme University
  146. FOOD-X
  147. Factory45
  148. Fargo Startup House 2014-2015
  149. FastTrack Propero Healthcare
  150. FbFund
  151. Female Propeller for High Flyers
  152. FinTech Innovation Lab
  153. FinTech Studios 2015
  154. Fintech Founders Club #2
  155. First Growth Venture Network
  156. Fishbowl Labs AOL
  157. Flagship Enterprise Center
  158. FlashStarts
  159. Flashpoint
  160. Flat6 Labs
  161. Fledge9
  162. Flextronics Lab IX
  163. Food Future Scale-up Accelerator 2017
  164. Food System 6 (FS6) Accelerator
  165. FoodForwardX
  166. Fortify Ventures
  167. Founder Institute
  168. FounderFuel
  169. FoundersPad
  170. Fownders Accelerator
  171. French Accelerator 2016
  172. Fund the Food
  173. Fuse Corps Host
  174. GAKKEN Accelerator Program
  175. Gainesville Technology Enterprise Center
  176. Game CoLab Incubator Program 2014
  177. GameFounders
  178. GammaRebels
  179. Gazelle Lab
  180. Gener8tor
  181. German Accelerator Life Sciences
  182. German Accelerator Tech
  183. Global Accelerator Network 2015
  184. Good Works Houston Lab
  185. GoodCompany Ventures
  186. Google Launchpad Accelerator
  187. Grants4Apps Accelerator
  188. GreenStart
  189. Greenlite Labs
  190. GrowLab
  191. Growth Hacking Accelerator 2015
  192. Gulf Coast Center for Innovation and Entrepreneurship
  193. H-Farm Ventures
  194. HACKT Mission for International Founders
  195. HAXLR8R
  196. HCC Entrepreneurship Launchpad
  197. HIGHLINE Academy
  198. HUB
  199. HUBB Accelerator
  200. HUBB GTLA 2016
  201. HackFWD
  202. Hatch
  203. Health Wildcatters
  204. Health accelerator
  205. Healthbox
  206. Hero City Co-Working Space
  207. High Street Startups Accelerator
  208. Highway1
  209. Honda Xcelerator
  210. Houston Technology Center
  211. Hub Ventures
  212. HugeThing
  213. I/O ventures
  214. ICONYC labs
  215. IDC Elevator
  216. INcubes Funnel and Accelerator 2014/2015
  217. INcubes Online Form
  218. INcubes Startup Visa
  219. Illumina Accelerator
  220. Illuminator, New York Accelerator 2015
  221. Imagine K12
  222. Immokalee Business Development Center
  223. Impact Engine
  224. Impact USA - 2017
  225. Incubate Miami
  226. Infuse Accelerator
  227. Ingenuity Partner Program
  228. InnoSpring
  229. Innov&Connect
  230. Innov8 for Health
  231. Innova Memphis
  232. InnovateOC
  233. Innovation Depot
  234. Innovation Pavilion
  235. Innovation Showcase Winter 2017
  236. Insight Accelerator Labs
  237. Intel Education Accelerator
  238. Investment Preparedness Lab
  239. Invoke Collective
  240. Iowa Startup Accelerator
  241. JFDI.Asia
  242. JFE Accelerator SF
  243. JLAB
  244. Jaguar Land Rover Tech Incubator
  245. Jolt
  246. JumpSchool
  247. JumpStart Foundry
  248. Jumpstart! Boulder
  249. JusticeXL
  250. Kairos Boston Spring Program
  251. Kaplan EdTech
  252. Kick
  253. Kick Boise
  254. Kick LA
  255. Kick Victoria
  256. Kicklabs
  257. Kinetiq Labs
  258. L-SPARK Accelerator
  259. LAUNCH incubator
  260. LAUNCHub
  261. LI TechCOMETS
  262. LabFunding Project Accelerator 2014
  263. Labs Venture Accelerator
  264. Launch Chapel Hill
  265. Launch Memphis
  266. LaunchBox Digital
  267. LaunchHouse
  268. LaunchPad PEI
  269. LaunchSpot
  270. Launch_Academy
  271. Launchpad Digital Health, LLC
  272. Launchpad LA
  273. Launchpad Long Island
  274. Le Camping
  275. Leading Entrepreneurial Accelerator Program
  276. Lean Launch Ventures
  277. LearnLaunchX
  278. Lemnos Labs
  279. Life Changing Labs
  280. LiftOff Health Incubator
  281. Lightbank Start
  282. LightningLab
  283. Lowe's Accelerator
  284. MACH37
  285. MACH37 Spring
  286. MIT SA+P venture accelerator
  287. MITA Institute Accelerator
  288. MTGx MediaFactory
  289. Mac6
  290. Madworks Governance Accelerator
  291. Maine Center for Entrepreneurial Development - Top Gun Program
  292. Matter
  293. Maven Ventures Fund & Incubator
  294. Media Camp
  295. Melbourne Accelerator Program
  296. Memphis BioWorks
  297. Merck Accelerator
  298. MergeLane 2017 Accelerator
  299. Mergelane
  300. Metavallon
  301. Microsoft Accelerator
  302. MindTheBridge
  303. Momentum
  304. MuckerLab
  305. Muru-D
  306. My5ive Accelerator 2016
  307. N-Motion (DUPLICATE)
  308. NDRC (LaunchPad / VentureLab)
  309. NEXT Dashboard
  310. NMotion
  311. NY Digital Health Accelerator
  312. NY Fashion Tech Lab 2017
  313. NYC ACRE
  314. NYC SeedStart
  315. Nashville Entrepreneur Center
  316. Nebula Shift
  317. Nephoscale IaaS
  318. Nest New York
  319. New Ventures Group
  320. New York Digital Health Accelerator (DUPLICATE)
  321. NewME Accelerator PopUps
  322. NewMe
  323. Next media accelerator
  324. NextHIT
  325. NextStart
  326. Nike+ Accelerator
  327. Northern Arizona Center for Entrepreneurship and Technology (NACET)
  328. Northern England
  329. Nxtp.labs
  330. OCTANe
  331. Oasis 500
  332. OpenFund
  333. Orange Fab
  334. Orange Works
  335. Orion Startups
  336. Oxygen Accelerator
  337. PIE
  338. Patriot Boot Camp
  339. Pearson Catalyst for Education
  340. Pipeline H2O
  341. Pitney Bowes Inc
  342. Plarium Labs
  343. Plug In South LA
  344. Plug and Play
  345. Plum Alley Investments 2016
  346. Points of Light Accelerator
  347. PowerHaus
  348. Preccelerator® Program 2016
  349. ProSiebenSat.1 Accelerator
  350. Project Entrepreneur 2016/17
  351. Project Healtchare
  352. Project Lift
  353. Project Music
  354. Project Skyway
  355. Propeller Venture Accelerator
  356. Prosper Capital Accelerator
  357. Proton Enterprises
  358. Pushstart Accelerator
  359. Qualcomm Robotics Accelerator
  360. Queen Creek Business Incubator
  361. R/GA Accelerator
  362. RAIN Incubator/Accelerator
  363. RJI Investment Group
  364. Reach
  365. RetailXelerator
  366. Rock Health
  367. Rocket Fuel Labs
  368. Rockstart Accelerator
  369. RunUp Labs
  370. Runway IoT Accelerator 2015
  371. SAP Startup Focus Program
  372. SKTA Innopartners Innovation Accelerator
  373. SPACELAB Tech Accelerator
  374. SPARK
  375. SPH Plug and Play
  376. SURF Incubator
  377. SaltMines Group Start-Up Studio
  378. ScaleTown
  379. Seamless IoT 2016
  380. Searchcamp
  381. Seed Hatchery
  382. SeedSpot
  383. SeedStartup
  384. SeedSumo
  385. Seedcamp
  386. Seedrocket
  387. Seeqnce
  388. Sequoia Apps
  389. Serval Ventures
  390. Shenzhen Valley Ventures Incubator
  391. Shoals Entrepreneurial Center
  392. Shopper Futures Accelerator
  393. Shotput Ventures
  394. Sid Martin Biotechnology Institute
  395. SigmaLabs Accelerator
  396. Silicon Valley Incubator & Accelerator
  397. SixThirty
  398. Sixers Innovation Lab
  399. Skywalker Accelerator
  400. SmartHealth Activator
  401. Smashd Labs
  402. SoCo Nexus Accelerator Spring 2017
  403. Social Enterprise Challenge
  404. Socratic Labs
  405. SparkLabs
  406. Sparkgap
  407. Sports Tank
  408. Springboard
  409. Sprint Accelerator
  410. Sprint Mobile Health Accelerator
  411. SproutBox
  412. SproutCamp
  413. Starburst Aerospace Accelerator
  414. Start Path Europe
  415. Start'inPost
  416. StartEngine
  417. StartFast Venture Accelerator
  418. Starta Accelerator Winter 2017
  419. Startl
  420. Startmate
  421. Startup Accelerator (DUPLICATE)
  422. Startup Front
  423. Startup Next & GAN
  424. Startup Orange County Accelerator
  425. Startup Runway
  426. Startup Wise Guys
  427. Startup Zone PEI
  428. Startup52X Accelerator
  429. StartupCity
  430. StartupHighway
  431. StartupHouse Foundry program
  432. StartupMinds Accelerator
  433. StartupYard
  434. Startupbootcamp
  435. Straight Shot
  436. Summer@Highland
  437. Surge
  438. SynBio axlr8r
  439. TEB Incubation & Acceleration Center
  440. THRIVE Accelerator III
  441. THRIVE Open Innovation (DUPLICATE)
  442. TIM#WCAP Accelerator
  443. TLabs
  444. TMCx Accelerator Digital Health 2017
  445. Tallwave
  446. Tampa Bay Innovation Center
  447. Tampa Bay Wave
  448. Tandem Mobile Accelerator
  449. Tech Nexus
  450. Tech Wildcatters
  451. Tech2020
  452. TechLaunch
  453. TechRanch
  454. TechSquareLabs
  455. Techstars
  456. Techstars Music
  457. Telenet Idealabs
  458. Telluride Venture Accelerator
  459. TenX
  460. The Alchemist Accelerator (DUPLICATE)
  461. The Ark
  462. The Bakery
  463. The Batchery
  464. The Brandery
  465. The Bridge
  466. The Center For Technology Enterprise & Development
  467. The Chaser
  468. The Company Lab (CO.LAB)
  469. The Draper FinTech Connection
  470. The Factory
  471. The Greatest Pitch
  472. The Harbor Accelerator
  473. The Incubator
  474. The Iron Yard
  475. The Mediapreneur Incubator
  476. The Morpheus
  477. The New York Venture Summit
  478. The Next Step: from idea to startup
  479. The Refinery
  480. The Unilever Foundry
  481. The Venture Center's Pre-Accelerator I
  482. The Vine OC
  483. The Vogt Awards
  484. The Yield Lab
  485. The eFactory Accelerator
  486. Think Big Partners Accelerator
  487. TiE Angels
  488. Tigerlabs Digital Health Accelerator
  489. Tolstoy Summer Camp
  490. TopSeedsLab
  491. Travel Startups Incubator
  492. Travelport Labs Accelerator
  493. Travelport Labs Incubator
  494. Triangle Startup Factory
  495. Tumml
  496. Tune Labs
  497. Twin Cities Accelerator 2016
  498. UW-Whitewater Launch Pad Accelerator
  499. Unbank.ventures FinTech Incubator
  500. University Technology Park
  501. Unreasonable Institute
  502. UpTech
  503. Upstart Accelerator
  504. Upstart Labs
  505. Upstart Memphis
  506. Uptima Business Bootcamp
  507. Upwest Labs
  508. VANTEC
  509. VC FinTech Accelerator
  510. Velocity Indiana Accelerator
  511. Venture Catalyst Partners
  512. Venture Hive
  513. Venture I
  514. VentureOut's Enterprise Tech Expedition
  515. Venturegeeks
  516. Vet-Tech Accelerator
  517. VictorySpark
  518. Village88 Techlab
  519. Volkswagen ERL Technology Accelerator
  520. WHLabs
  521. Wasabi Ventures Academy
  522. Wayra
  523. Wellness Accelerator
  524. Wells Fargo Startup Accelerator
  525. Wireless IoT
  526. Women Innovate Mobile
  527. XLerateHealth
  528. XTRATOS
  529. Xlerate Health
  530. Y Combinator
  531. Y&R SparkPlug 2017
  532. YEurope
  533. YLE Media Startup Accelerator Program
  534. Yahoo Ad Tech Program
  535. Yangler (online accelerator)
  536. Year of the Startup
  537. Yetizen Accelerator
  538. You Is Now
  539. Z80 Labs
  540. ZIP Launchpad Admission
  541. ZeroTo510
  542. Zone Startups Calgary
  543. designX 2017
  544. eMerging Ventures
  545. ezone
  546. iStart Jax (DUPLICATE)
  547. iStart Valley
  548. iVentures10
  549. ignite100
  550. innovyz start
  551. tekMountain Accelerator

Project Summary

This project will be used to determine which accelerators are the most effective at churning out successful startups, as well as what characteristics are exhibited by these accelerators. First, we need to gather as much data as we can about as many accelerators as we can in order to look at factors that differentiate successful vs. unsuccessful ventures. Next, we need to create a web crawling program which will gather information about accelerators across the world by accessing their websites and extracting information. I believe that our overall goal with this research project is to gain insight into the methods of successful accelerators, as well as to find out what exactly differentiates very successful accelerators from dead accelerators.

Helpful Links: http://seedrankings.com/

Sources

Summary: These are sources obtained from List of Accelerators, Crunchbase, and other Google searches. We will evaluate these sources by looking at the number of accelerators they supply (as most of them are lists) and then also taking a look at the type of information they provide about each accelerator. Key data points are cohort-related data, startup-related data, and logistics of the accelerator. Better sources supply more information that the URL alone.

(Obtained from List of Accelerators and various Google searches)

(Obtained from Google search: "Accelerator Database")

Other ways used to find Accelerators (listed below "List of Sources Obtained from Various Google Searches"):

  • Type in generic location + "accelerators" (e.g. Houston Accelerators)
  • Looked at roughly the first 20 results
  • Used three locations as examples of accelerators that pop up
  • Type in a specific state + "accelerator" + "list" (e.g. Texas accelerator list) to search for more relevant lists
  • Once again, looked at roughly the first 20 results
  • Crunchbase has its own webpage with instructions for how we retrieve the data

Source Evaluations

Summary: These evaluations couple with each of the sources above. The evaluations provide instructions for obtaining the information listed, as well as a general review of how useful the data seems. The review serves to determine whether a crawler would be suitable for obtaining information from the source autonomously.

SOURCE: Crunchbase

  • All of the information for the Crunchbase documentation is located in the page Crunchbase 2013 Snapshot webpage, along with the documentation for how we determined the accelerator information.

Source: http://www.acceleratorinfo.com/see-all.html

  1. Opened source website
  2. Copied Information under "All Accelerator Programs" to TextPad, already sorted. Returned 190 results
  3. Each link on parent list leads to individual home page url of accelerator
  • Used sample size of 20 links, determined 16 to be accelerators, 2 to be incubators, 2 to be inactive or broken links
  • Many accelerators do not include founding date, most recent accelerators from around 2013-2014 (as determined from home page)

Review

  • Reliable source for specific URLs to older accelerators, not very helpful for more specific information.
  • Web crawling seems improbable because information is not readily available from source. Can potentially mine staff information or contact information from associated "about" page in the home url


Source: http://www.seed-db.com/accelerators

  1. Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results.
  2. Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
  • Startup table includes:
  1. "state"
  2. "company name"
  3. "website and CrunchBase links"
  4. "cohort date"
  5. "exit value"
  6. "funding".
Many entries for "exit value" are missing, some values for "funding" are missing
On original seed-db webpage, each accelerator has a link to its associated home page url
  • From the table, each listed entry was an accelerator, although 24 accelerators out of 235 were classified as "dead"
  • Along with the home url, each accelerator table includes the following:
  1. Status
  2. Program (name)
  3. Location
  4. Country
  5. Number of companies
  6. Cumulative exit values
  7. Cumulative funding
  8. Average funding for startups
  9. Median funding for startups
Many entries for "median funding" are left empty, as well as entries for all types of funding on the bottom half of the table

Review

  • Reliable source for accelerators, includes list of accelerators both dead and active, as well as their associated start-ups
  • Web crawling potential is promising; startup table is located within the source for each webpage. Can also mine any category from the accelerator table
  • Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive.
  • Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc.

Source: http://www.seed-db.com/accelerators

Very similar to "http://www.seed-db.com/accelerators/all", but contains large regional accelerators as groups, rather than individual accelerators. For example, Techstars appears only once.
  1. Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 239 results.
  2. Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
  • Startup table includes same information as previous source, "http://www.seed-db.com/accelerators/all". However, accelerators spanning across multiple regions have their startups located under one category on this webpage.
On original seed-db webpage, each accelerator has a link to its associated home page url
  • From the table, each listed entry was an accelerator, although 24 accelerators/groups out of 239 were classified as "dead"
  • Along with the home url, each accelerator table includes the same information as the "http://www.seed-db.com/accelerators/all" source

Review

  • Reliable source for accelerators, includes list of accelerators both dead and active, as well as their associated start-ups
  • Web crawling potential is promising; startup table is located within the source for each webpage. Can also mine any category from the accelerator table
  • Overall very extensive data for accelerators that are included on the list, includes large groups as well as individual accelerators. It seems that some accelerators missing from "http://www.seed-db.com/accelerators/all" are located here, since there are 239 returns rather than 235.


Source: https://www.f6s.com/programs?type

  1. On the webpage, set "Type" to "Accelerator/Program", set "Location" to "North America", and set "Invest in Country" to "United States" to return results
  2. Highlighted results and scrolled down until all results found; copied results to TextPad
  3. In TextPad, sorted out lines with "by", as well as miscellaneous categories such as dates and dollar signs through Regular Expressions
  4. Using the "More Info" line which held constant through the entire list, assigned a sequential number to the line (in order to determine the number of results)
  • Obtained a grand total of 1467 results from the list
  • Along with the name of the program/accelerator, the data included:
  1. Dollar value per team
  2. Equity
  3. Application Site
  4. Accelerator URL
  • Many entries are not accelerators, from a quick glance through the results, there were various conferences, 3-5 days events, and written literature pertaining to accelerators as well
  • From a sample size of the first 30 entries, determined 10 to be valid accelerators, 3 incubators, 6 conferences/weekends, and the rest to be miscellaneous entries such as startup events or "studios" (perhaps useful but not relevant to search)
  • As we go down the list, the number of accelerators proportionately decreases. Can comfortably say that overall accelerator turnout from this website is much less than 33%, probably closer to 10-15%.

Review

  • Potentially useful website if crawler could remove the clutter and target solely the accelerators; very useful for identifying new accelerators since data automatically sorted by date and location.
  • Large list of sources includes many irrelevant results, such as conferences or weekends which are difficult to identify. The name of the sorting category itself, "Accelerator/Program" suggests that many of the results fall under the "Program" section rather than being valid accelerators.
  • Potential site for identifying accelerators, but limited by in-site sorting; useful for URL and perhaps equity, but not very detailed information relating to the accelerator/program.


Source: http://gust.com/usa-canada-accelerator-report-2015/

  1. Selected region of US and Canada
  2. Scrolled down to the section labeled "Top 20 Active Accelerators" and selected "see the full list" near the bottom of the listed accelerators
  3. Copied resulting entries into TextPad and sorted out the numbers to leave only the name of the accelerator
  • Obtained 100 results for different accelerators
  • Accelerator lists included:
  1. Name and URL
  2. Number of Start-ups funded (2015 only)
  • Accelerator list limited to 2015

Review

  • Website provides its own evaluation of an accelerator's success based on various factors and provides data for larger trends.
  • Usefulness is questionable because website does not provide much except the URL, and all of the entries are based on success in 2015.
  • Other interesting data within website such as "Hot Markets", investment breakdowns by state, etc. All of this data is also limited to 2015.

Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/

  1. Scrolled down to the section labeled "Startup accelerators in Boston"
  2. Copied text beginning from "MassChallenge" (the first paragraph was just a general definition of startups) and continued to copy until "Startup Incubators in Boston"
  3. After pasting in TextPad, I sorted the data to delete any characters after the "-" and added a sequential number at the beginning of each line
  • Returned a total of 17 results for startups in Boston
  • Accelerator list included:
  1. Name and URL
  2. Capital requirements
  3. Application periods and requirements
  4. Paragraph describing accelerator and its goals

Review

  • Although the guide is dated, useful for identifying strong accelerator programs in Boston
  • Limitation: only focuses on Boston, but the description is helpful in identifying the role of the accelerator
  • Limited information on accelerator, not very useful by itself without information from the accelerator URL

Source: https://www.corporate-accelerators.net/database/

  1. Copied and pasted table into Microsoft Excel (Data was already sorted into categories so no need for TextPad)
  2. Table returned 72 references (but there was a link to the bottom to a larger database)
  • The table itself includes:
  1. Major Company
  2. Accelerator
  3. Funding
  4. Equity
  5. Website
  6. Details
  • The "Details" link led to a variety of other information including:
  1. Status (Active or Inactive)
  2. Locations
  3. Funding
  4. Equity
  5. Term
  6. Cohort Based? (Regular or Irregular)
  7. Pitch Day
  8. Office Space
  9. Powered by
  10. Support Offered?
  11. Launch year
  12. Focus Areas
  13. General Description
  • Also Included a variety of data regarding the host company as well

Review

  • Solid list for corporate accelerators and also includes a variety of information about the accelerator, the cohorts, etc. Some of the entries are international accelerators however so need to filter them out
  • Only limited to 72 accelerators from major companies

Source: https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json

  1. This source is a .json file from the previous database
  2. After placing into TextPad, replaced each space with a ###, replaced each new line with a tab, and replaced each ### with a new line. Ultimately returned 80 results
  • From the file, the .json includes:
  1. NAICS and NAICS sector
  2. Classification
  3. Sector Description
  4. Term
  5. Goal
  6. Partner
  • Also includes most of the information from the previous source, since they are undoubtedly linked

Review

  • Another solid list for corporate accelerators with some more information, but ultimately very similar to the previous source.

Source: https://www.quora.com/Where-can-I-find-a-comprehensive-list-of-startup-incubators-and-accelerators-in-the-US

  1. Since we already looked at the first listed source (seed-db), I clicked on the second link "(by Robert Shedd) http://blog.shedd.us/321987608/" which took me to a page headed "Help for Startups! – A semi-complete list of startup accelerator programs" created by a blogger, Robert Shedd
  2. List included 102 entries by the blogger, each of which do look like an accelerator
  • Upon immediate overview, noticed many results from previous sources were missing. Immediately noticed lack of "OwlSpark", the accelerator from Rice.
  • Shedd only offers us the accelerator name plus its URL

Review

  • Nice list to cross-reference with other sources but does not offer much new insight compared to more powerful engines such as seed-db\

List of Sources Obtained from Various Google Searches

Summary: These accelerators are taken from a specific Google search rather than a list. The idea is to compile a list of Google searches that return relevant results of accelerators. This will aid in the creation of a future web crawler.

From "Location + Accelerator"(Only individual results, not lists)

Houston Accelerators

  • Examples of single accelerators found
  1. TMCx: http://www.tmc.edu/innovation/innovation-programs/tmcx/
  2. RED labs: http://redlabs.uh.edu/
  3. SURGE accelerator: https://kirkcoburn.com/
  4. OwlSpark: http://owlspark.com/
  5. NextHIT: http://www.houstonhealthventures.com/nexthit-accelerator-program-application/

Los Angeles Accelerators

  1. Amplify: http://amplify.la/
  2. Y Combinator: https://www.ycombinator.com/
  3. Chicklabs: https://www.chicklabsllc.com/
  4. Disney Accelerator: https://disneyaccelerator.com/
  5. Launchpad: https://launchpad.la/

New York Accelerators

  1. DreamIT Ventures: http://www.dreamit.com/#meaningful-experience
  2. Women Innovate Mobile: http://www.wim.co/
  3. Techstars NYC: http://www.techstars.com/programs/nyc-program/
  4. Entrepreneurs Roundtable: http://eranyc.com/
  5. FirstGrowthVC: http://venturecrush.com/fg/
  6. New York Digital Health Accelerator: http://digitalhealthaccelerator.com/
  7. Grand Central Tech: http://www.grandcentraltech.com/
  8. Accelerator Corp: http://www.acceleratorcorp.com/
  9. New York Startup Lab: http://nystartuplab.com/

Review

  • Some locations return more viable results for a similar sample size. For example, New York returned 9 valid accelerators, whereas Los Angeles and Houston both returned 5 actual accelerators out of the first 20 results: an 80% difference. Some optimization may come from identifying which locations return more accelerators upon searching.

From "State+Accelerator+List"

New York Accelerator List

California Accelerator List

Texas Accelerator List

Colorado Accelerator List

Washington Accelerator List

Oregon Accelerator List

Notes:

  • Seed-DB appears for almost all of the search results
  • Acceleratorinfo appears for most of the search results
  • There are multiple cumulative reports of incubators per location, but not for accelerators
  • Most regionalized accelerator lists deal with either an article or a ranking of a particular amount of accelerators in the area
  • Many results returned nationally ranked lists of accelerators, such as the Forbes list of "Top Accelerators" or something along the lines of "Best Accelerators in the US". The connection is that perhaps one accelerator mentioned on the list may be located within the searched state.
  • There are also a few results for actual particle accelerators that must be sorted out (i.e. superconducting super collider)

Found through google searching accelerators found previously

Found from googling YLE Media Startup Accelerator

neither of these have had their entries added to list of accelerators

Individual Accelerator Evaluations

Summary: The purpose of this section is to create instructions for each accelerator on how to find cohort information from their URLs. Along with specific instructions for obtaining the cohorts for each accelerator chosen, there should be a list of easy-to-obtain and relevant statistics regarding the accelerator, such as information about its team, location, etc. The variable statistics list is cumulative, whereas the cohort directions are unique per the accelerator.

Accelerators Chosen (Format = Name (source))

  1. Blue Startups (http://www.acceleratorinfo.com/see-all.html)
  2. Launchpad LA (http://www.acceleratorinfo.com/see-all.html)
  3. Y Combinator (http://www.seed-db.com/accelerators)
  4. FlashPoint (http://www.seed-db.com/accelerators/all)
  5. Prosper Accelerator (https://www.f6s.com/programs?type)
  6. Axel Springer Plug and Play (http://www.axelspringerplugandplay.com/)
  7. Techstars (http://www.seed-db.com/accelerators)
  8. Startmate (http://www.seed-db.com/accelerators)
  9. Capital Factory (http://blog.shedd.us/321987608/)
  10. OwlSpark (Google search: "Houston + accelerators")

Accelerator: Blue Startups (http://bluestartups.com/)

Finding the cohort:

  1. Navigated to "Track Record" page under the "Home" tab; found total number of graduated cohorts to be 7
  2. Navigated to "Portfolio" tab. Tab includes list of all seven graduated cohorts along with companies emerging from each one. Each cohort is listed under a separate page (ex. "Cohort 1", "Cohort 2", etc) and at the bottom of each cohort page, there is a link to the other 6. Each company has a short description along with its URL.
  3. An "Alumni News" page at the bottom of "Portfolio" includes articles pertinent to graduated startups.
  4. Unfortunately does not include the date and year of each cohort class, but perhaps could cross-reference with other sources.

Accelerator: Launchpad LA (http://launchpad.la/)

Finding the cohort:

  1. Navigated to "Companies" in the top of the homepage
  2. "Companies" returns all companies backed by Launchpad LA based on their class year and number (cohort)
    • Also sorted by active startups vs. inactive startups
  3. At the bottom of the "Companies" tab, there is a statistical layout returning values for the number of companies started by Launchpad during its time as an accelerator (2012-present), as well as the total funding funneled into the accelerator.

Accelerator: Y Combinator (http://www.ycombinator.com)

Finding the cohort:

  1. Scrolled down on the home page and clicked on a link entitled "See all companies".
  2. Navigated to a drop down menu named "All Batches", and clicked on it to expand the list.
  3. List is made up of dates ranging from 2005-2016, and these dates return lists of launched companies including most but not all of their URL's, as well as their launch year.

Accelerator: Flashpoint (http://flashpoint.gatech.edu/)

Finding the cohort:

  1. On upper right corner after animation, there is a tab sign which lets you navigate to a page labeled "Teams"
  2. The "Team" page has each batch of companies emerging from Georgia Tech, although it does not include the dates or cohorts of these companies. For example, "Batch 1" at the top of the page just lists the companies in the batch without URLs or any additional information.
  3. On the "Application" page on the tab near the top, there is information regarding Batch 7, which begins early 2017. Suggests that batch 6 either ended spring 2016 or fall 2016.

Accelerator: Prosper Women Entrepreneurs (http://www.prosperstl.com)

Finding the cohort:

  1. Navigated to "Accelerator" tab and clicked "Companies" when prompted with the drop down menu.
  2. This tab returned all of the launched company logos which then redirected to the company's home page when clicked.
  3. No other relevant form of information such as date launched or cohort was included on this page.

Accelerator: Axel Springer Plug and Play(http://www.axelspringerplugandplay.com/)

Finding the cohort:

  1. Clicked on the "Companies" tab on the home page and was directed to the middle of the page which included a short list of current companies.
  2. Clicked on the "All Companies" link which returned a page filled with startup logos and brief descriptions of those startups. When clicked, each logo serves to redirect to that startup's home page.
  3. Companies were not sorted by cohort or in any other relevant way.

Accelerator: Techstars (http://www.techstars.com)

Finding the cohorts:

  1. Navigated to the Accelerators tabs and clicked "Companies" on the drop down menu.
  2. Firstly, this returns a table comprised of a long list of different classes from different areas separated by years.
  3. Upon scrolling down further, each of these classes is broken down by the startups that graduated from them. It also includes information such as how much was invested in each startup, as well as whether or not the startup was acquired, is active, or failed.

Accelerator: Startmate (http://www.startmate.com.au)

Finding the cohorts:

  1. Navigated to the "Startups" tab, which returned a page of all startups that have graduated from Startmate.
  2. Startups are separated by year of graduation, and each company is linked on this page.
  3. It appears as if each year, 1 cohort is taken through the accelerator.

Accelerator: Capital Factory (https://capitalfactory.com/accelerate/)

Finding the cohorts:

  1. Navigated to the startups tab, which returned a long list of companies that were accelerated by Capital Factory.
  2. Each logo for the startups served as a link to their respective websites.
  3. There was no evidence or mention of any cohorts.

Accelerator: OwlSpark (http://entrepreneurship.rice.edu/accelerator/)

Finding the cohorts:

  1. Navigated to the "Startup Teams" tab, which returned a page that included links to 4 "Classes".
  2. Each class link i.e. (Class 1, Class 2, Class 3, Class 4) returned links to each startup that graduated from the program.
  3. These classes signify cohorts.

List of Promising Variables

  • Key People (founders, lead entrepreneurs, strategists, etc.)
  • Total number of launched companies
  • A FAQ for application details, accelerator vision, and
  • Funds raised per company (average)
  • Features offered by accelerator (perks, space, tools, etc)
  • General events hosted by the accelerator
  • (Success) stories for graduated start-ups

E-R Diagram (in list form) for Identifying Attributes to Pull from Accelerators

Summary: I will look at different entities within the accelerator page (e.g accelerators, cohorts, founders) and then find potential attributes that can be codified from those entities. Along with the attribute, we list a potential method for pulling that particular attribute.

Format:

Entity
  • Attribute - Possible sources/ways to get

Ed: "Be creative with finding new attributes to pull!"

List

Accelerators

  • Accelerator Name - Website, external database
  • Contact Form - General contact section in each website
  • Industry focus - can be pulled from description
  • Description - pulled from website itself
  • Takes equity? - Database or from "about" page
  • Non-profit? - Database
  • URL - Already have way of obtaining
  • DNS Registration Date - Already have way of obtaining
  • Address - Google Maps, maybe the website
  • Founding Date - Google Maps, website, server registration

Accelerators (1) has (n) Features

Features

  • Mentorship? - Description in website
  • Space Offered - Google Maps, Website description
  • Partnerships - Angel list, Same section as mentorship or events
  • Hosted Events - Calender

Accelerators (1) has (n) Founders

Founders

  • Name - Founders or Team Page
  • Title - Directly underneath or next to name
  • PhD? - Biography, webpage under name
  • Serial - Biography
  • Link back to "Accelerator Name" in Accelerators

Founders (n) has (n) Ventures

Ventures

  • Other Companies - Biography, webpage
  • Previous Companies - Biography
  • Net Worth - Forbes, Biography
  • Link back to "Name" in Founders

Accelerators (1) has (n) Cohorts

Cohorts

  • Date + Accelerator = Cohort ID - Database or Website
  • Number of Startups - Website, count from Startups
  • Cohort Number - Categorization on website, external database
  • Link back to "Accelerator Name"

Cohorts (1) has (n) Startups

Startups

  • Names - Website, external database
  • State of Inc - Angel List
  • URL - Angel List, website
  • Founding Date - Registration database, Angel List
  • Industry - startup description
  • Founding Location - Angel List
  • Current Location - Angel List
  • VC Raised to Date - SDC Platinum
  • Angel Funds Raised to date - Angel List

Variables which Distinguish Accelerator Websites

  • The word "Accelerator"
    • This word appears at least one time on the home page of the vast majority of accelerator websites. The word "Accelerator" appears either as a link to another page on the website or in a title on the homepage of the website. Not many other websites contain this word on their homepage, especially not if one Googles something generic such as "Accelerators in the US".
  • Fixed Term
    • Accelerators normally work with their cohorts for 3 months. This is a major factor which differentiates between an accelerator and any other member of a startup ecosystem. If on their website they mention either "3 months" or "12 weeks", it is extremely likely that the website belongs to an accelerator.
  • Cohorts, Portfolio, Class, or Companies
    • This is a potential variable that could link the websites of many different accelerators. The problem with the word "portfolio" is also used by numerous venture capital firms, which could potentially cause complications when attempting to pull only the sites of accelerators from a Google search. The word "cohort", however, would have an extremely high probability of identifying the website as belonging to an accelerator. The words "class" and "companies" are promising but do not offer certainty.
  • Equity, Investment
    • Although by itself, equity does not mean much, when paired with any of these other terms, it could potentially point to an accelerator. Most accelerators take equity in the form of common stock (6-8%), or they will ask for some alternate form of stake in the company.
  • Education and Mentorship
    • Accelerators differ from incubators and angel investors in that they emphasize the education of the potential startup. They offer advice and intense mentorship from more experienced entrepreneurs within their staff, as well as many networking opportunities with the outside world. This variable is more difficult to find on the website of the accelerator, but I believe that if the website includes numerous keywords such as "education", "mentorship", or "networking opportunities", it would be somewhat safe to assume that the website is owned by an accelerator.
  • Demo Day
    • This variable does not have tremendous potential in terms of crawling websites, but I feel that it is worth mentioning. Most accelerators "graduate" their cohorts with a demo day, which is a day when the startups present their company to potential investors. If the website contains the words "demo day", which is fairly uncommon, it could be a good source of accelerator identification.

A combination of any of these variables would certainly identify the current website as belonging to an accelerator.

Comprehensive List of Accelerators

All text files saved in "Accelerators" project on the McNair RPD.

  • Acc.Info: 190
  • SeedDB: 240
  • SARP: 59
  • Corp: 79
  • Total: 568 results

After removing duplicates and locations: 363 results

Doesn't count f6s, which returns 1170 results, roughly only 300 of which were accelerators. We created a crawler to sift through the webpages and parse HTML so we could identify the accelerators. Program and HTML saved on the Desktop.

Randomly Chosen Accelerators

  • TLabs
  • BetaSpring
  • The Unilever Foundry
  • AIA Accelerator
  • R/GA Accelerator
  • Zeroto510
  • Hub:raum
  • Orange Fab
  • Furnace
  • Launch Chapel Hill

Determining whether or not these are accelerators

Googled name of Accelerator and clicked on the first link

Looked for Variables which Distinguish Accelerator Websites

  • TLabs: Homepage states: "Leading Indian Tech Accelerator"; TLabs is an accelerator, but it is located in India.
  • Betaspring: Under the "About Betaspring" tab, it states that "Betaspring was among the first ten startup accelerators to launch worldwide".
  • The Unilever Foundry: Does not claim to be an accelerator, nor does it have information on the website about cohorts. This name was pulled from the source Corporate Accelerators.
  • AIA Accelerator: The word "accelerator" is included in the name. Under the "Overview" tab, it states that startups have received mentorship.
  • R/GA Accelerator: Under the "Overview" tab it states that the "R/GA Accelerator is designed for startups and... it is a three month, immersive, mentorship driven program".
  • Zeroto510: Website contains a "Portfolio Companies" tab which divides up the companies into cohorts. This identifies Zeroto510 as an accelerator.
  • Hub:raum: Offers accelerator and incubator programs; however, none are located in North America.
  • Orange Fab: States on the main page that "We're a 3-month accelerator program".
  • Furnace: "About" tab states that Furnace is "an innovative startup accelerator designed to form, incubate, and launch new companies". Concludes with a Demo Day
  • Launch Chapel Hill: Homepage states that they are "a startup accelerator". Also included on the homepage is a line that states "Applications for Cohort 7 are now open".

7/10 are accelerators located in the US.

2/10 are accelerators not located in the US.

1/10 is not an accelerator.

Steps for Extracting Cohort Information

  • TLabs: Clicked on the "Startup" tab and located a drop down menu entitled "Showing Startups from:". This menu separates startups into Batches ranging from 1-9. These batches are cohorts.
  • Betaspring: This website does not have a "Companies" or "Startups" tab. I clicked on their "Who" tab and noticed that within this section were two links called "Our portfolio" and "Our companies" which both linked to the same place. This place contained a list of the startups that Betaspring has funded, as well as links to each of the startup websites. The list was not separated into cohorts.
  • The Unilever Foundry: Does not have a "Startups" or "Companies" link on the website.
  • AIA Accelerator: Clicked on the "Startups" tab which returned a page with 5 companies and a bit of information on each of these companies. Also included the URL to each startup. However, the companies were not separated into cohorts, probably because there are so few of them.
  • R/GA Accelerator: Clicked on the "Alumni" tab and navigated down the webpage. Startups are separated by class, which means cohort in this case. Startup info contains link to demo day presentation as well as the startup url.
  • Zeroto510: Hovered over the "About Us" drop down menu and clicked on the "Portfolio Companies" link. Startups are separated by cohort, one for each year, starting from 2013.
  • Hub:raum: Clicked on the "Portfolio" tab. Directed to a page with many names of startups, as well as a brief description of what their company is about. Also includes a link to each startup's website. Startups are not separated into cohorts, but rather by investment by location, current participants, and alumni.
  • Orange Fab: Clicked on the "Startups" tab and was directed to a different page. Startups are not only separated into cohorts named "Seasons", but they are also separated by industry.
  • Furnace: Clicked on "Portfolio" tab, but unfortunately the website is broken and it returned an error in code.
  • Launch Chapel Hill: Clicked on the "Ventures" tab and was directed to a page in which all startups were separated into cohorts, and a brief description of the startup was provided underneath their logo.

Code

The directory for all data related to this project is located in:

E:\McNair\Projects\Accelerators

F6S Web Crawler

This is a python script using the selenium library that retrieves the html content of each page on F6S's North American Accelerator search results. The script is located in:

E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs 

The script is titled f6s_crawler_gentle.py

When run, the script visits the F6S search page for North American Accelerator's and begins retrieving the HTML of each page in that search list. NOTE: Timing must be spaced out between all interactions with the browser. F6S has Captcha, and the program will fail if the site receives too many hit requests, or has any inkling that it is being probed by a bot.

The Accelerator HTML files are stored in:

E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs\Accelerator_HTML_files

The Accelerator HTML files stored as text files are stored in:

E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs\Accelerator_HTML_files_text

F6S Parser

The next step is to take the HTML files retrieved by the crawler and to parse them for necessary information. This parser should also determine whether or not the site is an accelerator site.

The code for the parser is located in

E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs

It is titled f6s_parser.py

To run the code, open the file in Komodo and press play. If running from the command line, change to the correct directory and run the following comand:

python f6s_parser.py

The list of accelerators that passed through the parser is in the same directory:

E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs

The tab delimited text file is named AcceleratorList. The file contains the names of the accelerators that had the keywords listed in the file. Also, the file contains the run dates and location of the accelerator if it was listed on the f6s page.


F6S API

F6S has an API, but we have had no success getting a key to the API. The link to get a key to the API is on this page.

I (Peter) have emailed F6S to ask for a key directly at support@f6s.com. As of the end of the Fall 2016 Semester, they have not responded.

FUN FACT (MASS-RENAME FILES USING WINDOWS POWER SHELL):

The following command allowed me to append ".txt" to all files in a folder once in the proper directory:

Get-ChildItem * | Rename-Item -NewName { $_.name + '.txt'}

To change file formats, Microsoft suggests:

Get-ChildItem *.txt | Rename-Item -NewName { $_.name -Replace '\.txt', '.log'}

Final Data

The Parser for parsing the text files of accelerator data is located in:

E:\McNair\Projects\Accelerators\Code+Final_Data

The Parser for parsing the cohort files of accelerator data is also located in:

E:\McNair\Projects\Accelerators\Code+Final_Data

This folder contains the Python parsers. The Final_data folder contains the tab-delimited text files of parsed data. final_accelerator_data.txt contains the generalized data saved in .txt files and final_cohort_data.txt contains the cohort data saved in .cohort.txt files.

All the files entitled accelerator_data are subsets of the final_accelerator_data.txt file, but each file contains only the accelerators that matched to the flag specified in the file title.

find_headers .py finds a set of the headers for all the cohort files from the seed list project.

Google SiteSearch

E:\McNair\Projects\Accelerators\Google_SiteSearch

This folder contains code for a google search parser. The script sitesearch.py will search for a queried company and return a likely web address for that company.

Way Back Machine Parser

E:\McNair\Projects\Accelerators\Code+Final_Data\wayback_machine.py

This script takes URLs and returns a timestamp for the oldest documented webpage under that URL courtesy of the Way Back Machine Archive.

Process Locations

E:\McNair\Projects\Accelerators\Code+Final_Data\process_locations.py

This script takes a physical address and converts it into latitude and longitude coordinates. Should be used in conjunction with the Enclosing Circle program to find the concentration of accelerators.

E:\McNair\Software\CodeBase\EnclosingCircle.py

Kauffman Foundation Incubator Proposal Information

Institutions

Summary: F6S, Crunchbase, seed-db

Tools: Matcher - used to match lists of potential accelerators with our current list to identify duplicates/new matches (E:\McNair\Projects\Accelerators)

F6S

F6S WebCrawler and F6S Parser - E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs

CrunchBase

CrunchBase 2013 Snapshot (All Organizations)- E:\McNair\Projects\Accelerators\organizations.xls

CrunchBase 2013 Snapshot (Potential Accelerators)- E:\McNair\Projects\Accelerators\organizations.accdb under "Potential Accelerators query"

  • Obtained using keyword matches in the descriptions of the potential accelerators.

CrunchBase 2013 Snapshot (New Verified Accelerators) - E:\McNair\Projects\Accelerators\New CrunchBase Accelerators.xls

We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies.

AngelList

seed-db

Obtained through www.seed.db/accelerators

Global Accelerator Network (GAN)

GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\scrapeaccel.py

GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data

  • Contains: Company Name, # of Companies Range, % of Companies Funded, Funding Raised by Companies, Employee Range, Exit Funding, Exit Date, Total Company Funding Raised, # of Mentors Range, % Equity, Location, Minimum Seed Capital Investment

Cohorts

  • Cohorts obtained manually
  • All Cohort txt files are saved under "E:\McNair\Projects\Accelerators\Data
  • cohort file name = (accelerator name).cohort
  • Most updated Accelerator cohort data: E:\McNair\Projects\Accelerators\Cleaned Cohort Data.xls

Automation for obtaining cohorts??

Other Information

Summary: Whois Parser, Geocode, Tools to determine industry, etc

Whois Parser

  • Retrieves and parses Whois information. Specifically, takes a file with a column of domain names and populates the corresponding columns with information from the WhoIs API.
  • Often used to obtain locations.

Geocode

Input: Company Address Output: Directional Coordinates

  • Used to obtain the locations of different Accelerators and Cohort companies.

SDC Platinum Pull

Used to obtain funding information and match companies that have gotten funding with companies that are Accelerator cohorts.

Desired Information/Variables

  • Key People (founders, lead entrepreneurs, strategists, etc.)
  • Total number of launched companies
  • A FAQ for application details, accelerator vision, and
  • Funds raised per company (average)
  • Features offered by accelerator (perks, space, tools, etc)

Desired Tools/Information

Automating the Process of Obtaining Cohorts

  • Automating this process would save a lot of time and really progress the project.

Obtaining More Details on Accelerators

  • Having the kind of thorough information on industry, companies, funding, location, exits, mentors, leadership, that we got for the GAN companies would be fantastic.

List of Alive/Dead Accelerators

This is a dream but would be very helpful