Difference between revisions of "Accelerator Seed List (Data)"
VeeralShah (talk | contribs) |
|||
(50 intermediate revisions by 9 users not shown) | |||
Line 1: | Line 1: | ||
− | {{McNair | + | {{Project |
+ | |Has project output=Data,Tool | ||
+ | |Has sponsor=McNair Center | ||
|Has title=Accelerator Seed List (Data) | |Has title=Accelerator Seed List (Data) | ||
− | |Has owner=Shrey Agarwal, Matthew Ringheanu, Veeral Shah, | + | |Has owner=Shrey Agarwal, Matthew Ringheanu, Veeral Shah, Connor Rothschild, |
|Has start date=Fall 2016 | |Has start date=Fall 2016 | ||
|Has keywords=Accelerators,Data | |Has keywords=Accelerators,Data | ||
− | |Has project status= | + | |Has project status=Subsume |
|Is dependent on=Industry Classifier | |Is dependent on=Industry Classifier | ||
}} | }} | ||
+ | =Current Work= | ||
+ | |||
+ | ===As of 05/21/2018 the Google Sheet Workbook has been downloaded to the E drive. The now Excel Workbook is saved at E:\McNair\Projects\Accelerators\Summer 2018\Accelerator Master Variable List.xlsx. This is now the master file.=== | ||
+ | |||
+ | Google Master Sheet: https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=0 | ||
+ | *Cross-reference sheet with data from Peter's old accelerator consolidation file ("accelerator_data_noflag" and "accelerator_data" in "All Relevant Files") and fill in missing data | ||
+ | *Variables that are 100% NOT in these 2 files: | ||
+ | **Cohort Breakout? | ||
+ | **Subtype | ||
+ | **Designed for Students? | ||
+ | **Campuses | ||
+ | **Stage | ||
+ | **Software Tech | ||
+ | **What stage do they look for? | ||
+ | |||
+ | TODO: | ||
+ | McNair/Projects/Accelerators/Fall 2017/unfound_founders.txt | ||
+ | A 0 means we don't have founder data for that accelerator. | ||
+ | Specs: A tab delimited text file with the following fields: | ||
+ | Accelerator First Name Last Name LinkedInURL(if possible) | ||
+ | Getting the LinkedInURL will ensure accuracy, but will work without it. | ||
+ | |||
+ | |||
+ | *Shrey: Find "demo day" keywords, so that we can search AcceleratorName Year Keyword and get back potential demo day pages | ||
+ | |||
+ | |||
+ | ==Accelerator Type project== | ||
+ | |||
+ | File to edit is called "Accelerator type list". Located in the folder E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs. More systematic information and instructions are in"Instructions for Accelerator type project" in E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs. | ||
+ | |||
+ | NOTE: until we get through all 270 accelerators, we will just categorize each accelerator into the following three categories as quickly as possible with short notes in teh "other info" column for these; once we have this, we will go back through the ones that aren't categorized and add notes to the "other info" column. | ||
+ | |||
+ | |||
+ | Type list: | ||
+ | *Private | ||
+ | *Corporate | ||
+ | *Academic | ||
+ | Note: if DEAD, noted here. | ||
+ | |||
+ | |||
+ | Other info: | ||
+ | *nonprofit? (y/n) | ||
+ | |||
+ | *Subtype abbreviations: | ||
+ | **S: for if a social entrepreneurship initiative | ||
+ | **I: for if an incubator | ||
+ | **A: for an angel group | ||
+ | **F: for foreign | ||
+ | **C: for in coworking space/hub/etc | ||
+ | **V: for if part of venture fund | ||
+ | **G: for if government funded/partnered | ||
+ | **T: for international | ||
+ | |||
+ | |||
+ | Note: subtypes (from individual text files in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data) were only found for 23 of the 270 accelerators. These accelerators were initially intended to be removed from the master list. Remaining subtypes are currently being added. | ||
+ | |||
+ | other info: | ||
+ | |||
+ | international offices, founders, industries, org type, program duration, or other interesting, easily accessed variables. Additional information is especially important for accelerators that have no other subtype abbreviation listed. | ||
+ | |||
+ | |||
+ | ===Steps to research an accelerator=== | ||
+ | |||
+ | 1. Copy/paste URL listed in Accelerator type list file into google. If website is insufficient, try googling: | ||
+ | the name of the accelerator | ||
+ | the name of the accelerator + "crunchbase" | ||
+ | the name of the accelerator + "nonprofit" | ||
+ | |||
+ | the above steps sometimes lead to other helpful databases/news articles | ||
+ | |||
+ | 2. Note whether: | ||
+ | 1) Academic/Corporate/Private | ||
+ | 2) For Profit/Nonprofit. Sometimes this isn't directly stated but can be inferred through their description of, say their investment process. If they don't address this at all it's probably For Profit. | ||
+ | 3) subtype (S, I, A, F, C, V, G, T). | ||
+ | 4) Additional, easily-accessed info. Number 4 is really important if there's no subtype. | ||
+ | |||
+ | All 270 need to be done by the end of the semester. | ||
+ | |||
+ | |||
+ | Type list file saved as | ||
+ | "Accelerator type list" in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs. | ||
+ | The list of ListofAccs, from which we drew Accelerator type list, should have no matches with any of the flagged accelerators in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data. There are 23 matches though. So all subtypes must be searched and entered manually. Whether some were a nonprofit was listed in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs, called "whether nonprofit...". Accelerators with no info there on whether nonprofit need to have info entered manually. | ||
+ | |||
+ | =Funded By Accelerators= | ||
+ | |||
+ | Reference the like-named portion in [[Crunchbase Data#Funded by Accelerators|Crunchbase Data]] | ||
+ | |||
=End of Semester Report= | =End of Semester Report= | ||
The end of semester report will focus on ranking accelerators and environments based on the variables we have gathered. Our primary form of categorization will be ranking individual accelerators based on their venture capital raise rate. We can probably generate information over time for accelerators and the amount of VC they raised to get a sense of what locations have developed in the past five years from the dates of transactions recorded by SDC. To obtain these rankings, we will identify which cohorts companies were trained in, as well as complete details of the accelerator and the details of cohort companies. We will focus only on accelerators because there are many other entities in each ecosystem. We will also utilize information on IPO or acquisition by companies, obtained through Crunchbase, to gain some sense of how successful startups emerging from a particular accelerator are. To obtain the data over time, we will need to fill out the cohort date information column in our cohort data, which will require the help of either Crunchbase or the Wayback machine for older accelerators. In ranking the accelerators across regions, we can also track industry-specific hotspots for accelerators such as medicine in Memphis or technology in San Francisco. | The end of semester report will focus on ranking accelerators and environments based on the variables we have gathered. Our primary form of categorization will be ranking individual accelerators based on their venture capital raise rate. We can probably generate information over time for accelerators and the amount of VC they raised to get a sense of what locations have developed in the past five years from the dates of transactions recorded by SDC. To obtain these rankings, we will identify which cohorts companies were trained in, as well as complete details of the accelerator and the details of cohort companies. We will focus only on accelerators because there are many other entities in each ecosystem. We will also utilize information on IPO or acquisition by companies, obtained through Crunchbase, to gain some sense of how successful startups emerging from a particular accelerator are. To obtain the data over time, we will need to fill out the cohort date information column in our cohort data, which will require the help of either Crunchbase or the Wayback machine for older accelerators. In ranking the accelerators across regions, we can also track industry-specific hotspots for accelerators such as medicine in Memphis or technology in San Francisco. | ||
Line 67: | Line 156: | ||
=End of Semester Notes= | =End of Semester Notes= | ||
− | *We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data". | + | *We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data" or on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet]. |
+ | *We have listed all of the startups from the accelerators that have break out cohorts on their website on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet]. This contains the following information in the "Cohort List (new)" sheet: accelerator name, year, cohort name, company name, description, founders, category/sector, and location. | ||
+ | *Next steps include going through the demo day pages that have been downloaded and writing notes on the different types if possible (see [[Demo Day Page Google Classifier]]). | ||
=Data Collection Notes= | =Data Collection Notes= | ||
Line 175: | Line 266: | ||
==Link to Crunchbase API application== | ==Link to Crunchbase API application== | ||
− | https://about.crunchbase.com/forms/research-access-apply/ | + | https://about.crunchbase.com/forms/research-access-apply/ (Does not work anymore) |
+ | |||
+ | https://data.crunchbase.com/v3/docs/using-the-api (Has new instructions for application) | ||
==Sign-Ups== | ==Sign-Ups== | ||
Line 784: | Line 877: | ||
*http://www.represent.la/ | *http://www.represent.la/ | ||
*http://www.launch.co/blog/complete-list-of-incubators-and-accelerators-like-y-combinat.html | *http://www.launch.co/blog/complete-list-of-incubators-and-accelerators-like-y-combinat.html | ||
− | *https://angel.co/accelerator-4 | + | *https://angel.co/accelerator-4 (Does not work - seems to be replaced by https://angel.co/companies?company_types[]=Incubator ) |
(Obtained from Google search: "Accelerator Database") | (Obtained from Google search: "Accelerator Database") | ||
Line 823: | Line 916: | ||
− | ==Source: http://www.seed-db.com/accelerators | + | ==Source: http://www.seed-db.com/accelerators== |
#Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results. | #Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results. | ||
#Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort | #Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort | ||
Line 852: | Line 945: | ||
*Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive. | *Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive. | ||
*Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc. | *Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc. | ||
− | |||
==Source: http://www.seed-db.com/accelerators== | ==Source: http://www.seed-db.com/accelerators== | ||
Line 976: | Line 1,068: | ||
*Examples of single accelerators found | *Examples of single accelerators found | ||
:#TMCx: http://www.tmc.edu/innovation/innovation-programs/tmcx/ | :#TMCx: http://www.tmc.edu/innovation/innovation-programs/tmcx/ | ||
− | :#RED labs: http://redlabs.uh.edu/ | + | :#RED labs: http://redlabs.uh.edu/ |
:#SURGE accelerator: https://kirkcoburn.com/ | :#SURGE accelerator: https://kirkcoburn.com/ | ||
:#OwlSpark: http://owlspark.com/ | :#OwlSpark: http://owlspark.com/ | ||
:#NextHIT: http://www.houstonhealthventures.com/nexthit-accelerator-program-application/ | :#NextHIT: http://www.houstonhealthventures.com/nexthit-accelerator-program-application/ | ||
+ | |||
===Los Angeles Accelerators=== | ===Los Angeles Accelerators=== | ||
:#Amplify: http://amplify.la/ | :#Amplify: http://amplify.la/ | ||
Line 1,360: | Line 1,453: | ||
===CrunchBase=== | ===CrunchBase=== | ||
− | CrunchBase 2013 Snapshot (All Organizations)- E:\McNair\Projects\Accelerators\organizations.xls | + | CrunchBase 2013 Snapshot '''(All Organizations)'''- E:\McNair\Projects\Accelerators\organizations.xls |
− | CrunchBase 2013 Snapshot (Potential Accelerators)- E:\McNair\Projects\Accelerators\organizations.accdb under "Potential Accelerators query" | + | CrunchBase 2013 Snapshot '''(Potential Accelerators)'''- E:\McNair\Projects\Accelerators\organizations.accdb under "Potential Accelerators query" |
*Obtained using keyword matches in the descriptions of the potential accelerators. | *Obtained using keyword matches in the descriptions of the potential accelerators. | ||
− | CrunchBase 2013 Snapshot (New Verified Accelerators) - E:\McNair\Projects\Accelerators\New CrunchBase Accelerators.xls | + | CrunchBase 2013 Snapshot '''(New Verified Accelerators)''' - E:\McNair\Projects\Accelerators\New CrunchBase Accelerators.xls |
We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies. | We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies. | ||
+ | |||
+ | ===AngelList=== | ||
===seed-db=== | ===seed-db=== | ||
Line 1,376: | Line 1,471: | ||
===Global Accelerator Network (GAN)=== | ===Global Accelerator Network (GAN)=== | ||
− | GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\ | + | GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\scrapeaccel.py |
GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data | GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data | ||
Line 1,412: | Line 1,507: | ||
===Desired Information/Variables=== | ===Desired Information/Variables=== | ||
− | Key People (founders, lead entrepreneurs, strategists, etc.) | + | *Key People (founders, lead entrepreneurs, strategists, etc.) |
− | Total number of launched companies | + | *Total number of launched companies |
− | A FAQ for application details, accelerator vision, and | + | *A FAQ for application details, accelerator vision, and |
− | Funds raised per company (average) | + | *Funds raised per company (average) |
− | Features offered by accelerator (perks, space, tools, etc) | + | *Features offered by accelerator (perks, space, tools, etc) |
+ | |||
+ | ==Desired Tools/Information== | ||
+ | |||
+ | ===Automating the Process of Obtaining Cohorts=== | ||
+ | *Automating this process would save a lot of time and really progress the project. | ||
+ | |||
+ | ===Obtaining More Details on Accelerators=== | ||
+ | |||
+ | *Having the kind of thorough information on industry, companies, funding, location, exits, mentors, leadership, that we got for the GAN companies would be fantastic. | ||
+ | |||
+ | ===List of Alive/Dead Accelerators=== | ||
+ | |||
+ | This is a dream but would be very helpful |
Latest revision as of 13:44, 21 September 2020
Accelerator Seed List (Data) | |
---|---|
Project Information | |
Has title | Accelerator Seed List (Data) |
Has owner | Shrey Agarwal, Matthew Ringheanu, Veeral Shah, Connor Rothschild |
Has start date | Fall 2016 |
Has deadline date | |
Has keywords | Accelerators, Data |
Has project status | Subsume |
Is dependent on | Industry Classifier |
Dependent(s): | Accelerator Data, Demo Day Page Google Classifier |
Subsumed by: | U.S. Seed Accelerators |
Has sponsor | McNair Center |
Has project output | Data, Tool |
Copyright © 2019 edegan.com. All Rights Reserved. |
Contents
- 1 Current Work
- 2 Funded By Accelerators
- 3 End of Semester Report
- 4 Overview
- 5 Current Project Write-Up
- 6 Current To Do
- 7 End of Semester Notes
- 8 Data Collection Notes
- 9 List of Accelerators
- 10 Project Summary
- 11 Sources
- 12 Source Evaluations
- 12.1 SOURCE: Crunchbase
- 12.2 Source: http://www.acceleratorinfo.com/see-all.html
- 12.3 Source: http://www.seed-db.com/accelerators
- 12.4 Source: http://www.seed-db.com/accelerators
- 12.5 Source: https://www.f6s.com/programs?type
- 12.6 Source: http://gust.com/usa-canada-accelerator-report-2015/
- 12.7 Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/
- 12.8 Source: https://www.corporate-accelerators.net/database/
- 12.9 Source: https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json
- 12.10 Source: https://www.quora.com/Where-can-I-find-a-comprehensive-list-of-startup-incubators-and-accelerators-in-the-US
- 13 List of Sources Obtained from Various Google Searches
- 14 Individual Accelerator Evaluations
- 14.1 Accelerators Chosen (Format = Name (source))
- 14.2 Accelerator: Blue Startups (http://bluestartups.com/)
- 14.3 Accelerator: Launchpad LA (http://launchpad.la/)
- 14.4 Accelerator: Y Combinator (http://www.ycombinator.com)
- 14.5 Accelerator: Flashpoint (http://flashpoint.gatech.edu/)
- 14.6 Accelerator: Prosper Women Entrepreneurs (http://www.prosperstl.com)
- 14.7 Accelerator: Axel Springer Plug and Play(http://www.axelspringerplugandplay.com/)
- 14.8 Accelerator: Techstars (http://www.techstars.com)
- 14.9 Accelerator: Startmate (http://www.startmate.com.au)
- 14.10 Accelerator: Capital Factory (https://capitalfactory.com/accelerate/)
- 14.11 Accelerator: OwlSpark (http://entrepreneurship.rice.edu/accelerator/)
- 14.12 List of Promising Variables
- 15 E-R Diagram (in list form) for Identifying Attributes to Pull from Accelerators
- 16 Code
- 17 Kauffman Foundation Incubator Proposal Information
Current Work
As of 05/21/2018 the Google Sheet Workbook has been downloaded to the E drive. The now Excel Workbook is saved at E:\McNair\Projects\Accelerators\Summer 2018\Accelerator Master Variable List.xlsx. This is now the master file.
Google Master Sheet: https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=0
- Cross-reference sheet with data from Peter's old accelerator consolidation file ("accelerator_data_noflag" and "accelerator_data" in "All Relevant Files") and fill in missing data
- Variables that are 100% NOT in these 2 files:
- Cohort Breakout?
- Subtype
- Designed for Students?
- Campuses
- Stage
- Software Tech
- What stage do they look for?
TODO:
McNair/Projects/Accelerators/Fall 2017/unfound_founders.txt
A 0 means we don't have founder data for that accelerator. Specs: A tab delimited text file with the following fields:
Accelerator First Name Last Name LinkedInURL(if possible)
Getting the LinkedInURL will ensure accuracy, but will work without it.
- Shrey: Find "demo day" keywords, so that we can search AcceleratorName Year Keyword and get back potential demo day pages
Accelerator Type project
File to edit is called "Accelerator type list". Located in the folder E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs. More systematic information and instructions are in"Instructions for Accelerator type project" in E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs.
NOTE: until we get through all 270 accelerators, we will just categorize each accelerator into the following three categories as quickly as possible with short notes in teh "other info" column for these; once we have this, we will go back through the ones that aren't categorized and add notes to the "other info" column.
Type list:
- Private
- Corporate
- Academic
Note: if DEAD, noted here.
Other info:
- nonprofit? (y/n)
- Subtype abbreviations:
- S: for if a social entrepreneurship initiative
- I: for if an incubator
- A: for an angel group
- F: for foreign
- C: for in coworking space/hub/etc
- V: for if part of venture fund
- G: for if government funded/partnered
- T: for international
Note: subtypes (from individual text files in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data) were only found for 23 of the 270 accelerators. These accelerators were initially intended to be removed from the master list. Remaining subtypes are currently being added.
other info:
international offices, founders, industries, org type, program duration, or other interesting, easily accessed variables. Additional information is especially important for accelerators that have no other subtype abbreviation listed.
Steps to research an accelerator
1. Copy/paste URL listed in Accelerator type list file into google. If website is insufficient, try googling:
the name of the accelerator the name of the accelerator + "crunchbase" the name of the accelerator + "nonprofit"
the above steps sometimes lead to other helpful databases/news articles
2. Note whether:
1) Academic/Corporate/Private 2) For Profit/Nonprofit. Sometimes this isn't directly stated but can be inferred through their description of, say their investment process. If they don't address this at all it's probably For Profit. 3) subtype (S, I, A, F, C, V, G, T). 4) Additional, easily-accessed info. Number 4 is really important if there's no subtype.
All 270 need to be done by the end of the semester.
Type list file saved as
"Accelerator type list" in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs.
The list of ListofAccs, from which we drew Accelerator type list, should have no matches with any of the flagged accelerators in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data. There are 23 matches though. So all subtypes must be searched and entered manually. Whether some were a nonprofit was listed in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs, called "whether nonprofit...". Accelerators with no info there on whether nonprofit need to have info entered manually.
Funded By Accelerators
Reference the like-named portion in Crunchbase Data
End of Semester Report
The end of semester report will focus on ranking accelerators and environments based on the variables we have gathered. Our primary form of categorization will be ranking individual accelerators based on their venture capital raise rate. We can probably generate information over time for accelerators and the amount of VC they raised to get a sense of what locations have developed in the past five years from the dates of transactions recorded by SDC. To obtain these rankings, we will identify which cohorts companies were trained in, as well as complete details of the accelerator and the details of cohort companies. We will focus only on accelerators because there are many other entities in each ecosystem. We will also utilize information on IPO or acquisition by companies, obtained through Crunchbase, to gain some sense of how successful startups emerging from a particular accelerator are. To obtain the data over time, we will need to fill out the cohort date information column in our cohort data, which will require the help of either Crunchbase or the Wayback machine for older accelerators. In ranking the accelerators across regions, we can also track industry-specific hotspots for accelerators such as medicine in Memphis or technology in San Francisco.
To complete the report, we need to fill information in:
- Industry and focus
- Location
- Name, description
- Matched VC data
- Founder information (maybe)
Overview
This project is developing broad and near-population data on accelerators and their cohort companies. The objective is to identify which cohorts of which accelerators a cohort company was trained in, obtain details of the accelerators, and obtain details of the cohort companies, including information about any venture capital investment that the cohort company might have received and any IPO or acquisition the company may have experienced.
The primary use of this data is for an academic paper detailed on the Matching Entrepreneurs to Accelerators and VCs (Academic Paper) page.
However, this project can also provide useful data to other academic papers (Urban Start-up Agglomeration, Hubs (Academic Paper), and Hubs Scorecard (Academic Paper)), projects (Houston Entrepreneurship) and blog posts (under the Emerging Ecosystems umbrella project).
This project needs the results of the Industry Classifier, Whois Parser, and other tools.
Current Project Write-Up
Things To Do
- Obtain all URLs for accelerators in order to run through the Wayback Machine to find out when they started.
- Match Crunchbase Data with our Accelerator List to see if they have any accelerators that we do not.
- Obtain an example of accelerator that started early and has multiple companies but does not separate them into cohorts and figure out a way to determine which companies went through each cohort.
What Each File in the "Accelerator" Folder on the RDP Contains
- "Accelerator List Sources" (Folder) - This folder contains most of the sources that we pulled accelerator names from at the very beginning of the project.
- "Code+Final_Data" (Folder) - This folder contains Peter's code for pulling the data from the text files in the "Data" folder.
- "Crunchbase Snapshot" (Folder) - This folder contains the data we obtained from Crunchbase. There is a massive amount of data which we will need to sort through to find useful information and hopefully match that data with our current cohort data.
- "Data" (Folder) - This folder contains all of our data on accelerators including cohort information and the html files of each cohort page. I would estimate that it is about 95% clean currently.
- "Data - Copy" (Folder) - This is just a copy of our current "Data" folder.
- "Data_Copy" (Folder) - This is a copy of our original "Data" folder before we did any manual cleaning.
- "Enclosing_Circle" (Folder) - This folder seems to contain some data on VC but I'm not sure how it pertains to the Accelerator project.
- "F6S Accelerator HTMLs" (Folder) - This folder contains the HTML pages of all the pages on the F6S website. We used it to add more potential accelerators to our list.
- "Google_SiteSearch" (Folder) - This folder contains Python code for Google searches.
- "Industry_Classifier" (Folder) - This folder seems to contain Python code but I'm not sure what for.
- "Matcher" (Folder) - This folder contains the Matcher.
- "Python WebCrawler" (Folder) - This folder contains code that is a work in progress for pulling descriptions from accelerator websites. It is Jeemin's project.
- "Cleaned Cohort Data Copy" (Excel File) - This file contains a copy of our cleaned cohort data.
- "Cleaned Cohort Data" (Excel File) - This file contains the most current, completely cleaned data on cohort company information.
- "NormalizeFixedWidth" (PL File) - This is the normalizer.
- "PortCoNames" (TXT File) - This file contains all of the names of the cohort companies as well as the accelerator they went through.
- "VC Data" (Excel File) - This file contains all of the names of the companies that have ever received VC funding.
- "VC_Data" (TXT File) - This file contains that non-normalized data of all of the VC information.
- "VC_Data_Names" (TXT File) - This file contains all of the names of companies that have received VC funding.
- "VC_Data_Names_Matched_PortCoNames" (Excel File) - This file contains all of the cohort companies that have also received VC funding. Still needs to be sorted through.
Process
After accumulating the massive amount of data on accelerators, their cohorts, and their html files, we began cleaning those text files, which are located in the "Data" folder within "Accelerators". After going through the first round of cleaning, we ran a code through the cohort data which put all of that information into an Excel document called "Cleaned Cohort Data". There were still some mistakes in the cohort information unfortunately, which we fixed within the Excel file itself. Therefore, there are some text files within the "Data" folder that do not match with the "Cleaned Cohort Data" file. If we were to run the cohort code through the "Data" folder, we would get something that does not match with the "Cleaned Cohort Data" file, which is problematic. The solution to this (other than manually cleaning the text files again) would be to write a code from the "Cleaned Cohort Data" file which would allow us to clean the data in the "Data" folder through the format of the Excel file. We have also matched all of the cohort companies with our list of all companies that have received VC funding.
Current To Do
- Work on the Crunchbase 2013 Snapshot
- Match cohort companies to VC-backed portfolio companies
- Refine our data to work out which cohort each cohort company was a member of, cohort start dates and locations, etc.
- Make a list of top accelerator lists (e.g., http://tech.co/top-startup-accelerators-ranked-2012-08) and check that we have those accelerators
End of Semester Notes
- We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data" or on the "Accelerator Master Variable List" Google sheet.
- We have listed all of the startups from the accelerators that have break out cohorts on their website on the "Accelerator Master Variable List" Google sheet. This contains the following information in the "Cohort List (new)" sheet: accelerator name, year, cohort name, company name, description, founders, category/sector, and location.
- Next steps include going through the demo day pages that have been downloaded and writing notes on the different types if possible (see Demo Day Page Google Classifier).
Data Collection Notes
MATCHING
The files we used to match are located in the E drive. We used the matcher to match our portfolio company names from the cohort file located in E:\McNair\Projects\Accelerators.
- The files used to matching are located E:\McNair\Projects\Accelerators\Matcher
- Portco is the name of the companies pulled from the cohort file
- AccCo includes both the cohort company name, along with the name of the accelerator itself
- In the matcher, the inputs are the PortCo names, as well as the VC data from our pull in SDC
- The outputs include the AccCo_VC data located in E:\McNair\Projects\Accelerators which give a lot of information on the matches, including:
- name of the match itself
- number of investments
- dates that the company received its investments
SDC Pull
We accessed SDC platinum and pulled information on round-based funding that all registered companies received from between the years 1999 to 2017.
The receipt is as follows:
Session Details
Request Hits Request Description
0 - DATABASE: Portfolio Companies (VIPC) 1 96155 Venture Related Deals: Select All Venture Related Deals 2 79572 Round Date: 1/1/1999 to 3/1/2017 (Custom) (Calendar) 3 Custom Report: VC Data (Columnar) - Save As: E:\McNair\Projects\Accelerators\VC Data.txt
� Billing Ref # : 2054025 Capture File : riceuniv.2054025 Session Name :
The VC data pull includes the following variables:
Company Name Date Company Date Company Company Company City Company Street Address, Line 1 Company Street Address, Line 2 Total Known Company Industry Sub-Group 3 Company Industry Major Group Round Company Stage Level 3 Round Amt, Round Amt,
3 files
For each accelerator in the list, put files in E:\Projects\Accelerators\Data
- AcceleratorName.txt - copy and paste the variables below into a (tab-delimited) txt file and complete
- AcceleratorName.cohort - your cohort text file (see below)
- AcceleratorName.html (possibly automatically with a folder too) - save a copy of the html of the cohort page
.txt Variables
Name Score Flag CohortURL Address Duration Vintage Industry Description Equity NonProfit Notes
Try to get Name, Score, Flag, Cohort URL and Address for all. ONLY GRAB OTHER VARIABLES IF EASY. Just leave things blank if you can't find them quickly.
If the score is 0, or the flag is S, I, A, or F just stop - don't bother downloading a cohort list, saving an HTML file, etc. If possible, do stick a very brief description of the problem in the notes field.
Notes:
- Score: is 0-1 where 0 is definitely not an accelerator, 1 is definitely an accelerator
- Flag: (leave blank if not needed), if multiple then separate by comma
- S for social entrep
- I for incubator
- A for an angel group
- F is for foreign
- C for in coworking space/hub/etc
- V for if part of venture fund
- D is for Dead
- Put just the root URL in Cohort URL if there isn't a Cohort page
- Duration: in wks (months x 4.33 and round)
- Vintage is year of first cohort if possible
- Industry is industry focus but only if clear focus
- Equity is a number (don't put %) or Y/N
- Notes is only there if need it. Particularly try to use this field to note discards.
.cohort files
Your .cohort files must:
- Be tab delimited txt
- Have a header
- The first column must be the portfolio company name
- Grab as many columns as you can easily (and name them)
Standardized format for text files
Information Text file
- 1 tab only after each category
- No spaces after commas for flags or industry
- For duration put only a number in weeks but do not write "weeks"
- Equity is either only a number (no percent sign) or a Y/N
Cohort Text file
- 1 tab between each column
- Titles of each column on top
- Make a new category for "Cohort Number" and write either "1 2 3 4 etc."
- Matthew: 1-225 (done) Shrey: 226-550 (done)
Link to Crunchbase API application
https://about.crunchbase.com/forms/research-access-apply/ (Does not work anymore)
https://data.crunchbase.com/v3/docs/using-the-api (Has new instructions for application)
Sign-Ups
Ed - 1-10 (done) Carlin - 11-20 (done) Carlin - 21-40 (done) Christy - 41-60 (done) Avesh - 61-80 (done) Eliza - 81-100 (done) Meghana - 101-120 (done) Peter - 121-140 (done) Ramee - 141-160 (done) Will - 161-180 (done) Matthew - 181-200 (done) Julia - 201-220 (done) Peter - 221-240 (done) Shrey - 241-260 (done) Matthew - 261-280 (done) Eliza - 281-300 (done) Julia - 301-320 (done) Shrey - 321-340 (done) Carlin - 341-361 (done) Julia - 362-380 (done) Dylan - 381-393 (done) Jake - 394-404 (done) Dylan - 405-410 (done) Avesh - 411-415 (done) Dylan - 416-423 (done) Peter - 424-460(done) Carlin - 461-480 (done) Peter - 481-490(done) Julia - 491-510 (done) Peter - 511-515 (done) Julia - 516-529 (done) Ben - 530-540 (done) Shrey - 541-551 (done)
List of Accelerators
- 10Xelerator
- 1440
- 33entrepreneurs
- 500 Startups
- 9Mile Labs
- AIA Accelerator
- ARK Challenge
- AT&T Aspire Accelerator
- ATDC Community
- AZ TechCelerator
- AccelFoods
- Acceleprise
- Accelerate Baltimore
- Accelerate Genius
- Accelerate Tectoria Accelerator
- Accelerator Centre
- Advanced Technology Development Center (ATDC)
- Airbus BizLab
- Alchemist Accelerator
- AlphaLab
- Amplify.LA
- Angel Capital
- Angelcube
- Angelpad
- Annual Business BootCamp
- Arizona Center for Innovation
- Arizona Furnace
- Arrowhead Tech Incubator 2016
- Aspire 3 Accelerator 2017
- Atlanta Ventures Accelerator
- AutoXLR8R
- Awesome Inc.
- Axel Springer Plug and Play
- B 4 Change Impact Accelerator
- B2B Acceleration Program
- B4C Social Venture Accelerator
- BBC Worldwide Labs
- BMW Startup Garage
- Brandcelerate
- Bunker Labs
- Bank of Ireland Accelerator Programme
- Bantunium Labs Accelerator
- Barclays Accelerator
- Barclays New York Summer 2015
- Berkley Ventures
- Bessemer Business Incubation System
- Beta-i
- Beta.MN
- BetaFactory
- BetaSpring
- Betablox
- Betaspring RevUp (DUPLICATE)
- Bethnal Green Ventures
- BioAccel
- BioInspire
- Bir 2015
- BitAngel Engagement Level
- BitAngels Startup Summer Program of 2013
- Bizdom
- Black Forest Accelerator
- Blue Startups
- Blueprint Health
- Bolt Boston
- Bonnier Accelerator
- BoomStartup
- BoomStartup Winter 2017 (DUPLICATE)
- Boomtown Accelerator
- Boomtown Health Tech (DUPLICATE)
- Boost VC
- BootupLabs
- Brandery
- Brooklyn Beta Summer Camp
- Budweiser Dream Brewery
- Buildit
- BuiltinPGH Companies
- Business Innovation Center
- Business Opportunity Academy 2017
- Business Technology Development Center (BizTech)
- CLT Joules Energy Accelerator 2014
- CWI Ventures
- CWI Ventures Application (DUPLICATE)
- CableLabs Technology Tours 2016
- Capital Factory
- Capital Innovators
- Capital Investment Network (Startups)
- Caroline Plouff
- Catalyst Partners
- Cause Collective : Social Innovation Lab
- Center for Entrepreneurial Innovation
- Chain Reaction Innovations 2017
- Chemical Angel Network
- Chinaccelerator
- Cisco Entrepreneurs in Residence
- Citi Accelerator
- Citrix Startup Accelerator
- Claremont/Upland Makerspace Fablab
- Climate Ventures 2.0 Accelerator
- Co.Lab accelerator
- Code for America Accelerator
- Cohab's Traxtion Point
- Collision Conference Investors
- Common Bond
- Communitech Hyperdrive
- Conquer Accelerator
- Coolhouse Labs
- CuriousMinds Incubator / Accelerator
- CyberTECH San Diego
- DBS Accelerator
- DPD Last Mile labs
- DV X Labs
- Dat Ventures
- Decatur-Morgan County Entrepreneurial Center
- Deep Space Ventures
- Demo Accelerator 2016- 2017
- DeveloperTown
- Difference Engine
- Digital Malaysia Corporate Accelerator Program
- Digital Media Zone Incubator/Accelerator
- Disney Accelerator
- DogFish Accelerator
- Domi Station
- Dotforge accelerator
- Dream Funded
- DreamIT Health
- DreamStart - Free Mentoring Program
- Dreamit Ventures (DUPLICATE)
- Ducky Diggy Lloyd
- E-Capital Summit
- EC Mentor Skills Inventory
- EIGERlab
- ETRAC
- EY Startup Challenge
- Eco Holding
- Eleven Startup Accelerator
- Emerge Xcelerate
- EnterpriseWorks Incubation Program
- Entrepreneur Development Center
- Entrepreneurs Roundtable Accelerator
- Environmental Business Cluster
- Equity Legal
- Excelerate Labs
- Execution Labs
- Exhilarator
- Extreme Startups
- Extreme University
- FOOD-X
- Factory45
- Fargo Startup House 2014-2015
- FastTrack Propero Healthcare
- FbFund
- Female Propeller for High Flyers
- FinTech Innovation Lab
- FinTech Studios 2015
- Fintech Founders Club #2
- First Growth Venture Network
- Fishbowl Labs AOL
- Flagship Enterprise Center
- FlashStarts
- Flashpoint
- Flat6 Labs
- Fledge9
- Flextronics Lab IX
- Food Future Scale-up Accelerator 2017
- Food System 6 (FS6) Accelerator
- FoodForwardX
- Fortify Ventures
- Founder Institute
- FounderFuel
- FoundersPad
- Fownders Accelerator
- French Accelerator 2016
- Fund the Food
- Fuse Corps Host
- GAKKEN Accelerator Program
- Gainesville Technology Enterprise Center
- Game CoLab Incubator Program 2014
- GameFounders
- GammaRebels
- Gazelle Lab
- Gener8tor
- German Accelerator Life Sciences
- German Accelerator Tech
- Global Accelerator Network 2015
- Good Works Houston Lab
- GoodCompany Ventures
- Google Launchpad Accelerator
- Grants4Apps Accelerator
- GreenStart
- Greenlite Labs
- GrowLab
- Growth Hacking Accelerator 2015
- Gulf Coast Center for Innovation and Entrepreneurship
- H-Farm Ventures
- HACKT Mission for International Founders
- HAXLR8R
- HCC Entrepreneurship Launchpad
- HIGHLINE Academy
- HUB
- HUBB Accelerator
- HUBB GTLA 2016
- HackFWD
- Hatch
- Health Wildcatters
- Health accelerator
- Healthbox
- Hero City Co-Working Space
- High Street Startups Accelerator
- Highway1
- Honda Xcelerator
- Houston Technology Center
- Hub Ventures
- HugeThing
- I/O ventures
- ICONYC labs
- IDC Elevator
- INcubes Funnel and Accelerator 2014/2015
- INcubes Online Form
- INcubes Startup Visa
- Illumina Accelerator
- Illuminator, New York Accelerator 2015
- Imagine K12
- Immokalee Business Development Center
- Impact Engine
- Impact USA - 2017
- Incubate Miami
- Infuse Accelerator
- Ingenuity Partner Program
- InnoSpring
- Innov&Connect
- Innov8 for Health
- Innova Memphis
- InnovateOC
- Innovation Depot
- Innovation Pavilion
- Innovation Showcase Winter 2017
- Insight Accelerator Labs
- Intel Education Accelerator
- Investment Preparedness Lab
- Invoke Collective
- Iowa Startup Accelerator
- JFDI.Asia
- JFE Accelerator SF
- JLAB
- Jaguar Land Rover Tech Incubator
- Jolt
- JumpSchool
- JumpStart Foundry
- Jumpstart! Boulder
- JusticeXL
- Kairos Boston Spring Program
- Kaplan EdTech
- Kick
- Kick Boise
- Kick LA
- Kick Victoria
- Kicklabs
- Kinetiq Labs
- L-SPARK Accelerator
- LAUNCH incubator
- LAUNCHub
- LI TechCOMETS
- LabFunding Project Accelerator 2014
- Labs Venture Accelerator
- Launch Chapel Hill
- Launch Memphis
- LaunchBox Digital
- LaunchHouse
- LaunchPad PEI
- LaunchSpot
- Launch_Academy
- Launchpad Digital Health, LLC
- Launchpad LA
- Launchpad Long Island
- Le Camping
- Leading Entrepreneurial Accelerator Program
- Lean Launch Ventures
- LearnLaunchX
- Lemnos Labs
- Life Changing Labs
- LiftOff Health Incubator
- Lightbank Start
- LightningLab
- Lowe's Accelerator
- MACH37
- MACH37 Spring
- MIT SA+P venture accelerator
- MITA Institute Accelerator
- MTGx MediaFactory
- Mac6
- Madworks Governance Accelerator
- Maine Center for Entrepreneurial Development - Top Gun Program
- Matter
- Maven Ventures Fund & Incubator
- Media Camp
- Melbourne Accelerator Program
- Memphis BioWorks
- Merck Accelerator
- MergeLane 2017 Accelerator
- Mergelane
- Metavallon
- Microsoft Accelerator
- MindTheBridge
- Momentum
- MuckerLab
- Muru-D
- My5ive Accelerator 2016
- N-Motion (DUPLICATE)
- NDRC (LaunchPad / VentureLab)
- NEXT Dashboard
- NMotion
- NY Digital Health Accelerator
- NY Fashion Tech Lab 2017
- NYC ACRE
- NYC SeedStart
- Nashville Entrepreneur Center
- Nebula Shift
- Nephoscale IaaS
- Nest New York
- New Ventures Group
- New York Digital Health Accelerator (DUPLICATE)
- NewME Accelerator PopUps
- NewMe
- Next media accelerator
- NextHIT
- NextStart
- Nike+ Accelerator
- Northern Arizona Center for Entrepreneurship and Technology (NACET)
- Northern England
- Nxtp.labs
- OCTANe
- Oasis 500
- OpenFund
- Orange Fab
- Orange Works
- Orion Startups
- Oxygen Accelerator
- PIE
- Patriot Boot Camp
- Pearson Catalyst for Education
- Pipeline H2O
- Pitney Bowes Inc
- Plarium Labs
- Plug In South LA
- Plug and Play
- Plum Alley Investments 2016
- Points of Light Accelerator
- PowerHaus
- Preccelerator® Program 2016
- ProSiebenSat.1 Accelerator
- Project Entrepreneur 2016/17
- Project Healtchare
- Project Lift
- Project Music
- Project Skyway
- Propeller Venture Accelerator
- Prosper Capital Accelerator
- Proton Enterprises
- Pushstart Accelerator
- Qualcomm Robotics Accelerator
- Queen Creek Business Incubator
- R/GA Accelerator
- RAIN Incubator/Accelerator
- RJI Investment Group
- Reach
- RetailXelerator
- Rock Health
- Rocket Fuel Labs
- Rockstart Accelerator
- RunUp Labs
- Runway IoT Accelerator 2015
- SAP Startup Focus Program
- SKTA Innopartners Innovation Accelerator
- SPACELAB Tech Accelerator
- SPARK
- SPH Plug and Play
- SURF Incubator
- SaltMines Group Start-Up Studio
- ScaleTown
- Seamless IoT 2016
- Searchcamp
- Seed Hatchery
- SeedSpot
- SeedStartup
- SeedSumo
- Seedcamp
- Seedrocket
- Seeqnce
- Sequoia Apps
- Serval Ventures
- Shenzhen Valley Ventures Incubator
- Shoals Entrepreneurial Center
- Shopper Futures Accelerator
- Shotput Ventures
- Sid Martin Biotechnology Institute
- SigmaLabs Accelerator
- Silicon Valley Incubator & Accelerator
- SixThirty
- Sixers Innovation Lab
- Skywalker Accelerator
- SmartHealth Activator
- Smashd Labs
- SoCo Nexus Accelerator Spring 2017
- Social Enterprise Challenge
- Socratic Labs
- SparkLabs
- Sparkgap
- Sports Tank
- Springboard
- Sprint Accelerator
- Sprint Mobile Health Accelerator
- SproutBox
- SproutCamp
- Starburst Aerospace Accelerator
- Start Path Europe
- Start'inPost
- StartEngine
- StartFast Venture Accelerator
- Starta Accelerator Winter 2017
- Startl
- Startmate
- Startup Accelerator (DUPLICATE)
- Startup Front
- Startup Next & GAN
- Startup Orange County Accelerator
- Startup Runway
- Startup Wise Guys
- Startup Zone PEI
- Startup52X Accelerator
- StartupCity
- StartupHighway
- StartupHouse Foundry program
- StartupMinds Accelerator
- StartupYard
- Startupbootcamp
- Straight Shot
- Summer@Highland
- Surge
- SynBio axlr8r
- TEB Incubation & Acceleration Center
- THRIVE Accelerator III
- THRIVE Open Innovation (DUPLICATE)
- TIM#WCAP Accelerator
- TLabs
- TMCx Accelerator Digital Health 2017
- Tallwave
- Tampa Bay Innovation Center
- Tampa Bay Wave
- Tandem Mobile Accelerator
- Tech Nexus
- Tech Wildcatters
- Tech2020
- TechLaunch
- TechRanch
- TechSquareLabs
- Techstars
- Techstars Music
- Telenet Idealabs
- Telluride Venture Accelerator
- TenX
- The Alchemist Accelerator (DUPLICATE)
- The Ark
- The Bakery
- The Batchery
- The Brandery
- The Bridge
- The Center For Technology Enterprise & Development
- The Chaser
- The Company Lab (CO.LAB)
- The Draper FinTech Connection
- The Factory
- The Greatest Pitch
- The Harbor Accelerator
- The Incubator
- The Iron Yard
- The Mediapreneur Incubator
- The Morpheus
- The New York Venture Summit
- The Next Step: from idea to startup
- The Refinery
- The Unilever Foundry
- The Venture Center's Pre-Accelerator I
- The Vine OC
- The Vogt Awards
- The Yield Lab
- The eFactory Accelerator
- Think Big Partners Accelerator
- TiE Angels
- Tigerlabs Digital Health Accelerator
- Tolstoy Summer Camp
- TopSeedsLab
- Travel Startups Incubator
- Travelport Labs Accelerator
- Travelport Labs Incubator
- Triangle Startup Factory
- Tumml
- Tune Labs
- Twin Cities Accelerator 2016
- UW-Whitewater Launch Pad Accelerator
- Unbank.ventures FinTech Incubator
- University Technology Park
- Unreasonable Institute
- UpTech
- Upstart Accelerator
- Upstart Labs
- Upstart Memphis
- Uptima Business Bootcamp
- Upwest Labs
- VANTEC
- VC FinTech Accelerator
- Velocity Indiana Accelerator
- Venture Catalyst Partners
- Venture Hive
- Venture I
- VentureOut's Enterprise Tech Expedition
- Venturegeeks
- Vet-Tech Accelerator
- VictorySpark
- Village88 Techlab
- Volkswagen ERL Technology Accelerator
- WHLabs
- Wasabi Ventures Academy
- Wayra
- Wellness Accelerator
- Wells Fargo Startup Accelerator
- Wireless IoT
- Women Innovate Mobile
- XLerateHealth
- XTRATOS
- Xlerate Health
- Y Combinator
- Y&R SparkPlug 2017
- YEurope
- YLE Media Startup Accelerator Program
- Yahoo Ad Tech Program
- Yangler (online accelerator)
- Year of the Startup
- Yetizen Accelerator
- You Is Now
- Z80 Labs
- ZIP Launchpad Admission
- ZeroTo510
- Zone Startups Calgary
- designX 2017
- eMerging Ventures
- ezone
- iStart Jax (DUPLICATE)
- iStart Valley
- iVentures10
- ignite100
- innovyz start
- tekMountain Accelerator
Project Summary
This project will be used to determine which accelerators are the most effective at churning out successful startups, as well as what characteristics are exhibited by these accelerators. First, we need to gather as much data as we can about as many accelerators as we can in order to look at factors that differentiate successful vs. unsuccessful ventures. Next, we need to create a web crawling program which will gather information about accelerators across the world by accessing their websites and extracting information. I believe that our overall goal with this research project is to gain insight into the methods of successful accelerators, as well as to find out what exactly differentiates very successful accelerators from dead accelerators.
Helpful Links: http://seedrankings.com/
Sources
Summary: These are sources obtained from List of Accelerators, Crunchbase, and other Google searches. We will evaluate these sources by looking at the number of accelerators they supply (as most of them are lists) and then also taking a look at the type of information they provide about each accelerator. Key data points are cohort-related data, startup-related data, and logistics of the accelerator. Better sources supply more information that the URL alone.
(Obtained from List of Accelerators and various Google searches)
- http://seedrankings.com/
- http://www.acceleratorinfo.com/see-all.html
- http://www.seed-db.com/accelerators
- http://gust.com/usa-canada-accelerator-report-2015/?utm_content=35401577&utm_medium=social&utm_source=twitter
- https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/
- http://www.builtinnyc.com/2016/06/03/accelerators-incubators-nyc
- http://www.represent.la/
- http://www.launch.co/blog/complete-list-of-incubators-and-accelerators-like-y-combinat.html
- https://angel.co/accelerator-4 (Does not work - seems to be replaced by https://angel.co/companies?company_types[]=Incubator )
(Obtained from Google search: "Accelerator Database")
- seed-db is the first result that pops up
- https://www.corporate-accelerators.net/database/
- https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json
- By the 5th or 6th search result, the utility diminished greatly
- http://www.forbes.com/sites/briansolomon/2015/03/17/the-best-startup-accelerators-of-2015-powering-a-tech-boom/#2f52fa7e34e4
- http://www.inc.com/will-yakowicz/the-15-best-startup-accelerators-in-the-us.html
- http://www.forbes.com/sites/briansolomon/2016/03/11/the-best-startup-accelerators-of-2016/#74086a7724f2
- https://techcrunch.com/2015/03/17/these-are-the-top-20-us-accelerators/
- https://www.nexpcb.com/blogs/news/the-hardware-incubators-accelerators-list
Other ways used to find Accelerators (listed below "List of Sources Obtained from Various Google Searches"):
- Type in generic location + "accelerators" (e.g. Houston Accelerators)
- Looked at roughly the first 20 results
- Used three locations as examples of accelerators that pop up
- Type in a specific state + "accelerator" + "list" (e.g. Texas accelerator list) to search for more relevant lists
- Once again, looked at roughly the first 20 results
- Crunchbase has its own webpage with instructions for how we retrieve the data
Source Evaluations
Summary: These evaluations couple with each of the sources above. The evaluations provide instructions for obtaining the information listed, as well as a general review of how useful the data seems. The review serves to determine whether a crawler would be suitable for obtaining information from the source autonomously.
SOURCE: Crunchbase
- All of the information for the Crunchbase documentation is located in the page Crunchbase 2013 Snapshot webpage, along with the documentation for how we determined the accelerator information.
Source: http://www.acceleratorinfo.com/see-all.html
- Opened source website
- Copied Information under "All Accelerator Programs" to TextPad, already sorted. Returned 190 results
- Each link on parent list leads to individual home page url of accelerator
- Used sample size of 20 links, determined 16 to be accelerators, 2 to be incubators, 2 to be inactive or broken links
- Many accelerators do not include founding date, most recent accelerators from around 2013-2014 (as determined from home page)
Review
- Reliable source for specific URLs to older accelerators, not very helpful for more specific information.
- Web crawling seems improbable because information is not readily available from source. Can potentially mine staff information or contact information from associated "about" page in the home url
Source: http://www.seed-db.com/accelerators
- Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results.
- Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
- Startup table includes:
- "state"
- "company name"
- "website and CrunchBase links"
- "cohort date"
- "exit value"
- "funding".
- Many entries for "exit value" are missing, some values for "funding" are missing
- On original seed-db webpage, each accelerator has a link to its associated home page url
- From the table, each listed entry was an accelerator, although 24 accelerators out of 235 were classified as "dead"
- Along with the home url, each accelerator table includes the following:
- Status
- Program (name)
- Location
- Country
- Number of companies
- Cumulative exit values
- Cumulative funding
- Average funding for startups
- Median funding for startups
- Many entries for "median funding" are left empty, as well as entries for all types of funding on the bottom half of the table
Review
- Reliable source for accelerators, includes list of accelerators both dead and active, as well as their associated start-ups
- Web crawling potential is promising; startup table is located within the source for each webpage. Can also mine any category from the accelerator table
- Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive.
- Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc.
Source: http://www.seed-db.com/accelerators
- Very similar to "http://www.seed-db.com/accelerators/all", but contains large regional accelerators as groups, rather than individual accelerators. For example, Techstars appears only once.
- Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 239 results.
- Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
- Startup table includes same information as previous source, "http://www.seed-db.com/accelerators/all". However, accelerators spanning across multiple regions have their startups located under one category on this webpage.
- On original seed-db webpage, each accelerator has a link to its associated home page url
- From the table, each listed entry was an accelerator, although 24 accelerators/groups out of 239 were classified as "dead"
- Along with the home url, each accelerator table includes the same information as the "http://www.seed-db.com/accelerators/all" source
Review
- Reliable source for accelerators, includes list of accelerators both dead and active, as well as their associated start-ups
- Web crawling potential is promising; startup table is located within the source for each webpage. Can also mine any category from the accelerator table
- Overall very extensive data for accelerators that are included on the list, includes large groups as well as individual accelerators. It seems that some accelerators missing from "http://www.seed-db.com/accelerators/all" are located here, since there are 239 returns rather than 235.
Source: https://www.f6s.com/programs?type
- On the webpage, set "Type" to "Accelerator/Program", set "Location" to "North America", and set "Invest in Country" to "United States" to return results
- Highlighted results and scrolled down until all results found; copied results to TextPad
- In TextPad, sorted out lines with "by", as well as miscellaneous categories such as dates and dollar signs through Regular Expressions
- Using the "More Info" line which held constant through the entire list, assigned a sequential number to the line (in order to determine the number of results)
- Obtained a grand total of 1467 results from the list
- Along with the name of the program/accelerator, the data included:
- Dollar value per team
- Equity
- Application Site
- Accelerator URL
- Many entries are not accelerators, from a quick glance through the results, there were various conferences, 3-5 days events, and written literature pertaining to accelerators as well
- From a sample size of the first 30 entries, determined 10 to be valid accelerators, 3 incubators, 6 conferences/weekends, and the rest to be miscellaneous entries such as startup events or "studios" (perhaps useful but not relevant to search)
- As we go down the list, the number of accelerators proportionately decreases. Can comfortably say that overall accelerator turnout from this website is much less than 33%, probably closer to 10-15%.
Review
- Potentially useful website if crawler could remove the clutter and target solely the accelerators; very useful for identifying new accelerators since data automatically sorted by date and location.
- Large list of sources includes many irrelevant results, such as conferences or weekends which are difficult to identify. The name of the sorting category itself, "Accelerator/Program" suggests that many of the results fall under the "Program" section rather than being valid accelerators.
- Potential site for identifying accelerators, but limited by in-site sorting; useful for URL and perhaps equity, but not very detailed information relating to the accelerator/program.
Source: http://gust.com/usa-canada-accelerator-report-2015/
- Selected region of US and Canada
- Scrolled down to the section labeled "Top 20 Active Accelerators" and selected "see the full list" near the bottom of the listed accelerators
- Copied resulting entries into TextPad and sorted out the numbers to leave only the name of the accelerator
- Obtained 100 results for different accelerators
- Accelerator lists included:
- Name and URL
- Number of Start-ups funded (2015 only)
- Accelerator list limited to 2015
Review
- Website provides its own evaluation of an accelerator's success based on various factors and provides data for larger trends.
- Usefulness is questionable because website does not provide much except the URL, and all of the entries are based on success in 2015.
- Other interesting data within website such as "Hot Markets", investment breakdowns by state, etc. All of this data is also limited to 2015.
Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/
- Scrolled down to the section labeled "Startup accelerators in Boston"
- Copied text beginning from "MassChallenge" (the first paragraph was just a general definition of startups) and continued to copy until "Startup Incubators in Boston"
- After pasting in TextPad, I sorted the data to delete any characters after the "-" and added a sequential number at the beginning of each line
- Returned a total of 17 results for startups in Boston
- Accelerator list included:
- Name and URL
- Capital requirements
- Application periods and requirements
- Paragraph describing accelerator and its goals
Review
- Although the guide is dated, useful for identifying strong accelerator programs in Boston
- Limitation: only focuses on Boston, but the description is helpful in identifying the role of the accelerator
- Limited information on accelerator, not very useful by itself without information from the accelerator URL
Source: https://www.corporate-accelerators.net/database/
- Copied and pasted table into Microsoft Excel (Data was already sorted into categories so no need for TextPad)
- Table returned 72 references (but there was a link to the bottom to a larger database)
- The table itself includes:
- Major Company
- Accelerator
- Funding
- Equity
- Website
- Details
- The "Details" link led to a variety of other information including:
- Status (Active or Inactive)
- Locations
- Funding
- Equity
- Term
- Cohort Based? (Regular or Irregular)
- Pitch Day
- Office Space
- Powered by
- Support Offered?
- Launch year
- Focus Areas
- General Description
- Also Included a variety of data regarding the host company as well
Review
- Solid list for corporate accelerators and also includes a variety of information about the accelerator, the cohorts, etc. Some of the entries are international accelerators however so need to filter them out
- Only limited to 72 accelerators from major companies
Source: https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json
- This source is a .json file from the previous database
- After placing into TextPad, replaced each space with a ###, replaced each new line with a tab, and replaced each ### with a new line. Ultimately returned 80 results
- From the file, the .json includes:
- NAICS and NAICS sector
- Classification
- Sector Description
- Term
- Goal
- Partner
- Also includes most of the information from the previous source, since they are undoubtedly linked
Review
- Another solid list for corporate accelerators with some more information, but ultimately very similar to the previous source.
Source: https://www.quora.com/Where-can-I-find-a-comprehensive-list-of-startup-incubators-and-accelerators-in-the-US
- Since we already looked at the first listed source (seed-db), I clicked on the second link "(by Robert Shedd) http://blog.shedd.us/321987608/" which took me to a page headed "Help for Startups! – A semi-complete list of startup accelerator programs" created by a blogger, Robert Shedd
- List included 102 entries by the blogger, each of which do look like an accelerator
- Upon immediate overview, noticed many results from previous sources were missing. Immediately noticed lack of "OwlSpark", the accelerator from Rice.
- Shedd only offers us the accelerator name plus its URL
Review
- Nice list to cross-reference with other sources but does not offer much new insight compared to more powerful engines such as seed-db\
List of Sources Obtained from Various Google Searches
Summary: These accelerators are taken from a specific Google search rather than a list. The idea is to compile a list of Google searches that return relevant results of accelerators. This will aid in the creation of a future web crawler.
From "Location + Accelerator"(Only individual results, not lists)
Houston Accelerators
- Examples of single accelerators found
- TMCx: http://www.tmc.edu/innovation/innovation-programs/tmcx/
- RED labs: http://redlabs.uh.edu/
- SURGE accelerator: https://kirkcoburn.com/
- OwlSpark: http://owlspark.com/
- NextHIT: http://www.houstonhealthventures.com/nexthit-accelerator-program-application/
Los Angeles Accelerators
- Amplify: http://amplify.la/
- Y Combinator: https://www.ycombinator.com/
- Chicklabs: https://www.chicklabsllc.com/
- Disney Accelerator: https://disneyaccelerator.com/
- Launchpad: https://launchpad.la/
New York Accelerators
- DreamIT Ventures: http://www.dreamit.com/#meaningful-experience
- Women Innovate Mobile: http://www.wim.co/
- Techstars NYC: http://www.techstars.com/programs/nyc-program/
- Entrepreneurs Roundtable: http://eranyc.com/
- FirstGrowthVC: http://venturecrush.com/fg/
- New York Digital Health Accelerator: http://digitalhealthaccelerator.com/
- Grand Central Tech: http://www.grandcentraltech.com/
- Accelerator Corp: http://www.acceleratorcorp.com/
- New York Startup Lab: http://nystartuplab.com/
Review
- Some locations return more viable results for a similar sample size. For example, New York returned 9 valid accelerators, whereas Los Angeles and Houston both returned 5 actual accelerators out of the first 20 results: an 80% difference. Some optimization may come from identifying which locations return more accelerators upon searching.
From "State+Accelerator+List"
New York Accelerator List
- http://www.ongridventures.com/resources/new-york-silicon-alley-resources/newyorkaccelerators/ (Ranks 14 accelerators)
- http://under30ceo.com/11-new-york-tech-incubators-and-accelerators-for-entrepreneurs/ (Ranks 11 accelerators)
California Accelerator List
- http://www.socaltech.com/the_complete_guide_to_southern_california_accelerators_and_incubators_part_i/s-0040924.html (Lists accelerators in Southern Cali)
- http://barberacorporatelaw.com/blog/2014/4/8/28-business-incubators-in-the-los-angeles-area (List of 24 accelerators near the LA area)
Texas Accelerator List
- http://www.austinstartuplist.com/incubators (List of accelerators in Austin, <5 results)
- http://www.siliconhillsnews.com/2016/09/02/the-top-texas-healthcare-accelerators-and-incubators/ (Modest list of accelerators aiding in healthcare)
- http://realfoodmba.com/food-startup-accelerators/ (List of food-based accelerators, some of which are in Austin, others of which are international)
Colorado Accelerator List
- http://www.builtincolorado.com/2015/01/14/best-colorado-accelerators-your-startup (8 results)
- https://www.quora.com/What-accelerator-programs-are-located-in-Colorado (Quora inquiry yielding modest results)
Washington Accelerator List
- http://www.geekwire.com/2015/mapping-seattles-incubators-accelerators-and-co-working-spaces/ (Returns 14 results)
Oregon Accelerator List
- http://www.bizjournals.com/portland/subscriber-only/2016/01/15/incubators-and-accelerators.html (Returns list of 5 accelerators and details)
- http://www.oregon4biz.com/Innovate-&-Create/R&D-Business/Incubators/ (Returns list of 26 accelerators and incubators)
Notes:
- Seed-DB appears for almost all of the search results
- Acceleratorinfo appears for most of the search results
- There are multiple cumulative reports of incubators per location, but not for accelerators
- Most regionalized accelerator lists deal with either an article or a ranking of a particular amount of accelerators in the area
- Many results returned nationally ranked lists of accelerators, such as the Forbes list of "Top Accelerators" or something along the lines of "Best Accelerators in the US". The connection is that perhaps one accelerator mentioned on the list may be located within the searched state.
- There are also a few results for actual particle accelerators that must be sorted out (i.e. superconducting super collider)
Found through google searching accelerators found previously
Found from googling YLE Media Startup Accelerator
- https://www.corporate-accelerators.net/database/index.html (DB of Corporate Accelerators 71-79 entries)
- http://startupaccelerator.vc/accelerator-corporate-innovation-sig/ (Database of Accelerators and Corporate Innovation 92 entries)
neither of these have had their entries added to list of accelerators
Individual Accelerator Evaluations
Summary: The purpose of this section is to create instructions for each accelerator on how to find cohort information from their URLs. Along with specific instructions for obtaining the cohorts for each accelerator chosen, there should be a list of easy-to-obtain and relevant statistics regarding the accelerator, such as information about its team, location, etc. The variable statistics list is cumulative, whereas the cohort directions are unique per the accelerator.
Accelerators Chosen (Format = Name (source))
- Blue Startups (http://www.acceleratorinfo.com/see-all.html)
- Launchpad LA (http://www.acceleratorinfo.com/see-all.html)
- Y Combinator (http://www.seed-db.com/accelerators)
- FlashPoint (http://www.seed-db.com/accelerators/all)
- Prosper Accelerator (https://www.f6s.com/programs?type)
- Axel Springer Plug and Play (http://www.axelspringerplugandplay.com/)
- Techstars (http://www.seed-db.com/accelerators)
- Startmate (http://www.seed-db.com/accelerators)
- Capital Factory (http://blog.shedd.us/321987608/)
- OwlSpark (Google search: "Houston + accelerators")
Accelerator: Blue Startups (http://bluestartups.com/)
Finding the cohort:
- Navigated to "Track Record" page under the "Home" tab; found total number of graduated cohorts to be 7
- Navigated to "Portfolio" tab. Tab includes list of all seven graduated cohorts along with companies emerging from each one. Each cohort is listed under a separate page (ex. "Cohort 1", "Cohort 2", etc) and at the bottom of each cohort page, there is a link to the other 6. Each company has a short description along with its URL.
- An "Alumni News" page at the bottom of "Portfolio" includes articles pertinent to graduated startups.
- Unfortunately does not include the date and year of each cohort class, but perhaps could cross-reference with other sources.
Accelerator: Launchpad LA (http://launchpad.la/)
Finding the cohort:
- Navigated to "Companies" in the top of the homepage
- "Companies" returns all companies backed by Launchpad LA based on their class year and number (cohort)
- Also sorted by active startups vs. inactive startups
- At the bottom of the "Companies" tab, there is a statistical layout returning values for the number of companies started by Launchpad during its time as an accelerator (2012-present), as well as the total funding funneled into the accelerator.
Accelerator: Y Combinator (http://www.ycombinator.com)
Finding the cohort:
- Scrolled down on the home page and clicked on a link entitled "See all companies".
- Navigated to a drop down menu named "All Batches", and clicked on it to expand the list.
- List is made up of dates ranging from 2005-2016, and these dates return lists of launched companies including most but not all of their URL's, as well as their launch year.
Accelerator: Flashpoint (http://flashpoint.gatech.edu/)
Finding the cohort:
- On upper right corner after animation, there is a tab sign which lets you navigate to a page labeled "Teams"
- The "Team" page has each batch of companies emerging from Georgia Tech, although it does not include the dates or cohorts of these companies. For example, "Batch 1" at the top of the page just lists the companies in the batch without URLs or any additional information.
- On the "Application" page on the tab near the top, there is information regarding Batch 7, which begins early 2017. Suggests that batch 6 either ended spring 2016 or fall 2016.
Accelerator: Prosper Women Entrepreneurs (http://www.prosperstl.com)
Finding the cohort:
- Navigated to "Accelerator" tab and clicked "Companies" when prompted with the drop down menu.
- This tab returned all of the launched company logos which then redirected to the company's home page when clicked.
- No other relevant form of information such as date launched or cohort was included on this page.
Accelerator: Axel Springer Plug and Play(http://www.axelspringerplugandplay.com/)
Finding the cohort:
- Clicked on the "Companies" tab on the home page and was directed to the middle of the page which included a short list of current companies.
- Clicked on the "All Companies" link which returned a page filled with startup logos and brief descriptions of those startups. When clicked, each logo serves to redirect to that startup's home page.
- Companies were not sorted by cohort or in any other relevant way.
Accelerator: Techstars (http://www.techstars.com)
Finding the cohorts:
- Navigated to the Accelerators tabs and clicked "Companies" on the drop down menu.
- Firstly, this returns a table comprised of a long list of different classes from different areas separated by years.
- Upon scrolling down further, each of these classes is broken down by the startups that graduated from them. It also includes information such as how much was invested in each startup, as well as whether or not the startup was acquired, is active, or failed.
Accelerator: Startmate (http://www.startmate.com.au)
Finding the cohorts:
- Navigated to the "Startups" tab, which returned a page of all startups that have graduated from Startmate.
- Startups are separated by year of graduation, and each company is linked on this page.
- It appears as if each year, 1 cohort is taken through the accelerator.
Accelerator: Capital Factory (https://capitalfactory.com/accelerate/)
Finding the cohorts:
- Navigated to the startups tab, which returned a long list of companies that were accelerated by Capital Factory.
- Each logo for the startups served as a link to their respective websites.
- There was no evidence or mention of any cohorts.
Accelerator: OwlSpark (http://entrepreneurship.rice.edu/accelerator/)
Finding the cohorts:
- Navigated to the "Startup Teams" tab, which returned a page that included links to 4 "Classes".
- Each class link i.e. (Class 1, Class 2, Class 3, Class 4) returned links to each startup that graduated from the program.
- These classes signify cohorts.
List of Promising Variables
- Key People (founders, lead entrepreneurs, strategists, etc.)
- Total number of launched companies
- A FAQ for application details, accelerator vision, and
- Funds raised per company (average)
- Features offered by accelerator (perks, space, tools, etc)
- General events hosted by the accelerator
- (Success) stories for graduated start-ups
E-R Diagram (in list form) for Identifying Attributes to Pull from Accelerators
Summary: I will look at different entities within the accelerator page (e.g accelerators, cohorts, founders) and then find potential attributes that can be codified from those entities. Along with the attribute, we list a potential method for pulling that particular attribute.
Format:
- Entity
- Attribute - Possible sources/ways to get
Ed: "Be creative with finding new attributes to pull!"
List
Accelerators
- Accelerator Name - Website, external database
- Contact Form - General contact section in each website
- Industry focus - can be pulled from description
- Description - pulled from website itself
- Takes equity? - Database or from "about" page
- Non-profit? - Database
- URL - Already have way of obtaining
- DNS Registration Date - Already have way of obtaining
- Address - Google Maps, maybe the website
- Founding Date - Google Maps, website, server registration
Accelerators (1) has (n) Features
Features
- Mentorship? - Description in website
- Space Offered - Google Maps, Website description
- Partnerships - Angel list, Same section as mentorship or events
- Hosted Events - Calender
Accelerators (1) has (n) Founders
Founders
- Name - Founders or Team Page
- Title - Directly underneath or next to name
- PhD? - Biography, webpage under name
- Serial - Biography
- Link back to "Accelerator Name" in Accelerators
Founders (n) has (n) Ventures
Ventures
- Other Companies - Biography, webpage
- Previous Companies - Biography
- Net Worth - Forbes, Biography
- Link back to "Name" in Founders
Accelerators (1) has (n) Cohorts
Cohorts
- Date + Accelerator = Cohort ID - Database or Website
- Number of Startups - Website, count from Startups
- Cohort Number - Categorization on website, external database
- Link back to "Accelerator Name"
Cohorts (1) has (n) Startups
Startups
- Names - Website, external database
- State of Inc - Angel List
- URL - Angel List, website
- Founding Date - Registration database, Angel List
- Industry - startup description
- Founding Location - Angel List
- Current Location - Angel List
- VC Raised to Date - SDC Platinum
- Angel Funds Raised to date - Angel List
Variables which Distinguish Accelerator Websites
- The word "Accelerator"
- This word appears at least one time on the home page of the vast majority of accelerator websites. The word "Accelerator" appears either as a link to another page on the website or in a title on the homepage of the website. Not many other websites contain this word on their homepage, especially not if one Googles something generic such as "Accelerators in the US".
- Fixed Term
- Accelerators normally work with their cohorts for 3 months. This is a major factor which differentiates between an accelerator and any other member of a startup ecosystem. If on their website they mention either "3 months" or "12 weeks", it is extremely likely that the website belongs to an accelerator.
- Cohorts, Portfolio, Class, or Companies
- This is a potential variable that could link the websites of many different accelerators. The problem with the word "portfolio" is also used by numerous venture capital firms, which could potentially cause complications when attempting to pull only the sites of accelerators from a Google search. The word "cohort", however, would have an extremely high probability of identifying the website as belonging to an accelerator. The words "class" and "companies" are promising but do not offer certainty.
- Equity, Investment
- Although by itself, equity does not mean much, when paired with any of these other terms, it could potentially point to an accelerator. Most accelerators take equity in the form of common stock (6-8%), or they will ask for some alternate form of stake in the company.
- Education and Mentorship
- Accelerators differ from incubators and angel investors in that they emphasize the education of the potential startup. They offer advice and intense mentorship from more experienced entrepreneurs within their staff, as well as many networking opportunities with the outside world. This variable is more difficult to find on the website of the accelerator, but I believe that if the website includes numerous keywords such as "education", "mentorship", or "networking opportunities", it would be somewhat safe to assume that the website is owned by an accelerator.
- Demo Day
- This variable does not have tremendous potential in terms of crawling websites, but I feel that it is worth mentioning. Most accelerators "graduate" their cohorts with a demo day, which is a day when the startups present their company to potential investors. If the website contains the words "demo day", which is fairly uncommon, it could be a good source of accelerator identification.
A combination of any of these variables would certainly identify the current website as belonging to an accelerator.
Comprehensive List of Accelerators
All text files saved in "Accelerators" project on the McNair RPD.
- Acc.Info: 190
- SeedDB: 240
- SARP: 59
- Corp: 79
- Total: 568 results
After removing duplicates and locations: 363 results
Doesn't count f6s, which returns 1170 results, roughly only 300 of which were accelerators. We created a crawler to sift through the webpages and parse HTML so we could identify the accelerators. Program and HTML saved on the Desktop.
Randomly Chosen Accelerators
- TLabs
- BetaSpring
- The Unilever Foundry
- AIA Accelerator
- R/GA Accelerator
- Zeroto510
- Hub:raum
- Orange Fab
- Furnace
- Launch Chapel Hill
Determining whether or not these are accelerators
Googled name of Accelerator and clicked on the first link
Looked for Variables which Distinguish Accelerator Websites
- TLabs: Homepage states: "Leading Indian Tech Accelerator"; TLabs is an accelerator, but it is located in India.
- Betaspring: Under the "About Betaspring" tab, it states that "Betaspring was among the first ten startup accelerators to launch worldwide".
- The Unilever Foundry: Does not claim to be an accelerator, nor does it have information on the website about cohorts. This name was pulled from the source Corporate Accelerators.
- AIA Accelerator: The word "accelerator" is included in the name. Under the "Overview" tab, it states that startups have received mentorship.
- R/GA Accelerator: Under the "Overview" tab it states that the "R/GA Accelerator is designed for startups and... it is a three month, immersive, mentorship driven program".
- Zeroto510: Website contains a "Portfolio Companies" tab which divides up the companies into cohorts. This identifies Zeroto510 as an accelerator.
- Hub:raum: Offers accelerator and incubator programs; however, none are located in North America.
- Orange Fab: States on the main page that "We're a 3-month accelerator program".
- Furnace: "About" tab states that Furnace is "an innovative startup accelerator designed to form, incubate, and launch new companies". Concludes with a Demo Day
- Launch Chapel Hill: Homepage states that they are "a startup accelerator". Also included on the homepage is a line that states "Applications for Cohort 7 are now open".
7/10 are accelerators located in the US.
2/10 are accelerators not located in the US.
1/10 is not an accelerator.
Steps for Extracting Cohort Information
- TLabs: Clicked on the "Startup" tab and located a drop down menu entitled "Showing Startups from:". This menu separates startups into Batches ranging from 1-9. These batches are cohorts.
- Betaspring: This website does not have a "Companies" or "Startups" tab. I clicked on their "Who" tab and noticed that within this section were two links called "Our portfolio" and "Our companies" which both linked to the same place. This place contained a list of the startups that Betaspring has funded, as well as links to each of the startup websites. The list was not separated into cohorts.
- The Unilever Foundry: Does not have a "Startups" or "Companies" link on the website.
- AIA Accelerator: Clicked on the "Startups" tab which returned a page with 5 companies and a bit of information on each of these companies. Also included the URL to each startup. However, the companies were not separated into cohorts, probably because there are so few of them.
- R/GA Accelerator: Clicked on the "Alumni" tab and navigated down the webpage. Startups are separated by class, which means cohort in this case. Startup info contains link to demo day presentation as well as the startup url.
- Zeroto510: Hovered over the "About Us" drop down menu and clicked on the "Portfolio Companies" link. Startups are separated by cohort, one for each year, starting from 2013.
- Hub:raum: Clicked on the "Portfolio" tab. Directed to a page with many names of startups, as well as a brief description of what their company is about. Also includes a link to each startup's website. Startups are not separated into cohorts, but rather by investment by location, current participants, and alumni.
- Orange Fab: Clicked on the "Startups" tab and was directed to a different page. Startups are not only separated into cohorts named "Seasons", but they are also separated by industry.
- Furnace: Clicked on "Portfolio" tab, but unfortunately the website is broken and it returned an error in code.
- Launch Chapel Hill: Clicked on the "Ventures" tab and was directed to a page in which all startups were separated into cohorts, and a brief description of the startup was provided underneath their logo.
Code
The directory for all data related to this project is located in:
E:\McNair\Projects\Accelerators
F6S Web Crawler
This is a python script using the selenium library that retrieves the html content of each page on F6S's North American Accelerator search results. The script is located in:
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
The script is titled f6s_crawler_gentle.py
When run, the script visits the F6S search page for North American Accelerator's and begins retrieving the HTML of each page in that search list. NOTE: Timing must be spaced out between all interactions with the browser. F6S has Captcha, and the program will fail if the site receives too many hit requests, or has any inkling that it is being probed by a bot.
The Accelerator HTML files are stored in:
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs\Accelerator_HTML_files
The Accelerator HTML files stored as text files are stored in:
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs\Accelerator_HTML_files_text
F6S Parser
The next step is to take the HTML files retrieved by the crawler and to parse them for necessary information. This parser should also determine whether or not the site is an accelerator site.
The code for the parser is located in
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
It is titled f6s_parser.py
To run the code, open the file in Komodo and press play. If running from the command line, change to the correct directory and run the following comand:
python f6s_parser.py
The list of accelerators that passed through the parser is in the same directory:
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
The tab delimited text file is named AcceleratorList. The file contains the names of the accelerators that had the keywords listed in the file. Also, the file contains the run dates and location of the accelerator if it was listed on the f6s page.
F6S API
F6S has an API, but we have had no success getting a key to the API. The link to get a key to the API is on this page.
I (Peter) have emailed F6S to ask for a key directly at support@f6s.com. As of the end of the Fall 2016 Semester, they have not responded.
FUN FACT (MASS-RENAME FILES USING WINDOWS POWER SHELL):
The following command allowed me to append ".txt" to all files in a folder once in the proper directory:
Get-ChildItem * | Rename-Item -NewName { $_.name + '.txt'}
To change file formats, Microsoft suggests:
Get-ChildItem *.txt | Rename-Item -NewName { $_.name -Replace '\.txt', '.log'}
Final Data
The Parser for parsing the text files of accelerator data is located in:
E:\McNair\Projects\Accelerators\Code+Final_Data
The Parser for parsing the cohort files of accelerator data is also located in:
E:\McNair\Projects\Accelerators\Code+Final_Data
This folder contains the Python parsers. The Final_data folder contains the tab-delimited text files of parsed data. final_accelerator_data.txt contains the generalized data saved in .txt files and final_cohort_data.txt contains the cohort data saved in .cohort.txt files.
All the files entitled accelerator_data are subsets of the final_accelerator_data.txt file, but each file contains only the accelerators that matched to the flag specified in the file title.
find_headers .py finds a set of the headers for all the cohort files from the seed list project.
Google SiteSearch
E:\McNair\Projects\Accelerators\Google_SiteSearch
This folder contains code for a google search parser. The script sitesearch.py will search for a queried company and return a likely web address for that company.
Way Back Machine Parser
E:\McNair\Projects\Accelerators\Code+Final_Data\wayback_machine.py
This script takes URLs and returns a timestamp for the oldest documented webpage under that URL courtesy of the Way Back Machine Archive.
Process Locations
E:\McNair\Projects\Accelerators\Code+Final_Data\process_locations.py
This script takes a physical address and converts it into latitude and longitude coordinates. Should be used in conjunction with the Enclosing Circle program to find the concentration of accelerators.
E:\McNair\Software\CodeBase\EnclosingCircle.py
Kauffman Foundation Incubator Proposal Information
Institutions
Summary: F6S, Crunchbase, seed-db
Tools: Matcher - used to match lists of potential accelerators with our current list to identify duplicates/new matches (E:\McNair\Projects\Accelerators)
F6S
F6S WebCrawler and F6S Parser - E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
CrunchBase
CrunchBase 2013 Snapshot (All Organizations)- E:\McNair\Projects\Accelerators\organizations.xls
CrunchBase 2013 Snapshot (Potential Accelerators)- E:\McNair\Projects\Accelerators\organizations.accdb under "Potential Accelerators query"
- Obtained using keyword matches in the descriptions of the potential accelerators.
CrunchBase 2013 Snapshot (New Verified Accelerators) - E:\McNair\Projects\Accelerators\New CrunchBase Accelerators.xls
We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies.
AngelList
seed-db
Obtained through www.seed.db/accelerators
Global Accelerator Network (GAN)
GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\scrapeaccel.py
GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data
- Contains: Company Name, # of Companies Range, % of Companies Funded, Funding Raised by Companies, Employee Range, Exit Funding, Exit Date, Total Company Funding Raised, # of Mentors Range, % Equity, Location, Minimum Seed Capital Investment
Cohorts
- Cohorts obtained manually
- All Cohort txt files are saved under "E:\McNair\Projects\Accelerators\Data
- cohort file name = (accelerator name).cohort
- Most updated Accelerator cohort data: E:\McNair\Projects\Accelerators\Cleaned Cohort Data.xls
Automation for obtaining cohorts??
Other Information
Summary: Whois Parser, Geocode, Tools to determine industry, etc
Whois Parser
- Retrieves and parses Whois information. Specifically, takes a file with a column of domain names and populates the corresponding columns with information from the WhoIs API.
- Often used to obtain locations.
Geocode
Input: Company Address Output: Directional Coordinates
- Used to obtain the locations of different Accelerators and Cohort companies.
SDC Platinum Pull
Used to obtain funding information and match companies that have gotten funding with companies that are Accelerator cohorts.
Desired Information/Variables
- Key People (founders, lead entrepreneurs, strategists, etc.)
- Total number of launched companies
- A FAQ for application details, accelerator vision, and
- Funds raised per company (average)
- Features offered by accelerator (perks, space, tools, etc)
Desired Tools/Information
Automating the Process of Obtaining Cohorts
- Automating this process would save a lot of time and really progress the project.
Obtaining More Details on Accelerators
- Having the kind of thorough information on industry, companies, funding, location, exits, mentors, leadership, that we got for the GAN companies would be fantastic.
List of Alive/Dead Accelerators
This is a dream but would be very helpful