Changes

Jump to navigation Jump to search
no edit summary
{{Project|Has project output=Data,Tool|Has sponsor=McNair ProjectsCenter
|Has title=Accelerator Seed List (Data)
|Has owner=Shrey Agarwal, Matthew Ringheanu, Veeral Shah, Connor Rothschild,
|Has start date=Fall 2016
||Has keywords=Accelerators,Data|Has project status=ActiveSubsume
|Is dependent on=Industry Classifier
}}
=Current Work=
 
===As of 05/21/2018 the Google Sheet Workbook has been downloaded to the E drive. The now Excel Workbook is saved at E:\McNair\Projects\Accelerators\Summer 2018\Accelerator Master Variable List.xlsx. This is now the master file.===
 
Google Master Sheet: https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=0
*Cross-reference sheet with data from Peter's old accelerator consolidation file ("accelerator_data_noflag" and "accelerator_data" in "All Relevant Files") and fill in missing data
*Variables that are 100% NOT in these 2 files:
**Cohort Breakout?
**Subtype
**Designed for Students?
**Campuses
**Stage
**Software Tech
**What stage do they look for?
 
TODO:
McNair/Projects/Accelerators/Fall 2017/unfound_founders.txt
A 0 means we don't have founder data for that accelerator.
Specs: A tab delimited text file with the following fields:
Accelerator First Name Last Name LinkedInURL(if possible)
Getting the LinkedInURL will ensure accuracy, but will work without it.
 
 
*Shrey: Find "demo day" keywords, so that we can search AcceleratorName Year Keyword and get back potential demo day pages
 
 
==Accelerator Type project==
 
File to edit is called "Accelerator type list". Located in the folder E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs. More systematic information and instructions are in"Instructions for Accelerator type project" in E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs.
 
NOTE: until we get through all 270 accelerators, we will just categorize each accelerator into the following three categories as quickly as possible with short notes in teh "other info" column for these; once we have this, we will go back through the ones that aren't categorized and add notes to the "other info" column.
 
 
Type list:
*Private
*Corporate
*Academic
Note: if DEAD, noted here.
 
 
Other info:
*nonprofit? (y/n)
 
*Subtype abbreviations:
**S: for if a social entrepreneurship initiative
**I: for if an incubator
**A: for an angel group
**F: for foreign
**C: for in coworking space/hub/etc
**V: for if part of venture fund
**G: for if government funded/partnered
**T: for international
 
 
Note: subtypes (from individual text files in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data) were only found for 23 of the 270 accelerators. These accelerators were initially intended to be removed from the master list. Remaining subtypes are currently being added.
 
other info:
 
international offices, founders, industries, org type, program duration, or other interesting, easily accessed variables. Additional information is especially important for accelerators that have no other subtype abbreviation listed.
 
 
===Steps to research an accelerator===
 
1. Copy/paste URL listed in Accelerator type list file into google. If website is insufficient, try googling:
the name of the accelerator
the name of the accelerator + "crunchbase"
the name of the accelerator + "nonprofit"
 
the above steps sometimes lead to other helpful databases/news articles
 
2. Note whether:
1) Academic/Corporate/Private
2) For Profit/Nonprofit. Sometimes this isn't directly stated but can be inferred through their description of, say their investment process. If they don't address this at all it's probably For Profit.
3) subtype (S, I, A, F, C, V, G, T).
4) Additional, easily-accessed info. Number 4 is really important if there's no subtype.
 
All 270 need to be done by the end of the semester.
 
 
Type list file saved as
"Accelerator type list" in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs.
The list of ListofAccs, from which we drew Accelerator type list, should have no matches with any of the flagged accelerators in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data. There are 23 matches though. So all subtypes must be searched and entered manually. Whether some were a nonprofit was listed in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs, called "whether nonprofit...". Accelerators with no info there on whether nonprofit need to have info entered manually.
 
=Funded By Accelerators=
 
Reference the like-named portion in [[Crunchbase Data#Funded by Accelerators|Crunchbase Data]]
=End of Semester Report=
#Work on the [[Crunchbase 2013 Snapshot]]
#Match cohort companies to VC -backed portfolio companies
#Refine our data to work out which cohort each cohort company was a member of, cohort start dates and locations, etc.
#Make a list of top accelerator lists (e.g., http://tech.co/top-startup-accelerators-ranked-2012-08) and check that we have those accelerators
=End of Semester Notes=
*We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data"or on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet].*We have listed all of the startups from the accelerators that have break out cohorts on their website on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet]. This contains the following information in the "Cohort List (new)" sheet: accelerator name, year, cohort name, company name, description, founders, category/sector, and location. *Next steps include going through the demo day pages that have been downloaded and writing notes on the different types if possible (see [[Demo Day Page Google Classifier]]).
=Data Collection Notes=
 
==MATCHING==
 
The files we used to match are located in the E drive. We used the matcher to match our portfolio company names from the cohort file located in E:\McNair\Projects\Accelerators.
*The files used to matching are located E:\McNair\Projects\Accelerators\Matcher
*Portco is the name of the companies pulled from the cohort file
*AccCo includes both the cohort company name, along with the name of the accelerator itself
*In the matcher, the inputs are the PortCo names, as well as the VC data from our pull in SDC
*The outputs include the AccCo_VC data located in E:\McNair\Projects\Accelerators which give a lot of information on the matches, including:
:*name of the match itself
:*number of investments
:*dates that the company received its investments
 
==SDC Pull==
 
We accessed SDC platinum and pulled information on round-based funding that all registered companies received from between the years 1999 to 2017.
 
The receipt is as follows:
 
Session Details
---------------
Request Hits Request Description
0 - DATABASE: Portfolio Companies (VIPC)
1 96155 Venture Related Deals: Select All Venture Related Deals
2 79572 Round Date: 1/1/1999 to 3/1/2017 (Custom) (Calendar)
3 Custom Report: VC Data (Columnar) - Save As:
E:\McNair\Projects\Accelerators\VC Data.txt
Billing Ref # : 2054025
Capture File : riceuniv.2054025
Session Name :
 
The VC data pull includes the following variables:
 
Company Name Date Company Date Company Company Company City Company Street Address, Line 1 Company Street Address, Line 2 Total Known Company Industry Sub-Group 3 Company Industry Major Group Round Company Stage Level 3 Round Amt, Round Amt,
==3 files==
==Link to Crunchbase API application==
https://about.crunchbase.com/forms/research-access-apply/(Does not work anymore) https://data.crunchbase.com/v3/docs/using-the-api (Has new instructions for application)
==Sign-Ups==
*http://www.represent.la/
*http://www.launch.co/blog/complete-list-of-incubators-and-accelerators-like-y-combinat.html
*https://angel.co/accelerator-4(Does not work - seems to be replaced by https://angel.co/companies?company_types[]=Incubator )
(Obtained from Google search: "Accelerator Database")
==Source: http://www.seed-db.com/accelerators/all==
#Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results.
#Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
*Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive.
*Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc.
 
==Source: http://www.seed-db.com/accelerators==
*Examples of single accelerators found
:#TMCx: http://www.tmc.edu/innovation/innovation-programs/tmcx/
:#RED labs: http://redlabs.uh.edu/8
:#SURGE accelerator: https://kirkcoburn.com/
:#OwlSpark: http://owlspark.com/
:#NextHIT: http://www.houstonhealthventures.com/nexthit-accelerator-program-application/
 
===Los Angeles Accelerators===
:#Amplify: http://amplify.la/
This script takes a physical address and converts it into latitude and longitude coordinates. Should be used in conjunction with the Enclosing Circle program to find the concentration of accelerators.
E:\McNair\Software\CodeBase\EnclosingCircle.py
 
=Kauffman Foundation Incubator Proposal Information=
 
==Institutions==
Summary: F6S, Crunchbase, seed-db
 
Tools: Matcher - used to match lists of potential accelerators with our current list to identify duplicates/new matches (E:\McNair\Projects\Accelerators)
 
===F6S===
F6S WebCrawler and F6S Parser - E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
 
===CrunchBase===
 
CrunchBase 2013 Snapshot '''(All Organizations)'''- E:\McNair\Projects\Accelerators\organizations.xls
 
CrunchBase 2013 Snapshot '''(Potential Accelerators)'''- E:\McNair\Projects\Accelerators\organizations.accdb under "Potential Accelerators query"
 
*Obtained using keyword matches in the descriptions of the potential accelerators.
 
CrunchBase 2013 Snapshot '''(New Verified Accelerators)''' - E:\McNair\Projects\Accelerators\New CrunchBase Accelerators.xls
 
We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies.
 
===AngelList===
 
===seed-db===
 
Obtained through www.seed.db/accelerators
 
===Global Accelerator Network (GAN)===
 
GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\scrapeaccel.py
 
GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data
*Contains: Company Name, # of Companies Range, % of Companies Funded, Funding Raised by Companies, Employee Range, Exit Funding, Exit Date, Total Company Funding Raised, # of Mentors Range, % Equity, Location, Minimum Seed Capital Investment
 
==Cohorts==
 
*Cohorts obtained manually
*All Cohort txt files are saved under "E:\McNair\Projects\Accelerators\Data
*cohort file name = (accelerator name).cohort
*Most updated Accelerator cohort data: E:\McNair\Projects\Accelerators\Cleaned Cohort Data.xls
 
Automation for obtaining cohorts??
 
==Other Information==
Summary: Whois Parser, Geocode, Tools to determine industry, etc
 
===Whois Parser===
 
*Retrieves and parses Whois information. Specifically, takes a file with a column of domain names and populates the corresponding columns with information from the WhoIs API.
 
*Often used to obtain locations.
 
===Geocode===
 
Input: Company Address
Output: Directional Coordinates
 
*Used to obtain the locations of different Accelerators and Cohort companies.
 
===SDC Platinum Pull===
 
Used to obtain funding information and match companies that have gotten funding with companies that are Accelerator cohorts.
 
===Desired Information/Variables===
 
*Key People (founders, lead entrepreneurs, strategists, etc.)
*Total number of launched companies
*A FAQ for application details, accelerator vision, and
*Funds raised per company (average)
*Features offered by accelerator (perks, space, tools, etc)
 
==Desired Tools/Information==
 
===Automating the Process of Obtaining Cohorts===
*Automating this process would save a lot of time and really progress the project.
 
===Obtaining More Details on Accelerators===
 
*Having the kind of thorough information on industry, companies, funding, location, exits, mentors, leadership, that we got for the GAN companies would be fantastic.
 
===List of Alive/Dead Accelerators===
 
This is a dream but would be very helpful

Navigation menu