Changes

Jump to navigation Jump to search
no edit summary
{{Project|Has project output=Data,Tool|Has sponsor=McNair ProjectsCenter|Project TitleHas title=Accelerator Seed List (Data),|Topic Area=Entrepreneurship Ecosystems,|OwnerHas owner=Shrey Agarwal, Matthew Ringheanu, Veeral Shah, Connor Rothschild,|Start TermHas start date=Fall 2016,|KeywordsHas keywords=Accelerators,Data|Primary BillingHas project status=AccMcNair01,Subsume|Is dependent on=Industry Classifier
}}
This project will be used to determine which accelerators are the most effective at churning out successful startups, as well as what characteristics are exhibited by these accelerators. First, we need to gather as much data as we can about as many accelerators as we can in order to look at factors differentiate successful vs. unsuccessful ventures.. Next, we need to create a web crawling program which will gather information about accelerators across the world by accessing their websites and extracting information. I believe that our overall goal with this research project is to gain insight into the methods of successful accelerators, as well as to find out what exactly differentiates very successful accelerators from dead accelerators.=Current Work=
Helpful Links: http:===As of 05/21/seedrankings2018 the Google Sheet Workbook has been downloaded to the E drive.com/ httpThe now Excel Workbook is saved at E://www\McNair\Projects\Accelerators\Summer 2018\Accelerator Master Variable List.forbesxlsx.com/sites/briansolomon/2016/03/11/This is now the-best-startup-accelerators-of-2016/#38b2114624f2master file.===
Google Master Sheet: https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=Pre5aa2f1f9#gid=0*Cross-existing Data=reference sheet with data from Peter's old accelerator consolidation file ("accelerator_data_noflag" and "accelerator_data" in "All Relevant Files") and fill in missing data*Variables that are 100% NOT in these 2 files:**Cohort Breakout?**Subtype**Designed for Students?**Campuses**Stage**Software Tech**What stage do they look for?
TODO: McNair/Projects/Accelerators/Fall 2017/unfound_founders.txtA 0 means we don't have founder data for that accelerator.Specs: A tab delimited text file with the following fields: Accelerator First Name Last Name LinkedInURL(if possible)Getting the LinkedInURL will ensure accuracy, but will work without it.  *Shrey: Find "demo day" keywords, so that we can search AcceleratorName Year Keyword and get back potential demo day pages  ==Accelerator Type project== File to edit is called "Accelerator type list". Located in the folder E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs. More systematic information and instructions are in"Instructions for Accelerator type project" in E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs. NOTE: until we get through all 270 accelerators, we will just categorize each accelerator into the following three categories as quickly as possible with short notes in teh "other info" column for these; once we have this, we will go back through the ones that aren't categorized and add notes to the "other info" column.   Type list:*Private*Corporate*Academic Note: if DEAD, noted here.  Other info:*nonprofit? (y/n) *Subtype abbreviations:**S: for if a social entrepreneurship initiative**I: for if an incubator**A: for an angel group**F: for foreign**C: for in coworking space/hub/etc**V: for if part of venture fund**G: for if government funded/partnered**T: for international   Note: subtypes (from individual text files in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data) were only found for 23 of the 270 accelerators. These accelerators were initially intended to be removed from the master list. Remaining subtypes are currently being added. other info:  international offices, founders, industries, org type, program duration, or other interesting, easily accessed variables. Additional information is especially important for accelerators that have no other subtype abbreviation listed.  ===Steps to research an accelerator=== 1. Copy/paste URL listed in Accelerator type list file into google. If website is insufficient, try googling: the name of the accelerator the name of the accelerator + "crunchbase" the name of the accelerator + "nonprofit"  the above steps sometimes lead to other helpful databases/news articles 2. Note whether: 1) Academic/Corporate/Private 2) For Profit/Nonprofit. Sometimes this isn't directly stated but can be inferred through their description of, say their investment process. If they don't address this at all it's probably For Profit. 3) subtype (S, I, A, F, C, V, G, T). 4) Additional, easily-accessed info. Number 4 is really important if there's no subtype.  All 270 need to be done by the end of the semester.  Type list file saved as "Accelerator type list" in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs.The list of ListofAccs, from which we drew Accelerator type list, should have no matches with any of the flagged accelerators in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data. There are 23 matches though. So all subtypes must be searched and entered manually. Whether some were a nonprofit was listed in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs, called "whether nonprofit...". Accelerators with no info there on whether nonprofit need to have info entered manually. =Funded By Accelerators= Reference the like-named portion in [[Crunchbase Data#Funded by Accelerators|Crunchbase Data]] =End of Semester Report=The end of semester report will focus on ranking accelerators and environments based on the variables we have gathered. Our primary form of categorization will be ranking individual accelerators based on their venture capital raise rate. We can probably generate information over time for accelerators and the amount of VC they raised to get a sense of what locations have developed in the past five years from the dates of transactions recorded by SDC. To obtain these rankings, we will identify which cohorts companies were trained in, as well as complete details of the accelerator and the details of cohort companies. We will focus only on accelerators because there are many other entities in each ecosystem. We will also utilize information on IPO or acquisition by companies, obtained through Crunchbase, to gain some sense of how successful startups emerging from a particular accelerator are. To obtain the data over time, we will need to fill out the cohort date information column in our cohort data, which will require the help of either Crunchbase or the Wayback machine for older accelerators. In ranking the accelerators across regions, we can also track industry-specific hotspots for accelerators such as medicine in Memphis or technology in San Francisco. To complete the report, we need to fill information in:*Industry and focus*Location*Name, description*Matched VC data*Founder information (maybe) =Overview=This project is developing broad and near-population data on accelerators and their cohort companies. The objective is to identify which cohorts of which accelerators a cohort company was trained in, obtain details of the accelerators, and obtain details of the cohort companies, including information about any venture capital investment that the cohort company might have received and any IPO or acquisition the company may have experienced. The primary use of this data is for an academic paper detailed on the [[Matching Entrepreneurs to Accelerators and VCs (Academic Paper)]] page.  However, this project can also provide useful data to other academic papers ([[Urban Start-up Agglomeration]], [[Hubs (Academic Paper)]], and [[Hubs Scorecard (Academic Paper)]]), projects ([[Houston Entrepreneurship]]) and blog posts (under the [[Emerging Ecosystems]] umbrella project). This project needs the results of the [[Industry Classifier]], [[Whois Parser]], and other tools. =Current Project Write-Up= ==Things To Do==*Obtain all URLs for accelerators in order to run through the Wayback Machine to find out when they started.*Match Crunchbase Data with our Accelerator List to see if they have any accelerators that we do not.*Obtain an example of accelerator that started early and has multiple companies but does not separate them into cohorts and figure out a way to determine which companies went through each cohort. ==What Each File in the "Accelerator" Folder on the RDP Contains==*"Accelerator List Sources" (Folder) - This folder contains most of the sources that we pulled accelerator names from at the very beginning of the project.*"Code+Final_Data" (Folder) - This folder contains Peter's code for pulling the data from the text files in the "Data" folder.*"Crunchbase Snapshot" (Folder) - This folder contains the data we obtained from Crunchbase. There is a massive amount of data which we will need to sort through to find useful information and hopefully match that data with our current cohort data.*"Data" (Folder) - This folder contains all of our data on accelerators including cohort information and the html files of each cohort page. I would estimate that it is about 95% clean currently.*"Data - Copy" (Folder) - This is just a copy of our current "Data" folder.*"Data_Copy" (Folder) - This is a copy of our original "Data" folder before we did any manual cleaning.*"Enclosing_Circle" (Folder) - This folder seems to contain some data on VC but I'm not sure how it pertains to the Accelerator project.*"F6S Accelerator HTMLs" (Folder) - This folder contains the HTML pages of all the pages on the F6S website. We used it to add more potential accelerators to our list.*"Google_SiteSearch" (Folder) - This folder contains Python code for Google searches.*"Industry_Classifier" (Folder) - This folder seems to contain Python code but I'm not sure what for.*"Matcher" (Folder) - This folder contains the Matcher.*"Python WebCrawler" (Folder) - This folder contains code that is a work in progress for pulling descriptions from accelerator websites. It is Jeemin's project.*"Cleaned Cohort Data Copy" (Excel File) - This file contains a copy of our cleaned cohort data.*"Cleaned Cohort Data" (Excel File) - This file contains the most current, completely cleaned data on cohort company information.*"NormalizeFixedWidth" (PL File) - This is the normalizer.*"PortCoNames" (TXT File) - This file contains all of the names of the cohort companies as well as the accelerator they went through.*"VC Data" (Excel File) - This file contains all of the names of the companies that have ever received VC funding.*"VC_Data" (TXT File) - This file contains that non-normalized data of all of the VC information.*"VC_Data_Names" (TXT File) - This file contains all of the names of companies that have received VC funding.*"VC_Data_Names_Matched_PortCoNames" (Excel File) - This file contains all of the cohort companies that have also received VC funding. Still needs to be sorted through. ==Process==After accumulating the massive amount of data on accelerators, their cohorts, and their html files, we began cleaning those text files, which are located in the "Data" folder within "Accelerators". After going through the first round of cleaning, we ran a code through the cohort data which put all of that information into an Excel document called "Cleaned Cohort Data". There were still some mistakes in the cohort information unfortunately, which we fixed within the Excel file itself. Therefore, there are some text files within the "Data" folder that do not match with the "Cleaned Cohort Data" file. If we were to run the cohort code through the "Data" folder, we would get something that does not match with the "Cleaned Cohort Data" file, which is problematic. The solution to this (other than manually cleaning the text files again) would be to write a code from the "Cleaned Cohort Data" file which would allow us to clean the data in the "Data" folder through the format of the Excel file. We have also matched all of the cohort companies with our list of all companies that have received VC funding. =Current To Do= #Work on the [[Crunchbase 2013 Snapshot]]#Match cohort companies to VC-backed portfolio companies#Refine our data to work out which cohort each cohort company was a member of, cohort start dates and locations, etc.#Make a list of top accelerator lists (e.g., http://tech.co/top-startup-accelerators-ranked-2012-08) and check that we have those accelerators =End of Semester Notes= *We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data" or on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet].*We have listed all of the startups from the accelerators that have break out cohorts on their website on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet]. This contains the following information in the "Cohort List (new)" sheet: accelerator name, year, cohort name, company name, description, founders, category/sector, and location. *Next steps include going through the demo day pages that have been downloaded and writing notes on the different types if possible (see [[Demo Day Page Google Classifier]]). =Data Collection Notes= ==MATCHING== The files we used to match are located in the E drive. We used the matcher to match our portfolio company names from the cohort file located in E:\McNair\Projects\Accelerators. *The files used to matching are located E:\McNair\Projects\Accelerators\Matcher*Portco is the name of the companies pulled from the cohort file*AccCo includes both the cohort company name, along with the name of the accelerator itself*In the matcher, the inputs are the PortCo names, as well as the VC data from our pull in SDC*The outputs include the AccCo_VC data located in E:\McNair\Projects\Accelerators which give a lot of information on the matches, including::*name of the match itself:*number of investments:*dates that the company received its investments ==SDC Pull== We accessed SDC platinum and pulled information on round-based funding that all registered companies received from between the years 1999 to 2017. The receipt is as follows: Session Details---------------Request Hits Request Description 0 - DATABASE: Portfolio Companies (VIPC) 1 96155 Venture Related Deals: Select All Venture Related Deals 2 79572 Round Date: 1/1/1999 to 3/1/2017 (Custom) (Calendar) 3 Custom Report: VC Data (Columnar) - Save As: E:\McNair\Projects\Accelerators\VC Data.txtBilling Ref # : 2054025Capture File : riceuniv.2054025Session Name :  The VC data pull includes the following variables:  Company Name Date Company Date Company Company Company City Company Street Address, Line 1 Company Street Address, Line 2 Total Known Company Industry Sub-Group 3 Company Industry Major Group Round Company Stage Level 3 Round Amt, Round Amt, ==3 files== For each accelerator in the list, put files in E:\Projects\Accelerators\Data*AcceleratorName.txt - copy and paste the variables below into a (tab-delimited) txt file and complete*AcceleratorName.cohort - your cohort text file (see below)*AcceleratorName.html (possibly automatically with a folder too) - save a copy of the html of the cohort page ==.txt Variables==  Name Score Flag CohortURL Address Duration Vintage Industry Description Equity NonProfit Notes   Try to get '''Name, Score, Flag, Cohort URL and Address''' for all. ONLY GRAB OTHER VARIABLES IF EASY. Just leave things blank if you can't find them quickly. '''If the score is 0, or the flag is S, I, A, or F just stop''' - don't bother downloading a cohort list, saving an HTML file, etc. If possible, do stick a very brief description of the problem in the notes field. Notes:*Score: is 0-1 where 0 is definitely not an accelerator, 1 is definitely an accelerator*Flag: (leave blank if not needed), if multiple then separate by comma**S for social entrep**I for incubator**A for an angel group**F is for foreign**C for in coworking space/hub/etc**V for if part of venture fund**D is for Dead*Put just the root URL in Cohort URL if there isn't a Cohort page*Duration: in wks (months x 4.33 and round)*Vintage is year of first cohort if possible*Industry is industry focus but only if clear focus*Equity is a number (don't put %) or Y/N*Notes is only there if need it. Particularly try to use this field to note discards. ==.cohort files== Your .cohort files must:*Be tab delimited txt*Have a header*The first column must be the portfolio company name*Grab as many columns as you can easily (and name them) ==Standardized format for text files== Information Text file*1 tab only after each category*No spaces after commas for flags or industry*For duration put only a number in weeks but do not write "weeks"*Equity is either only a number (no percent sign) or a Y/N  Cohort Text file*1 tab between each column*Titles of each column on top*Make a new category for "Cohort Number" and write either "1 2 3 4 etc."*Matthew: 1-225 (done) Shrey: 226-550 (done) ==Link to Crunchbase API application== https://about.crunchbase.com/forms/research-access-apply/ (Does not work anymore) https://data.crunchbase.com/v3/docs/using-the-api (Has new instructions for application) ==Sign-Ups==  Ed - 1-10 (done) Carlin - 11-20 (done) Carlin - 21-40 (done) Christy - 41-60 (done) Avesh - 61-80 (done) Eliza - 81-100 (done) Meghana - 101-120 (done) Peter - 121-140 (done) Ramee - 141-160 (done) Will - 161-180 (done) Matthew - 181-200 (done) Julia - 201-220 (done) Peter - 221-240 (done) Shrey - 241-260 (done) Matthew - 261-280 (done) Eliza - 281-300 (done) Julia - 301-320 (done) Shrey - 321-340 (done) Carlin - 341-361 (done) Julia - 362-380 (done) Dylan - 381-393 (done) Jake - 394-404 (done) Dylan - 405-410 (done) Avesh - 411-415 (done) Dylan - 416-423 (done) Peter - 424-460(done) Carlin - 461-480 (done) Peter - 481-490(done) Julia - 491-510 (done) Peter - 511-515 (done) Julia - 516-529 (done) Ben - 530-540 (done) Shrey - 541-551 (done) =List of Accelerators=#10Xelerator#1440#33entrepreneurs#500 Startups#9Mile Labs#AIA Accelerator#ARK Challenge#AT&T Aspire Accelerator#ATDC Community#AZ TechCelerator#AccelFoods#Acceleprise#Accelerate Baltimore#Accelerate Genius#Accelerate Tectoria Accelerator#Accelerator Centre#Advanced Technology Development Center (ATDC)#Airbus BizLab#Alchemist Accelerator#AlphaLab#Amplify.LA#Angel Capital#Angelcube#Angelpad#Annual Business BootCamp#Arizona Center for Innovation#Arizona Furnace#Arrowhead Tech Incubator 2016#Aspire 3 Accelerator 2017#Atlanta Ventures Accelerator #AutoXLR8R#Awesome Inc.#Axel Springer Plug and Play#B 4 Change Impact Accelerator#B2B Acceleration Program#B4C Social Venture Accelerator#BBC Worldwide Labs#BMW Startup Garage#Brandcelerate#Bunker Labs#Bank of Ireland Accelerator Programme#Bantunium Labs Accelerator#Barclays Accelerator#Barclays New York Summer 2015#Berkley Ventures#Bessemer Business Incubation System#Beta-i#Beta.MN#BetaFactory#BetaSpring#Betablox#Betaspring RevUp (DUPLICATE)#Bethnal Green Ventures#BioAccel#BioInspire#Bir 2015#BitAngel Engagement Level#BitAngels Startup Summer Program of 2013#Bizdom#Black Forest Accelerator#Blue Startups#Blueprint Health#Bolt Boston#Bonnier Accelerator#BoomStartup#BoomStartup Winter 2017 (DUPLICATE)#Boomtown Accelerator#Boomtown Health Tech (DUPLICATE)#Boost VC#BootupLabs#Brandery#Brooklyn Beta Summer Camp#Budweiser Dream Brewery#Buildit#BuiltinPGH Companies#Business Innovation Center#Business Opportunity Academy 2017#Business Technology Development Center (BizTech)#CLT Joules Energy Accelerator 2014#CWI Ventures#CWI Ventures Application (DUPLICATE)#CableLabs Technology Tours 2016#Capital Factory#Capital Innovators#Capital Investment Network (Startups)#Caroline Plouff#Catalyst Partners#Cause Collective : Social Innovation Lab#Center for Entrepreneurial Innovation#Chain Reaction Innovations 2017#Chemical Angel Network#Chinaccelerator#Cisco Entrepreneurs in Residence#Citi Accelerator#Citrix Startup Accelerator#Claremont/Upland Makerspace Fablab#Climate Ventures 2.0 Accelerator#Co.Lab accelerator#Code for America Accelerator#Cohab's Traxtion Point#Collision Conference Investors#Common Bond#Communitech Hyperdrive#Conquer Accelerator#Coolhouse Labs#CuriousMinds Incubator / Accelerator#CyberTECH San Diego#DBS Accelerator#DPD Last Mile labs#DV X Labs#Dat Ventures#Decatur-Morgan County Entrepreneurial Center#Deep Space Ventures#Demo Accelerator 2016- 2017#DeveloperTown#Difference Engine#Digital Malaysia Corporate Accelerator Program#Digital Media Zone Incubator/Accelerator#Disney Accelerator#DogFish Accelerator#Domi Station#Dotforge accelerator#Dream Funded#DreamIT Health#DreamStart - Free Mentoring Program#Dreamit Ventures (DUPLICATE)#Ducky Diggy Lloyd #E-Capital Summit#EC Mentor Skills Inventory#EIGERlab#ETRAC#EY Startup Challenge#Eco Holding#Eleven Startup Accelerator#Emerge Xcelerate#EnterpriseWorks Incubation Program#Entrepreneur Development Center#Entrepreneurs Roundtable Accelerator#Environmental Business Cluster#Equity Legal#Excelerate Labs#Execution Labs#Exhilarator#Extreme Startups#Extreme University#FOOD-X#Factory45#Fargo Startup House 2014-2015#FastTrack Propero Healthcare#FbFund#Female Propeller for High Flyers#FinTech Innovation Lab#FinTech Studios 2015#Fintech Founders Club #2#First Growth Venture Network#Fishbowl Labs AOL#Flagship Enterprise Center#FlashStarts#Flashpoint#Flat6 Labs#Fledge9#Flextronics Lab IX#Food Future Scale-up Accelerator 2017#Food System 6 (FS6) Accelerator#FoodForwardX#Fortify Ventures#Founder Institute#FounderFuel#FoundersPad#Fownders Accelerator#French Accelerator 2016#Fund the Food#Fuse Corps Host#GAKKEN Accelerator Program#Gainesville Technology Enterprise Center#Game CoLab Incubator Program 2014#GameFounders#GammaRebels#Gazelle Lab#Gener8tor#German Accelerator Life Sciences#German Accelerator Tech#Global Accelerator Network 2015#Good Works Houston Lab#GoodCompany Ventures#Google Launchpad Accelerator#Grants4Apps Accelerator#GreenStart#Greenlite Labs#GrowLab#Growth Hacking Accelerator 2015#Gulf Coast Center for Innovation and Entrepreneurship#H-Farm Ventures#HACKT Mission for International Founders#HAXLR8R#HCC Entrepreneurship Launchpad#HIGHLINE Academy#HUB#HUBB Accelerator#HUBB GTLA 2016#HackFWD#Hatch#Health Wildcatters#Health accelerator#Healthbox#Hero City Co-Working Space#High Street Startups Accelerator#Highway1#Honda Xcelerator #Houston Technology Center#Hub Ventures#HugeThing#I/O ventures#ICONYC labs#IDC Elevator#INcubes Funnel and Accelerator 2014/2015#INcubes Online Form#INcubes Startup Visa#Illumina Accelerator#Illuminator, New York Accelerator 2015#Imagine K12#Immokalee Business Development Center#Impact Engine#Impact USA - 2017#Incubate Miami#Infuse Accelerator#Ingenuity Partner Program#InnoSpring#Innov&Connect#Innov8 for Health#Innova Memphis#InnovateOC#Innovation Depot#Innovation Pavilion#Innovation Showcase Winter 2017#Insight Accelerator Labs#Intel Education Accelerator#Investment Preparedness Lab#Invoke Collective#Iowa Startup Accelerator#JFDI.Asia#JFE Accelerator SF#JLAB#Jaguar Land Rover Tech Incubator#Jolt#JumpSchool #JumpStart Foundry#Jumpstart! Boulder#JusticeXL#Kairos Boston Spring Program#Kaplan EdTech#Kick#Kick Boise#Kick LA#Kick Victoria#Kicklabs#Kinetiq Labs#L-SPARK Accelerator#LAUNCH incubator#LAUNCHub#LI TechCOMETS#LabFunding Project Accelerator 2014#Labs Venture Accelerator#Launch Chapel Hill#Launch Memphis#LaunchBox Digital#LaunchHouse#LaunchPad PEI#LaunchSpot#Launch_Academy#Launchpad Digital Health, LLC#Launchpad LA#Launchpad Long Island#Le Camping#Leading Entrepreneurial Accelerator Program#Lean Launch Ventures#LearnLaunchX#Lemnos Labs#Life Changing Labs#LiftOff Health Incubator#Lightbank Start#LightningLab#Lowe's Accelerator#MACH37#MACH37 Spring#MIT SA+P venture accelerator#MITA Institute Accelerator#MTGx MediaFactory#Mac6#Madworks Governance Accelerator#Maine Center for Entrepreneurial Development - Top Gun Program#Matter#Maven Ventures Fund & Incubator#Media Camp#Melbourne Accelerator Program#Memphis BioWorks#Merck Accelerator#MergeLane 2017 Accelerator#Mergelane#Metavallon#Microsoft Accelerator#MindTheBridge#Momentum#MuckerLab#Muru-D#My5ive Accelerator 2016#N-Motion (DUPLICATE)#NDRC (LaunchPad / VentureLab)#NEXT Dashboard#NMotion#NY Digital Health Accelerator#NY Fashion Tech Lab 2017#NYC ACRE#NYC SeedStart#Nashville Entrepreneur Center#Nebula Shift#Nephoscale IaaS#Nest New York #New Ventures Group#New York Digital Health Accelerator (DUPLICATE)#NewME Accelerator PopUps #NewMe#Next media accelerator#NextHIT#NextStart#Nike+ Accelerator#Northern Arizona Center for Entrepreneurship and Technology (NACET)#Northern England#Nxtp.labs#OCTANe#Oasis 500#OpenFund#Orange Fab#Orange Works#Orion Startups#Oxygen Accelerator#PIE#Patriot Boot Camp#Pearson Catalyst for Education#Pipeline H2O#Pitney Bowes Inc#Plarium Labs#Plug In South LA #Plug and Play#Plum Alley Investments 2016#Points of Light Accelerator#PowerHaus#Preccelerator® Program 2016#ProSiebenSat.1 Accelerator#Project Entrepreneur 2016/17#Project Healtchare#Project Lift#Project Music#Project Skyway#Propeller Venture Accelerator#Prosper Capital Accelerator#Proton Enterprises#Pushstart Accelerator#Qualcomm Robotics Accelerator#Queen Creek Business Incubator#R/GA Accelerator#RAIN Incubator/Accelerator#RJI Investment Group#Reach#RetailXelerator#Rock Health#Rocket Fuel Labs#Rockstart Accelerator#RunUp Labs#Runway IoT Accelerator 2015#SAP Startup Focus Program#SKTA Innopartners Innovation Accelerator#SPACELAB Tech Accelerator#SPARK#SPH Plug and Play#SURF Incubator#SaltMines Group Start-Up Studio#ScaleTown#Seamless IoT 2016#Searchcamp#Seed Hatchery#SeedSpot#SeedStartup#SeedSumo#Seedcamp#Seedrocket#Seeqnce#Sequoia Apps#Serval Ventures#Shenzhen Valley Ventures Incubator#Shoals Entrepreneurial Center#Shopper Futures Accelerator#Shotput Ventures#Sid Martin Biotechnology Institute#SigmaLabs Accelerator#Silicon Valley Incubator & Accelerator#SixThirty#Sixers Innovation Lab#Skywalker Accelerator#SmartHealth Activator#Smashd Labs#SoCo Nexus Accelerator Spring 2017#Social Enterprise Challenge#Socratic Labs#SparkLabs#Sparkgap#Sports Tank#Springboard#Sprint Accelerator#Sprint Mobile Health Accelerator#SproutBox#SproutCamp#Starburst Aerospace Accelerator#Start Path Europe#Start'inPost#StartEngine#StartFast Venture Accelerator#Starta Accelerator Winter 2017#Startl#Startmate#Startup Accelerator (DUPLICATE)#Startup Front#Startup Next & GAN#Startup Orange County Accelerator#Startup Runway#Startup Wise Guys#Startup Zone PEI#Startup52X Accelerator#StartupCity#StartupHighway#StartupHouse Foundry program#StartupMinds Accelerator #StartupYard#Startupbootcamp#Straight Shot#Summer@Highland#Surge#SynBio axlr8r#TEB Incubation & Acceleration Center#THRIVE Accelerator III#THRIVE Open Innovation (DUPLICATE)#TIM#WCAP Accelerator#TLabs#TMCx Accelerator Digital Health 2017#Tallwave#Tampa Bay Innovation Center#Tampa Bay Wave#Tandem Mobile Accelerator#Tech Nexus#Tech Wildcatters#Tech2020#TechLaunch#TechRanch#TechSquareLabs#Techstars#Techstars Music#Telenet Idealabs#Telluride Venture Accelerator#TenX#The Alchemist Accelerator (DUPLICATE)#The Ark#The Bakery#The Batchery#The Brandery#The Bridge#The Center For Technology Enterprise & Development#The Chaser#The Company Lab (CO.LAB)#The Draper FinTech Connection#The Factory#The Greatest Pitch#The Harbor Accelerator#The Incubator#The Iron Yard#The Mediapreneur Incubator#The Morpheus#The New York Venture Summit#The Next Step: from idea to startup#The Refinery#The Unilever Foundry#The Venture Center's Pre-Accelerator I#The Vine OC#The Vogt Awards#The Yield Lab#The eFactory Accelerator#Think Big Partners Accelerator#TiE Angels#Tigerlabs Digital Health Accelerator#Tolstoy Summer Camp#TopSeedsLab#Travel Startups Incubator#Travelport Labs Accelerator#Travelport Labs Incubator#Triangle Startup Factory#Tumml#Tune Labs#Twin Cities Accelerator 2016#UW-Whitewater Launch Pad Accelerator#Unbank.ventures FinTech Incubator#University Technology Park#Unreasonable Institute#UpTech#Upstart Accelerator#Upstart Labs#Upstart Memphis#Uptima Business Bootcamp#Upwest Labs#VANTEC#VC FinTech Accelerator#Velocity Indiana Accelerator#Venture Catalyst Partners#Venture Hive#Venture I#VentureOut's Enterprise Tech Expedition#Venturegeeks#Vet-Tech Accelerator#VictorySpark#Village88 Techlab#Volkswagen ERL Technology Accelerator#WHLabs#Wasabi Ventures Academy#Wayra#Wellness Accelerator#Wells Fargo Startup Accelerator#Wireless IoT#Women Innovate Mobile#XLerateHealth#XTRATOS#Xlerate Health#Y Combinator#Y&R SparkPlug 2017#YEurope#YLE Media Startup Accelerator Program#Yahoo Ad Tech Program#Yangler (online accelerator)#Year of the Startup#Yetizen Accelerator#You Is Now#Z80 Labs#ZIP Launchpad Admission#ZeroTo510#Zone Startups Calgary#designX 2017#eMerging Ventures#ezone#iStart Jax (DUPLICATE)#iStart Valley#iVentures10#ignite100#innovyz start#tekMountain Accelerator =Project Summary=This project will be used to determine which accelerators are the most effective at churning out successful startups, as well as what characteristics are exhibited by these accelerators. First, we need to gather as much data as we can about as many accelerators as we can in order to look at factors that differentiate successful vs. unsuccessful ventures. Next, we need to create a web crawling program which will gather information about accelerators across the world by accessing their websites and extracting information. I believe that our overall goal with this research project is to gain insight into the methods of successful accelerators, as well as to find out what exactly differentiates very successful accelerators from dead accelerators. Helpful Links: http://seedrankings.com/
=Sources=
Summary: These are sources obtained from [[List of Accelerators]] , Crunchbase, and other Google searches. We will evaluate these sources by looking at the number of accelerators they supply (as most of them are lists) and then also taking a look at the type of information they provide about each accelerator. Key data points are cohort-related data, startup-related data, and logistics of the accelerator. Better sources supply more information that the URL alone.
(Obtained from [[List of Accelerators]]and various Google searches)*http://seedrankings.com/
*http://www.acceleratorinfo.com/see-all.html
*http://www.seed-db.com/accelerators, http://www.seed-db.com/accelerators/all*https://www.f6s.com/programs?type
*http://gust.com/usa-canada-accelerator-report-2015/?utm_content=35401577&utm_medium=social&utm_source=twitter
*https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/
*http://www.builtinnyc.com/2016/06/03/accelerators-incubators-nyc
*http://www.represent.la/
*http://www.launch.co/blog/complete-list-of-incubators-and-accelerators-like-y-combinat.html
*https://angel.co/accelerator-4 (Does not work - seems to be replaced by https://angel.co/companies?company_types[]=Incubator )
(Obtained from Google search: "Accelerator Database")
*https://www.corporate-accelerators.net/database/
*https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json
*Quora: https://www.quora.com/Where-can-I-find-a-comprehensive-list-of-startup-incubators-and-accelerators-in-the-US"
*By the 5th or 6th search result, the utility diminished greatly
*http://www.forbes.com/sites/briansolomon/2015/03/17/the-best-startup-accelerators-of-2015-powering-a-tech-boom/#2f52fa7e34e4
*http://www.inc.com/will-yakowicz/the-15-best-startup-accelerators-in-the-us.html
*http://www.forbes.com/sites/briansolomon/2016/03/11/the-best-startup-accelerators-of-2016/#74086a7724f2
*https://techcrunch.com/2015/03/17/these-are-the-top-20-us-accelerators/
*https://www.nexpcb.com/blogs/news/the-hardware-incubators-accelerators-list
Other ways used to find Accelerators (listed below "List of Sources Obtained from Various Google Searches"):
:*Looked at roughly the first 20 results
:*Used three locations as examples of accelerators that pop up
*Type in a specific state + "accelerator" + "list" (e.g. Texas accelerator list) to search for more relevant lists
:*Once again, looked at roughly the first 20 results
*Crunchbase has its own webpage with instructions for how we retrieve the data
=Source Evaluations=
Summary: These evaluations couple with each of the sources above. The evaluations provide instructions for obtaining the information listed, as well as a general review of how useful the data seems. The review serves to determine whether a crawler would be suitable for obtaining information from the source autonomously.
 
==SOURCE: Crunchbase==
*All of the information for the Crunchbase documentation is located in the page [[Crunchbase 2013 Snapshot]] webpage, along with the documentation for how we determined the accelerator information.
==Source: http://www.acceleratorinfo.com/see-all.html==
==Source: http://www.seed-db.com/accelerators/all==
#Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results.
#Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
*Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive.
*Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc.
 
==Source: http://www.seed-db.com/accelerators==
*Examples of single accelerators found
:#TMCx: http://www.tmc.edu/innovation/innovation-programs/tmcx/
:#RED labs: http://redlabs.uh.edu/8
:#SURGE accelerator: https://kirkcoburn.com/
:#OwlSpark: http://owlspark.com/
:#NextHIT: http://www.houstonhealthventures.com/nexthit-accelerator-program-application/
 
===Los Angeles Accelerators===
:#Amplify: http://amplify.la/
===Review===
*Some locations return more viable results for a similar sample size. For example, New York returned 9 valid accelerators, whereas Los Angeles and Houston both returned 5 actual accelerators out of the first 20 results: an 80% difference. Some optimization may come from identifying which locations return more accelerators upon searching.
 
==From "State+Accelerator+List"==
===New York Accelerator List===
*http://www.ongridventures.com/resources/new-york-silicon-alley-resources/newyorkaccelerators/ (Ranks 14 accelerators)
*http://under30ceo.com/11-new-york-tech-incubators-and-accelerators-for-entrepreneurs/ (Ranks 11 accelerators)
===California Accelerator List===
*http://www.socaltech.com/the_complete_guide_to_southern_california_accelerators_and_incubators_part_i/s-0040924.html (Lists accelerators in Southern Cali)
*http://barberacorporatelaw.com/blog/2014/4/8/28-business-incubators-in-the-los-angeles-area (List of 24 accelerators near the LA area)
===Texas Accelerator List===
*http://www.austinstartuplist.com/incubators (List of accelerators in Austin, <5 results)
*http://www.siliconhillsnews.com/2016/09/02/the-top-texas-healthcare-accelerators-and-incubators/ (Modest list of accelerators aiding in healthcare)
*http://realfoodmba.com/food-startup-accelerators/ (List of food-based accelerators, some of which are in Austin, others of which are international)
===Colorado Accelerator List===
*http://www.builtincolorado.com/2015/01/14/best-colorado-accelerators-your-startup (8 results)
*https://www.quora.com/What-accelerator-programs-are-located-in-Colorado (Quora inquiry yielding modest results)
===Washington Accelerator List===
*http://www.geekwire.com/2015/mapping-seattles-incubators-accelerators-and-co-working-spaces/ (Returns 14 results)
===Oregon Accelerator List===
*http://www.bizjournals.com/portland/subscriber-only/2016/01/15/incubators-and-accelerators.html (Returns list of 5 accelerators and details)
*http://www.oregon4biz.com/Innovate-&-Create/R&D-Business/Incubators/ (Returns list of 26 accelerators and incubators)
 
Notes:
*Seed-DB appears for almost all of the search results
*Acceleratorinfo appears for most of the search results
*There are multiple cumulative reports of incubators per location, but not for accelerators
*Most regionalized accelerator lists deal with either an article or a ranking of a particular amount of accelerators in the area
*Many results returned nationally ranked lists of accelerators, such as the Forbes list of "Top Accelerators" or something along the lines of "Best Accelerators in the US". The connection is that perhaps one accelerator mentioned on the list may be located within the searched state.
*There are also a few results for actual particle accelerators that must be sorted out (i.e. superconducting super collider)
 
==Found through google searching accelerators found previously==
'''Found from googling YLE Media Startup Accelerator'''
*https://www.corporate-accelerators.net/database/index.html (DB of Corporate Accelerators 71-79 entries)
*http://startupaccelerator.vc/accelerator-corporate-innovation-sig/ (Database of Accelerators and Corporate Innovation 92 entries)
neither of these have had their entries added to list of accelerators
=Individual Accelerator Evaluations=
#Prosper Accelerator (https://www.f6s.com/programs?type)
#Axel Springer Plug and Play (http://www.axelspringerplugandplay.com/)
#Bolt Techstars (httpshttp://bostonstartupsguidewww.seed-db.com/guide/every-boston-startup-accelerator-incubator/accelerators)#AIA Accelerator Startmate (httpshttp://githubwww.seed-db.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json)
#Capital Factory (http://blog.shedd.us/321987608/)
#OwlSpark (Google search: "Houston + accelerators")
#Unfortunately does not include the date and year of each cohort class, but perhaps could cross-reference with other sources.
==Accelerator: Launchpad LA: (http://launchpad.la/)==Finding the cohort:#Navigated to "Companies" in the top of the homepage#"Companies" returns all companies backed by Launchpad LA based on their class year and number (cohort)#:*Also sorted by active startups vs. inactive startups#At the bottom of the "Companies"tab, there is a statistical layout returning values for the number of companies started by Launchpad during its time as an accelerator (2012-present), as well as the total funding funneled into the accelerator. ==Accelerator: Y Combinator (http://www.ycombinator.com)==Finding the cohort:#Scrolled down on the home page and clicked on a link entitled "See all companies".#Navigated to a drop down menu named "All Batches", and clicked on it to expand the list.#List is made up of dates ranging from 2005-2016, and these dates return lists of launched companies including most but not all of their URL's, as well as their launch year. ==Accelerator: Flashpoint (http://flashpoint.gatech.edu/)==Finding the cohort:#On upper right corner after animation, there is a tab sign which lets you navigate to a page labeled "Teams"#The "Team" page has each batch of companies emerging from Georgia Tech, although it does not include the dates or cohorts of these companies. For example, "Batch 1" at the top of the page just lists the companies in the batch without URLs or any additional information.#On the "Application" page on the tab near the top, there is information regarding Batch 7, which begins early 2017. Suggests that batch 6 either ended spring 2016 or fall 2016. ==Accelerator: Prosper Women Entrepreneurs (http://www.prosperstl.com)==
Finding the cohort:
#Navigated to "Accelerator" tab and clicked "Companies" when prompted with the drop down menu.
#This tab returned all of the launched company logos which then redirected to the company's home page when clicked.
#No other relevant form of information such as date launched or cohort was included on this page.
 
==Accelerator: Axel Springer Plug and Play(http://www.axelspringerplugandplay.com/)==
Finding the cohort:
#Clicked on the "Companies" tab on the home page and was directed to the middle of the page which included a short list of current companies.
#Clicked on the "All Companies" link which returned a page filled with startup logos and brief descriptions of those startups. When clicked, each logo serves to redirect to that startup's home page.
#Companies were not sorted by cohort or in any other relevant way.
 
==Accelerator: Techstars (http://www.techstars.com)==
Finding the cohorts:
#Navigated to the Accelerators tabs and clicked "Companies" on the drop down menu.
#Firstly, this returns a table comprised of a long list of different classes from different areas separated by years.
#Upon scrolling down further, each of these classes is broken down by the startups that graduated from them. It also includes information such as how much was invested in each startup, as well as whether or not the startup was acquired, is active, or failed.
 
==Accelerator: Startmate (http://www.startmate.com.au)==
Finding the cohorts:
#Navigated to the "Startups" tab, which returned a page of all startups that have graduated from Startmate.
#Startups are separated by year of graduation, and each company is linked on this page.
#It appears as if each year, 1 cohort is taken through the accelerator.
 
==Accelerator: Capital Factory (https://capitalfactory.com/accelerate/)==
Finding the cohorts:
#Navigated to the startups tab, which returned a long list of companies that were accelerated by Capital Factory.
#Each logo for the startups served as a link to their respective websites.
#There was no evidence or mention of any cohorts.
 
==Accelerator: OwlSpark (http://entrepreneurship.rice.edu/accelerator/)==
Finding the cohorts:
#Navigated to the "Startup Teams" tab, which returned a page that included links to 4 "Classes".
#Each class link i.e. (Class 1, Class 2, Class 3, Class 4) returned links to each startup that graduated from the program.
#These classes signify cohorts.
==List of Promising Variables==
*Key People (founders, lead entrepreneurs, strategists, etc.)
*Total number of launched companies
*A FAQ for application details, accelerator vision, and
*Funds raised per company (average)
*Features offered by accelerator (perks, space, tools, etc)
*General events hosted by the accelerator
*(Success) stories for graduated start-ups
 
=E-R Diagram (in list form) for Identifying Attributes to Pull from Accelerators=
Summary: I will look at different entities within the accelerator page (e.g accelerators, cohorts, founders) and then find potential attributes that can be codified from those entities. Along with the attribute, we list a potential method for pulling that particular attribute.
 
Format:
:<u>Entity</u>
:*Attribute - Possible sources/ways to get
 
Ed: "Be creative with finding new attributes to pull!"
 
==List==
<u>Accelerators</u>
*Accelerator Name - Website, external database
*Contact Form - General contact section in each website
*Industry focus - can be pulled from description
*Description - pulled from website itself
*Takes equity? - Database or from "about" page
*Non-profit? - Database
*URL - Already have way of obtaining
*DNS Registration Date - Already have way of obtaining
*Address - Google Maps, maybe the website
*Founding Date - Google Maps, website, server registration
 
<u>Accelerators</u> (1) has (n) <u>Features</u>
 
<u>Features</u>
*Mentorship? - Description in website
*Space Offered - Google Maps, Website description
*Partnerships - Angel list, Same section as mentorship or events
*Hosted Events - Calender
 
<u>Accelerators</u> (1) has (n) <u>Founders</u>
 
<u>Founders</u>
*Name - Founders or Team Page
*Title - Directly underneath or next to name
*PhD? - Biography, webpage under name
*Serial - Biography
*Link back to "Accelerator Name" in <u>Accelerators</u>
 
<u>Founders</u> (n) has (n) <u>Ventures</u>
 
<u>Ventures</u>
*Other Companies - Biography, webpage
*Previous Companies - Biography
*Net Worth - Forbes, Biography
*Link back to "Name" in <u>Founders</u>
 
<u>Accelerators</u> (1) has (n) <u>Cohorts</u>
 
<u>Cohorts</u>
*Date + Accelerator = Cohort ID - Database or Website
*Number of Startups - Website, count from <u>Startups</u>
*Cohort Number - Categorization on website, external database
*Link back to "Accelerator Name"
 
<u>Cohorts</u> (1) has (n) <u>Startups</u>
 
<u>Startups</u>
*Names - Website, external database
*State of Inc - Angel List
*URL - Angel List, website
*Founding Date - Registration database, Angel List
*Industry - startup description
*Founding Location - Angel List
*Current Location - Angel List
*VC Raised to Date - SDC Platinum
*Angel Funds Raised to date - Angel List
 
==Variables which Distinguish Accelerator Websites==
*The word "Accelerator"
**This word appears at least one time on the home page of the vast majority of accelerator websites. The word "Accelerator" appears either as a link to another page on the website or in a title on the homepage of the website. Not many other websites contain this word on their homepage, especially not if one Googles something generic such as "Accelerators in the US".
 
*Fixed Term
**Accelerators normally work with their cohorts for 3 months. This is a major factor which differentiates between an accelerator and any other member of a startup ecosystem. If on their website they mention either "3 months" or "12 weeks", it is extremely likely that the website belongs to an accelerator.
 
*Cohorts, Portfolio, Class, or Companies
**This is a potential variable that could link the websites of many different accelerators. The problem with the word "portfolio" is also used by numerous venture capital firms, which could potentially cause complications when attempting to pull only the sites of accelerators from a Google search. The word "cohort", however, would have an extremely high probability of identifying the website as belonging to an accelerator. The words "class" and "companies" are promising but do not offer certainty.
 
*Equity, Investment
**Although by itself, equity does not mean much, when paired with any of these other terms, it could potentially point to an accelerator. Most accelerators take equity in the form of common stock (6-8%), or they will ask for some alternate form of stake in the company.
 
*Education and Mentorship
**Accelerators differ from incubators and angel investors in that they emphasize the education of the potential startup. They offer advice and intense mentorship from more experienced entrepreneurs within their staff, as well as many networking opportunities with the outside world. This variable is more difficult to find on the website of the accelerator, but I believe that if the website includes numerous keywords such as "education", "mentorship", or "networking opportunities", it would be somewhat safe to assume that the website is owned by an accelerator.
 
*Demo Day
**This variable does not have tremendous potential in terms of crawling websites, but I feel that it is worth mentioning. Most accelerators "graduate" their cohorts with a demo day, which is a day when the startups present their company to potential investors. If the website contains the words "demo day", which is fairly uncommon, it could be a good source of accelerator identification.
 
A combination of any of these variables would certainly identify the current website as belonging to an accelerator.
 
==Comprehensive List of Accelerators==
 
All text files saved in "Accelerators" project on the McNair RPD.
 
*Acc.Info: 190
*SeedDB: 240
*SARP: 59
*Corp: 79
*Total: 568 results
 
After removing duplicates and locations: 363 results
 
Doesn't count f6s, which returns 1170 results, roughly only 300 of which were accelerators. We created a crawler to sift through the webpages and parse HTML so we could identify the accelerators. Program and HTML saved on the Desktop.
 
==Randomly Chosen Accelerators==
*TLabs
*BetaSpring
*The Unilever Foundry
*AIA Accelerator
*R/GA Accelerator
*Zeroto510
*Hub:raum
*Orange Fab
*Furnace
*Launch Chapel Hill
 
===Determining whether or not these are accelerators===
Googled name of Accelerator and clicked on the first link
 
Looked for Variables which Distinguish Accelerator Websites
*TLabs: Homepage states: "Leading Indian Tech Accelerator"; TLabs is an accelerator, but it is located in India.
*Betaspring: Under the "About Betaspring" tab, it states that "Betaspring was among the first ten startup accelerators to launch worldwide".
*The Unilever Foundry: Does not claim to be an accelerator, nor does it have information on the website about cohorts. This name was pulled from the source Corporate Accelerators.
*AIA Accelerator: The word "accelerator" is included in the name. Under the "Overview" tab, it states that startups have received mentorship.
*R/GA Accelerator: Under the "Overview" tab it states that the "R/GA Accelerator is designed for startups and... it is a three month, immersive, mentorship driven program".
*Zeroto510: Website contains a "Portfolio Companies" tab which divides up the companies into cohorts. This identifies Zeroto510 as an accelerator.
*Hub:raum: Offers accelerator and incubator programs; however, none are located in North America.
*Orange Fab: States on the main page that "We're a 3-month accelerator program".
*Furnace: "About" tab states that Furnace is "an innovative startup accelerator designed to form, incubate, and launch new companies". Concludes with a Demo Day
*Launch Chapel Hill: Homepage states that they are "a startup accelerator". Also included on the homepage is a line that states "Applications for Cohort 7 are now open".
 
7/10 are accelerators located in the US.
 
2/10 are accelerators not located in the US.
 
1/10 is not an accelerator.
 
===Steps for Extracting Cohort Information===
*TLabs: Clicked on the "Startup" tab and located a drop down menu entitled "Showing Startups from:". This menu separates startups into Batches ranging from 1-9. These batches are cohorts.
*Betaspring: This website does not have a "Companies" or "Startups" tab. I clicked on their "Who" tab and noticed that within this section were two links called "Our portfolio" and "Our companies" which both linked to the same place. This place contained a list of the startups that Betaspring has funded, as well as links to each of the startup websites. The list was not separated into cohorts.
*The Unilever Foundry: Does not have a "Startups" or "Companies" link on the website.
*AIA Accelerator: Clicked on the "Startups" tab which returned a page with 5 companies and a bit of information on each of these companies. Also included the URL to each startup. However, the companies were not separated into cohorts, probably because there are so few of them.
*R/GA Accelerator: Clicked on the "Alumni" tab and navigated down the webpage. Startups are separated by class, which means cohort in this case. Startup info contains link to demo day presentation as well as the startup url.
*Zeroto510: Hovered over the "About Us" drop down menu and clicked on the "Portfolio Companies" link. Startups are separated by cohort, one for each year, starting from 2013.
*Hub:raum: Clicked on the "Portfolio" tab. Directed to a page with many names of startups, as well as a brief description of what their company is about. Also includes a link to each startup's website. Startups are not separated into cohorts, but rather by investment by location, current participants, and alumni.
*Orange Fab: Clicked on the "Startups" tab and was directed to a different page. Startups are not only separated into cohorts named "Seasons", but they are also separated by industry.
*Furnace: Clicked on "Portfolio" tab, but unfortunately the website is broken and it returned an error in code.
*Launch Chapel Hill: Clicked on the "Ventures" tab and was directed to a page in which all startups were separated into cohorts, and a brief description of the startup was provided underneath their logo.
 
=Code=
 
The directory for all data related to this project is located in:
E:\McNair\Projects\Accelerators
 
==F6S Web Crawler==
 
This is a python script using the selenium library that retrieves the html content of each page on F6S's North American Accelerator search results. The script is located in:
 
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
 
The script is titled f6s_crawler_gentle.py
 
When run, the script visits the F6S search page for North American Accelerator's and begins retrieving the HTML of each page in that search list.
NOTE: Timing must be spaced out between all interactions with the browser. F6S has Captcha, and the program will fail if the site receives too many hit requests, or has any inkling that it is being probed by a bot.
 
The Accelerator HTML files are stored in:
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs\Accelerator_HTML_files
 
The Accelerator HTML files stored as text files are stored in:
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs\Accelerator_HTML_files_text
 
==F6S Parser==
The next step is to take the HTML files retrieved by the crawler and to parse them for necessary information. This parser should also determine whether or not the site is an accelerator site.
 
The code for the parser is located in
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
 
It is titled f6s_parser.py
 
To run the code, open the file in Komodo and press play.
If running from the command line, change to the correct directory and run the following comand:
python f6s_parser.py
 
The list of accelerators that passed through the parser is in the same directory:
E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
 
The tab delimited text file is named AcceleratorList.
The file contains the names of the accelerators that had the keywords listed in the file. Also, the file contains the run dates and location of the accelerator if it was listed on the f6s page.
 
 
==F6S API==
F6S has an API, but we have had no success getting a key to the API. The link to get a key to the API is on [https://www.f6s.com/developers/apis/deal-feed this page].
 
I (Peter) have emailed F6S to ask for a key directly at support@f6s.com. As of the end of the Fall 2016 Semester, they have not responded.
 
FUN FACT (MASS-RENAME FILES USING WINDOWS POWER SHELL):
 
The following command allowed me to append ".txt" to all files in a folder once in the proper directory:
Get-ChildItem * | Rename-Item -NewName { $_.name + '.txt'}
 
To change file formats, Microsoft suggests:
Get-ChildItem *.txt | Rename-Item -NewName { $_.name -Replace '\.txt', '.log'}
 
==Final Data==
The Parser for parsing the text files of accelerator data is located in:
E:\McNair\Projects\Accelerators\Code+Final_Data
 
The Parser for parsing the cohort files of accelerator data is also located in:
E:\McNair\Projects\Accelerators\Code+Final_Data
 
This folder contains the Python parsers. The Final_data folder contains the tab-delimited text files of parsed data. final_accelerator_data.txt contains the generalized data saved in .txt files and final_cohort_data.txt contains the cohort data saved in .cohort.txt files.
 
All the files entitled accelerator_data are subsets of the final_accelerator_data.txt file, but each file contains only the accelerators that matched to the flag specified in the file title.
 
find_headers .py finds a set of the headers for all the cohort files from the seed list project.
 
==Google SiteSearch==
E:\McNair\Projects\Accelerators\Google_SiteSearch
This folder contains code for a google search parser. The script sitesearch.py will search for a queried company and return a likely web address for that company.
 
==Way Back Machine Parser==
E:\McNair\Projects\Accelerators\Code+Final_Data\wayback_machine.py
This script takes URLs and returns a timestamp for the oldest documented webpage under that URL courtesy of the Way Back Machine Archive.
 
==Process Locations==
E:\McNair\Projects\Accelerators\Code+Final_Data\process_locations.py
This script takes a physical address and converts it into latitude and longitude coordinates. Should be used in conjunction with the Enclosing Circle program to find the concentration of accelerators.
E:\McNair\Software\CodeBase\EnclosingCircle.py
 
=Kauffman Foundation Incubator Proposal Information=
 
==Institutions==
Summary: F6S, Crunchbase, seed-db
 
Tools: Matcher - used to match lists of potential accelerators with our current list to identify duplicates/new matches (E:\McNair\Projects\Accelerators)
 
===F6S===
F6S WebCrawler and F6S Parser - E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
 
===CrunchBase===
 
CrunchBase 2013 Snapshot '''(All Organizations)'''- E:\McNair\Projects\Accelerators\organizations.xls
 
CrunchBase 2013 Snapshot '''(Potential Accelerators)'''- E:\McNair\Projects\Accelerators\organizations.accdb under "Potential Accelerators query"
 
*Obtained using keyword matches in the descriptions of the potential accelerators.
 
CrunchBase 2013 Snapshot '''(New Verified Accelerators)''' - E:\McNair\Projects\Accelerators\New CrunchBase Accelerators.xls
 
We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies.
 
===AngelList===
 
===seed-db===
 
Obtained through www.seed.db/accelerators
 
===Global Accelerator Network (GAN)===
 
GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\scrapeaccel.py
 
GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data
*Contains: Company Name, # of Companies Range, % of Companies Funded, Funding Raised by Companies, Employee Range, Exit Funding, Exit Date, Total Company Funding Raised, # of Mentors Range, % Equity, Location, Minimum Seed Capital Investment
 
==Cohorts==
 
*Cohorts obtained manually
*All Cohort txt files are saved under "E:\McNair\Projects\Accelerators\Data
*cohort file name = (accelerator name).cohort
*Most updated Accelerator cohort data: E:\McNair\Projects\Accelerators\Cleaned Cohort Data.xls
 
Automation for obtaining cohorts??
 
==Other Information==
Summary: Whois Parser, Geocode, Tools to determine industry, etc
 
===Whois Parser===
 
*Retrieves and parses Whois information. Specifically, takes a file with a column of domain names and populates the corresponding columns with information from the WhoIs API.
 
*Often used to obtain locations.
 
===Geocode===
 
Input: Company Address
Output: Directional Coordinates
 
*Used to obtain the locations of different Accelerators and Cohort companies.
 
===SDC Platinum Pull===
 
Used to obtain funding information and match companies that have gotten funding with companies that are Accelerator cohorts.
 
===Desired Information/Variables===
 
*Key People (founders, lead entrepreneurs, strategists, etc.)
*Total number of launched companies
*A FAQ for application details, accelerator vision, and
*Funds raised per company (average)
*Features offered by accelerator (perks, space, tools, etc)
 
==Desired Tools/Information==
 
===Automating the Process of Obtaining Cohorts===
*Automating this process would save a lot of time and really progress the project.
 
===Obtaining More Details on Accelerators===
 
*Having the kind of thorough information on industry, companies, funding, location, exits, mentors, leadership, that we got for the GAN companies would be fantastic.
 
===List of Alive/Dead Accelerators===
 
This is a dream but would be very helpful

Navigation menu