Difference between revisions of "Incubator Seed Data Coverage"

From edegan.com
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{Project
 
{{Project
 +
|Has project output=Data
 +
|Has sponsor=Kauffman Incubator Project
 +
|Has sponsor=Kauffman Incubator Project
 
|Has title=Incubator Seed Data Coverage
 
|Has title=Incubator Seed Data Coverage
 
|Has owner=Ed Egan,
 
|Has owner=Ed Egan,
Line 266: Line 269:
 
*For Galvanize, the Seattle location is in US Incubators and the San Francisco and New York City locations are both in AngelList.
 
*For Galvanize, the Seattle location is in US Incubators and the San Francisco and New York City locations are both in AngelList.
  
'''Overall, 73% of the incubators in the hand-collected data are present in the seed data. The project therefore met its objective!'''
+
'''Overall, 73% of the incubators in the hand-collected data are present in the seed data.'''
  
The three absent incubators all have a university affiliation. AU Entrepreneurship Incubator is based at American University in DC, IncubatorCTX is based at Concordia University in Northwest Austin, and Discovery Launchpad at UMN is at the University of Minnesota. This suggests that we would need another source of data to capture academic incubators. The data also suggests that the different sources do capture different incubators. There is little apparent correlation between the sources, and the Google Crawler is the only one to capture more than half of the incubators.
+
The three of the four absent incubators have a university affiliation. AU Entrepreneurship Incubator is based at American University in DC, IncubatorCTX is based at Concordia University in Northwest Austin, and Discovery Launchpad at UMN is at the University of Minnesota. This suggests that we would need another source of data to capture academic incubators. The fourth missing incubator is the branch office of a chain (Galvanize). This chain does not describe itself as an incubator but appears to meet the criteria for one. We could attempt to put together data on incubator chains and their offices separately. The data also suggests that the different sources do capture different incubators. There is little apparent correlation between the sources, and the Google Crawler is the only one to capture more than half of the incubators.

Latest revision as of 13:41, 21 September 2020


Project
Incubator Seed Data Coverage
Project logo 02.png
Project Information
Has title Incubator Seed Data Coverage
Has owner Ed Egan
Has start date
Has deadline date
Has project status Active
Subsumed by: Incubator Seed Data, Incubators in Five Ecosystems
Has sponsor Kauffman Incubator Project
Has project output Data
Copyright © 2019 edegan.com. All Rights Reserved.


Overview

The purpose of this project is to test the coverage and accuracy of the Incubator Seed Data using the hand-collected data on Incubators in Five Ecosystems as a benchmark.

Specifically, this project fulfills point 6 of the Expected Outcomes by June 2019 of the Kauffman Incubator Project:

6. The seed data will have at least a 70% baseline accuracy and coverage of incubators compared to results from hand collected data on 5 ecosystems, as measured by the data analysis.

Data

The five ecosystem incubators are:

City State Incubator Name
Washington DC Inclusive Innovation Incubator (In3)
Washington DC AU Entrepreneurship Incubator
Washington DC Global Development Incubator
Washington DC Halcyon Incubator
Washington DC The Hatchery
Burlington VT Vermont Center for Emerging Technologies (VCET)
Austin TX Austin Technology Incubator
Austin TX IncubatorCTX
Austin TX Economic Growth Business Incubator
Austin TX ACC Bioscience Incubator
Austin TX Bunker Labs
Austin TX Galvanize
St. Paul MN University Enterprise Laboratories
Minneapolis MN Discovery Launchpad at UMN
St. Paul MN Lunar Startups

The datasets to test against are (as tables in the incubators database, also available as tab-delimited text files):

  • Incubators -- 2137 records, combining the records in CIAIncubators and USIncubators
  • CIAIncubators -- 1603 records, combining incubators identified in Crunchbase, INBIA, and AngelList
  • USIncubators -- 707 records, combining state and regional incubator lists found as a part of the US Incubators project
  • Data from the Google Crawler run against the five ecosystems

Process

Load FiveEcosystemIncubators.txt into incubators then run the matcher:

perl Matcher.pl -mode=2 -file1="FiveEcosystemIncubators.txt" -file2="Incubators.txt"

Note that there is substantial name variation in Incubators.txt for the same firm, so standard name based matching doesn't work. For example:

Inclusive Innovation Incubator	Inclusive Innovation Incubator	DC	in3dc.com	Inclusive Innovation Incubator (In3) - D.C's first co-working, training, & incubator space intentional about diversity & inclusion.	Washington	2301-D Georgia Ave, NW		Crunchbase
Inclusive Innovation Incubator (In3)	Inclusive Innovation Incubator (In3)	DC	www.in3d.com	Inclusive Innovation Incubator (In3) is the District's first community space focused on inclusion innovation and incubation. The incubator is committed to creating a collaborative environment where under-resourced members have access to the space and services needed to build or grow a successful business.	Washington DC			AngelList,USIncubators

And out of the 15 incubator names, 11 were in our incubators table (irrespective of location), and of these y had name variation(s).

Fortunately the count is small, so we can conduct a manual review. For the Google crawler, we only count a hit if the website of the incubator itself is included in the results, rather than a news article or other information that references the incubator.

Results

In the table below, 1 indicates the incubator was present in the source, and 0 indicates it was absent. The last column, labelled any records if the incubator was in our complete seed data, comprised of the incubator table and the Google Crawler results.

Name Location Crunchbase INBIA Angellist US Incubators Google Crawler Any
Inclusive Innovation Incubator (In3) Washington, DC 1 0 1 1 1 1
AU Entrepreneurship Incubator Washington, DC 0 0 0 0 0 0
Global Development Incubator Washington, DC 0 0 0 1 0 1
Halcyon Incubator Washington, DC 0 0 0 1 1 1
The Hatchery Washington, DC 0 0 0 1 0 1
Vermont Center for Emerging Technologies (VCET) Burlington, VT 1 1 0 0 1 1
Austin Technology Incubator Austin, TX 1 0 1 0 1 1
IncubatorCTX Austin, TX 0 0 0 0 0 0
Economic Growth Business Incubator Austin, TX 0 0 0 0 1 1
ACC Bioscience Incubator Austin, TX 0 0 1 0 1 1
Bunker Labs Austin, TX 0 0 0 0 1 1
Galvanize Austin, TX 0 0 0 0 0 0
University Enterprise Laboratories St. Paul , MN 0 0 0 1 1 1
Discovery Launchpad at UMN Minneapolis , MN 0 0 0 0 0 0
Lunar Startups St. Paul, MN 0 0 0 1 0 1
Total 3 (20%) 1 (7%) 3 (20%) 6 (40%) 8 (53%) 11 (73%)

Notes:

  • For The Hatchery, the NY location is also in AngelList
  • For Bunker Labs, the Austin location is missing but the Seattle location is in US Incubators, the Chicago location is in AngelList, and the Google crawler found an Arlington, VA location too.
  • For Galvanize, the Seattle location is in US Incubators and the San Francisco and New York City locations are both in AngelList.

Overall, 73% of the incubators in the hand-collected data are present in the seed data.

The three of the four absent incubators have a university affiliation. AU Entrepreneurship Incubator is based at American University in DC, IncubatorCTX is based at Concordia University in Northwest Austin, and Discovery Launchpad at UMN is at the University of Minnesota. This suggests that we would need another source of data to capture academic incubators. The fourth missing incubator is the branch office of a chain (Galvanize). This chain does not describe itself as an incubator but appears to meet the criteria for one. We could attempt to put together data on incubator chains and their offices separately. The data also suggests that the different sources do capture different incubators. There is little apparent correlation between the sources, and the Google Crawler is the only one to capture more than half of the incubators.