Objective

Project
US Incubators
Project Information
Has title	US Incubators
Has owner	Yi Ma
Has start date
Has deadline date
Has project status	Active
	Copyright © 2019 edegan.com. All Rights Reserved.

The objective of this project is to assemble a near-population dataset on U.S. incubators! This project uses the Incubator Seed Data.

File Location

E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators

Notes:

Highlighted rows need to be deleted
The format of zip code field is text

Progress

Extract incubator data from data on national resources

National Data

Source	Progress	How many?	Data	Method
Whartoneclub Incubators	Done	21	url company name city state	regular expression
InterNational Business Incubation Association or see our INBIA page	Done	415	Company Name, address, ,city, state, zip code, country, url and contact person	regular expression
Clustermapping	Done	292	Company name, description, address 1, address 2, city, state, zip code	regular expression
The MBA Is Dead	Link doesn't work	186 Results	City and Country low equity, high offer, high value high equity, low offer, low value	regular expression
Gaebler	Done	360 Results	incubator name url	regular expression

Gaebler incubator list is in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\Gaebler\Results.txt and the script to retrieve the results is in the same director and called Gaebler.py.

Extract incubator data from data on regional resources

Source	Progress	How many?	Region	Data	Method
Alabama Business Incubation Network	Done	12	Alabama	Incubator Name, URL, and Brief Description	regular expression
IdeaGist - Alasak	Done	1	Alaska	Company name, URL, City, State	Manual Collection
Florida Business Incubation Association	Done	72	Florida	incubator name, address, city, state, phone number and url	regular expression
Louisiana Business Incubation Association	Done	25	Louisiana	incubator name contact name address and phone number link to website	regular expression
Maryland Business Incubation Association	Done	35	Maryland	Incubator name, short description, and link to another page within main site with contains a link to the incubator home page	regular expression
Massachusetts Association of Business Incubators	Done	21	Massachusetts	incubator name, short description, and link to incubator home page	regular expression
Boston Startup Guide	Done	10	Boston	Company Name and URL Capital Provided & equity taken Application Process	regular expression
Michigan Business Innovation Association	Done	15	Michigan	company name, ulr, address, url, city, state, zip code	regular expression
NH Tech Alliance	Done	10	New Hampshire	company name, city, url, brief description	regular expression
NC Business Incubation Association	Done	33	North Carolina	Incubator name, address, contact, title, phone number, url and email	Manual Data Collection
Oklahoma Business Incubator Association	Done	34	Oklahoma	Incubator name and link to it	regular expression
Incubators/Accelerators In DC	Done	55*	DC	Incubator name and link to it and brief description	regular expression
High Tech News and Information for South California	Done	34	California	Url, company name, description, city, state	regular expression
Leagal Counsel to Entrepreneurs and Emerging Growth Companies	Done	25	Los Angeles	Url, company name, city, state, description	regular expression
IdeaGist - Colorado	Done	8	Colorado	Company name, url, location	Manual collection
IdeaGist - Connecticut	Done	7	Connecticut	Company name, url, location	Manual collection
Delaware Business Times	Done	11	Delaware	URL, company name, address, city, state code, phone number, email, description	Regular expression
Washington State Department of Commerce	Done	25	WA	Url, company name, address, city, state, zipcode	manual collection
Seattle Incubators	Done	10	Seattle	Company name, url, description	regular expression
Digital NYC	Done	25	NYC	Company name, description	regular expression
Idaho Commerce	Done	14	Idaho	URL, company name, city	regular expression
Business Oregon	Done	25	Oregon	Company name, address, city, state, zip code, service area, description	regular expression
Tech.co	Done	16	Arizona	URL, Company name, description	regular expression
Arkansas Inc	Done	3	Arkansas	URL, Company name, description	Regular Expression

Notes:

DC includes both incubators and accelerators
Oregon includes both incubators and accelerators
Arizona includes both incubators and accelerators
Clustermapping contains non-US data. They have been highlighted in the spreadsheet

Retrieving Incubators from Crunchbase Database

We are pulling out relevant fields from crunchbase database using incubator uuids chosen by Yi and Libby following the process:

1) Create a file of uuids of incubators

CrunchbaseShortOrgDescChosenByYi.txt (275)
CrunchbaseShortOrgDescChosenByLibby.txt (301)

File path: Z:\crunchbase3

2) Load the file into the database

DROP TABLE ChosenShortOrgUUIDs;
CREATE TABLE ChosenShortOrgUUIDs (
 uuid varchar(100)
);
\COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByYi.txt' WITH DELIMITER AS E'\t' HEADER NULL AS  CSV
--275

DROP TABLE ChosenLongOrgUUIDs;
CREATE TABLE ChosenLongOrgUUIDs (
 uuid varchar(100)
);
\COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByLibby.txt' WITH DELIMITER AS E'\t' HEADER NULL AS  CSV
--301

3) Run a query that joins uuids with related fields

Fields we are interested in:

company_name, domain, homepage_url, country_code, state_code, region, city, address, status, short_description, category_list, category_group_list, funding_rounds, funding_total_usd, founded_on, employee_count, A.uuid, primary_role, type

4) Resulting files are in:

Z:\crunchbase3

File names: ChosenLongOrgResults.txt, ChosenShortOrgResults.txt

Useful Regular Exes

1. Replace “\s+$” with [leave blank] to remove all the empty lines

2. Replace "s+$" with [leave blank] to removes all the whitespaces

3. <.*> finds everything that starts with < and ends with >

4. Replace href=" with "\n" to start a new line for each url

5. Replace "\s\s+" with [leave blank] to remove more than one white spaces

6. Replace "(?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" with "\1\2\3\4\5\6\r\n" to merge every six lines

7. Replace "[ ]{2,}" with [leave blank] removes more than one spaces between two words

8. Crtl+Q, B turns on the block select mode

9. Replace " .*" with [leave black] to remove noncharacters

Useful PostgreSQL Script

Loading/Unloading Data

Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS (unless there is a VERY good reason, like the source data is huge and comes preformatted differently).

Load using: \COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV

Unload (copy to txt file) using: \COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV

Creating Tables

DROP TABLE tablename;
 
CREATE TABLE tablename (
 field1 varchar(100),
 field2 int,
 field3 date,
 field4 real
);

US Incubators

Contents

Objective

File Location

Progress

Extract incubator data from data on national resources

Extract incubator data from data on regional resources

Retrieving Incubators from Crunchbase Database

Useful Regular Exes

Useful PostgreSQL Script

Loading/Unloading Data

Creating Tables

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools