US Incubators
US Incubators | |
---|---|
Project Information | |
Has title | US Incubators |
Has owner | Yi Ma |
Has start date | |
Has deadline date | |
Has project status | Active |
Copyright © 2019 edegan.com. All Rights Reserved. |
Objective
The objective of this project is to assemble a near-population dataset on U.S. incubators! This project uses the Incubator Seed Data.
File Location
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators
Notes:
- Highlighted rows need to be deleted
- The format of zip code field is text
Progress
Extract incubator data from data on national resources
- National Data
Source | Progress | How many? | Data | Method |
---|---|---|---|---|
Whartoneclub Incubators | Done | 21 |
|
regular expression |
InterNational Business Incubation Association or see our INBIA page | Done | 415 |
|
regular expression |
Clustermapping | Done | 292 |
|
regular expression |
The MBA Is Dead | Link doesn't work | 186 Results |
|
regular expression |
Gaebler | Done | 360 Results |
|
regular expression |
- Gaebler incubator list is in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\Gaebler\Results.txt and the script to retrieve the results is in the same director and called Gaebler.py.
Extract incubator data from data on regional resources
Source | Progress | How many? | Region | Data | Method |
---|---|---|---|---|---|
Alabama Business Incubation Network | Done | 12 | Alabama | Incubator Name, URL, and Brief Description | regular expression |
IdeaGist - Alasak | Done | 1 | Alaska | Company name, URL, City, State | Manual Collection |
Florida Business Incubation Association | Done | 72 | Florida | incubator name, address, city, state, phone number and url | regular expression |
Louisiana Business Incubation Association | Done | 25 | Louisiana |
|
regular expression |
Maryland Business Incubation Association | Done | 35 | Maryland | Incubator name, short description, and link to another page within main site with contains a link to the incubator home page | regular expression |
Massachusetts Association of Business Incubators | Done | 21 | Massachusetts | incubator name, short description, and link to incubator home page | regular expression |
Boston Startup Guide | Done | 10 | Boston |
|
regular expression |
Michigan Business Innovation Association | Done | 15 | Michigan | company name, ulr, address, url, city, state, zip code | regular expression |
NH Tech Alliance | Done | 10 | New Hampshire | company name, city, url, brief description | regular expression |
NC Business Incubation Association | Done | 33 | North Carolina | Incubator name, address, contact, title, phone number, url and email | Manual Data Collection |
Oklahoma Business Incubator Association | Done | 34 | Oklahoma | Incubator name and link to it | regular expression |
Incubators/Accelerators In DC | Done | 55* | DC | Incubator name and link to it and brief description | regular expression |
High Tech News and Information for South California | Done | 34 | California | Url, company name, description, city, state | regular expression |
Leagal Counsel to Entrepreneurs and Emerging Growth Companies | Done | 25 | Los Angeles | Url, company name, city, state, description | regular expression |
IdeaGist - Colorado | Done | 8 | Colorado | Company name, url, location | Manual collection |
IdeaGist - Connecticut | Done | 7 | Connecticut | Company name, url, location | Manual collection |
Delaware Business Times | Done | 11 | Delaware | URL, company name, address, city, state code, phone number, email, description | Regular expression |
Washington State Department of Commerce | Done | 25 | WA | Url, company name, address, city, state, zipcode | manual collection |
Seattle Incubators | Done | 10 | Seattle | Company name, url, description | regular expression |
Digital NYC | Done | 25 | NYC | Company name, description | regular expression |
Idaho Commerce | Done | 14 | Idaho | URL, company name, city | regular expression |
Business Oregon | Done | 25 | Oregon | Company name, address, city, state, zip code, service area, description | regular expression |
Tech.co | Done | 16 | Arizona | URL, Company name, description | regular expression |
Arkansas Inc | Done | 3 | Arkansas | URL, Company name, description | Regular Expression |
Notes:
- DC includes both incubators and accelerators
- Oregon includes both incubators and accelerators
- Arizona includes both incubators and accelerators
- Clustermapping contains non-US data. They have been highlighted in the spreadsheet
Retrieving Incubators from Crunchbase Database
We are pulling out relevant fields from crunchbase database using incubator uuids chosen by Yi and Libby following the process:
1) Create a file of uuids of incubators
- CrunchbaseShortOrgDescChosenByYi.txt (275)
- CrunchbaseShortOrgDescChosenByLibby.txt (301)
File path: Z:\crunchbase3
2) Load the file into the database
DROP TABLE ChosenShortOrgUUIDs; CREATE TABLE ChosenShortOrgUUIDs ( uuid varchar(100) ); \COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByYi.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --275
DROP TABLE ChosenLongOrgUUIDs; CREATE TABLE ChosenLongOrgUUIDs ( uuid varchar(100) ); \COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByLibby.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --301
3) Run a query that joins uuids with related fields
Fields we are interested in:
company_name, domain, homepage_url, country_code, state_code, region, city, address, status, short_description, category_list, category_group_list, funding_rounds, funding_total_usd, founded_on, employee_count, A.uuid, primary_role, type
4) Resulting files are in:
Z:\crunchbase3
File names: ChosenLongOrgResults.txt, ChosenShortOrgResults.txt
Useful Regular Exes
1. Replace “\s+$” with [leave blank] to remove all the empty lines
2. Replace "s+$" with [leave blank] to removes all the whitespaces
3. <.*> finds everything that starts with < and ends with >
4. Replace href=" with "\n" to start a new line for each url
5. Replace "\s\s+" with [leave blank] to remove more than one white spaces
6. Replace "(?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" with "\1\2\3\4\5\6\r\n" to merge every six lines
7. Replace "[ ]{2,}" with [leave blank] removes more than one spaces between two words
8. Crtl+Q, B turns on the block select mode
9. Replace " .*" with [leave black] to remove noncharacters
Useful PostgreSQL Script
Loading/Unloading Data
Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS (unless there is a VERY good reason, like the source data is huge and comes preformatted differently).
Load using: \COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
Unload (copy to txt file) using: \COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
Creating Tables
DROP TABLE tablename; CREATE TABLE tablename ( field1 varchar(100), field2 int, field3 date, field4 real );