Difference between revisions of "US Incubators"
(103 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{Project | {{Project | ||
+ | |Has project output=Data,How-to | ||
+ | |Has sponsor=Kauffman Incubator Project | ||
|Has title=US Incubators | |Has title=US Incubators | ||
|Has owner=Yi Ma, | |Has owner=Yi Ma, | ||
Line 8: | Line 10: | ||
The objective of this project is to assemble a near-population dataset on U.S. incubators! This project uses the [[Incubator Seed Data]]. | The objective of this project is to assemble a near-population dataset on U.S. incubators! This project uses the [[Incubator Seed Data]]. | ||
+ | |||
+ | =File Location= | ||
+ | |||
+ | E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators | ||
+ | |||
+ | Notes: | ||
+ | * Highlighted rows need to be deleted | ||
+ | * The format of zip code field is text | ||
=Progress= | =Progress= | ||
− | == | + | ==Extract incubator data from data on national resources== |
*National Data | *National Data | ||
Line 20: | Line 30: | ||
! How many? | ! How many? | ||
! Data | ! Data | ||
+ | ! Method | ||
|- | |- | ||
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators] | | [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators] | ||
Line 25: | Line 36: | ||
| 21 | | 21 | ||
| | | | ||
− | * url | + | * url |
+ | * company name | ||
+ | * city | ||
+ | * state | ||
+ | | regular expression | ||
|- | |- | ||
| [http://exchange.inbia.org/network/findacompany/ InterNational Business Incubation Association] or see our [[INBIA]] page | | [http://exchange.inbia.org/network/findacompany/ InterNational Business Incubation Association] or see our [[INBIA]] page | ||
Line 31: | Line 46: | ||
| 415 | | 415 | ||
| | | | ||
− | * Company Name and | + | * Company Name, address, ,city, state, zip code, country, url and contact person |
+ | | regular expression | ||
|- | |- | ||
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping] | | [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping] | ||
− | | | + | | Done |
| 292 | | 292 | ||
| | | | ||
− | * Company name | + | * Company name, description, address 1, address 2, city, state, zip code |
− | + | | regular expression | |
|- | |- | ||
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ] | | [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ] | ||
− | | | + | | Link doesn't work |
| 186 Results | | 186 Results | ||
| | | | ||
Line 47: | Line 63: | ||
* low equity, high offer, high value | * low equity, high offer, high value | ||
* high equity, low offer, low value | * high equity, low offer, low value | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [http://www.gaebler.com/Business-Incubator-Lists-By-State.htm Gaebler] | ||
+ | | Done | ||
+ | | 360 Results | ||
+ | | | ||
+ | * incubator name | ||
+ | * url | ||
+ | | regular expression | ||
|} | |} | ||
− | * | + | :* Gaebler incubator list is in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\Gaebler\Results.txt and the script to retrieve the results is in the same director and called Gaebler.py. |
+ | |||
+ | ==Extract incubator data from data on regional resources== | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
Line 57: | Line 84: | ||
! Region | ! Region | ||
! Data | ! Data | ||
− | ! | + | ! Method |
|- | |- | ||
Line 65: | Line 92: | ||
| Alabama | | Alabama | ||
| Incubator Name, URL, and Brief Description | | Incubator Name, URL, and Brief Description | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-alaska-usa/ IdeaGist - Alasak] | ||
+ | | Done | ||
+ | | 1 | ||
+ | | Alaska | ||
+ | | Company name, URL, City, State | ||
+ | | Manual Collection | ||
|- | |- | ||
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association] | | [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association] | ||
Line 71: | Line 106: | ||
| Florida | | Florida | ||
| incubator name, address, city, state, phone number and url | | incubator name, address, city, state, phone number and url | ||
+ | | regular expression | ||
+ | |- | ||
+ | |[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide] | ||
+ | | Done | ||
+ | | 10 | ||
+ | | Boston | ||
+ | | | ||
+ | *Company Name and URL | ||
+ | * Capital Provided & equity taken | ||
+ | * Application Process | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association] | ||
+ | | Done | ||
+ | | 33 | ||
+ | | North Carolina | ||
+ | | Incubator name, address, contact, title, phone number, url and email | ||
+ | | Manual Data Collection | ||
+ | |- | ||
+ | | [https://www.okbia.org/our-members Oklahoma Business Incubator Association] | ||
+ | | Done | ||
+ | | 34 | ||
+ | | Oklahoma | ||
+ | | Incubator name and link to it | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://www.socaltech.com/incubate.php High Tech News and Information for South California] | ||
+ | | Done | ||
+ | | 34 | ||
+ | | California | ||
+ | | Url, company name, description, city, state | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [http://barberacorporatelaw.com/blog/2014/4/8/28-business-incubators-in-the-los-angeles-area Leagal Counsel to Entrepreneurs and Emerging Growth Companies] | ||
+ | | Done | ||
+ | | 25 | ||
+ | | Los Angeles | ||
+ | | Url, company name, city, state, description | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-colorado-usa/ IdeaGist - Colorado] | ||
+ | | Done | ||
+ | | 8 | ||
+ | | Colorado | ||
+ | | Company name, url, location | ||
+ | | Manual collection | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-connecticut-usa/ IdeaGist - Connecticut] | ||
+ | | Done | ||
+ | | 7 | ||
+ | | Connecticut | ||
+ | | Company name, url, location | ||
+ | | Manual collection | ||
+ | |- | ||
+ | | [https://www.delawarebusinesstimes.com/coworking-incubators/ Delaware Business Times] | ||
+ | | Done | ||
+ | | 11 | ||
+ | | DE | ||
+ | | URL, company name, address, city, state code, phone number, email, description | ||
+ | | Regular expression | ||
+ | |- | ||
+ | | [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC] | ||
+ | | Done | ||
+ | | 55* | ||
+ | | DC | ||
+ | | Incubator name and link to it and brief description | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association] | ||
+ | | Done | ||
+ | | 72 | ||
+ | | FL | ||
+ | | Company name, address, city, state code, zip code, phone number, URL | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://www.georgia.org/business-incubators Georgia Department of Economic Development] | ||
+ | | Done | ||
+ | | 12 | ||
+ | | GA | ||
+ | | URL, company name, address, city, state code, zip code, phone number, contact person | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-hawaii-usa/ IdeaGist - Hawaii] [http://www.buzgate.org/8.0/hi/fh_incubators.html?cb=none Business Utility Zone Gateway] | ||
+ | | Done | ||
+ | | 3 | ||
+ | | HI | ||
+ | | Company name, url, city, state code | ||
+ | | manual collection | ||
+ | |- | ||
+ | | [https://commerce.idaho.gov/business-climate/entrepreneurial-culture/ Idaho Commerce] | ||
+ | | Done | ||
+ | | 14 | ||
+ | | IO | ||
+ | | URL, company name, city | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-illinois-usa/ IdeaGist - Illinois] | ||
+ | | Done | ||
+ | | 18 | ||
+ | | IL | ||
+ | | Company name, url, city, state code | ||
+ | | manual collection | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-indiana-usa/ IdeaGist - Indiana] | ||
+ | | Done | ||
+ | | 3 | ||
+ | | IN | ||
+ | | Company name, url, city, state code | ||
+ | | manual collection | ||
+ | |- | ||
+ | | [https://www.iasourcelink.com/resources/business-incubators-accelerators-and-coworks IA SourceLink] [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-iowa-usa/ IdeaGist - Iowa] | ||
+ | | Done | ||
+ | | 10 | ||
+ | | IA | ||
+ | | Company name, city, state code, url, description, email, contact person, phone number | ||
+ | | manual collection | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-kansas-usa/ IdeaGist - Kansas] [https://innovatekansas.org/innovation-resources/ InnovateKansas] | ||
+ | | Done | ||
+ | | 3 | ||
+ | | KS | ||
+ | | Company name, url, city, state code | ||
+ | | manual collection | ||
+ | |- | ||
+ | | [https://www.thinkkentucky.com/Entrepreneurship/Accelerators_Incubators.aspx ThinkKentucky] | ||
+ | | Done | ||
+ | | 10 | ||
+ | | KY | ||
+ | | Company name, address, city, state code, zip code, url, description | ||
+ | | regular expression | ||
|- | |- | ||
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association] | | [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association] | ||
| Done | | Done | ||
| 25 | | 25 | ||
− | | | + | | LA |
− | | | + | | Company name, contact person, title, address, city, state code, zip code, phone number, email, url |
− | + | | regular expression | |
− | + | |- | |
− | + | | [http://www.innovatorsguide.org/incubators/maine_business_incubators.htm Innovators Guide - Maine] | |
− | + | | Done | |
+ | | 4 | ||
+ | | ME | ||
+ | | URL, company name, city, state code | ||
+ | | regular expression | ||
|- | |- | ||
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association] | | [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association] | ||
| Done | | Done | ||
| 35 | | 35 | ||
− | | | + | | MD |
− | | | + | | URL, company name, state code, description |
+ | | regular expression | ||
|- | |- | ||
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators] | | [https://www.massincubators.org/ Massachusetts Association of Business Incubators] | ||
| Done | | Done | ||
| 21 | | 21 | ||
− | | | + | | MA |
− | | | + | | Company Name, URL, State Code, Description |
+ | | regular expression | ||
+ | |- | ||
+ | | [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association] | ||
+ | | Done | ||
+ | | 15 | ||
+ | | MI | ||
+ | | Company Name, URL, Address, City, State Code, Zip Code | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://www.americaninno.com/minne/guides-minne/everything-you-need-to-know-about-minnesotas-startup-incubators-and-accelerators/ MinneInno] [https://ideagist.com/list-of-accelerators-and-incubators-in-minnesota/ IdeaGist - Minnesota] | ||
+ | | Done | ||
+ | | 7 | ||
+ | | MN | ||
+ | | URL, Company Name, City, State Code, Description | ||
+ | | Regular Exes | ||
|- | |- | ||
− | |[https:// | + | | [https://www.mississippi.org/home-page/business-services/entrepreneurs-small-business/mississippi-business-incubators/ Mississippi Delevlopment Authority] |
+ | | Done | ||
+ | | 26 | ||
+ | | MS | ||
+ | | Company Name, Address, City, State Code, Zip Code, Phone Number, Contact Person, Title, Email, URL | ||
+ | | Regular Exes | ||
+ | |- | ||
+ | |- | ||
+ | | [https://www.mosourcelink.com/guides/innovation-led/wet-labs-and-technology-incubators SourceLink - MO] [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-missouri-usa/ IdeaGist - Missouri] | ||
+ | | Done | ||
+ | | 23 | ||
+ | | MO | ||
+ | | Company Name, URL, City, State Code, Description | ||
+ | | Regular Exes | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-montana-usa/ IdeaGist - Montana] | ||
+ | | Done | ||
+ | | 2 | ||
+ | | MT | ||
+ | | Company Name, URL, City, State Code | ||
+ | | Manual | ||
+ | |- | ||
+ | | [https://www.omaha.com/special_sections/outlook/accelerators-incubators-help-boost-startups/article_97005ca0-1949-57e6-87bc-21fc4df9adba.html Omaha World Herald] | ||
+ | | Done | ||
+ | | 9 | ||
+ | | NE | ||
+ | | Company Name, Address, City, State Code, Phone Number, URL, Description | ||
+ | | Regular Exes | ||
+ | |- | ||
+ | | [http://business.nv.gov/Resource_Center/Coworking/_Incubator/_Accelerator_Spaces/ Department of Business and Industry - Nevada] | ||
+ | | Done | ||
+ | | 25 | ||
+ | | NV | ||
+ | | URL, Company Name, State Code, Description | ||
+ | | Regular Exes | ||
+ | |- | ||
+ | | [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ] | ||
| Done | | Done | ||
| 10 | | 10 | ||
− | | | + | | NH |
− | | | + | | Company Name, City, State Code, URL, Description |
− | + | | Regular Exes | |
− | |||
− | |||
|- | |- | ||
− | | [https:// | + | | [https://libguides.rutgers.edu/c.php?g=336746&p=2267151 Rutgers University Libraries] |
− | | | + | | Done |
| 15 | | 15 | ||
− | | | + | | NJ |
− | | | + | | URL, Company Name, City, State Code |
− | |- | + | | Regular Exes |
− | | [https:// | + | |- |
− | | | + | | [https://gonm.biz/business-development/start/small-business-incubators-accelerators New Mexico Economic Development] |
+ | | Done | ||
| 8 | | 8 | ||
− | | | + | | NM |
− | | | + | | Company Name, State Code, URL, Description |
+ | | Regular Exes | ||
+ | |- | ||
+ | | [https://fuzehub.com/fuzehub-blog/fuzehub-report-new-york-state-incubators-and-innovation-hot-spots/ FuzeHub] | ||
+ | | Done | ||
+ | | 33 | ||
+ | | NY | ||
+ | | URL, Company Name, Type, Manufacturing Sector, Services, Description | ||
+ | | Regular Exes | ||
+ | |- | ||
+ | | [http://www.ncincubation.org/NCIncubators.aspx North Carolina Business Incubation Association] | ||
+ | | Done | ||
+ | | 33 | ||
+ | | NC | ||
+ | | Company Name, Address, City, State Code, Contact Person, Title, Phone Number, URL, email | ||
+ | | Manual | ||
+ | |- | ||
+ | | [https://www.business.nd.gov/aviation/TechnologyParksandIncubators/ Economic Development and Finance of North Dakota] | ||
+ | | Done | ||
+ | | 2 | ||
+ | | ND | ||
+ | | Company Name, URL, Address, City, State Code, Description | ||
+ | | Manual | ||
+ | |- | ||
+ | | [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-ohio-usa/ IdeaGist - Ohio] | ||
+ | | Done | ||
+ | | 7 | ||
+ | | OH | ||
+ | | Company Name, URL, Ciry, State Code | ||
+ | | Manual | ||
+ | |- | ||
+ | | [http://startup.choosewashingtonstate.com/resources/work-spaces/ Washington State Department of Commerce] | ||
+ | | Done | ||
+ | | 25 | ||
+ | | WA | ||
+ | | Url, company name, address, city, state, zipcode | ||
+ | | manual collection | ||
+ | |- | ||
+ | | [https://www.newtechnorthwest.com/resource-guide/incubators/ Seattle Incubators] | ||
+ | | Done | ||
+ | | 10 | ||
+ | | Seattle | ||
+ | | Company name, url, description | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://www.digital.nyc/incubators/search?keywords=&field_incubator_type_value=0 Digital NYC] | ||
+ | | Done | ||
+ | | 25 | ||
+ | | NYC | ||
+ | | Company name, description | ||
+ | | regular expression | ||
+ | |- | ||
+ | | [https://www.oregon4biz.com/Innovate-&-Create/R&D-Business/Incubators/ Business Oregon] | ||
+ | | Done | ||
+ | | 25 | ||
+ | | Oregon | ||
+ | | Company name, address, city, state, zip code, service area, description | ||
+ | | regular expression | ||
|- | |- | ||
− | | [ | + | | [https://tech.co/news/arizona-incubators-accelerator-listed-2017-08 Tech.co] |
− | | | + | | Done |
− | | | + | | 16 |
− | | | + | | Arizona |
− | | | + | | URL, Company name, description |
+ | | regular expression | ||
|- | |- | ||
− | | [https://www. | + | | [https://www.arkansasedc.com/business-resources/entrepreneurial-resources Arkansas Inc] |
− | | | + | | Done |
− | | | + | | 3 |
− | | | + | | Arkansas |
− | | | + | | URL, Company name, description |
+ | | regular expression | ||
|- | |- | ||
− | | [https:// | + | | [https://www.creativeportland.com/resources/incubators-business-counseling CreativePortland] |
− | | | + | | Done |
− | | | + | | 11 |
− | | | + | | Portland |
− | | | + | | Company name, url, description |
+ | | regular expression | ||
|} | |} | ||
+ | |||
+ | Notes: | ||
+ | *DC includes both incubators and accelerators | ||
+ | *Oregon includes both incubators and accelerators | ||
+ | *Arizona includes both incubators and accelerators | ||
+ | *Nevada includes incubators, accelerators and coworking space | ||
+ | *Clustermapping contains non-US data. They have been highlighted in the spreadsheet | ||
+ | |||
+ | DC, Oregon, Arizona and Nevada where all reprocessed in '''US Incubators - Reviewed By Ed.xlsx''' and files named statename-incubators.txt were created for import. For these 4 states, organizations were excluded by default and only included if they self-identified as an incubator in some fashion (name, description, etc.). Note that the US incubators data contains social impact incubators (though it does not appear to contain virtual ones as it is location based). | ||
+ | |||
+ | The SQL script LoadTables.sql, in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators, loads all of the tables in the '''incubators''' dbase and does the basic manipulation of the data. This code is also in Incubators.sql in E:\projects\Kauffman Incubator Project -- and this file's version was the last one updated. | ||
+ | |||
+ | The resulting assembly of the state data, which also includes cities like Los Angeles, Boston, NYC, and others, results in the '''USIncubators''' table and corresponding '''USIncubators.txt''' text file, which has 707 records and the following fields with the following coverage: | ||
+ | *orgname --707 | ||
+ | *statecode --707 | ||
+ | *url --609 | ||
+ | *description --343 | ||
+ | *city --524 | ||
+ | *address --252 | ||
+ | *zip --235 | ||
+ | |||
+ | ==Retrieving Incubators from Crunchbase Database== | ||
+ | |||
+ | We are pulling out relevant fields from crunchbase database using incubator uuids chosen by Yi and Libby following the process: | ||
+ | |||
+ | 1) Create a file of uuids of incubators | ||
+ | |||
+ | *CrunchbaseShortOrgDescChosenByYi.txt (275) | ||
+ | *CrunchbaseShortOrgDescChosenByLibby.txt (301) | ||
+ | |||
+ | File path: Z:\crunchbase3 | ||
+ | |||
+ | 2) Load the file into the database | ||
+ | DROP TABLE ChosenShortOrgUUIDs; | ||
+ | CREATE TABLE ChosenShortOrgUUIDs ( | ||
+ | uuid varchar(100) | ||
+ | ); | ||
+ | \COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByYi.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV | ||
+ | --275 | ||
+ | |||
+ | DROP TABLE ChosenLongOrgUUIDs; | ||
+ | CREATE TABLE ChosenLongOrgUUIDs ( | ||
+ | uuid varchar(100) | ||
+ | ); | ||
+ | \COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByLibby.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV | ||
+ | --301 | ||
+ | |||
+ | 3) Run a query that joins uuids with related fields | ||
+ | |||
+ | Fields we are interested in: | ||
+ | company_name, domain, homepage_url, country_code, state_code, region, city, address, status, short_description, category_list, category_group_list, funding_rounds, funding_total_usd, founded_on, employee_count, A.uuid, primary_role, type | ||
+ | |||
+ | 4) Resulting files are in: | ||
+ | Z:\crunchbase3 | ||
+ | File names: ChosenLongOrgResults.txt, ChosenShortOrgResults.txt | ||
=Useful Regular Exes= | =Useful Regular Exes= | ||
− | 1. Replace “\s+$” with | + | 1. Replace “\s+$” with [leave blank] to remove all the empty lines |
− | 2. <.*> finds everything that starts with < and ends with > | + | |
+ | 2. Replace "s+$" with [leave blank] to removes all the whitespaces | ||
+ | |||
+ | 3. <.*> finds everything that starts with < and ends with > | ||
+ | |||
+ | 4. Replace href=" with "\n" to start a new line for each url | ||
+ | |||
+ | 5. Replace "\s\s+" with [leave blank] to remove more than one white spaces | ||
+ | |||
+ | 6. Replace "(?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" with "\1\2\3\4\5\6\r\n" to merge every six lines | ||
+ | |||
+ | 7. Replace "[ ]{2,}" with [leave blank] removes more than one spaces between two words | ||
+ | |||
+ | 8. Crtl+Q, B turns on the block select mode | ||
+ | |||
+ | 9. Replace " .*" with [leave black] to remove noncharacters | ||
+ | |||
+ | =Useful PostgreSQL Script= | ||
+ | |||
+ | ==Loading/Unloading Data== | ||
+ | |||
+ | Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS (unless there is a VERY good reason, like the source data is huge and comes preformatted differently). | ||
+ | |||
+ | Load using: | ||
+ | <nowiki>\COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV </nowiki> | ||
+ | |||
+ | Unload (copy to txt file) using: | ||
+ | <nowiki>\COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV</nowiki> | ||
+ | |||
+ | ==Creating Tables== | ||
+ | |||
+ | DROP TABLE tablename; | ||
+ | |||
+ | CREATE TABLE tablename ( | ||
+ | field1 varchar(100), | ||
+ | field2 int, | ||
+ | field3 date, | ||
+ | field4 real | ||
+ | ); |
Latest revision as of 13:43, 21 September 2020
US Incubators | |
---|---|
Project Information | |
Has title | US Incubators |
Has owner | Yi Ma |
Has start date | |
Has deadline date | |
Has project status | Active |
Has sponsor | Kauffman Incubator Project |
Has project output | Data, How-to |
Copyright © 2019 edegan.com. All Rights Reserved. |
Objective
The objective of this project is to assemble a near-population dataset on U.S. incubators! This project uses the Incubator Seed Data.
File Location
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators
Notes:
- Highlighted rows need to be deleted
- The format of zip code field is text
Progress
Extract incubator data from data on national resources
- National Data
Source | Progress | How many? | Data | Method |
---|---|---|---|---|
Whartoneclub Incubators | Done | 21 |
|
regular expression |
InterNational Business Incubation Association or see our INBIA page | Done | 415 |
|
regular expression |
Clustermapping | Done | 292 |
|
regular expression |
The MBA Is Dead | Link doesn't work | 186 Results |
|
regular expression |
Gaebler | Done | 360 Results |
|
regular expression |
- Gaebler incubator list is in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\Gaebler\Results.txt and the script to retrieve the results is in the same director and called Gaebler.py.
Extract incubator data from data on regional resources
Source | Progress | How many? | Region | Data | Method |
---|---|---|---|---|---|
Alabama Business Incubation Network | Done | 12 | Alabama | Incubator Name, URL, and Brief Description | regular expression |
IdeaGist - Alasak | Done | 1 | Alaska | Company name, URL, City, State | Manual Collection |
Florida Business Incubation Association | Done | 72 | Florida | incubator name, address, city, state, phone number and url | regular expression |
Boston Startup Guide | Done | 10 | Boston |
|
regular expression |
NC Business Incubation Association | Done | 33 | North Carolina | Incubator name, address, contact, title, phone number, url and email | Manual Data Collection |
Oklahoma Business Incubator Association | Done | 34 | Oklahoma | Incubator name and link to it | regular expression |
High Tech News and Information for South California | Done | 34 | California | Url, company name, description, city, state | regular expression |
Leagal Counsel to Entrepreneurs and Emerging Growth Companies | Done | 25 | Los Angeles | Url, company name, city, state, description | regular expression |
IdeaGist - Colorado | Done | 8 | Colorado | Company name, url, location | Manual collection |
IdeaGist - Connecticut | Done | 7 | Connecticut | Company name, url, location | Manual collection |
Delaware Business Times | Done | 11 | DE | URL, company name, address, city, state code, phone number, email, description | Regular expression |
Incubators/Accelerators In DC | Done | 55* | DC | Incubator name and link to it and brief description | regular expression |
Florida Business Incubation Association | Done | 72 | FL | Company name, address, city, state code, zip code, phone number, URL | regular expression |
Georgia Department of Economic Development | Done | 12 | GA | URL, company name, address, city, state code, zip code, phone number, contact person | regular expression |
IdeaGist - Hawaii Business Utility Zone Gateway | Done | 3 | HI | Company name, url, city, state code | manual collection |
Idaho Commerce | Done | 14 | IO | URL, company name, city | regular expression |
IdeaGist - Illinois | Done | 18 | IL | Company name, url, city, state code | manual collection |
IdeaGist - Indiana | Done | 3 | IN | Company name, url, city, state code | manual collection |
IA SourceLink IdeaGist - Iowa | Done | 10 | IA | Company name, city, state code, url, description, email, contact person, phone number | manual collection |
IdeaGist - Kansas InnovateKansas | Done | 3 | KS | Company name, url, city, state code | manual collection |
ThinkKentucky | Done | 10 | KY | Company name, address, city, state code, zip code, url, description | regular expression |
Louisiana Business Incubation Association | Done | 25 | LA | Company name, contact person, title, address, city, state code, zip code, phone number, email, url | regular expression |
Innovators Guide - Maine | Done | 4 | ME | URL, company name, city, state code | regular expression |
Maryland Business Incubation Association | Done | 35 | MD | URL, company name, state code, description | regular expression |
Massachusetts Association of Business Incubators | Done | 21 | MA | Company Name, URL, State Code, Description | regular expression |
Michigan Business Innovation Association | Done | 15 | MI | Company Name, URL, Address, City, State Code, Zip Code | regular expression |
MinneInno IdeaGist - Minnesota | Done | 7 | MN | URL, Company Name, City, State Code, Description | Regular Exes |
Mississippi Delevlopment Authority | Done | 26 | MS | Company Name, Address, City, State Code, Zip Code, Phone Number, Contact Person, Title, Email, URL | Regular Exes |
SourceLink - MO IdeaGist - Missouri | Done | 23 | MO | Company Name, URL, City, State Code, Description | Regular Exes |
IdeaGist - Montana | Done | 2 | MT | Company Name, URL, City, State Code | Manual |
Omaha World Herald | Done | 9 | NE | Company Name, Address, City, State Code, Phone Number, URL, Description | Regular Exes |
Department of Business and Industry - Nevada | Done | 25 | NV | URL, Company Name, State Code, Description | Regular Exes |
NH Tech Alliance | Done | 10 | NH | Company Name, City, State Code, URL, Description | Regular Exes |
Rutgers University Libraries | Done | 15 | NJ | URL, Company Name, City, State Code | Regular Exes |
New Mexico Economic Development | Done | 8 | NM | Company Name, State Code, URL, Description | Regular Exes |
FuzeHub | Done | 33 | NY | URL, Company Name, Type, Manufacturing Sector, Services, Description | Regular Exes |
North Carolina Business Incubation Association | Done | 33 | NC | Company Name, Address, City, State Code, Contact Person, Title, Phone Number, URL, email | Manual |
Economic Development and Finance of North Dakota | Done | 2 | ND | Company Name, URL, Address, City, State Code, Description | Manual |
IdeaGist - Ohio | Done | 7 | OH | Company Name, URL, Ciry, State Code | Manual |
Washington State Department of Commerce | Done | 25 | WA | Url, company name, address, city, state, zipcode | manual collection |
Seattle Incubators | Done | 10 | Seattle | Company name, url, description | regular expression |
Digital NYC | Done | 25 | NYC | Company name, description | regular expression |
Business Oregon | Done | 25 | Oregon | Company name, address, city, state, zip code, service area, description | regular expression |
Tech.co | Done | 16 | Arizona | URL, Company name, description | regular expression |
Arkansas Inc | Done | 3 | Arkansas | URL, Company name, description | regular expression |
CreativePortland | Done | 11 | Portland | Company name, url, description | regular expression |
Notes:
- DC includes both incubators and accelerators
- Oregon includes both incubators and accelerators
- Arizona includes both incubators and accelerators
- Nevada includes incubators, accelerators and coworking space
- Clustermapping contains non-US data. They have been highlighted in the spreadsheet
DC, Oregon, Arizona and Nevada where all reprocessed in US Incubators - Reviewed By Ed.xlsx and files named statename-incubators.txt were created for import. For these 4 states, organizations were excluded by default and only included if they self-identified as an incubator in some fashion (name, description, etc.). Note that the US incubators data contains social impact incubators (though it does not appear to contain virtual ones as it is location based).
The SQL script LoadTables.sql, in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators, loads all of the tables in the incubators dbase and does the basic manipulation of the data. This code is also in Incubators.sql in E:\projects\Kauffman Incubator Project -- and this file's version was the last one updated.
The resulting assembly of the state data, which also includes cities like Los Angeles, Boston, NYC, and others, results in the USIncubators table and corresponding USIncubators.txt text file, which has 707 records and the following fields with the following coverage:
- orgname --707
- statecode --707
- url --609
- description --343
- city --524
- address --252
- zip --235
Retrieving Incubators from Crunchbase Database
We are pulling out relevant fields from crunchbase database using incubator uuids chosen by Yi and Libby following the process:
1) Create a file of uuids of incubators
- CrunchbaseShortOrgDescChosenByYi.txt (275)
- CrunchbaseShortOrgDescChosenByLibby.txt (301)
File path: Z:\crunchbase3
2) Load the file into the database
DROP TABLE ChosenShortOrgUUIDs; CREATE TABLE ChosenShortOrgUUIDs ( uuid varchar(100) ); \COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByYi.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --275
DROP TABLE ChosenLongOrgUUIDs; CREATE TABLE ChosenLongOrgUUIDs ( uuid varchar(100) ); \COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByLibby.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --301
3) Run a query that joins uuids with related fields
Fields we are interested in:
company_name, domain, homepage_url, country_code, state_code, region, city, address, status, short_description, category_list, category_group_list, funding_rounds, funding_total_usd, founded_on, employee_count, A.uuid, primary_role, type
4) Resulting files are in:
Z:\crunchbase3
File names: ChosenLongOrgResults.txt, ChosenShortOrgResults.txt
Useful Regular Exes
1. Replace “\s+$” with [leave blank] to remove all the empty lines
2. Replace "s+$" with [leave blank] to removes all the whitespaces
3. <.*> finds everything that starts with < and ends with >
4. Replace href=" with "\n" to start a new line for each url
5. Replace "\s\s+" with [leave blank] to remove more than one white spaces
6. Replace "(?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" with "\1\2\3\4\5\6\r\n" to merge every six lines
7. Replace "[ ]{2,}" with [leave blank] removes more than one spaces between two words
8. Crtl+Q, B turns on the block select mode
9. Replace " .*" with [leave black] to remove noncharacters
Useful PostgreSQL Script
Loading/Unloading Data
Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS (unless there is a VERY good reason, like the source data is huge and comes preformatted differently).
Load using: \COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
Unload (copy to txt file) using: \COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
Creating Tables
DROP TABLE tablename; CREATE TABLE tablename ( field1 varchar(100), field2 int, field3 date, field4 real );