Difference between revisions of "US Incubators"

From edegan.com
Jump to navigation Jump to search
 
(111 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{Project
 
{{Project
 +
|Has project output=Data,How-to
 +
|Has sponsor=Kauffman Incubator Project
 
|Has title=US Incubators
 
|Has title=US Incubators
 
|Has owner=Yi Ma,
 
|Has owner=Yi Ma,
Line 8: Line 10:
  
 
The objective of this project is to assemble a near-population dataset on U.S. incubators! This project uses the [[Incubator Seed Data]].
 
The objective of this project is to assemble a near-population dataset on U.S. incubators! This project uses the [[Incubator Seed Data]].
 +
 +
=File Location=
 +
 +
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators
 +
 +
Notes:
 +
* Highlighted rows need to be deleted
 +
* The format of zip code field is text
  
 
=Progress=
 
=Progress=
  
==1.Extract incubator data from the following sources==
+
==Extract incubator data from data on national resources==
  
 
*National Data
 
*National Data
Line 20: Line 30:
 
! How many?
 
! How many?
 
! Data
 
! Data
 +
! Method
 
|-
 
|-
 
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]
 
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]
Line 25: Line 36:
 
| 21
 
| 21
 
|  
 
|  
* URL, Company Name, City, State
+
* url
 +
* company name
 +
* city
 +
* state
 +
| regular expression
 
|-
 
|-
 
| [http://exchange.inbia.org/network/findacompany/ InterNational Business Incubation Association] or see our [[INBIA]] page
 
| [http://exchange.inbia.org/network/findacompany/ InterNational Business Incubation Association] or see our [[INBIA]] page
Line 31: Line 46:
 
| 415
 
| 415
 
|  
 
|  
* Company Name and address
+
* Company Name, address, ,city, state, zip code, country, url and contact person
 +
| regular expression
 
|-
 
|-
 
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]
 
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]
| Not Done
+
| Done
 
| 292
 
| 292
 
|  
 
|  
* Company name with link to a separate page within cluster mapping
+
* Company name, description, address 1, address 2, city, state, zip code
* on that page there is a link to the incubator's website
+
| regular expression
 
|-
 
|-
 
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]
 
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]
| Not Done
+
| Link doesn't work
 
| 186 Results
 
| 186 Results
 
|
 
|
Line 47: Line 63:
 
* low equity, high offer, high value
 
* low equity, high offer, high value
 
* high equity, low offer, low value
 
* high equity, low offer, low value
 +
| regular expression
 +
|-
 +
| [http://www.gaebler.com/Business-Incubator-Lists-By-State.htm Gaebler]
 +
| Done
 +
| 360 Results
 +
|
 +
* incubator name
 +
* url
 +
| regular expression
 
|}
 
|}
  
*Regional Data
+
:* Gaebler incubator list is in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\Gaebler\Results.txt and the script to retrieve the results is in the same director and called Gaebler.py.
 +
 
 +
==Extract incubator data from data on regional resources==
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
 
! Source
 
! Source
! Directions
+
! Progress
 
! How many?
 
! How many?
 
! Region
 
! Region
 
! Data
 
! Data
! Benefits
+
! Method
! Limitations
 
  
 
|-
 
|-
 
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]
 
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]
| Opened source link and counted incubators listed on the home page
+
| Done
 
| 12
 
| 12
 
| Alabama
 
| Alabama
| Incubator Name, Brief Description, and a link to the home page
+
| Incubator Name, URL, and Brief Description
| Reliable links that are filtered to include only incubators
+
| regular expression
| only contains information on incubators in Alabama that are associated with NBIA
+
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-alaska-usa/ IdeaGist - Alasak]
 +
| Done
 +
| 1
 +
| Alaska
 +
| Company name, URL, City, State
 +
| Manual Collection
 
|-
 
|-
 
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]
 
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]
| Opened source link and then opened links for each of the four regions in Florida
+
| Done
| 66
+
| 72
 
| Florida
 
| Florida
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page
+
| incubator name, address, city, state, phone number and url
| Provides reliable links. Filtered to include only information on incubators
+
| regular expression
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.
 
|-
 
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]
 
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared
 
| 28
 
| Louisiana
 
|
 
* incubator name
 
* contact name
 
* address and phone number
 
* link to website
 
| data is filtered to include only incubators, links are reliable
 
| only incubators in state of Louisiana, limited data set
 
|-
 
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]
 
| Opened source link and counted number of incubators listed on the page
 
| 35
 
| Maryland
 
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page
 
| Reliable links, filtered to include only incubators
 
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.
 
|-
 
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]
 
| Open source link and count number of incubators listed on the page
 
| 20
 
| Massachusetts
 
| incubator name, short description, and link to incubator home page
 
| reliable links, only data on incubators
 
| limited dataset
 
 
|-
 
|-
 
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]
 
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]
| Scrolled down to the section labeled "Startup incubators in Boston"
+
| Done
 
| 10
 
| 10
 
| Boston
 
| Boston
Line 113: Line 116:
 
* Capital Provided & equity taken
 
* Capital Provided & equity taken
 
* Application Process
 
* Application Process
| reliable links
+
| regular expression
| relatively unformatted data that would be challenging to use. Limited in scope
 
|-
 
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]
 
| Open source link and count number of incubators listed in the column next to the map
 
| 15
 
| Michigan
 
| incubator name, address, link to location on map, and link to incubator home page
 
| reliable links, only data on incubators
 
| limited dataset
 
 
|-  
 
|-  
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]
 
| Open source link and count organizations listed under "NHBIN Member Locations"
 
| 8
 
| New Hampshire
 
| incubator name, town within NH, brief description, and link to home page
 
| reliable links only data on incubators
 
| limited dataset, not very structured organization on website
 
|-
 
 
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]
 
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]
| Open source link, click on each county and count the number of business incubators
+
| Done
| 32
+
| 33
 
| North Carolina
 
| North Carolina
| Incubator name, address, program directors, and link
+
| Incubator name, address, contact, title, phone number, url and email
| only data on incubators
+
| Manual Data Collection
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links
 
 
|-
 
|-
 
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]
 
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]
| Open source link and count the number of incubators
+
| Done
| 29
+
| 34
 
| Oklahoma
 
| Oklahoma
 
| Incubator name and link to it
 
| Incubator name and link to it
| reliable links, only data on incubators
+
| regular expression
| limited dataset
+
|-
 +
| [https://www.socaltech.com/incubate.php High Tech News and Information for South California]
 +
| Done
 +
| 34
 +
| California
 +
| Url, company name, description, city, state
 +
| regular expression
 +
|-
 +
| [http://barberacorporatelaw.com/blog/2014/4/8/28-business-incubators-in-the-los-angeles-area Leagal Counsel to Entrepreneurs and Emerging Growth Companies]
 +
| Done
 +
| 25
 +
| Los Angeles
 +
| Url, company name, city, state, description
 +
| regular expression
 +
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-colorado-usa/ IdeaGist - Colorado]
 +
| Done
 +
| 8
 +
| Colorado
 +
| Company name, url, location
 +
| Manual collection
 +
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-connecticut-usa/ IdeaGist - Connecticut]
 +
| Done
 +
| 7
 +
| Connecticut
 +
| Company name, url, location
 +
| Manual collection
 +
|-
 +
| [https://www.delawarebusinesstimes.com/coworking-incubators/ Delaware Business Times]
 +
| Done
 +
| 11
 +
| DE
 +
| URL, company name, address, city, state code, phone number, email, description
 +
| Regular expression
 
|-
 
|-
 
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]
 
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]
| Open source link and count the number of incubators, I did not include co-working spaces
+
| Done
| 15
+
| 55*
 
| DC
 
| DC
 
| Incubator name and link to it and brief description
 
| Incubator name and link to it and brief description
| reliable links, helpful description
+
| regular expression
| limited dataset, mix of incubators and other organizations
+
|-
 +
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]
 +
| Done
 +
| 72
 +
| FL
 +
| Company name, address, city, state code, zip code, phone number, URL
 +
| regular expression
 +
|-
 +
| [https://www.georgia.org/business-incubators Georgia Department of Economic Development]
 +
| Done
 +
| 12
 +
| GA
 +
| URL, company name, address, city, state code, zip code, phone number, contact person
 +
| regular expression
 +
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-hawaii-usa/ IdeaGist - Hawaii] [http://www.buzgate.org/8.0/hi/fh_incubators.html?cb=none Business Utility Zone Gateway]
 +
| Done
 +
| 3
 +
| HI
 +
| Company name, url, city, state code
 +
| manual collection
 +
|-
 +
| [https://commerce.idaho.gov/business-climate/entrepreneurial-culture/ Idaho Commerce]
 +
| Done
 +
| 14
 +
| IO
 +
| URL, company name, city
 +
| regular expression
 +
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-illinois-usa/ IdeaGist - Illinois]
 +
| Done
 +
| 18
 +
| IL
 +
| Company name, url, city, state code
 +
| manual collection
 +
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-indiana-usa/ IdeaGist - Indiana]
 +
| Done
 +
| 3
 +
| IN
 +
| Company name, url, city, state code
 +
| manual collection
 +
|-
 +
| [https://www.iasourcelink.com/resources/business-incubators-accelerators-and-coworks IA SourceLink] [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-iowa-usa/ IdeaGist - Iowa]
 +
| Done
 +
| 10
 +
| IA
 +
| Company name, city, state code, url, description, email, contact person, phone number
 +
| manual collection
 +
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-kansas-usa/ IdeaGist - Kansas] [https://innovatekansas.org/innovation-resources/ InnovateKansas]
 +
| Done
 +
| 3
 +
| KS
 +
| Company name, url, city, state code
 +
| manual collection
 +
|-
 +
| [https://www.thinkkentucky.com/Entrepreneurship/Accelerators_Incubators.aspx ThinkKentucky]
 +
| Done
 +
| 10
 +
| KY
 +
| Company name, address, city, state code, zip code, url, description
 +
| regular expression
 +
|-
 +
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]
 +
| Done
 +
| 25
 +
| LA
 +
| Company name, contact person, title, address, city, state code, zip code, phone number, email, url
 +
| regular expression
 +
|-
 +
| [http://www.innovatorsguide.org/incubators/maine_business_incubators.htm Innovators Guide - Maine]
 +
| Done
 +
| 4
 +
| ME
 +
| URL, company name, city, state code
 +
| regular expression
 +
|-
 +
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]
 +
| Done
 +
| 35
 +
| MD
 +
| URL, company name, state code, description
 +
| regular expression
 +
|-
 +
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]
 +
| Done
 +
| 21
 +
| MA
 +
| Company Name, URL, State Code, Description
 +
| regular expression
 +
|-
 +
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]
 +
| Done
 +
| 15
 +
| MI
 +
| Company Name, URL, Address, City, State Code, Zip Code
 +
| regular expression
 +
|-
 +
| [https://www.americaninno.com/minne/guides-minne/everything-you-need-to-know-about-minnesotas-startup-incubators-and-accelerators/ MinneInno] [https://ideagist.com/list-of-accelerators-and-incubators-in-minnesota/ IdeaGist - Minnesota]
 +
| Done
 +
| 7
 +
| MN
 +
| URL, Company Name, City, State Code, Description
 +
| Regular Exes
 +
|-
 +
| [https://www.mississippi.org/home-page/business-services/entrepreneurs-small-business/mississippi-business-incubators/ Mississippi Delevlopment Authority]
 +
| Done
 +
| 26
 +
| MS
 +
| Company Name, Address, City, State Code, Zip Code, Phone Number, Contact Person, Title, Email, URL
 +
| Regular Exes
 +
|-
 +
|-
 +
| [https://www.mosourcelink.com/guides/innovation-led/wet-labs-and-technology-incubators SourceLink - MO] [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-missouri-usa/ IdeaGist - Missouri]
 +
| Done
 +
| 23
 +
| MO
 +
| Company Name, URL, City, State Code, Description
 +
| Regular Exes
 +
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-montana-usa/ IdeaGist - Montana]
 +
| Done
 +
| 2
 +
| MT
 +
| Company Name, URL, City, State Code
 +
| Manual
 +
|-
 +
| [https://www.omaha.com/special_sections/outlook/accelerators-incubators-help-boost-startups/article_97005ca0-1949-57e6-87bc-21fc4df9adba.html Omaha World Herald]
 +
| Done
 +
| 9
 +
| NE
 +
| Company Name, Address, City, State Code, Phone Number, URL, Description
 +
| Regular Exes
 +
|-
 +
| [http://business.nv.gov/Resource_Center/Coworking/_Incubator/_Accelerator_Spaces/ Department of Business and Industry - Nevada]
 +
| Done
 +
| 25
 +
| NV
 +
| URL, Company Name, State Code, Description
 +
| Regular Exes
 +
|-
 +
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]
 +
| Done
 +
| 10
 +
| NH
 +
| Company Name, City, State Code, URL, Description
 +
| Regular Exes
 +
|-
 +
| [https://libguides.rutgers.edu/c.php?g=336746&p=2267151 Rutgers University Libraries]
 +
| Done
 +
| 15
 +
| NJ
 +
| URL, Company Name, City, State Code
 +
| Regular Exes
 +
|-
 +
| [https://gonm.biz/business-development/start/small-business-incubators-accelerators New Mexico Economic Development]
 +
| Done
 +
| 8
 +
| NM
 +
| Company Name, State Code, URL, Description
 +
| Regular Exes
 +
|-
 +
| [https://fuzehub.com/fuzehub-blog/fuzehub-report-new-york-state-incubators-and-innovation-hot-spots/ FuzeHub]
 +
| Done
 +
| 33
 +
| NY
 +
| URL, Company Name, Type, Manufacturing Sector, Services, Description
 +
| Regular Exes
 +
|-
 +
| [http://www.ncincubation.org/NCIncubators.aspx North Carolina Business Incubation Association]
 +
| Done
 +
| 33
 +
| NC
 +
| Company Name, Address, City, State Code, Contact Person, Title, Phone Number, URL, email
 +
| Manual
 +
|-
 +
| [https://www.business.nd.gov/aviation/TechnologyParksandIncubators/ Economic Development and Finance of North Dakota]
 +
| Done
 +
| 2
 +
| ND
 +
| Company Name, URL, Address, City, State Code, Description
 +
| Manual
 +
|-
 +
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-ohio-usa/ IdeaGist - Ohio]
 +
| Done
 +
| 7
 +
| OH
 +
| Company Name, URL, Ciry, State Code
 +
| Manual
 +
|-
 +
| [http://startup.choosewashingtonstate.com/resources/work-spaces/ Washington State Department of Commerce]
 +
| Done
 +
| 25
 +
| WA
 +
| Url, company name, address, city, state, zipcode
 +
| manual collection
 +
|-
 +
| [https://www.newtechnorthwest.com/resource-guide/incubators/ Seattle Incubators]
 +
| Done
 +
| 10
 +
| Seattle
 +
| Company name, url, description
 +
| regular expression
 +
|-
 +
| [https://www.digital.nyc/incubators/search?keywords=&field_incubator_type_value=0 Digital NYC]
 +
| Done
 +
| 25
 +
| NYC
 +
| Company name, description
 +
| regular expression
 +
|-
 +
| [https://www.oregon4biz.com/Innovate-&-Create/R&D-Business/Incubators/ Business Oregon]
 +
| Done
 +
| 25
 +
| Oregon
 +
| Company name, address, city, state, zip code, service area, description
 +
| regular expression
 +
|-
 +
| [https://tech.co/news/arizona-incubators-accelerator-listed-2017-08 Tech.co]
 +
| Done
 +
| 16
 +
| Arizona
 +
| URL, Company name, description
 +
| regular expression
 +
|-
 +
| [https://www.arkansasedc.com/business-resources/entrepreneurial-resources Arkansas Inc]
 +
| Done
 +
| 3
 +
| Arkansas
 +
| URL, Company name, description
 +
| regular expression
 +
|-
 +
| [https://www.creativeportland.com/resources/incubators-business-counseling CreativePortland]
 +
| Done
 +
| 11
 +
| Portland
 +
| Company name, url, description
 +
| regular expression
 +
|}
 +
 
 +
Notes:
 +
*DC includes both incubators and accelerators
 +
*Oregon includes both incubators and accelerators
 +
*Arizona includes both incubators and accelerators
 +
*Nevada includes incubators, accelerators and coworking space
 +
*Clustermapping contains non-US data. They have been highlighted in the spreadsheet
 +
 
 +
DC, Oregon, Arizona and Nevada where all reprocessed in '''US Incubators - Reviewed By Ed.xlsx''' and files named statename-incubators.txt were created for import. For these 4 states, organizations were excluded by default and only included if they self-identified as an incubator in some fashion (name, description, etc.). Note that the US incubators data contains social impact incubators (though it does not appear to contain virtual ones as it is location based).
 +
 
 +
The SQL script LoadTables.sql, in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators, loads all of the tables in the '''incubators''' dbase and does the basic manipulation of the data. This code is also in Incubators.sql in E:\projects\Kauffman Incubator Project -- and this file's version was the last one updated.
 +
 
 +
The resulting assembly of the state data, which also includes cities like Los Angeles, Boston, NYC, and others, results in the '''USIncubators''' table and corresponding '''USIncubators.txt''' text file, which has 707 records and the following fields with the following coverage:
 +
*orgname --707
 +
*statecode --707
 +
*url --609
 +
*description --343
 +
*city --524
 +
*address --252
 +
*zip --235
 +
 
 +
==Retrieving Incubators from Crunchbase Database==
 +
 
 +
We are pulling out relevant fields from crunchbase database using incubator uuids chosen by Yi and Libby following the process:
 +
 
 +
1) Create a file of uuids of incubators
 +
 
 +
*CrunchbaseShortOrgDescChosenByYi.txt (275)
 +
*CrunchbaseShortOrgDescChosenByLibby.txt (301)
 +
 
 +
File path: Z:\crunchbase3
 +
 
 +
2) Load the file into the database
 +
DROP TABLE ChosenShortOrgUUIDs;
 +
CREATE TABLE ChosenShortOrgUUIDs (
 +
  uuid varchar(100)
 +
);
 +
\COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByYi.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
 +
--275
 +
 
 +
DROP TABLE ChosenLongOrgUUIDs;
 +
CREATE TABLE ChosenLongOrgUUIDs (
 +
  uuid varchar(100)
 +
);
 +
\COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByLibby.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
 +
--301
 +
 
 +
3) Run a query that joins uuids with related fields
 +
 
 +
Fields we are interested in:
 +
company_name, domain, homepage_url, country_code, state_code, region, city, address, status, short_description, category_list, category_group_list, funding_rounds, funding_total_usd, founded_on, employee_count, A.uuid, primary_role, type
 +
 
 +
4) Resulting files are in:
 +
Z:\crunchbase3
 +
File names: ChosenLongOrgResults.txt, ChosenShortOrgResults.txt
 +
 
 +
=Useful Regular Exes=
 +
 
 +
1. Replace “\s+$” with [leave blank] to remove all the empty lines
 +
 
 +
2. Replace "s+$" with [leave blank] to removes all the whitespaces
 +
 
 +
3. <.*> finds everything that starts with < and ends with >
 +
 
 +
4. Replace href=" with "\n" to start a new line for each url
 +
 
 +
5. Replace "\s\s+" with [leave blank] to remove more than one white spaces
 +
 
 +
6. Replace "(?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" with "\1\2\3\4\5\6\r\n" to merge every six lines
 +
 
 +
7. Replace "[ ]{2,}" with [leave blank] removes more than one spaces between two words
 +
 
 +
8. Crtl+Q, B turns on the block select mode
 +
 
 +
9. Replace " .*" with [leave black] to remove noncharacters
 +
 
 +
=Useful PostgreSQL Script=
 +
 
 +
==Loading/Unloading Data==
 +
 
 +
Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS (unless there is a VERY good reason, like the source data is huge and comes preformatted differently).
  
|}
+
Load using:
 +
<nowiki>\COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV </nowiki>
 +
 
 +
Unload (copy to txt file) using:
 +
<nowiki>\COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV</nowiki>
 +
 
 +
==Creating Tables==
 +
 
 +
DROP TABLE tablename;
 +
 
 +
CREATE TABLE tablename (
 +
  field1 varchar(100),
 +
  field2 int,
 +
  field3 date,
 +
  field4 real
 +
);

Latest revision as of 12:43, 21 September 2020


Project
US Incubators
Project logo 02.png
Project Information
Has title US Incubators
Has owner Yi Ma
Has start date
Has deadline date
Has project status Active
Has sponsor Kauffman Incubator Project
Has project output Data, How-to
Copyright © 2019 edegan.com. All Rights Reserved.


Objective

The objective of this project is to assemble a near-population dataset on U.S. incubators! This project uses the Incubator Seed Data.

File Location

E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators

Notes:

  • Highlighted rows need to be deleted
  • The format of zip code field is text

Progress

Extract incubator data from data on national resources

  • National Data
Source Progress How many? Data Method
Whartoneclub Incubators Done 21
  • url
  • company name
  • city
  • state
regular expression
InterNational Business Incubation Association or see our INBIA page Done 415
  • Company Name, address, ,city, state, zip code, country, url and contact person
regular expression
Clustermapping Done 292
  • Company name, description, address 1, address 2, city, state, zip code
regular expression
The MBA Is Dead Link doesn't work 186 Results
  • City and Country
  • low equity, high offer, high value
  • high equity, low offer, low value
regular expression
Gaebler Done 360 Results
  • incubator name
  • url
regular expression
  • Gaebler incubator list is in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\Gaebler\Results.txt and the script to retrieve the results is in the same director and called Gaebler.py.

Extract incubator data from data on regional resources

Source Progress How many? Region Data Method
Alabama Business Incubation Network Done 12 Alabama Incubator Name, URL, and Brief Description regular expression
IdeaGist - Alasak Done 1 Alaska Company name, URL, City, State Manual Collection
Florida Business Incubation Association Done 72 Florida incubator name, address, city, state, phone number and url regular expression
Boston Startup Guide Done 10 Boston
  • Company Name and URL
  • Capital Provided & equity taken
  • Application Process
regular expression
NC Business Incubation Association Done 33 North Carolina Incubator name, address, contact, title, phone number, url and email Manual Data Collection
Oklahoma Business Incubator Association Done 34 Oklahoma Incubator name and link to it regular expression
High Tech News and Information for South California Done 34 California Url, company name, description, city, state regular expression
Leagal Counsel to Entrepreneurs and Emerging Growth Companies Done 25 Los Angeles Url, company name, city, state, description regular expression
IdeaGist - Colorado Done 8 Colorado Company name, url, location Manual collection
IdeaGist - Connecticut Done 7 Connecticut Company name, url, location Manual collection
Delaware Business Times Done 11 DE URL, company name, address, city, state code, phone number, email, description Regular expression
Incubators/Accelerators In DC Done 55* DC Incubator name and link to it and brief description regular expression
Florida Business Incubation Association Done 72 FL Company name, address, city, state code, zip code, phone number, URL regular expression
Georgia Department of Economic Development Done 12 GA URL, company name, address, city, state code, zip code, phone number, contact person regular expression
IdeaGist - Hawaii Business Utility Zone Gateway Done 3 HI Company name, url, city, state code manual collection
Idaho Commerce Done 14 IO URL, company name, city regular expression
IdeaGist - Illinois Done 18 IL Company name, url, city, state code manual collection
IdeaGist - Indiana Done 3 IN Company name, url, city, state code manual collection
IA SourceLink IdeaGist - Iowa Done 10 IA Company name, city, state code, url, description, email, contact person, phone number manual collection
IdeaGist - Kansas InnovateKansas Done 3 KS Company name, url, city, state code manual collection
ThinkKentucky Done 10 KY Company name, address, city, state code, zip code, url, description regular expression
Louisiana Business Incubation Association Done 25 LA Company name, contact person, title, address, city, state code, zip code, phone number, email, url regular expression
Innovators Guide - Maine Done 4 ME URL, company name, city, state code regular expression
Maryland Business Incubation Association Done 35 MD URL, company name, state code, description regular expression
Massachusetts Association of Business Incubators Done 21 MA Company Name, URL, State Code, Description regular expression
Michigan Business Innovation Association Done 15 MI Company Name, URL, Address, City, State Code, Zip Code regular expression
MinneInno IdeaGist - Minnesota Done 7 MN URL, Company Name, City, State Code, Description Regular Exes
Mississippi Delevlopment Authority Done 26 MS Company Name, Address, City, State Code, Zip Code, Phone Number, Contact Person, Title, Email, URL Regular Exes
SourceLink - MO IdeaGist - Missouri Done 23 MO Company Name, URL, City, State Code, Description Regular Exes
IdeaGist - Montana Done 2 MT Company Name, URL, City, State Code Manual
Omaha World Herald Done 9 NE Company Name, Address, City, State Code, Phone Number, URL, Description Regular Exes
Department of Business and Industry - Nevada Done 25 NV URL, Company Name, State Code, Description Regular Exes
NH Tech Alliance Done 10 NH Company Name, City, State Code, URL, Description Regular Exes
Rutgers University Libraries Done 15 NJ URL, Company Name, City, State Code Regular Exes
New Mexico Economic Development Done 8 NM Company Name, State Code, URL, Description Regular Exes
FuzeHub Done 33 NY URL, Company Name, Type, Manufacturing Sector, Services, Description Regular Exes
North Carolina Business Incubation Association Done 33 NC Company Name, Address, City, State Code, Contact Person, Title, Phone Number, URL, email Manual
Economic Development and Finance of North Dakota Done 2 ND Company Name, URL, Address, City, State Code, Description Manual
IdeaGist - Ohio Done 7 OH Company Name, URL, Ciry, State Code Manual
Washington State Department of Commerce Done 25 WA Url, company name, address, city, state, zipcode manual collection
Seattle Incubators Done 10 Seattle Company name, url, description regular expression
Digital NYC Done 25 NYC Company name, description regular expression
Business Oregon Done 25 Oregon Company name, address, city, state, zip code, service area, description regular expression
Tech.co Done 16 Arizona URL, Company name, description regular expression
Arkansas Inc Done 3 Arkansas URL, Company name, description regular expression
CreativePortland Done 11 Portland Company name, url, description regular expression

Notes:

  • DC includes both incubators and accelerators
  • Oregon includes both incubators and accelerators
  • Arizona includes both incubators and accelerators
  • Nevada includes incubators, accelerators and coworking space
  • Clustermapping contains non-US data. They have been highlighted in the spreadsheet

DC, Oregon, Arizona and Nevada where all reprocessed in US Incubators - Reviewed By Ed.xlsx and files named statename-incubators.txt were created for import. For these 4 states, organizations were excluded by default and only included if they self-identified as an incubator in some fashion (name, description, etc.). Note that the US incubators data contains social impact incubators (though it does not appear to contain virtual ones as it is location based).

The SQL script LoadTables.sql, in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators, loads all of the tables in the incubators dbase and does the basic manipulation of the data. This code is also in Incubators.sql in E:\projects\Kauffman Incubator Project -- and this file's version was the last one updated.

The resulting assembly of the state data, which also includes cities like Los Angeles, Boston, NYC, and others, results in the USIncubators table and corresponding USIncubators.txt text file, which has 707 records and the following fields with the following coverage:

  • orgname --707
  • statecode --707
  • url --609
  • description --343
  • city --524
  • address --252
  • zip --235

Retrieving Incubators from Crunchbase Database

We are pulling out relevant fields from crunchbase database using incubator uuids chosen by Yi and Libby following the process:

1) Create a file of uuids of incubators

  • CrunchbaseShortOrgDescChosenByYi.txt (275)
  • CrunchbaseShortOrgDescChosenByLibby.txt (301)
File path: Z:\crunchbase3

2) Load the file into the database

DROP TABLE ChosenShortOrgUUIDs;
CREATE TABLE ChosenShortOrgUUIDs (
 uuid varchar(100)
);
\COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByYi.txt' WITH DELIMITER AS E'\t' HEADER NULL AS  CSV
--275
DROP TABLE ChosenLongOrgUUIDs;
CREATE TABLE ChosenLongOrgUUIDs (
 uuid varchar(100)
);
\COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByLibby.txt' WITH DELIMITER AS E'\t' HEADER NULL AS  CSV
--301

3) Run a query that joins uuids with related fields

Fields we are interested in:

company_name, domain, homepage_url, country_code, state_code, region, city, address, status, short_description, category_list, category_group_list, funding_rounds, funding_total_usd, founded_on, employee_count, A.uuid, primary_role, type

4) Resulting files are in:

Z:\crunchbase3

File names: ChosenLongOrgResults.txt, ChosenShortOrgResults.txt

Useful Regular Exes

1. Replace “\s+$” with [leave blank] to remove all the empty lines

2. Replace "s+$" with [leave blank] to removes all the whitespaces

3. <.*> finds everything that starts with < and ends with >

4. Replace href=" with "\n" to start a new line for each url

5. Replace "\s\s+" with [leave blank] to remove more than one white spaces

6. Replace "(?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" with "\1\2\3\4\5\6\r\n" to merge every six lines

7. Replace "[ ]{2,}" with [leave blank] removes more than one spaces between two words

8. Crtl+Q, B turns on the block select mode

9. Replace " .*" with [leave black] to remove noncharacters

Useful PostgreSQL Script

Loading/Unloading Data

Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS (unless there is a VERY good reason, like the source data is huge and comes preformatted differently).

Load using: \COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV

Unload (copy to txt file) using: \COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV

Creating Tables

DROP TABLE tablename;
 
CREATE TABLE tablename (
 field1 varchar(100),
 field2 int,
 field3 date,
 field4 real
);