| * Opened source link. * Copied results from "U.S. Based Incubators" into excel spreadsheet. Done
| 21
|
* Name, City, Stateurl* Url to home page of incubatorcompany name| Links to the home page of incubator* city* state| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)regular expression
|-
| [http://exchange.inbia.org/network/findacompany/ InterNational Business Incubation Association] or see our [[INBIA]] page
| * Opened source link* Entered "United States" for country and clicked "Find Companies"Done
| 415
|
* Company Name and , address* Link to another page within inbia on that page there is a link to the incubator's homepage| The database contains information on a lot of economic development institutions , ,city, state, zip code, country, url and would provide a mass quantity of datacontact person| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programsregular expression
* Company name with link to a separate page within cluster mapping* on that page there is a link to the incubator's website| Provides a long list of entrepreneurship organizations| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. Using the first 10 links, three were acceleratorsaddress 1, address 2, city, six were missing links (two were self-proclaimed incubators in description)state, and one was another type of support organization.zip code| regular expression
|-
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]
| * Opened source link. * Selected "Region" >> "US & Canada"Link doesn't work
| 186 Results
| * Click on each accelerator/incubator to get data
* City and Country
* low equity, high offer, high value
* high equity, low offer, low value
* link to company homepage| regular expression|-* categories of companies it accelerates| [http://www.gaebler.com/incubatesBusiness-Incubator-Lists-By-State.htm Gaebler]| Done| 360 Results| Can search by region or by category of companies* incubator name* url| Seems to be a lot of data on accelerators and fewer incubators includedregular expression
|}
:* Gaebler incubator list is in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\Gaebler\Results.txt and the script to retrieve the results is in the same director and called Gaebler.py.
==Extract incubator data from data on regional resources==
{| class="wikitable"
|-
! Source
! Progress
! How many?
! Region
! Data
! Method
|-
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]
| Done
| 34
| Oklahoma
| Incubator name and link to it
| regular expression
|-
| [https://www.socaltech.com/incubate.php High Tech News and Information for South California]
| Done
| 34
| California
| Url, company name, description, city, state
| regular expression
|-
| [http://barberacorporatelaw.com/blog/2014/4/8/28-business-incubators-in-the-los-angeles-area Leagal Counsel to Entrepreneurs and Emerging Growth Companies]
| [https://www.delawarebusinesstimes.com/coworking-incubators/ Delaware Business Times]
| Done
| 11
| DE
| URL, company name, address, city, state code, phone number, email, description
| Regular expression
|-
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]
| Done
| 55*
| DC
| Incubator name and link to it and brief description
| regular expression
|-
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]
| Done
| 72
| FL
| Company name, address, city, state code, zip code, phone number, URL
| regular expression
|-
| [https://www.georgia.org/business-incubators Georgia Department of Economic Development]
| Done
| 12
| GA
| URL, company name, address, city, state code, zip code, phone number, contact person
| regular expression
|-
| [https://ideagist.com/list-of-startup-accelerators-and-incubators-in-hawaii-usa/ IdeaGist - Hawaii] [http://www.buzgate.org/8.0/hi/fh_incubators.html?cb=none Business Utility Zone Gateway]
| [https://www.omaha.com/special_sections/outlook/accelerators-incubators-help-boost-startups/article_97005ca0-1949-57e6-87bc-21fc4df9adba.html Omaha World Herald]
| Done
| 9
| NE
| Company Name, Address, City, State Code, Phone Number, URL, Description
| Regular Exes
|-
| [http://business.nv.gov/Resource_Center/Coworking/_Incubator/_Accelerator_Spaces/ Department of Business and Industry - Nevada]
*Arizona includes both incubators and accelerators
*Nevada includes incubators, accelerators and coworking space
*Clustermapping contains non-US data. They have been highlighted in the spreadsheet
DC, Oregon, Arizona and Nevada where all reprocessed in '''US Incubators - Reviewed By Ed.xlsx''' and files named statename-incubators.txt were created for import. For these 4 states, organizations were excluded by default and only included if they self-identified as an incubator in some fashion (name, description, etc.). Note that the US incubators data contains social impact incubators (though it does not appear to contain virtual ones as it is location based).
The SQL script LoadTables.sql, in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\YiMaResearch\US Incubators, loads all of the tables in the '''incubators''' dbase and does the basic manipulation of the data. This code is also in Incubators.sql in E:\projects\Kauffman Incubator Project -- and this file's version was the last one updated.
The resulting assembly of the state data, which also includes cities like Los Angeles, Boston, NYC, and others, results in the '''USIncubators''' table and corresponding '''USIncubators.txt''' text file, which has 707 records and the following fields with the following coverage:
*orgname --707
*statecode --707
*url --609
*description --343
*city --524
*address --252
*zip --235
==Retrieving Incubators from Crunchbase Database==
We are pulling out relevant fields from crunchbase database using incubator uuids chosen by Yi and Libby following the process:
1) Create a file of uuids of incubators
*CrunchbaseShortOrgDescChosenByYi.txt (275)
*CrunchbaseShortOrgDescChosenByLibby.txt (301)
File path: Z:\crunchbase3
2) Load the file into the database
DROP TABLE ChosenShortOrgUUIDs;
CREATE TABLE ChosenShortOrgUUIDs (
uuid varchar(100)
);
\COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByYi.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--275
DROP TABLE ChosenLongOrgUUIDs;
CREATE TABLE ChosenLongOrgUUIDs (
uuid varchar(100)
);
\COPY ChosenShortOrgUUIDs FROM 'CrunchbaseShortOrgDescChosenByLibby.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--301
3) Run a query that joins uuids with related fields
Fields we are interested in:
company_name, domain, homepage_url, country_code, state_code, region, city, address, status, short_description, category_list, category_group_list, funding_rounds, funding_total_usd, founded_on, employee_count, A.uuid, primary_role, type
1. Replace “\s+$” with [leave blank] to remove all the empty lines
2. Replace "s+$" with [leave blank] to removes all the whitespaces
3. <.*> finds everything that starts with < and ends with >
4. Replace href=" with "\n" to start a new line for each url
5. Replace "\s\s+" with [leave blank] to remove more than one white spaces
6. Replace "(?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" with "\1\2\3\4\5\6\r\n" to merge every six lines
7. Replace "[ ]{2,}" with [leave blank] removes more than one spaces between two words
8. Crtl+Q, B turns on the block select mode
9. Replace " .*" with [leave black] to remove noncharacters
=Useful PostgreSQL Script=
==Loading/Unloading Data==
Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS (unless there is a VERY good reason, like the source data is huge and comes preformatted differently).
Load using:
<nowiki>\COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV </nowiki>
Unload (copy to txt file) using:
<nowiki>\COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV</nowiki>