http://www.edegan.com/mediawiki/api.php?action=feedcontributions&user=AnneFreeman&feedformat=atomedegan.com - User contributions [en]2024-03-19T04:53:09ZUser contributionsMediaWiki 1.34.2http://www.edegan.com/mediawiki/index.php?title=Google_Crawler&diff=25799Google Crawler2019-05-29T21:34:07Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=Google Crawler<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Ecosystem Organization Classifier, Incubator Seed Data<br />
}}<br />
==Background==<br />
We wanted to create a web crawler that could collect data from google searches specific to individual cities. The searches could be in the format of "incubator" + "city, state". It was modeled off of previous researcher's web crawler which collected information on accelerators. We could not simply modify their web crawler as it used an outdated python module. <br />
<br />
The output from this crawler could be used in several ways:<br />
# The URLs determined to be incubator websites can be input for the [[Listing Page Classifier]] that takes an incubator website URL and identifies which page contains the client company listing.<br />
# The title text can be analyzed using n-grams to look for keywords in order to classify the URL as an incubator. This strategy is discussed in [[Geocoding Inventor Locations (Tool)]].<br />
# Key elements of a page's HTML can be feed into an adapted version of the [[Demo Day Page Google Classifier]] to identify demo day webpages that contain a list of cohort companies.<br />
# The page can be passed over to Amazon's [https://www.mturk.com/ Mechanical Turk] to outsource the task of classifying pages as being incubators.<br />
<br />
==Selenium Implementation==<br />
The selenium implementation of the crawler requires a downloaded chrome driver. The crawler opens the text file containing a list of locations in the format "city, state" with each entry separated by a newline. It appends the google search query domain "https://www.google.com/search?q=" to the front of the key term "incubator" and appropriately attaches the city and state name, using google escape characters for commas and spaces. Then the crawler uses the chromedriver browser to access the url and parse the results for each location. It's default is to parse 10 pages of results, meaning that approximately 100 lines of data are collected for each location.<br />
<br />
Relevant files, including python script, text files are located in<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\SeleniumScraper<br />
<br />
==Beautiful Soup Implementation==<br />
When we created the web crawler, our first implementation used beautiful soup to directly "request" the url. The crawler took the same input file (city, state on each line, separated by newlines) and formatted queries in the same manner. Then, using beautifulsoup, the script opens each of the generated urls and parses the resulting page to collect the titles and urls of the results. The data collected is stored in a tab separated text file with each row containing city, state, title of result, url<br />
<br />
Relevant files, including python script, text files are located in<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\GoogleCrawler <br />
<br />
This crawler was frequently blocked, as directly performed queries to google and parsed the results with beautiful soup. Additionally, this implementation would only collect eight results for each location. To prevent the crawler from being blocked and collect more results, we decided to switch and use selenium.<br />
<br />
== Things to note/What needs work ==<br />
The scraper coded using beautifulSoup does not work, it is frequently blocked by google. The scraper coded using Selenium pushes in the URL to google rather than typing in the search term and hitting enter. The Selenium script also does not collect results from multiple pages, I believe it collects results only from the first page at the moment. <br />
<br />
== How to Run == <br />
The scripts incubator_scrape_data.py, and incubator_selenium_scrape.py were coded on a Mac in a virtualenv using python 3.6.5 <br />
The following packages were loaded into the environment for the Selenium Script:<br />
* numpy 1.16.2 <br />
* pandas 0.24.2 <br />
* pip 19.1.1 <br />
* python-dateutil 2.8.0 <br />
* pytz 2019.1 <br />
* selenium 3.141.0<br />
* setuptools 41.0.0 <br />
* six 1.12.0 <br />
* urllib3 1.24.1 <br />
* wheel 0.33.1</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25798INBIA2019-05-29T21:15:40Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
<br />
<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLs Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file containing the URLs to scrape into a pandas dataframe<br />
# changes the urls by replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# opens each url and extracts information using element tree parser<br />
# collects information from each url and stores it in a txt file<br />
<br />
<br />
The crawler generates a tab separated text file called INBIA_data.txt containing [company_name, street_address, city, state, zipcode, country, website] and is populated by information from the 415 entries from the database. <br />
<br />
The txt file and the python script (inbia_scrape.py) are located in <br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA <br />
<br />
== How to Run ==<br />
The following script inbia_scrape.py was coded in a virtualenv on a Mac, using Python 3.6.5 <br />
The following packages where loaded in that virtualenv<br />
* beautifulsoup4 4.7.1 <br />
* certifi 2019.3.9<br />
* chardet 3.0.4 <br />
* idna 2.8 <br />
* numpy 1.16.2 <br />
* pandas 0.24.2 <br />
* pip 19.1.1 <br />
* python-dateutil 2.8.0 <br />
* pytz 2018.9 <br />
* requests 2.21.0 <br />
* setuptools 40.8.0 <br />
* six 1.12.0 <br />
* soupsieve 1.9 <br />
* urllib3 1.24.1 <br />
* wheel 0.33.1</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25797INBIA2019-05-29T21:15:06Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
<br />
<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLs Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file containing the URLs to scrape into a pandas dataframe<br />
# changes the urls by replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# opens each url and extracts information using element tree parser<br />
# collects information from each url and stores it in a txt file<br />
<br />
<br />
The crawler generates a tab separated text file called INBIA_data.txt containing [company_name, street_address, city, state, zipcode, country, website] and is populated by information from the 415 entries from the database. <br />
<br />
The txt file and the python script (inbia_scrape.py) are located in <br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA <br />
<br />
== How to Run ==<br />
The following script inbia_scrape.py was coded in a virtualenv on a Mac, using Python 3.6.5<br />
The following packages where loaded in that virtualenv<br />
* beautifulsoup4 4.7.1 <br />
* certifi 2019.3.9<br />
* chardet 3.0.4 <br />
* idna 2.8 <br />
* numpy 1.16.2 <br />
* pandas 0.24.2 <br />
* pip 19.1.1 <br />
* python-dateutil 2.8.0 <br />
* pytz 2018.9 <br />
* requests 2.21.0 <br />
* setuptools 40.8.0 <br />
* six 1.12.0 <br />
* soupsieve 1.9 <br />
* urllib3 1.24.1 <br />
* wheel 0.33.1</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25796INBIA2019-05-29T21:14:40Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
<br />
<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLs Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file containing the URLs to scrape into a pandas dataframe<br />
# changes the urls by replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# opens each url and extracts information using element tree parser<br />
# collects information from each url and stores it in a txt file<br />
<br />
<br />
The crawler generates a tab separated text file called INBIA_data.txt containing [company_name, street_address, city, state, zipcode, country, website] and is populated by information from the 415 entries from the database. <br />
<br />
The txt file and the python script (inbia_scrape.py) are located in <br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA <br />
<br />
== How to Run ==<br />
The following script inbia_scrape.py was coded in a virtualenv on a Mac, using Python 3.6.5<br />
The following packages where loaded in that virtualenv<br />
* beautifulsoup4 4.7.1 <br />
* certifi 2019.3.9<br />
* chardet 3.0.4 <br />
* idna 2.8 <br />
* numpy 1.16.2 <br />
* pandas 0.24.2 <br />
* pip 19.1.1 <br />
* python-dateutil 2.8.0 <br />
* pytz 2018.9 <br />
* requests 2.21.0 <br />
* setuptools 40.8.0 <br />
* six 1.12.0 <br />
* soupsieve 1.9 <br />
* urllib3 1.24.1 <br />
* wheel 0.33.1</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=AngelList_Database&diff=25795AngelList Database2019-05-29T20:23:23Z<p>AnneFreeman: /* Summary of Python Files */</p>
<hr />
<div>{{Project<br />
|Has title=AngelList Database<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
}}<br />
The purpose of this project is to build a database of incubators, perhaps as well as other ecosystem organizations, from AngelList.<br />
<br />
==Crawler Specification==<br />
<br />
===There are incubators here===<br />
<br />
Process from before:<br />
*Opened source link (http://www.angel.co)<br />
*Typed "incubator" in the search box<br />
*Clicked on "Search for 'incubator'<br />
<br />
===500 Results===<br />
<br />
Revised process:<br />
*Visit https://angel.co/search?q=incubator<br />
*Click More (a lot)<br />
*Save the HTML page as E:\projects\AngelList\AngelList.html<br />
*That gets you 500 (out of 1,447 claimed results)<br />
*Process the HTML using Regular Expressions to produce AngelListPages.txt, which is in the format:<br />
**URL\tConame<br />
*Note that restricting to "Companies" reduces it to 1,339 results.<br />
<br />
===Failed workarounds===<br />
<br />
Tried work around with pages:<br />
*https://angel.co/search?page=13&q=incubator&type=companies<br />
*https://angel.co/search?page=14&q=incubator&type=companies<br />
<br />
But 40 results per page, page 13 ends with No Results Yet after More, and page 14 opens with it. So still capped at 500 results.<br />
<br />
It appears from the format of results that Angellist has a type "incubator", though some likely incubators have other types (e.g., BMW iVentures Incubator is a "VC Firm" and Austin Technology Incubator is a "Company". And I can't see a way to restrict search by type.<br />
<br />
Signed up for an account as Ed Egan, ed@edegan.com, littleAmount. Then the link More -> Incubators takes you to https://angel.co/accelerators/apply. But there doesn't seem to be an advanced search. Count of incubator results increased while on the site!<br />
<br />
===400 Results===<br />
<br />
The page https://angel.co/incubators shows 6,054 companies. It stopped adding to the list after 20 More clicks, which turned out to be 400 results. Saved page as E:\projects\AngelList\Incubator - CompanyTypes - AngelList.html<br />
<br />
Given the page title, this is likely the just the '''"Incubator" company type''' organizations. However, there is some useful information that could be extracted from just that page. The incubator type also '''clearly includes accelerators''' and other things.<br />
<br />
===Possible Processes===<br />
<br />
In either of the cases below, we'd need a Selenium web driver to click More (a lot). For the later case, we'd also need to get the URL encodings (probably by hand) for the State names we'd like to search.<br />
<br />
====Restricted Search====<br />
Tried searching incubator TX but it looks like only the name and text descriptions is searched. Tried searching "incubator a", "incubator b", "incubator c" and each had less than 500 results, so that ''might'' work.<br />
<br />
====Company Search====<br />
<br />
https://angel.co/companies has a search function. You can select type as incubator and location as US: https://angel.co/companies?company_types[]=Incubator&locations[]=1688-United+States This gives 993 companies...<br />
<br />
It might be possible to go state by state. California has 385, Massachusetts has 36, New York has 141, etc. But again, this is '''limited to the incubator type'''.<br />
<br />
==Crawler==<br />
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. <br />
===Crawler By Company Type===<br />
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
===Crawler By Keyword===<br />
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
== Master File of Results ==<br />
We performed a diff of the two files to create a master file with only unique results. The master file containing the unique results from the two crawlers contains 1512 results. We decided to drop the state when determining if the results were unique because occasionally the same company would be listed in different states, leading to repetitive results.<br />
<br />
== Saving AngelList Pages ==<br />
===Failed Attempts===<br />
The AngelList website was excellent at detecting bot activity and blocking our IP address. We attempted several different ways of downloading the pages from the masterlist that were blocked by AngelList. <br />
* urllib from python<br />
* using a google crawler (scrapy) <br />
* accessing them directly with a curl/wget() command<br />
These three methods were blocked by the angelList site. So we decided to use Selenium<br />
=== Selenium Script ===<br />
The selenium script to download the pages opens the URL and then saves it in a data folder. It also checks for a recaptcha and pauses the script so that the recaptcha can be manually solved. Even using selenium and manually solving recaptchas, angelList would occasionally block our IP address, making it necessary to perform the script in small batches, only collecting ~600 webpages before changing wifi networks. The selenium code save_angelList_pages.py is in the RDP folder angelList.<br />
<br />
== Parsing Saved AngelList Pages ==<br />
We used beautiful soup to iterated through the static html files that were saved from the angelList website. We created three tab separated text files. The first was populated via parse_company_info.py and contains basic information about the company including the company name, a short description, the location, the company size, a URL to the company website, and the business tags on angelList. The second was populated via parse_portfolio.py and contains information including the company name, and the name of a portfolio company. The third was populated via parse_employees.py and contains information including the company name, and the name of the employee/founder at the company. The three python files and the data files they generated are in the RDP folder angelList.<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\angelList<br />
<br />
<br />
== Things to note/What needs work ==<br />
The selenium script to download the HTML files from angelList cannot be run completely with the masterFile. The masterFile needs to be split into smaller files and then run on devices connected to different wifi networks to avoid being blocked. <br />
<br />
The script parse_employees.py does not collect all the necessary information on the employees from the downloaded HTML files, there is a bug in the beautiful soup code.<br />
<br />
== How to Run ==<br />
The following scripts were coded in a virtualenv on a Mac, using Python 3.6.5<br />
* angelList_companyTypeIncubator.py<br />
* angelList_keywordIncubator.py<br />
* masterFile.py<br />
* save_angelList_pages.py<br />
* parse_company_info.py<br />
* parse_portfolio.py<br />
* parse_employees.py<br />
<br />
The following packages where loaded in that virtualenv<br />
* beautifulsoup4 4.7.1 <br />
* bs4 0.0.1 <br />
* certifi 2019.3.9<br />
* chardet 3.0.4 <br />
* idna 2.8 <br />
* numpy 1.16.2 <br />
* pandas 0.24.2 <br />
* pip 19.1.1 <br />
* python-dateutil 2.8.0 <br />
* pytz 2019.1 <br />
* requests 2.21.0 <br />
* selenium 3.141.0 <br />
* setuptools 41.0.0 <br />
* six 1.12.0 <br />
* soupsieve 1.9.1 <br />
* urllib3 1.24.1 <br />
* wheel 0.33.1 <br />
<br />
== Summary of Python Files == <br />
===angelList_companyTypeIncubator.py ===<br />
* input: text file with URL endings for states<br />
* output: tab separated text file (AngelList_companyTypeIncubator.txt)<br />
* description: Uses selenium to search AngelList for companies with the type incubator using a list with the proper endings for the states (and Washington DC) to create the angelList URL. It clicks the more button at the bottom of the screen when necessary. It stores the results, state, company name, short description, and url to site within angelList to a tab separated text file.<br />
===angelList_keywordIncubator.py ===<br />
* input: text file with URL endings for states<br />
* output: tab separated text file (AngelList_keywordIncubator.txt)<br />
* description: Uses selenium to search AngelList for companies that appear using the key word "incubator" and using a list with the proper endings for the states (and Washington DC) to create the angelList URL. It clicks the more button at the bottom of the screen when necessary. It stores the results, state, company name, short description, and url to site within angelList to a tab separated text file.<br />
=== masterFile.py ===<br />
* inputs: two tab separated files (AngelList_companyTypeIncubator.txt, AngelList_keywordIncubator.txt)<br />
* outputs: one tab separated file (angelList_masterFile.txt)<br />
* description: masterFile.py performs a diff on the two tab separated files with angelListData and creates a master file containing unique entries for use in save_angelList_pages.py<br />
=== save_angelList_pages.py ===<br />
* input: one tab separated file (angelList_masterFile.txt)<br />
* output: data folder containing html files<br />
* description: Uses selenium to open the url to the site for the incubator within angelList then saves the webpage as a html file in a specified folder.<br />
=== parse_company_info.py ===<br />
* input: path to data folder containing html files<br />
* output: tab separated file containing company info (angelList_companyInfo.txt)<br />
* description: Iterates through the saved angelList files and collects information such as the company name, a short description, the location, company size, URL company website, and business tags. It saves the information in a tab separated text file.<br />
=== parse_portfolio.py ===<br />
* input: path to data folder containing html files<br />
* output: tab separated file containing portfolio info (angelList_portfolio.txt)<br />
* description: Iterates through the saved angelList files and collects information on the portfolio of the company, saving the company name and the company portfolio name as a tab separated text file.<br />
=== parse_employees.py ===<br />
* input: path to data folder containing html files<br />
* output: tab separated file containing employee/founder info (angelList_employees.txt)<br />
* description: Iterates through the saved angelList files and collects information on people that work at the company, saving the company name and the founder/employee name as a tab separated text file.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=AngelList_Database&diff=25794AngelList Database2019-05-29T20:22:53Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=AngelList Database<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
}}<br />
The purpose of this project is to build a database of incubators, perhaps as well as other ecosystem organizations, from AngelList.<br />
<br />
==Crawler Specification==<br />
<br />
===There are incubators here===<br />
<br />
Process from before:<br />
*Opened source link (http://www.angel.co)<br />
*Typed "incubator" in the search box<br />
*Clicked on "Search for 'incubator'<br />
<br />
===500 Results===<br />
<br />
Revised process:<br />
*Visit https://angel.co/search?q=incubator<br />
*Click More (a lot)<br />
*Save the HTML page as E:\projects\AngelList\AngelList.html<br />
*That gets you 500 (out of 1,447 claimed results)<br />
*Process the HTML using Regular Expressions to produce AngelListPages.txt, which is in the format:<br />
**URL\tConame<br />
*Note that restricting to "Companies" reduces it to 1,339 results.<br />
<br />
===Failed workarounds===<br />
<br />
Tried work around with pages:<br />
*https://angel.co/search?page=13&q=incubator&type=companies<br />
*https://angel.co/search?page=14&q=incubator&type=companies<br />
<br />
But 40 results per page, page 13 ends with No Results Yet after More, and page 14 opens with it. So still capped at 500 results.<br />
<br />
It appears from the format of results that Angellist has a type "incubator", though some likely incubators have other types (e.g., BMW iVentures Incubator is a "VC Firm" and Austin Technology Incubator is a "Company". And I can't see a way to restrict search by type.<br />
<br />
Signed up for an account as Ed Egan, ed@edegan.com, littleAmount. Then the link More -> Incubators takes you to https://angel.co/accelerators/apply. But there doesn't seem to be an advanced search. Count of incubator results increased while on the site!<br />
<br />
===400 Results===<br />
<br />
The page https://angel.co/incubators shows 6,054 companies. It stopped adding to the list after 20 More clicks, which turned out to be 400 results. Saved page as E:\projects\AngelList\Incubator - CompanyTypes - AngelList.html<br />
<br />
Given the page title, this is likely the just the '''"Incubator" company type''' organizations. However, there is some useful information that could be extracted from just that page. The incubator type also '''clearly includes accelerators''' and other things.<br />
<br />
===Possible Processes===<br />
<br />
In either of the cases below, we'd need a Selenium web driver to click More (a lot). For the later case, we'd also need to get the URL encodings (probably by hand) for the State names we'd like to search.<br />
<br />
====Restricted Search====<br />
Tried searching incubator TX but it looks like only the name and text descriptions is searched. Tried searching "incubator a", "incubator b", "incubator c" and each had less than 500 results, so that ''might'' work.<br />
<br />
====Company Search====<br />
<br />
https://angel.co/companies has a search function. You can select type as incubator and location as US: https://angel.co/companies?company_types[]=Incubator&locations[]=1688-United+States This gives 993 companies...<br />
<br />
It might be possible to go state by state. California has 385, Massachusetts has 36, New York has 141, etc. But again, this is '''limited to the incubator type'''.<br />
<br />
==Crawler==<br />
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. <br />
===Crawler By Company Type===<br />
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
===Crawler By Keyword===<br />
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
== Master File of Results ==<br />
We performed a diff of the two files to create a master file with only unique results. The master file containing the unique results from the two crawlers contains 1512 results. We decided to drop the state when determining if the results were unique because occasionally the same company would be listed in different states, leading to repetitive results.<br />
<br />
== Saving AngelList Pages ==<br />
===Failed Attempts===<br />
The AngelList website was excellent at detecting bot activity and blocking our IP address. We attempted several different ways of downloading the pages from the masterlist that were blocked by AngelList. <br />
* urllib from python<br />
* using a google crawler (scrapy) <br />
* accessing them directly with a curl/wget() command<br />
These three methods were blocked by the angelList site. So we decided to use Selenium<br />
=== Selenium Script ===<br />
The selenium script to download the pages opens the URL and then saves it in a data folder. It also checks for a recaptcha and pauses the script so that the recaptcha can be manually solved. Even using selenium and manually solving recaptchas, angelList would occasionally block our IP address, making it necessary to perform the script in small batches, only collecting ~600 webpages before changing wifi networks. The selenium code save_angelList_pages.py is in the RDP folder angelList.<br />
<br />
== Parsing Saved AngelList Pages ==<br />
We used beautiful soup to iterated through the static html files that were saved from the angelList website. We created three tab separated text files. The first was populated via parse_company_info.py and contains basic information about the company including the company name, a short description, the location, the company size, a URL to the company website, and the business tags on angelList. The second was populated via parse_portfolio.py and contains information including the company name, and the name of a portfolio company. The third was populated via parse_employees.py and contains information including the company name, and the name of the employee/founder at the company. The three python files and the data files they generated are in the RDP folder angelList.<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\angelList<br />
<br />
<br />
== Things to note/What needs work ==<br />
The selenium script to download the HTML files from angelList cannot be run completely with the masterFile. The masterFile needs to be split into smaller files and then run on devices connected to different wifi networks to avoid being blocked. <br />
<br />
The script parse_employees.py does not collect all the necessary information on the employees from the downloaded HTML files, there is a bug in the beautiful soup code.<br />
<br />
== How to Run ==<br />
The following scripts were coded in a virtualenv on a Mac, using Python 3.6.5<br />
* angelList_companyTypeIncubator.py<br />
* angelList_keywordIncubator.py<br />
* masterFile.py<br />
* save_angelList_pages.py<br />
* parse_company_info.py<br />
* parse_portfolio.py<br />
* parse_employees.py<br />
<br />
The following packages where loaded in that virtualenv<br />
* beautifulsoup4 4.7.1 <br />
* bs4 0.0.1 <br />
* certifi 2019.3.9<br />
* chardet 3.0.4 <br />
* idna 2.8 <br />
* numpy 1.16.2 <br />
* pandas 0.24.2 <br />
* pip 19.1.1 <br />
* python-dateutil 2.8.0 <br />
* pytz 2019.1 <br />
* requests 2.21.0 <br />
* selenium 3.141.0 <br />
* setuptools 41.0.0 <br />
* six 1.12.0 <br />
* soupsieve 1.9.1 <br />
* urllib3 1.24.1 <br />
* wheel 0.33.1 <br />
<br />
== Summary of Python Files == <br />
===angelList_companyTypeIncubator.py ===<br />
* input: text file with URL endings for states<br />
* output: tab separated text file (AngelList_companyTypeIncubator.txt)<br />
* description: Uses selenium to search AngelList for companies with the type incubator using a list with the proper endings for the states (and Washington DC) to create the angelList URL. It clicks the more button at the bottom of the screen when necessary. It stores the results, state, company name, short description, and url to site within angelList to a tab separated text file.<br />
<br />
===angelList_keywordIncubator.py ===<br />
* input: text file with URL endings for states<br />
* output: tab separated text file (AngelList_keywordIncubator.txt)<br />
* description: Uses selenium to search AngelList for companies that appear using the key word "incubator" and using a list with the proper endings for the states (and Washington DC) to create the angelList URL. It clicks the more button at the bottom of the screen when necessary. It stores the results, state, company name, short description, and url to site within angelList to a tab separated text file.<br />
<br />
=== masterFile.py ===<br />
* inputs: two tab separated files (AngelList_companyTypeIncubator.txt, AngelList_keywordIncubator.txt)<br />
* outputs: one tab separated file (angelList_masterFile.txt)<br />
* description: masterFile.py performs a diff on the two tab separated files with angelListData and creates a master file containing unique entries for use in save_angelList_pages.py<br />
<br />
<br />
=== save_angelList_pages.py ===<br />
* input: one tab separated file (angelList_masterFile.txt)<br />
* output: data folder containing html files<br />
* description: Uses selenium to open the url to the site for the incubator within angelList then saves the webpage as a html file in a specified folder.<br />
<br />
=== parse_company_info.py ===<br />
* input: path to data folder containing html files<br />
* output: tab separated file containing company info (angelList_companyInfo.txt)<br />
* description: Iterates through the saved angelList files and collects information such as the company name, a short description, the location, company size, URL company website, and business tags. It saves the information in a tab separated text file.<br />
<br />
<br />
<br />
=== parse_portfolio.py ===<br />
* input: path to data folder containing html files<br />
* output: tab separated file containing portfolio info (angelList_portfolio.txt)<br />
* description: Iterates through the saved angelList files and collects information on the portfolio of the company, saving the company name and the company portfolio name as a tab separated text file.<br />
<br />
<br />
=== parse_employees.py ===<br />
* input: path to data folder containing html files<br />
* output: tab separated file containing employee/founder info (angelList_employees.txt)<br />
* description: Iterates through the saved angelList files and collects information on people that work at the company, saving the company name and the founder/employee name as a tab separated text file.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Anne_Freeman&diff=25793Anne Freeman2019-05-29T20:12:04Z<p>AnneFreeman: </p>
<hr />
<div>{{Team Member<br />
|Has name=Anne Freeman<br />
|Has headshot=Headshot_small_2018.png<br />
|Has team position=Tech Team<br />
|Has team status=Active<br />
|Has or doing degree=Bachelor<br />
|Has academic major=CS<br />
|Has skills=SQL, Some ML<br />
|Has email=aof7@georgetown.edu<br />
}}<br />
I'm worked on various aspects of the Kauffman Incubator Project during the Spring of 2019.<br />
<br />
== What is an incubator? ==<br />
One of my initial goals was to define what an incubator is. I began by researching how other experts in the field defined incubators and how this research group in the past approached this issue. We developed a working definition which can be found at our other wiki page [[Defining Incubators]]. We began finding ways to define an incubator, such it's purpose within the larger entrepreneurship system, the duration of time companies spend at an incubator and the application process companies undergo. We also sought to differentiate between an incubator and an accelerator and to define high-growth technology incubators.<br />
<br />
== What attributes define an incubator? ==<br />
From the definition of incubators, I began exploring the baseline attributes that should be collected on all entrepreneurship ecosystem organizations in order to properly classify incubators, specifically high-growth technology incubators. There is a more in depth description of this work on the [[Formulate baseline attributes]] page. We created a working list of attributes focused on characterizing the timeline, affiliation, services provided, type of company incubated, and key demographics. <br />
<br />
== Evaluating Potential Data Sources ==<br />
We evaluated potential data sources to select databases that would provide sufficient information on incubators. There is more information on this project on the page [[Incubator Seed Data]]. We started by going through the [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that the previous research group used. Then we performed google searches for other viable sources and evaluated them in a systematic manner. We decided that crunchbase, INBIA, a google crawler, and AngelList were the best sources as the data could be collected in an automated manner, and they were the databases with the most information on incubators. <br />
<br />
== INBIA Data ==<br />
Using a list of URLS from the INBIA website, I created a web crawler that used beautiful soup to directly request the urls, parse the information, and store it in a tab separated text file. There is more information about this process at the [[INBIA]] page.<br />
<br />
== Google Crawler ==<br />
Using a list of locations, I wrote a google crawler using beautiful soup that created urls for google and directly requested and parsed the results. This crawler was often blocked by google so I switched to using selenium. The selenium crawler searched google for the city and the key word "incubator" it received 10 pages of results and stored the city, title, and url in a tab separated text file. There is more information about this process at the [[Google Crawler]] page. <br />
<br />
== AngelList Data ==<br />
Using selenium, I created a crawler to search the angelList database using the keyword "incubator" and the state. I also created a crawler to search the angelList database for companies with the type "incubator" and the state. The crawlers would click the "more" button at the bottom of the page to view all of the results and then save them in a tab separated text file. I performed a diff on the results to create a masterFile containing only unique entries. Then I used selenium to open the URL for the incubator within the angelList website and download it to a local folder. Then using beautfulsoup I parsed the static HTML files for information on the company, the employees, and the portfolio. There is more information on this process on the [[AngelList Database]] page.<br />
<br />
== Things that still need work ==<br />
The selenium google crawler pushes the urls rather than typing them in to google and hitting enter. It also collects the same page 10 times rather than selecting the next page. <br />
The AngelList Data script to parse the employees is not collecting information from all the incubators. The script needs to be adjusted</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Anne_Freeman&diff=25783Anne Freeman2019-05-29T15:13:32Z<p>AnneFreeman: </p>
<hr />
<div>{{Team Member<br />
|Has name=Anne Freeman<br />
|Has headshot=Headshot_small_2018.png<br />
|Has team position=Tech Team<br />
|Has team status=Active<br />
|Has or doing degree=Bachelor<br />
|Has academic major=CS<br />
|Has skills=SQL, Some ML<br />
|Has email=aof7@georgetown.edu<br />
}}<br />
I'm worked on various aspects of the Kauffman Incubator Project during the Spring of 2019.<br />
<br />
== What is an incubator? ==<br />
One of my initial goals was to define what an incubator is. I began by researching how other experts in the field defined incubators and how this research group in the past approached this issue. We developed a working definition which can be found at our other wiki page [[Defining Incubators]]. We began finding ways to define an incubator, such it's purpose within the larger entrepreneurship system, the duration of time companies spend at an incubator and the application process companies undergo. We also sought to differentiate between an incubator and an accelerator and to define high-growth technology incubators.<br />
<br />
== What attributes define an incubator? ==<br />
From the definition of incubators, I began exploring the baseline attributes that should be collected on all entrepreneurship ecosystem organizations in order to properly classify incubators, specifically high-growth technology incubators. There is a more in depth description of this work on the [[Formulate baseline attributes]] page. We created a working list of attributes focused on characterizing the timeline, affiliation, services provided, type of company incubated, and key demographics. <br />
<br />
== Evaluating Potential Data Sources ==<br />
We evaluated potential data sources to select databases that would provide sufficient information on incubators. There is more information on this project on the page [[Incubator Seed Data]]. We started by going through the [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that the previous research group used. Then we performed google searches for other viable sources and evaluated them in a systematic manner. We decided that crunchbase, INBIA, a google crawler, and AngelList were the best sources as the data could be collected in an automated manner, and they were the databases with the most information on incubators. <br />
<br />
== INBIA Data ==<br />
Using a list of URLS from the INBIA website, I created a web crawler that used beautiful soup to directly request the urls, parse the information, and store it in a tab separated text file. There is more information about this process at the [[INBIA]] page.<br />
<br />
== Google Crawler ==<br />
Using a list of locations, I wrote a google crawler using beautiful soup that created urls for google and directly requested and parsed the results. This crawler was often blocked by google so I switched to using selenium. The selenium crawler searched google for the city and the key word "incubator" it received 10 pages of results and stored the city, title, and url in a tab separated text file. There is more information about this process at the [[Google Crawler]] page. <br />
<br />
== AngelList Data ==<br />
Using selenium, I created a crawler to search the angelList database using the keyword "incubator" and the state. I also created a crawler to search the angelList database for companies with the type "incubator" and the state. The crawlers would click the "more" button at the bottom of the page to view all of the results and then save them in a tab separated text file. I performed a diff on the results to create a masterFile containing only unique entries. Then I used selenium to open the URL for the incubator within the angelList website and download it to a local folder. Then using beautfulsoup I parsed the static HTML files for information on the company, the employees, and the portfolio. There is more information on this process on the [[AngelList Database]] page.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=AngelList_Database&diff=25782AngelList Database2019-05-29T14:47:40Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=AngelList Database<br />
|Has project status=Active<br />
}}<br />
<br />
The purpose of this project is to build a database of incubators, perhaps as well as other ecosystem organizations, from AngelList.<br />
<br />
==Crawler Specification==<br />
<br />
===There are incubators here===<br />
<br />
Process from before:<br />
*Opened source link (http://www.angel.co)<br />
*Typed "incubator" in the search box<br />
*Clicked on "Search for 'incubator'<br />
<br />
===500 Results===<br />
<br />
Revised process:<br />
*Visit https://angel.co/search?q=incubator<br />
*Click More (a lot)<br />
*Save the HTML page as E:\projects\AngelList\AngelList.html<br />
*That gets you 500 (out of 1,447 claimed results)<br />
*Process the HTML using Regular Expressions to produce AngelListPages.txt, which is in the format:<br />
**URL\tConame<br />
*Note that restricting to "Companies" reduces it to 1,339 results.<br />
<br />
===Failed workarounds===<br />
<br />
Tried work around with pages:<br />
*https://angel.co/search?page=13&q=incubator&type=companies<br />
*https://angel.co/search?page=14&q=incubator&type=companies<br />
<br />
But 40 results per page, page 13 ends with No Results Yet after More, and page 14 opens with it. So still capped at 500 results.<br />
<br />
It appears from the format of results that Angellist has a type "incubator", though some likely incubators have other types (e.g., BMW iVentures Incubator is a "VC Firm" and Austin Technology Incubator is a "Company". And I can't see a way to restrict search by type.<br />
<br />
Signed up for an account as Ed Egan, ed@edegan.com, littleAmount. Then the link More -> Incubators takes you to https://angel.co/accelerators/apply. But there doesn't seem to be an advanced search. Count of incubator results increased while on the site!<br />
<br />
===400 Results===<br />
<br />
The page https://angel.co/incubators shows 6,054 companies. It stopped adding to the list after 20 More clicks, which turned out to be 400 results. Saved page as E:\projects\AngelList\Incubator - CompanyTypes - AngelList.html<br />
<br />
Given the page title, this is likely the just the '''"Incubator" company type''' organizations. However, there is some useful information that could be extracted from just that page. The incubator type also '''clearly includes accelerators''' and other things.<br />
<br />
===Possible Processes===<br />
<br />
In either of the cases below, we'd need a Selenium web driver to click More (a lot). For the later case, we'd also need to get the URL encodings (probably by hand) for the State names we'd like to search.<br />
<br />
====Restricted Search====<br />
Tried searching incubator TX but it looks like only the name and text descriptions is searched. Tried searching "incubator a", "incubator b", "incubator c" and each had less than 500 results, so that ''might'' work.<br />
<br />
====Company Search====<br />
<br />
https://angel.co/companies has a search function. You can select type as incubator and location as US: https://angel.co/companies?company_types[]=Incubator&locations[]=1688-United+States This gives 993 companies...<br />
<br />
It might be possible to go state by state. California has 385, Massachusetts has 36, New York has 141, etc. But again, this is '''limited to the incubator type'''.<br />
<br />
==Crawler==<br />
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. <br />
===Crawler By Company Type===<br />
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
===Crawler By Keyword===<br />
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
== Master File of Results ==<br />
We performed a diff of the two files to create a master file with only unique results. The master file containing the unique results from the two crawlers contains 1512 results. We decided to drop the state when determining if the results were unique because occasionally the same company would be listed in different states, leading to repetitive results.<br />
<br />
== Saving AngelList Pages ==<br />
===Failed Attempts===<br />
The AngelList website was excellent at detecting bot activity and blocking our IP address. We attempted several different ways of downloading the pages from the masterlist that were blocked by AngelList. <br />
* urllib from python<br />
* using a google crawler (scrapy) <br />
* accessing them directly with a curl/wget() command<br />
These three methods were blocked by the angelList site. So we decided to use Selenium<br />
=== Selenium Script ===<br />
The selenium script to download the pages opens the URL and then saves it in a data folder. It also checks for a recaptcha and pauses the script so that the recaptcha can be manually solved. Even using selenium and manually solving recaptchas, angelList would occasionally block our IP address, making it necessary to perform the script in small batches, only collecting ~600 webpages before changing wifi networks. The selenium code save_angelList_pages.py is in the RDP folder angelList.<br />
<br />
== Parsing Saved AngelList Pages ==<br />
We used beautiful soup to iterated through the static html files that were saved from the angelList website. We created three tab separated text files. The first was populated via parse_company_info.py and contains basic information about the company including the company name, a short description, the location, the company size, a URL to the company website, and the business tags on angelList. The second was populated via parse_portfolio.py and contains information including the company name, and the name of a portfolio company. The third was populated via parse_employees.py and contains information including the company name, and the name of the employee/founder at the company. The three python files and the data files they generated are in the RDP folder angelList.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25781INBIA2019-05-29T13:24:25Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
<br />
<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file containing the URLs to scrape into a pandas dataframe<br />
# changes the urls by replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# opens each url and extracts information using element tree parser<br />
# collects information from each url and stores it in a txt file<br />
<br />
<br />
The crawler generates a tab separated text file called INBIA_data.txt containing [company_name, street_address, city, state, zipcode, country, website] and is populated by information from the 415 entries from the database. <br />
<br />
The txt file and the python script (inbia_scrape.py) are located in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=AngelList_Database&diff=25462AngelList Database2019-05-01T19:20:35Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=AngelList Database<br />
|Has project status=Active<br />
}}<br />
<br />
The purpose of this project is to build a database of incubators, perhaps as well as other ecosystem organizations, from AngelList.<br />
<br />
==Crawler Specification==<br />
<br />
===There are incubators here===<br />
<br />
Process from before:<br />
*Opened source link (http://www.angel.co)<br />
*Typed "incubator" in the search box<br />
*Clicked on "Search for 'incubator'<br />
<br />
===500 Results===<br />
<br />
Revised process:<br />
*Visit https://angel.co/search?q=incubator<br />
*Click More (a lot)<br />
*Save the HTML page as E:\projects\AngelList\AngelList.html<br />
*That gets you 500 (out of 1,447 claimed results)<br />
*Process the HTML using Regular Expressions to produce AngelListPages.txt, which is in the format:<br />
**URL\tConame<br />
*Note that restricting to "Companies" reduces it to 1,339 results.<br />
<br />
===Failed workarounds===<br />
<br />
Tried work around with pages:<br />
*https://angel.co/search?page=13&q=incubator&type=companies<br />
*https://angel.co/search?page=14&q=incubator&type=companies<br />
<br />
But 40 results per page, page 13 ends with No Results Yet after More, and page 14 opens with it. So still capped at 500 results.<br />
<br />
It appears from the format of results that Angellist has a type "incubator", though some likely incubators have other types (e.g., BMW iVentures Incubator is a "VC Firm" and Austin Technology Incubator is a "Company". And I can't see a way to restrict search by type.<br />
<br />
Signed up for an account as Ed Egan, ed@edegan.com, littleAmount. Then the link More -> Incubators takes you to https://angel.co/accelerators/apply. But there doesn't seem to be an advanced search. Count of incubator results increased while on the site!<br />
<br />
===400 Results===<br />
<br />
The page https://angel.co/incubators shows 6,054 companies. It stopped adding to the list after 20 More clicks, which turned out to be 400 results. Saved page as E:\projects\AngelList\Incubator - CompanyTypes - AngelList.html<br />
<br />
Given the page title, this is likely the just the '''"Incubator" company type''' organizations. However, there is some useful information that could be extracted from just that page. The incubator type also '''clearly includes accelerators''' and other things.<br />
<br />
===Possible Processes===<br />
<br />
In either of the cases below, we'd need a Selenium web driver to click More (a lot). For the later case, we'd also need to get the URL encodings (probably by hand) for the State names we'd like to search.<br />
<br />
====Restricted Search====<br />
Tried searching incubator TX but it looks like only the name and text descriptions is searched. Tried searching "incubator a", "incubator b", "incubator c" and each had less than 500 results, so that ''might'' work.<br />
<br />
====Company Search====<br />
<br />
https://angel.co/companies has a search function. You can select type as incubator and location as US: https://angel.co/companies?company_types[]=Incubator&locations[]=1688-United+States This gives 993 companies...<br />
<br />
It might be possible to go state by state. California has 385, Massachusetts has 36, New York has 141, etc. But again, this is '''limited to the incubator type'''.<br />
<br />
==Crawler==<br />
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. <br />
===Crawler By Company Type===<br />
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
===Crawler By Keyword===<br />
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
== Master File of Results ==<br />
We performed a diff of the two files to create a master file with only unique results. The master file containing the unique results from the two crawlers contains 1512 results. We decided to drop the state when determining if the results were unique because occasionally the same company would be listed in different states, leading to repetitive results.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=AngelList_Database&diff=25461AngelList Database2019-05-01T19:14:29Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=AngelList Database<br />
|Has project status=Active<br />
}}<br />
<br />
The purpose of this project is to build a database of incubators, perhaps as well as other ecosystem organizations, from AngelList.<br />
<br />
==Crawler Specification==<br />
<br />
===There are incubators here===<br />
<br />
Process from before:<br />
*Opened source link (http://www.angel.co)<br />
*Typed "incubator" in the search box<br />
*Clicked on "Search for 'incubator'<br />
<br />
===500 Results===<br />
<br />
Revised process:<br />
*Visit https://angel.co/search?q=incubator<br />
*Click More (a lot)<br />
*Save the HTML page as E:\projects\AngelList\AngelList.html<br />
*That gets you 500 (out of 1,447 claimed results)<br />
*Process the HTML using Regular Expressions to produce AngelListPages.txt, which is in the format:<br />
**URL\tConame<br />
*Note that restricting to "Companies" reduces it to 1,339 results.<br />
<br />
===Failed workarounds===<br />
<br />
Tried work around with pages:<br />
*https://angel.co/search?page=13&q=incubator&type=companies<br />
*https://angel.co/search?page=14&q=incubator&type=companies<br />
<br />
But 40 results per page, page 13 ends with No Results Yet after More, and page 14 opens with it. So still capped at 500 results.<br />
<br />
It appears from the format of results that Angellist has a type "incubator", though some likely incubators have other types (e.g., BMW iVentures Incubator is a "VC Firm" and Austin Technology Incubator is a "Company". And I can't see a way to restrict search by type.<br />
<br />
Signed up for an account as Ed Egan, ed@edegan.com, littleAmount. Then the link More -> Incubators takes you to https://angel.co/accelerators/apply. But there doesn't seem to be an advanced search. Count of incubator results increased while on the site!<br />
<br />
===400 Results===<br />
<br />
The page https://angel.co/incubators shows 6,054 companies. It stopped adding to the list after 20 More clicks, which turned out to be 400 results. Saved page as E:\projects\AngelList\Incubator - CompanyTypes - AngelList.html<br />
<br />
Given the page title, this is likely the just the '''"Incubator" company type''' organizations. However, there is some useful information that could be extracted from just that page. The incubator type also '''clearly includes accelerators''' and other things.<br />
<br />
===Possible Processes===<br />
<br />
In either of the cases below, we'd need a Selenium web driver to click More (a lot). For the later case, we'd also need to get the URL encodings (probably by hand) for the State names we'd like to search.<br />
<br />
====Restricted Search====<br />
Tried searching incubator TX but it looks like only the name and text descriptions is searched. Tried searching "incubator a", "incubator b", "incubator c" and each had less than 500 results, so that ''might'' work.<br />
<br />
====Company Search====<br />
<br />
https://angel.co/companies has a search function. You can select type as incubator and location as US: https://angel.co/companies?company_types[]=Incubator&locations[]=1688-United+States This gives 993 companies...<br />
<br />
It might be possible to go state by state. California has 385, Massachusetts has 36, New York has 141, etc. But again, this is '''limited to the incubator type'''.<br />
<br />
==Crawler==<br />
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. <br />
===Crawler By Company Type===<br />
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
===Crawler By Keyword===<br />
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
== Master File of Results ==<br />
We performed a diff of the two files to create a master file with only unique results. The master file containing the unique results from the two crawlers contains 1512 results. We decided to use the urls to determine if the results were unique because occasionally the same company would be listed in different states, leading to repetitive results.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=AngelList_Database&diff=25460AngelList Database2019-05-01T19:10:06Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=AngelList Database<br />
|Has project status=Active<br />
}}<br />
<br />
The purpose of this project is to build a database of incubators, perhaps as well as other ecosystem organizations, from AngelList.<br />
<br />
==Crawler Specification==<br />
<br />
===There are incubators here===<br />
<br />
Process from before:<br />
*Opened source link (http://www.angel.co)<br />
*Typed "incubator" in the search box<br />
*Clicked on "Search for 'incubator'<br />
<br />
===500 Results===<br />
<br />
Revised process:<br />
*Visit https://angel.co/search?q=incubator<br />
*Click More (a lot)<br />
*Save the HTML page as E:\projects\AngelList\AngelList.html<br />
*That gets you 500 (out of 1,447 claimed results)<br />
*Process the HTML using Regular Expressions to produce AngelListPages.txt, which is in the format:<br />
**URL\tConame<br />
*Note that restricting to "Companies" reduces it to 1,339 results.<br />
<br />
===Failed workarounds===<br />
<br />
Tried work around with pages:<br />
*https://angel.co/search?page=13&q=incubator&type=companies<br />
*https://angel.co/search?page=14&q=incubator&type=companies<br />
<br />
But 40 results per page, page 13 ends with No Results Yet after More, and page 14 opens with it. So still capped at 500 results.<br />
<br />
It appears from the format of results that Angellist has a type "incubator", though some likely incubators have other types (e.g., BMW iVentures Incubator is a "VC Firm" and Austin Technology Incubator is a "Company". And I can't see a way to restrict search by type.<br />
<br />
Signed up for an account as Ed Egan, ed@edegan.com, littleAmount. Then the link More -> Incubators takes you to https://angel.co/accelerators/apply. But there doesn't seem to be an advanced search. Count of incubator results increased while on the site!<br />
<br />
===400 Results===<br />
<br />
The page https://angel.co/incubators shows 6,054 companies. It stopped adding to the list after 20 More clicks, which turned out to be 400 results. Saved page as E:\projects\AngelList\Incubator - CompanyTypes - AngelList.html<br />
<br />
Given the page title, this is likely the just the '''"Incubator" company type''' organizations. However, there is some useful information that could be extracted from just that page. The incubator type also '''clearly includes accelerators''' and other things.<br />
<br />
===Possible Processes===<br />
<br />
In either of the cases below, we'd need a Selenium web driver to click More (a lot). For the later case, we'd also need to get the URL encodings (probably by hand) for the State names we'd like to search.<br />
<br />
====Restricted Search====<br />
Tried searching incubator TX but it looks like only the name and text descriptions is searched. Tried searching "incubator a", "incubator b", "incubator c" and each had less than 500 results, so that ''might'' work.<br />
<br />
====Company Search====<br />
<br />
https://angel.co/companies has a search function. You can select type as incubator and location as US: https://angel.co/companies?company_types[]=Incubator&locations[]=1688-United+States This gives 993 companies...<br />
<br />
It might be possible to go state by state. California has 385, Massachusetts has 36, New York has 141, etc. But again, this is '''limited to the incubator type'''.<br />
<br />
==Crawler==<br />
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. The master file containing the unique results from the two crawlers contains 1512 results.<br />
<br />
===Crawler By Company Type===<br />
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
===Crawler By Keyword===<br />
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=AngelList_Database&diff=25459AngelList Database2019-05-01T18:37:22Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=AngelList Database<br />
|Has project status=Active<br />
}}<br />
<br />
The purpose of this project is to build a database of incubators, perhaps as well as other ecosystem organizations, from AngelList.<br />
<br />
==Crawler Specification==<br />
<br />
===There are incubators here===<br />
<br />
Process from before:<br />
*Opened source link (http://www.angel.co)<br />
*Typed "incubator" in the search box<br />
*Clicked on "Search for 'incubator'<br />
<br />
===500 Results===<br />
<br />
Revised process:<br />
*Visit https://angel.co/search?q=incubator<br />
*Click More (a lot)<br />
*Save the HTML page as E:\projects\AngelList\AngelList.html<br />
*That gets you 500 (out of 1,447 claimed results)<br />
*Process the HTML using Regular Expressions to produce AngelListPages.txt, which is in the format:<br />
**URL\tConame<br />
*Note that restricting to "Companies" reduces it to 1,339 results.<br />
<br />
===Failed workarounds===<br />
<br />
Tried work around with pages:<br />
*https://angel.co/search?page=13&q=incubator&type=companies<br />
*https://angel.co/search?page=14&q=incubator&type=companies<br />
<br />
But 40 results per page, page 13 ends with No Results Yet after More, and page 14 opens with it. So still capped at 500 results.<br />
<br />
It appears from the format of results that Angellist has a type "incubator", though some likely incubators have other types (e.g., BMW iVentures Incubator is a "VC Firm" and Austin Technology Incubator is a "Company". And I can't see a way to restrict search by type.<br />
<br />
Signed up for an account as Ed Egan, ed@edegan.com, littleAmount. Then the link More -> Incubators takes you to https://angel.co/accelerators/apply. But there doesn't seem to be an advanced search. Count of incubator results increased while on the site!<br />
<br />
===400 Results===<br />
<br />
The page https://angel.co/incubators shows 6,054 companies. It stopped adding to the list after 20 More clicks, which turned out to be 400 results. Saved page as E:\projects\AngelList\Incubator - CompanyTypes - AngelList.html<br />
<br />
Given the page title, this is likely the just the '''"Incubator" company type''' organizations. However, there is some useful information that could be extracted from just that page. The incubator type also '''clearly includes accelerators''' and other things.<br />
<br />
===Possible Processes===<br />
<br />
In either of the cases below, we'd need a Selenium web driver to click More (a lot). For the later case, we'd also need to get the URL encodings (probably by hand) for the State names we'd like to search.<br />
<br />
====Restricted Search====<br />
Tried searching incubator TX but it looks like only the name and text descriptions is searched. Tried searching "incubator a", "incubator b", "incubator c" and each had less than 500 results, so that ''might'' work.<br />
<br />
====Company Search====<br />
<br />
https://angel.co/companies has a search function. You can select type as incubator and location as US: https://angel.co/companies?company_types[]=Incubator&locations[]=1688-United+States This gives 993 companies...<br />
<br />
It might be possible to go state by state. California has 385, Massachusetts has 36, New York has 141, etc. But again, this is '''limited to the incubator type'''.<br />
<br />
==Crawler==<br />
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. The master file containing the unique results from the two crawlers contains 1696 results.<br />
<br />
===Crawler By Company Type===<br />
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
===Crawler By Keyword===<br />
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=AngelList_Database&diff=25456AngelList Database2019-05-01T18:16:01Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=AngelList Database<br />
|Has project status=Active<br />
}}<br />
<br />
The purpose of this project is to build a database of incubators, perhaps as well as other ecosystem organizations, from AngelList.<br />
<br />
==Crawler Specification==<br />
<br />
===There are incubators here===<br />
<br />
Process from before:<br />
*Opened source link (http://www.angel.co)<br />
*Typed "incubator" in the search box<br />
*Clicked on "Search for 'incubator'<br />
<br />
===500 Results===<br />
<br />
Revised process:<br />
*Visit https://angel.co/search?q=incubator<br />
*Click More (a lot)<br />
*Save the HTML page as E:\projects\AngelList\AngelList.html<br />
*That gets you 500 (out of 1,447 claimed results)<br />
*Process the HTML using Regular Expressions to produce AngelListPages.txt, which is in the format:<br />
**URL\tConame<br />
*Note that restricting to "Companies" reduces it to 1,339 results.<br />
<br />
===Failed workarounds===<br />
<br />
Tried work around with pages:<br />
*https://angel.co/search?page=13&q=incubator&type=companies<br />
*https://angel.co/search?page=14&q=incubator&type=companies<br />
<br />
But 40 results per page, page 13 ends with No Results Yet after More, and page 14 opens with it. So still capped at 500 results.<br />
<br />
It appears from the format of results that Angellist has a type "incubator", though some likely incubators have other types (e.g., BMW iVentures Incubator is a "VC Firm" and Austin Technology Incubator is a "Company". And I can't see a way to restrict search by type.<br />
<br />
Signed up for an account as Ed Egan, ed@edegan.com, littleAmount. Then the link More -> Incubators takes you to https://angel.co/accelerators/apply. But there doesn't seem to be an advanced search. Count of incubator results increased while on the site!<br />
<br />
===400 Results===<br />
<br />
The page https://angel.co/incubators shows 6,054 companies. It stopped adding to the list after 20 More clicks, which turned out to be 400 results. Saved page as E:\projects\AngelList\Incubator - CompanyTypes - AngelList.html<br />
<br />
Given the page title, this is likely the just the '''"Incubator" company type''' organizations. However, there is some useful information that could be extracted from just that page. The incubator type also '''clearly includes accelerators''' and other things.<br />
<br />
===Possible Processes===<br />
<br />
In either of the cases below, we'd need a Selenium web driver to click More (a lot). For the later case, we'd also need to get the URL encodings (probably by hand) for the State names we'd like to search.<br />
<br />
====Restricted Search====<br />
Tried searching incubator TX but it looks like only the name and text descriptions is searched. Tried searching "incubator a", "incubator b", "incubator c" and each had less than 500 results, so that ''might'' work.<br />
<br />
====Company Search====<br />
<br />
https://angel.co/companies has a search function. You can select type as incubator and location as US: https://angel.co/companies?company_types[]=Incubator&locations[]=1688-United+States This gives 993 companies...<br />
<br />
It might be possible to go state by state. California has 385, Massachusetts has 36, New York has 141, etc. But again, this is '''limited to the incubator type'''.<br />
<br />
==Crawler==<br />
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. <br />
<br />
===Crawler By Company Type===<br />
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.<br />
<br />
===Crawler By Keyword===<br />
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Google_Crawler&diff=25266Google Crawler2019-04-15T18:13:52Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=Google Crawler<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Ecosystem Organization Classifier, Incubator Seed Data<br />
}}<br />
==Background==<br />
We wanted to create a web crawler that could collect data from google searches specific to individual cities. The searches could be in the format of "incubator" + "city, state". It was modeled off of previous researcher's web crawler which collected information on accelerators. We could not simply modify their web crawler as it used an outdated python module. <br />
<br />
The output from this crawler could be used in several ways:<br />
# The URLs determined to be incubator websites can be input for the [[Listing Page Classifier]] that takes an incubator website URL and identifies which page contains the client company listing.<br />
# The title text can be analyzed using n-grams to look for keywords in order to classify the URL as an incubator. This strategy is discussed in [[Geocoding Inventor Locations (Tool)]].<br />
# Key elements of a page's HTML can be feed into an adapted version of the [[Demo Day Page Google Classifier]] to identify demo day webpages that contain a list of cohort companies.<br />
# The page can be passed over to Amazon's [https://www.mturk.com/ Mechanical Turk] to outsource the task of classifying pages as being incubators.<br />
<br />
==Selenium Implementation==<br />
The selenium implementation of the crawler requires a downloaded chrome driver. The crawler opens the text file containing a list of locations in the format "city, state" with each entry separated by a newline. It appends the google search query domain "https://www.google.com/search?q=" to the front of the key term "incubator" and appropriately attaches the city and state name, using google escape characters for commas and spaces. Then the crawler uses the chromedriver browser to access the url and parse the results for each location. It's default is to parse 10 pages of results, meaning that approximately 100 lines of data are collected for each location.<br />
<br />
Relevant files, including python script, text files are located in<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\SeleniumCrawler <br />
<br />
==Beautiful Soup Implementation==<br />
When we created the web crawler, our first implementation used beautiful soup to directly "request" the url. The crawler took the same input file (city, state on each line, separated by newlines) and formatted queries in the same manner. Then, using beautifulsoup, the script opens each of the generated urls and parses the resulting page to collect the titles and urls of the results. The data collected is stored in a tab separated text file with each row containing city, state, title of result, url<br />
<br />
Relevant files, including python script, text files are located in<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\GoogleCrawler <br />
<br />
This crawler was frequently blocked, as directly performed queries to google and parsed the results with beautiful soup. Additionally, this implementation would only collect eight results for each location. To prevent the crawler from being blocked and collect more results, we decided to switch and use selenium.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Google_Crawler&diff=25265Google Crawler2019-04-15T18:12:21Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=Web Crawler<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Ecosystem Organization Classifier, Incubator Seed Data<br />
}}<br />
==Background==<br />
We wanted to create a web crawler that could collect data from google searches specific to individual cities. The searches could be in the format of "incubator" + "city, state". It was modeled off of previous researcher's web crawler which collected information on accelerators. We could not simply modify their web crawler as it used an outdated python module. <br />
<br />
The output from this crawler could be used in several ways:<br />
# The URLs determined to be incubator websites can be input for the [[Listing Page Classifier]] that takes an incubator website URL and identifies which page contains the client company listing.<br />
# The title text can be analyzed using n-grams to look for keywords in order to classify the URL as an incubator. This strategy is discussed in [[Geocoding Inventor Locations (Tool)]].<br />
# Key elements of a page's HTML can be feed into an adapted version of the [[Demo Day Page Google Classifier]] to identify demo day webpages that contain a list of cohort companies.<br />
# The page can be passed over to Amazon's [https://www.mturk.com/ Mechanical Turk] to outsource the task of classifying pages as being incubators.<br />
<br />
==Selenium Implementation==<br />
The selenium implementation of the crawler requires a downloaded chrome driver. The crawler opens the text file containing a list of locations in the format "city, state" with each entry separated by a newline. It appends the google search query domain "https://www.google.com/search?q=" to the front of the key term "incubator" and appropriately attaches the city and state name, using google escape characters for commas and spaces. Then the crawler uses the chromedriver browser to access the url and parse the results for each location. It's default is to parse 10 pages of results, meaning that approximately 100 lines of data are collected for each location.<br />
<br />
Relevant files, including python script, text files are located in<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\SeleniumCrawler <br />
<br />
==Beautiful Soup Implementation==<br />
When we created the web crawler, our first implementation used beautiful soup to directly "request" the url. The crawler took the same input file (city, state on each line, separated by newlines) and formatted queries in the same manner. Then, using beautifulsoup, the script opens each of the generated urls and parses the resulting page to collect the titles and urls of the results. The data collected is stored in a tab separated text file with each row containing city, state, title of result, url<br />
<br />
Relevant files, including python script, text files are located in<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\GoogleCrawler <br />
<br />
This crawler was frequently blocked, as directly performed queries to google and parsed the results with beautiful soup. Additionally, this implementation would only collect eight results for each location. To prevent the crawler from being blocked and collect more results, we decided to switch and use selenium.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Google_Crawler&diff=25169Google Crawler2019-04-08T19:20:58Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=Google Crawler<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Ecosystem Organization Classifier, Incubator Seed Data<br />
}}<br />
==Background==<br />
We wanted to create a google web crawler that could collect data from web searches specific to individual cities. The searches could be in the format of "incubator" + "city, state". It was modeled off of previous researcher's web crawler which collected information on accelerators. We could not simply modify their web crawler as it used an outdated python module. <br />
<br />
==Implementation==<br />
The crawler opens the text file containing a list of locations in the format "city, state", with each entry separated by a newline. It appends the google search query domain "https://www.google.com/search?q=" to the front of the key term "incubator" and appropriately attaches the city and state name, using google escape characters for commas and spaces. Then, using beautifulsoup, the script opens each of the generated urls and parses the resulting page to collect the titles and urls of the results. <br />
The titles and urls are stored in a csv file in the following format<br />
* first row: city, state<br />
* second row: titles of results<br />
* third row: urls of results<br />
* fourth row: blank<br />
This pattern repeats for each city, state query.<br />
<br />
Relevant files, including python script, text files and csv files are located in<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\GoogleCrawler</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Google_Crawler&diff=25168Google Crawler2019-04-08T19:04:44Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=Google Crawler<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Ecosystem Organization Classifier, Incubator Seed Data<br />
}}<br />
<br />
The crawler performs searches using google and collects the title and url for results. Searches are in the format of "incubator" + "city, state".<br />
<br />
The crawler opens the text file containing a list of locations in the format "city, state", with each entry separated by a newline. It appends the google search query domain "https://www.google.com/search?q=" to the front of the key term "incubator" and appropriately attaches the city and state name, using google escape characters for commas and spaces. Then, using beautifulsoup, the script opens each of the generated urls and parses the resulting page to collect the titles and urls of the results. <br />
The titles and urls are stored in a csv file in the following format<br />
* first row: city, state<br />
* second row: titles of results<br />
* third row: urls of results<br />
* fourth row: blank<br />
This pattern repeats for each city, state query.<br />
<br />
Relevant files, including python script, text files and csv files are located in<br />
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\GoogleCrawler</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25167Incubator Seed Data2019-04-08T18:52:23Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database, INBIA, Google Crawler<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
==Chosen Sources==<br />
<br />
*[[Crunchbase Database|Crunchbase]]<br />
*[[INBIA]]<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
<br />
Out of the first 10 unique company links -- 1 was a broken link, 7 were accelerators, and 2 could possibly be incubators<br />
|}<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Lots of programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
<br />
Out of the first 10 links, 3 bad links, 3 potential incubators, and 4 accelerators<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| 164<br />
| <br />
* Company Name<br />
* Link to homepage<br />
* Location <br />
* Short Description<br />
| reliable links directly to homepage of companies, can search within regions<br />
| Mix of incubators and accelerators. Can only filter region to North America.<br />
Out of the first 10 organizations in the US -- 6 were accelerators and 4 could potentially be incubators.<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
| <br />
|}<br />
<br />
<br />
<br />
==Region Specific Incubator Sources==<br />
Many state and local governments contain information on incubators and accelerators that operate within their jurisdiction. They do not provide comprehensive sources on all incubators within the US but could be helpful as sources to cross-reference with a larger database. <br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. <br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Projects&diff=25166Projects2019-04-08T18:51:41Z<p>AnneFreeman: </p>
<hr />
<div>==Main Project Tree==<br />
<br />
The list below shows the main project tree with the current active project assignments.<br />
<br />
#[[Ecosystem Organization Classifier]] -- {{#show: Ecosystem Organization Classifier | ?Has owner}} ({{#show: Ecosystem Organization Classifier | ?Has project status}})<br />
##[[Defining Incubators]] -- {{#show: Defining Incubators | ?Has owner}} ({{#show: Defining Incubators | ?Has project status}})<br />
###[[Formulate baseline attributes]] -- {{#show: Formulate baseline attributes | ?Has owner}} ({{#show: Formulate baseline attributes | ?Has project status}})<br />
##[[Incubator Seed Data]] -- {{#show: Incubator Seed Data | ?Has owner}} ({{#show: Incubator Seed Data | ?Has project status}})<br />
###[[Crunchbase Database]] -- {{#show: Crunchbase Database | ?Has owner}} ({{#show: Crunchbase Database | ?Has project status}})<br />
###[[INBIA]] -- {{#show: INBIA | ?Has owner}} ({{#show: INBIA | ?Has project status}})<br />
###[[Google Crawler]] -- {{#show: Google Crawler | ?Has owner}} ({{#show: Google Crawler | ?Has project status}})<br />
##[[Incubators in Five Ecosystems]] -- {{#show: Incubators in Five Ecosystems | ?Has owner}} ({{#show: Incubators in Five Ecosystems | ?Has project status}})<br />
##[[US Incubators]] -- {{#show: US Incubators | ?Has owner}} ({{#show: US Incubators | ?Has project status}})<br />
###[[Ecosystem: Austin or Houston]] -- {{#show: Ecosystem: Austin or Houston | ?Has owner}} ({{#show: Ecosystem: Austin or Houston | ?Has project status}})<br />
###[[Ecosystem: Burlington VT]] -- {{#show: Ecosystem: Burlington VT | ?Has owner}} ({{#show: Ecosystem: Burlington VT | ?Has project status}})<br />
###[[Ecosystem: Denver CO]] -- {{#show: Ecosystem: Denver CO | ?Has owner}} ({{#show: Ecosystem: Denver CO | ?Has project status}})<br />
###[[Ecosystem: Washington DC]] -- {{#show: Ecosystem: Washington DC | ?Has owner}} ({{#show: Ecosystem: Washington DC | ?Has project status}})<br />
###[[Ecosystem: Twin Cities MN]] -- {{#show: Ecosystem: Twin Cities MN | ?Has owner}} ({{#show: Ecosystem: Twin Cities MN | ?Has project status}})<br />
#[[Listing Page Classifier]] -- {{#show: Listing Page Classifier | ?Has owner}} ({{#show: Listing Page Classifier | ?Has project status}})<br />
#[[Listing Page Extractor]]<br />
##[[Domain Specific Language Research]] -- {{#show: Domain Specific Language Research | ?Has owner}} ({{#show: Domain Specific Language Research | ?Has project status}})<br />
##[[Listing Page Plugin Spec]] -- {{#show: Listing Page Plugin Spec | ?Has owner}} ({{#show: Listing Page Plugin Spec | ?Has project status}})<br />
##[[LP Extractor Protocol]] -- {{#show: LP Extractor Protocol | ?Has owner}} ({{#show: LP Extractor Protocol | ?Has project status}})<br />
<br />
==List of All Projects==<br />
<br />
[[category:Internal]]<br />
Information on each project may be found in the table below. All projects are included in [[:Category:Project]] if they use [[Template:Project]]. To create or edit a project, please use [[Form: Project]]<br />
<br />
{{#ask: <br />
[[Category:Project]] <br />
[[Has project status::Active]] <br />
| format=count<br />
| intro=<strong>Data summary: There are </strong><br />
| outro=<strong> active projects found.</strong><br />
}}<br />
<br />
{{#ask:<br />
[[Category:Project]]<br />
[[Has project status::Active]] <br />
|mainlabel=Project<br />
|?Has owner=Owner<br />
|?Does subsume=Subsumes<br />
|format=table<br />
}}<br />
<br />
Researchers may also wish to review related precursor projects done at the [[McNair Projects|McNair Center]].</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Google_Crawler&diff=25165Google Crawler2019-04-08T18:50:27Z<p>AnneFreeman: Created page with "{{McNair Projects |Has title=Google Crawler |Has owner=Anne Freeman, |Has project status=Active |Depends upon it=Ecosystem Organization Classifier, Incubator Seed Data }} Goog..."</p>
<hr />
<div>{{McNair Projects<br />
|Has title=Google Crawler<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Ecosystem Organization Classifier, Incubator Seed Data<br />
}}<br />
Google Crawler to collect information from web searches of the format "incubator" + "city name"</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25061Incubator Seed Data2019-04-03T17:00:49Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
<br />
Out of the first 10 unique company links -- 1 was a broken link, 7 were accelerators, and 2 could possibly be incubators<br />
|}<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Lots of programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
<br />
Out of the first 10 links, 3 bad links, 3 potential incubators, and 4 accelerators<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| 164<br />
| <br />
* Company Name<br />
* Link to homepage<br />
* Location <br />
* Short Description<br />
| reliable links directly to homepage of companies, can search within regions<br />
| Mix of incubators and accelerators. Can only filter region to North America.<br />
Out of the first 10 organizations in the US -- 6 were accelerators and 4 could potentially be incubators.<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
| <br />
|}<br />
<br />
<br />
<br />
==Region Specific Incubator Sources==<br />
Many state and local governments contain information on incubators and accelerators that operate within their jurisdiction. They do not provide comprehensive sources on all incubators within the US but could be helpful as sources to cross-reference with a larger database. <br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. <br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25060Incubator Seed Data2019-04-03T16:31:58Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
|}<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Lots of programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
<br />
Out of the first 10 links, 3 bad links, 3 potential incubators, and 4 accelerators<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| 164<br />
| <br />
* Company Name<br />
* Link to homepage<br />
* Location <br />
* Short Description<br />
| reliable links directly to homepage of companies, can search within regions<br />
| Mix of incubators and accelerators. Can only filter region to North America.<br />
Out of the first 10 organizations in the US -- 6 were accelerators and 4 could potentially be incubators.<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
| <br />
|}<br />
<br />
<br />
<br />
==Region Specific Incubator Sources==<br />
Many state and local governments contain information on incubators and accelerators that operate within their jurisdiction. They do not provide comprehensive sources on all incubators within the US but could be helpful as sources to cross-reference with a larger database. <br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. <br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25059Incubator Seed Data2019-04-03T16:25:23Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
|}<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Lots of programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
<br />
Out of the first 10 links, 3 bad links, 3 potential incubators, and 4 accelerators<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| 164<br />
| <br />
* Company Name<br />
* Link to homepage<br />
* Location <br />
* Short Description<br />
| reliable links directly to homepage of companies, can search within regions<br />
| Mix of incubators and accelerators. Can only filter region to North America<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
| <br />
|}<br />
<br />
<br />
<br />
==Region Specific Incubator Sources==<br />
Many state and local governments contain information on incubators and accelerators that operate within their jurisdiction. They do not provide comprehensive sources on all incubators within the US but could be helpful as sources to cross-reference with a larger database. <br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. <br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25058INBIA2019-04-03T16:02:02Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
<br />
<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file into a pandas dataframe<br />
# changes the urls by replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# opens each url and extract information using element tree parser<br />
# writes information for each url to csv file <br />
<br />
<br />
The crawler generates a csv file called INBIA_data.csv containing [company_name, street_address, city, state, zipcode, country, website, contact_person] and is populated by information from the 415 entries from the database. <br />
<br />
The csv file and the python script (inbia_scrape.py) are located in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25057INBIA2019-04-03T15:59:15Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file into a pandas dataframe<br />
# changes the urls by replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# opens each url and extract information using element tree parser<br />
# writes information for each url to csv file <br />
<br />
<br />
The crawler generates a csv file called INBIA_data.csv containing [company_name, street_address, city, state, zipcode, country, website, contact_person] and is populated by information from the 415 entries from the database. <br />
<br />
The csv file and the python script (inbia_scrape.py) are located in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25056INBIA2019-04-03T15:53:29Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file into a pandas dataframe<br />
# changes the urls by -- replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# open each url and extract information using element tree parser<br />
# write information for each url to csv file <br />
<br />
<br />
The crawler generates a csv file called INBIA_data.csv containing [company_name, street_address, city, state, zipcode, country, website, contact_person] and is populated by information from the 415 entries from the database. <br />
<br />
The csv file and the python script (inbia_scrape.py) are located in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25055INBIA2019-04-03T15:52:47Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file into a pandas dataframe<br />
# changes the urls by -- replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# open each url and extract information using element tree parser<br />
# write information for each url to csv file <br />
<br />
<br />
The crawler generates a csv file called INBIA_data.csv containing [company_name, street_address, city, state, zipcode, country, website, contact_person] and is populated by information from the 415 entries from the database. The crawler is called inbia_scrape.py. Both the csv file and the python script are located in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25054INBIA2019-04-03T15:50:31Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==<br />
We wrote a web crawler that <br />
# reads in the csv file into a pandas dataframe<br />
# changes the urls by -- replacing ''?c=companyprofile&amp;'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url<br />
# open each url and extract information using element tree parser<br />
# write information for each url to csv file <br />
<br />
This crawler is called inbia_scrape.py and it is located in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25053INBIA2019-04-03T15:40:15Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
}}<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25052INBIA2019-04-03T15:38:01Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
|Does subsume=Incubator Seed Data, Ecosystem Organization Classifier,<br />
}}<br />
==Initial Review of INBIA==<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25051INBIA2019-04-03T15:36:54Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
|Does subsume=Incubator Seed Data, Ecosystem Organization Classifier,<br />
}}<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] containing information on 415 incubators in the United States.<br />
<br />
<br />
==Initial Review of INBIA==<br />
The INBIA directory contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages. <br />
<br />
See [http://www.edegan.com/wiki/Incubator_Seed_Data#Evaluation_of_Sources_from_Specific_Google_Searches Wiki Page Table] for more details on source evaluations.<br />
<br />
==Retrieve URLS from INBIA Directory==<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.<br />
<br />
<br />
==Retrieve Data from URLS Generated==</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25050INBIA2019-04-03T15:31:39Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
|Does subsume=Incubator Seed Data, Ecosystem Organization Classifier,<br />
}}<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] containing information on 415 incubators in the United States.<br />
<br />
<br />
===INBIA===<br />
<br />
We retrieved the INBIA data as follows:<br />
#Go to http://exchange.inbia.org/network/findacompany/ and search US<br />
#Change to 100 results per page<br />
#Save HTML page of 0-100 <br />
#Choose next page, Save HTML page of 100-200<br />
#Sort Z-A<br />
#Save HTML page 418-318<br />
#Choose next page, Save HTML page of 318-218<br />
#Note that we are missing some that start with L and M<br />
#Search US L, Choose page with L as first letter, Save HTML of L<br />
#Search US M, Choose page with M as first letter, Save HTML of M<br />
<br />
Then process each of those html files with regular expressions in textpad<br />
*Search .*biobubblekey Replace #<br />
*Search ^[^#].*\n Replace NOTHING<br />
*Search .*href=\" Replace NOTHING<br />
*Search <\/a> Replace NOTHING<br />
*Search \"> Replace \t<br />
<br />
Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:<br />
1863 Ventures/Project 500 /?c=companyprofile&amp;UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e <br />
4th Sector Innovations /?c=companyprofile&amp;UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a <br />
712 Innovations /?c=companyprofile&amp;UserKey=531ad600-e11a-4c74-9f37-bace816b9325 <br />
AccelerateHER /?c=companyprofile&amp;UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b <br />
ACTION Innovation Network /?c=companyprofile&amp;UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802 <br />
<br />
We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&amp;</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center. <br />
<br />
We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25049Incubator Seed Data2019-04-03T15:31:31Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
|}<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Reliable links, includes university supported programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| 164<br />
| <br />
* Company Name<br />
* Link to homepage<br />
* Location <br />
* Short Description<br />
| reliable links directly to homepage of companies, can search within regions<br />
| Mix of incubators and accelerators. Can only filter region to North America<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
| <br />
|}<br />
<br />
<br />
<br />
==Region Specific Incubator Sources==<br />
Many state and local governments contain information on incubators and accelerators that operate within their jurisdiction. They do not provide comprehensive sources on all incubators within the US but could be helpful as sources to cross-reference with a larger database. <br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. <br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Projects&diff=25048Projects2019-04-03T15:30:52Z<p>AnneFreeman: </p>
<hr />
<div>==Main Project Tree==<br />
<br />
The list below shows the main project tree with the current active project assignments.<br />
<br />
#[[Ecosystem Organization Classifier]]<br />
##[[Defining Incubators]] -- {{#show: Defining Incubators | ?Has owner}} ({{#show: Defining Incubators | ?Has project status}})<br />
###[[Formulate baseline attributes]] -- {{#show: Formulate baseline attributes | ?Has owner}} ({{#show: Formulate baseline attributes | ?Has project status}})<br />
##[[Incubator Seed Data]] -- {{#show: Incubator Seed Data | ?Has owner}} ({{#show: Incubator Seed Data | ?Has project status}})<br />
###[[Crunchbase Database]] -- {{#show: Crunchbase Database | ?Has owner}} ({{#show: Crunchbase Database | ?Has project status}})<br />
###[[INBIA]] -- {{#show: INBIA | ?Has owner}} ({{#show: INBIA | ?Has project status}})<br />
##[[Incubators in Five Ecosystems]] -- {{#show: Incubators in Five Ecosystems | ?Has owner}} ({{#show: Incubators in Five Ecosystems | ?Has project status}})<br />
###[[Ecosystem: Austin or Houston]] -- {{#show: Ecosystem: Austin or Houston | ?Has owner}} ({{#show: Ecosystem: Austin or Houston | ?Has project status}})<br />
###[[Ecosystem: Burlington VT]] -- {{#show: Ecosystem: Burlington VT | ?Has owner}} ({{#show: Ecosystem: Burlington VT | ?Has project status}})<br />
###[[Ecosystem: Denver CO]] -- {{#show: Ecosystem: Denver CO | ?Has owner}} ({{#show: Ecosystem: Denver CO | ?Has project status}})<br />
###[[Ecosystem: Washington DC]] -- {{#show: Ecosystem: Washington DC | ?Has owner}} ({{#show: Ecosystem: Washington DC | ?Has project status}})<br />
###[[Ecosystem: Twin Cities MN]] -- {{#show: Ecosystem: Twin Cities MN | ?Has owner}} ({{#show: Ecosystem: Twin Cities MN | ?Has project status}})<br />
#[[Listing Page Classifier]] -- {{#show: Listing Page Classifier | ?Has owner}} ({{#show: Listing Page Classifier | ?Has project status}})<br />
#[[Listing Page Extractor]]<br />
##[[Domain Specific Language Research]] -- {{#show: Domain Specific Language Research | ?Has owner}} ({{#show: Domain Specific Language Research | ?Has project status}})<br />
##[[Listing Page Plugin Spec]] -- {{#show: Listing Page Plugin Spec | ?Has owner}} ({{#show: Listing Page Plugin Spec | ?Has project status}})<br />
##[[LP Extractor Protocol]] -- {{#show: LP Extractor Protocol | ?Has owner}} ({{#show: LP Extractor Protocol | ?Has project status}})<br />
<br />
==List of All Projects==<br />
<br />
[[category:Internal]]<br />
Information on each project may be found in the table below. All projects are included in [[:Category:Project]] if they use [[Template:Project]]. To create or edit a project, please use [[Form: Project]]<br />
<br />
{{#ask: <br />
[[Category:Project]] <br />
[[Has project status::Active]] <br />
| format=count<br />
| intro=<strong>Data summary: There are </strong><br />
| outro=<strong> active projects found.</strong><br />
}}<br />
<br />
{{#ask:<br />
[[Category:Project]]<br />
[[Has project status::Active]] <br />
|mainlabel=Project<br />
|?Has owner=Owner<br />
|?Does subsume=Subsumes<br />
|format=table<br />
}}<br />
<br />
Researchers may also wish to review related precursor projects done at the [[McNair Projects|McNair Center]].</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25047INBIA2019-04-03T15:30:41Z<p>AnneFreeman: </p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Depends upon it=Incubator Seed Data<br />
|Does subsume=Incubator Seed Data, Ecosystem Organization Classifier,<br />
}}<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] containing information on 415 incubators in the United States.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=INBIA&diff=25046INBIA2019-04-03T15:29:27Z<p>AnneFreeman: Created page with "{{McNair Projects |Has title=INBIA |Has owner=Anne Freeman, |Depends upon it=Incubator Seed Data |Does subsume=Incubator Seed Data, Ecosystem Organization Classifier, }} The [..."</p>
<hr />
<div>{{McNair Projects<br />
|Has title=INBIA<br />
|Has owner=Anne Freeman,<br />
|Depends upon it=Incubator Seed Data<br />
|Does subsume=Incubator Seed Data, Ecosystem Organization Classifier,<br />
}}<br />
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] containing information on 415 incubators in the United States.</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25024Incubator Seed Data2019-04-01T16:42:41Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
|}<br />
<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Reliable links, includes university supported programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| 164<br />
| <br />
* Company Name<br />
* Link to homepage<br />
* Location <br />
* Short Description<br />
| reliable links directly to homepage of companies, can search within regions<br />
| Mix of incubators and accelerators. Can only filter region to North America<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
| <br />
|}<br />
<br />
<br />
<br />
==Region Specific Incubator Sources==<br />
Many state and local governments contain information on incubators and accelerators that operate within their jurisdiction. They do not provide comprehensive sources on all incubators within the US but could be helpful as sources to cross-reference with a larger database. <br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. <br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25023Incubator Seed Data2019-04-01T16:40:35Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
|}<br />
<br />
==Region Specific Incubator Sources==<br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. However, the sites are limited in scope to a specific state/region and cannot be used as a sole data source. Additionally, many state/local governments have lists of incubators on their websites that could be useful for identifying local organizations within the region.<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Reliable links, includes university supported programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| 164<br />
| <br />
* Company Name<br />
* Link to homepage<br />
* Location <br />
* Short Description<br />
| reliable links directly to homepage of companies, can search within regions<br />
| Mix of incubators and accelerators. Can only filter region to North America<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
| <br />
|}<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25022Incubator Seed Data2019-04-01T16:37:37Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
|}<br />
<br />
==Region Specific Incubator Sources==<br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. However, the sites are limited in scope to a specific state/region and cannot be used as a sole data source. Additionally, many state/local governments have lists of incubators on their websites that could be useful for identifying local organizations within the region.<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Reliable links, includes university supported programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=dc&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| how many<br />
| <br />
|<br />
|<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
|<br />
|}<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25021Incubator Seed Data2019-04-01T16:36:58Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
|}<br />
<br />
==Region Specific Incubator Sources==<br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. However, the sites are limited in scope to a specific state/region and cannot be used as a sole data source. Additionally, many state/local governments have lists of incubators on their websites that could be useful for identifying local organizations within the region.<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
|[https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/ Boston Startup Guide]<br />
| Scrolled down to the section labeled "Startup incubators in Boston"<br />
| 10<br />
| Boston<br />
|<br />
*Company Name and URL<br />
* Capital Provided & equity taken<br />
* Application Process<br />
| reliable links<br />
| relatively unformatted data that would be challenging to use. Limited in scope<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://dmped.dc.gov/page/incubators-accelerators-and-co-working-spaces Incubators/Accelerators In DC]<br />
| Open source link and count the number of incubators, I did not include co-working spaces<br />
| 15<br />
| DC<br />
| Incubator name and link to it and brief description<br />
| reliable links, helpful description<br />
| limited dataset, mix of incubators and other organizations<br />
<br />
|}<br />
<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [Source: http://www.acceleratorinfo.com/see-all.html Accelerator Info]<br />
| <br />
* Opened source link<br />
* Copied first column (“All Startup Support Programs”) into excel (215)<br />
* Copied second column (“All University Programs”) into excel (249)<br />
| 464<br />
| Each link on parent list leads to individual '''home page url''' of organization<br />
| Reliable links, includes university supported programs<br />
| Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator<br />
|-<br />
| [https://www.galidata.org/accelerators/directory/?keyword=dc&region=north_america Galidata]<br />
| Filter by Region: North America<br />
| how many<br />
| <br />
|<br />
|<br />
|-<br />
| [[:Crunchbase Database]]<br />
| See the [[Crunchbase Database]] project page for more information.<br />
| <br />
|<br />
|<br />
|<br />
|}<br />
<br />
<br />
<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=25020Incubator Seed Data2019-04-01T16:05:30Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
<br />
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.<br />
<br />
Status: We have identified [[Crunchbase Database|Crunchbase]] as one structured source for incubators, and we have a license for Crunchbase Pro. We are currently evaluating other sources, as described on this page. Given the paucity of strong sources, we will likely use a custom Google crawler (searching "incubator cityname" and similar) as a source. We will also be creating a new [[VentureXpert Database]] using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.<br />
<br />
==Goal==<br />
<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
==Evaluation of Sources from Specific Google Searches==<br />
<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|-<br />
| [https://thembaisdead.com/list-of-startup-accelerators-and-incubators/ The MBA Is Dead ]<br />
| <br />
* Opened source link. <br />
* Selected "Region" >> "US & Canada"<br />
| 186 Results<br />
| <br />
* Click on each accelerator/incubator to get data<br />
* City and Country<br />
* low equity, high offer, high value<br />
* high equity, low offer, low value<br />
* link to company homepage<br />
* categories of companies it accelerates/incubates<br />
| Can search by region or by category of companies<br />
| Seems to be a lot of data on accelerators and fewer incubators included<br />
|}<br />
<br />
==Evaluation of Sources from INIBIA List of US Accelerator Associations==<br />
<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. However, the sites are limited in scope to a specific state/region and cannot be used as a sole data source.<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|}<br />
<br />
<br />
== [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable ==<br />
<br />
===Source: http://www.acceleratorinfo.com/see-all.html===<br />
# Opened source link<br />
# Copied links from first column (“All Startup Support Programs”) into excel and returned 215 results<br />
# Copied links from second column (“All University Programs”) into excel and returned 249 results)<br />
# Each link on parent list leads to individual '''home page url''' of organization<br />
'''Review'''<br />
* Provides only links, does not separate between incubator and accelerator, some of the university supported programs may not be considered either an incubator or an accelerator<br />
<br />
===Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/===<br />
# Scrolled down to the section labeled "Startup incubators in Boston"<br />
# Counted the number of incubators in Boston (10)<br />
# Data<br />
:# Company Name and URL<br />
:# Capital Provide<br />
:# Equity taken<br />
:# Application Process<br />
'''Review'''<br />
* The data is relatively unformatted and would be a challenge to use<br />
* It is limited in scope to the Boston Area and only provides information on 10 incubators<br />
<br />
===Source: https://www.galidata.org/accelerators/directory/?keyword=dc&region=north_america===<br />
# Filter by Region: North America<br />
# Search: [US city]<br />
* Contains a mix of incubators and accelerators<br />
* Source: [https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/ Link]<br />
<br />
<br />
===Source: [[:Crunchbase Database]]===<br />
<br />
See the [[Crunchbase Database]] project page for more information.<br />
<br />
==[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable==<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
*'''Source:''' https://www.gan.co/engage/accelerators/<br />
:*Reason'': does not include information on incubators<br />
:*''Learn More'': https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=24835Incubator Seed Data2019-03-27T18:46:09Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
=Goal=<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
=Evaluation of Sources from Specific Google Searches=<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
=Evaluation of Sources from INIBIA List of US Accelerator Associations=<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. However, the sites are limited in scope to a specific state/region and cannot be used as a sole data source.<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|-<br />
| [https://www.viethconsulting.com/members/googlemaps/google_maps.php?mode=normal&orgcode=MBIA Michigan Business Innovation Association]<br />
| Open source link and count number of incubators listed in the column next to the map<br />
| 15<br />
| Michigan<br />
| incubator name, address, link to location on map, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|- <br />
| [https://livefreeandstart.com/resources/incubators-makerspaces/ NH Tech Alliance ]<br />
| Open source link and count organizations listed under "NHBIN Member Locations"<br />
| 8<br />
| New Hampshire<br />
| incubator name, town within NH, brief description, and link to home page <br />
| reliable links only data on incubators<br />
| limited dataset, not very structured organization on website<br />
|-<br />
| [http://www.ncincubation.org/NCIncubators.aspx NC Business Incubation Association]<br />
| Open source link, click on each county and count the number of business incubators<br />
| 32<br />
| North Carolina<br />
| Incubator name, address, program directors, and link<br />
| only data on incubators<br />
| limited dataset, hard to navigate site with web crawler, some of the incubators do not have links<br />
|-<br />
| [https://www.okbia.org/our-members Oklahoma Business Incubator Association]<br />
| Open source link and count the number of incubators<br />
| 29<br />
| Oklahoma<br />
| Incubator name and link to it<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|}<br />
<br />
<br />
= [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable = <br />
==Source: http://www.acceleratorinfo.com/see-all.html==<br />
# Opened source link<br />
# Copied links from first column (“All Startup Support Programs”) into excel and returned 215 results<br />
# Copied links from second column (“All University Programs”) into excel and returned 249 results)<br />
# Each link on parent list leads to individual '''home page url''' of organization<br />
'''Review'''<br />
* Provides only links, does not separate between incubator and accelerator, some of the university supported programs may not be considered either an incubator or an accelerator<br />
==Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/==<br />
# Scrolled down to the section labeled "Startup incubators in Boston"<br />
# Counted the number of incubators in Boston (10)<br />
# Data<br />
:# Company Name and URL<br />
:# Capital Provide<br />
:# Equity taken<br />
:# Application Process<br />
'''Review'''<br />
* The data is relatively unformatted and would be a challenge to use<br />
* It is limited in scope to the Boston Area and only provides information on 10 incubators<br />
<br />
<br />
<br />
<br />
=[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable=<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=24833Incubator Seed Data2019-03-27T17:40:11Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
=Goal=<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
=Evaluation of Sources from Specific Google Searches=<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
=Evaluation of Sources from INIBIA List of US Accelerator Associations=<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format. However, the sites are limited in scope to a specific state/region and cannot be used as a sole data source.<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|}<br />
<br />
<br />
<br />
<br />
= [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable = <br />
==Source: http://www.acceleratorinfo.com/see-all.html==<br />
# Opened source link<br />
# Copied links from first column (“All Startup Support Programs”) into excel and returned 215 results<br />
# Copied links from second column (“All University Programs”) into excel and returned 249 results)<br />
# Each link on parent list leads to individual '''home page url''' of organization<br />
'''Review'''<br />
* Provides only links, does not separate between incubator and accelerator, some of the university supported programs may not be considered either an incubator or an accelerator<br />
==Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/==<br />
# Scrolled down to the section labeled "Startup incubators in Boston"<br />
# Counted the number of incubators in Boston (10)<br />
# Data<br />
:# Company Name and URL<br />
:# Capital Provide<br />
:# Equity taken<br />
:# Application Process<br />
'''Review'''<br />
* The data is relatively unformatted and would be a challenge to use<br />
* It is limited in scope to the Boston Area and only provides information on 10 incubators<br />
<br />
<br />
<br />
<br />
=[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable=<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=24831Incubator Seed Data2019-03-27T17:37:47Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
=Goal=<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
=Evaluation of Sources from Specific Google Searches=<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
=Evaluation of Sources from INIBIA List of US Accelerator Associations=<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database.<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Region<br />
! Data<br />
! Benefits<br />
! Limitations<br />
<br />
|-<br />
| [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]<br />
| Opened source link and counted incubators listed on the home page<br />
| 12<br />
| Alabama<br />
| Incubator Name, Brief Description, and a link to the home page<br />
| Reliable links that are filtered to include only incubators<br />
| only contains information on incubators in Alabama that are associated with NBIA<br />
|-<br />
| [http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association]<br />
| Opened source link and then opened links for each of the four regions in Florida<br />
| 66<br />
| Florida<br />
| source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page<br />
| Provides reliable links. Filtered to include only information on incubators<br />
| May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators.<br />
|-<br />
| [https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association]<br />
| Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared<br />
| 28<br />
| Louisiana<br />
| <br />
* incubator name<br />
* contact name<br />
* address and phone number<br />
* link to website<br />
| data is filtered to include only incubators, links are reliable<br />
| only incubators in state of Louisiana, limited data set<br />
|-<br />
| [http://incubatemaryland.org/incubators/ Maryland Business Incubation Association]<br />
| Opened source link and counted number of incubators listed on the page<br />
| 35<br />
| Maryland<br />
| Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page<br />
| Reliable links, filtered to include only incubators<br />
| It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland.<br />
|-<br />
| [https://www.massincubators.org/ Massachusetts Association of Business Incubators]<br />
| Open source link and count number of incubators listed on the page<br />
| 20<br />
| Massachusetts<br />
| incubator name, short description, and link to incubator home page<br />
| reliable links, only data on incubators<br />
| limited dataset<br />
|}<br />
<br />
<br />
<br />
<br />
= [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable = <br />
==Source: http://www.acceleratorinfo.com/see-all.html==<br />
# Opened source link<br />
# Copied links from first column (“All Startup Support Programs”) into excel and returned 215 results<br />
# Copied links from second column (“All University Programs”) into excel and returned 249 results)<br />
# Each link on parent list leads to individual '''home page url''' of organization<br />
'''Review'''<br />
* Provides only links, does not separate between incubator and accelerator, some of the university supported programs may not be considered either an incubator or an accelerator<br />
==Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/==<br />
# Scrolled down to the section labeled "Startup incubators in Boston"<br />
# Counted the number of incubators in Boston (10)<br />
# Data<br />
:# Company Name and URL<br />
:# Capital Provide<br />
:# Equity taken<br />
:# Application Process<br />
'''Review'''<br />
* The data is relatively unformatted and would be a challenge to use<br />
* It is limited in scope to the Boston Area and only provides information on 10 incubators<br />
<br />
<br />
<br />
<br />
=[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable=<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=24826Incubator Seed Data2019-03-27T17:16:38Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
=Goal=<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
=Evaluation of Sources from Specific Google Searches=<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! How many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
| [https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations Clustermapping]<br />
| Opened Source Link<br />
| 292<br />
| <br />
* Company name with link to a separate page within cluster mapping<br />
* on that page there is a link to the incubator's website<br />
| Provides a long list of entrepreneurship organizations<br />
| Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together. <br />
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
|}<br />
<br />
<br />
<br />
<br />
<br />
=Evaluation of Sources from INIBIA List of US Accelerator Associations=<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database.<br />
<br />
==Source: [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]==<br />
# Opened source link<br />
# Counted incubators listed on the home page and found information for 12 incubators<br />
# Data<br />
:* Incubator Name, Brief Description, and a link to the home page<br />
'''Review'''<br />
* Provides reliable links to incubators within Alabama<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: only incubators within Alabama<br />
==Source: http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association (FBIA)==<br />
# Opened source link<br />
# Opened links for each of the four regions in Florida<br />
# Counted incubators listed in each of the four regions to get information on 66 incubators<br />
# Data<br />
:* main site contains links to four regions in Florida, each region contains the following data:<br />
::* incubator name, address, phone number and link to home page<br />
'''Review'''<br />
* Provides reliable links to incubators within Florida<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: may be challenging for a web crawler to navigate, as main page has links to regions which has links to home pages of incubators <br />
* Limitation: only incubators within Florida<br />
==Source: https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association==<br />
# Opened source link<br />
# Copied the data on incubators into a text editor and searched for how many times the word "E-Mail" appeared<br />
# Site contained data on 28 incubators in the state of Louisiana<br />
# Data<br />
:* Main site contains <br />
:* incubator name<br />
:* contact name<br />
:* address and phone number<br />
:* link to website<br />
'''Review'''<br />
* Provides reliable links to incubators within Louisiana<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: may be challenging for a web crawler to navigate, as main page does not contain information on incubator but rather has links to home pages of incubators <br />
* Limitation: only incubators within Louisiana<br />
==Source: http://incubatemaryland.org/incubators/ Maryland Business Incubation Association==<br />
# Opened source link<br />
# Counted number of incubators listed on the page and found information on 35 incubators<br />
# Data<br />
:* Main site contains incubator name, short description and link to another page within main site read more which contains:<br />
::* link to incubate home page<br />
'''Review'''<br />
* Provides reliable links to incubators within Maryland<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: may be challenging for a web crawler to navigate, as main page contains links internal to the site which then link to the home pages of incubators<br />
* Limitation: only incubators within Maryland<br />
==Source: https://www.massincubators.org/ Massachusetts Association of Business Incubators==<br />
# Opened source link<br />
# Counted number of incubators listed on the page and found information on 20 incubators<br />
# Data<br />
:* Main site contains incubator name, short description and link to the incubator's home page<br />
'''Review'''<br />
* Provides reliable links to incubators within Massachusetts<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: only incubators within Massachusetts<br />
<br />
<br />
<br />
<br />
= [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable = <br />
==Source: http://www.acceleratorinfo.com/see-all.html==<br />
# Opened source link<br />
# Copied links from first column (“All Startup Support Programs”) into excel and returned 215 results<br />
# Copied links from second column (“All University Programs”) into excel and returned 249 results)<br />
# Each link on parent list leads to individual '''home page url''' of organization<br />
'''Review'''<br />
* Provides only links, does not separate between incubator and accelerator, some of the university supported programs may not be considered either an incubator or an accelerator<br />
==Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/==<br />
# Scrolled down to the section labeled "Startup incubators in Boston"<br />
# Counted the number of incubators in Boston (10)<br />
# Data<br />
:# Company Name and URL<br />
:# Capital Provide<br />
:# Equity taken<br />
:# Application Process<br />
'''Review'''<br />
* The data is relatively unformatted and would be a challenge to use<br />
* It is limited in scope to the Boston Area and only provides information on 10 incubators<br />
<br />
<br />
<br />
<br />
=[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable=<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreemanhttp://www.edegan.com/mediawiki/index.php?title=Incubator_Seed_Data&diff=24824Incubator Seed Data2019-03-27T17:12:51Z<p>AnneFreeman: </p>
<hr />
<div>{{Project<br />
|Has title=Incubator Seed Data<br />
|Has owner=Anne Freeman,<br />
|Has project status=Active<br />
|Is dependent on=Crunchbase Database,<br />
}}<br />
=Goal=<br />
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators ([[Formulate_baseline_attributes]]). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.<br />
<br />
<br />
=Evaluation of Sources from Specific Google Searches=<br />
*Searches included:<br />
:* "incubator database"<br />
:* "us business incubators database"<br />
{| class="wikitable"<br />
|-<br />
! Source<br />
! Directions<br />
! Data on how many?<br />
! Data<br />
! Benefits<br />
! Limitations<br />
|-<br />
| [https://www.whartoneclub.com/resources/entrepreneurship/incubators/ Whartoneclub Incubators]<br />
| <br />
* Opened source link. <br />
* Copied results from "U.S. Based Incubators" into excel spreadsheet. <br />
| 21<br />
| <br />
* Name, City, State<br />
* Url to home page of incubator<br />
| Links to the home page of incubator<br />
| May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator)<br />
|-<br />
| [http://exchange.inbia.org/network/findacompany/ National Business Incubation Association]<br />
| <br />
* Opened source link<br />
* Entered "United States" for country and clicked "Find Companies"<br />
| 415<br />
| <br />
* Company Name and address<br />
* Link to another page within inbia on that page there is a link to the incubator's homepage<br />
| The database contains information on a lot of economic development institutions and would provide a mass quantity of data<br />
| Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators. <br />
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs<br />
|-<br />
|<br />
|<br />
|<br />
|<br />
|<br />
|<br />
|}<br />
<br />
<br />
<br />
==Source: https://www.clustermapping.us/organization-type/innovation-and-entrepreneurship-support-organizations ==<br />
# Opened source link<br />
# Received 292 results for Innovation and Entrepreneurship Support Organizations in the US<br />
# Data<br />
:# Brief Description<br />
:# Company name with link to a separate page within cluster mapping<br />
::# Link to Company Website<br />
::# Regions<br />
'''Review'''<br />
* Provides a long list of entrepreneurship organizations<br />
* Limitation: Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website.<br />
* Limitation: Different types of entrepreneurship organizations are mixed together<br />
* Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization.<br />
<br />
<br />
<br />
=Evaluation of Sources from INIBIA List of US Accelerator Associations=<br />
The National Business Incubation Association maintains a list of [https://inbia.org/services/resources/ U.S. Incubation Associations]. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database.<br />
<br />
==Source: [http://asbdc.org/start-ups/incubators-in-alabama/ Alabama Business Incubation Network]==<br />
# Opened source link<br />
# Counted incubators listed on the home page and found information for 12 incubators<br />
# Data<br />
:* Incubator Name, Brief Description, and a link to the home page<br />
'''Review'''<br />
* Provides reliable links to incubators within Alabama<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: only incubators within Alabama<br />
==Source: http://www.fbiaonline.org/Incubators/incubators.htm Florida Business Incubation Association (FBIA)==<br />
# Opened source link<br />
# Opened links for each of the four regions in Florida<br />
# Counted incubators listed in each of the four regions to get information on 66 incubators<br />
# Data<br />
:* main site contains links to four regions in Florida, each region contains the following data:<br />
::* incubator name, address, phone number and link to home page<br />
'''Review'''<br />
* Provides reliable links to incubators within Florida<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: may be challenging for a web crawler to navigate, as main page has links to regions which has links to home pages of incubators <br />
* Limitation: only incubators within Florida<br />
==Source: https://www.louisianaincubation.org/current-members Louisiana Business Incubation Association==<br />
# Opened source link<br />
# Copied the data on incubators into a text editor and searched for how many times the word "E-Mail" appeared<br />
# Site contained data on 28 incubators in the state of Louisiana<br />
# Data<br />
:* Main site contains <br />
:* incubator name<br />
:* contact name<br />
:* address and phone number<br />
:* link to website<br />
'''Review'''<br />
* Provides reliable links to incubators within Louisiana<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: may be challenging for a web crawler to navigate, as main page does not contain information on incubator but rather has links to home pages of incubators <br />
* Limitation: only incubators within Louisiana<br />
==Source: http://incubatemaryland.org/incubators/ Maryland Business Incubation Association==<br />
# Opened source link<br />
# Counted number of incubators listed on the page and found information on 35 incubators<br />
# Data<br />
:* Main site contains incubator name, short description and link to another page within main site read more which contains:<br />
::* link to incubate home page<br />
'''Review'''<br />
* Provides reliable links to incubators within Maryland<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: may be challenging for a web crawler to navigate, as main page contains links internal to the site which then link to the home pages of incubators<br />
* Limitation: only incubators within Maryland<br />
==Source: https://www.massincubators.org/ Massachusetts Association of Business Incubators==<br />
# Opened source link<br />
# Counted number of incubators listed on the page and found information on 20 incubators<br />
# Data<br />
:* Main site contains incubator name, short description and link to the incubator's home page<br />
'''Review'''<br />
* Provides reliable links to incubators within Massachusetts<br />
* Benefit: data is filtered to include only incubators<br />
* Limitation: only incubators within Massachusetts<br />
<br />
<br />
<br />
<br />
= [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are Potentially Viable = <br />
==Source: http://www.acceleratorinfo.com/see-all.html==<br />
# Opened source link<br />
# Copied links from first column (“All Startup Support Programs”) into excel and returned 215 results<br />
# Copied links from second column (“All University Programs”) into excel and returned 249 results)<br />
# Each link on parent list leads to individual '''home page url''' of organization<br />
'''Review'''<br />
* Provides only links, does not separate between incubator and accelerator, some of the university supported programs may not be considered either an incubator or an accelerator<br />
==Source: https://bostonstartupsguide.com/guide/every-boston-startup-accelerator-incubator/==<br />
# Scrolled down to the section labeled "Startup incubators in Boston"<br />
# Counted the number of incubators in Boston (10)<br />
# Data<br />
:# Company Name and URL<br />
:# Capital Provide<br />
:# Equity taken<br />
:# Application Process<br />
'''Review'''<br />
* The data is relatively unformatted and would be a challenge to use<br />
* It is limited in scope to the Boston Area and only provides information on 10 incubators<br />
<br />
<br />
<br />
<br />
=[[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that are not viable=<br />
* '''Source:''' http://www.seed-db.com/accelerators<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fwww.seed-db.com.2Faccelerators | Previous Research]]<br />
* '''Source:''' https://www.f6s.com/programs?type<br />
:* ''Reason'': data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.f6s.com.2Fprograms.3Ftype | Previous Research]]<br />
* '''Source:''' http://gust.com/usa-canada-accelerator-report-2015/<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_http:.2F.2Fgust.com.2Fusa-canada-accelerator-report-2015.2F | Previous Research]]<br />
* '''Source:''' https://www.corporate-accelerators.net/database/<br />
:* ''Reason'': this website is no longer active, the link will not work<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fwww.corporate-accelerators.net.2Fdatabase.2F | Previous Research]]<br />
<br />
* '''Source:''' https://github.com/florianheinemann/www-corporate-accelerators-net/blob/master/_data/Accelerators.json<br />
:* ''Reason'': does not include information on incubators<br />
:* ''Learn More'': [[Accelerator_Seed_List_(Data)#Source:_https:.2F.2Fgithub.com.2Fflorianheinemann.2Fwww-corporate-accelerators-net.2Fblob.2Fmaster.2F_data.2FAccelerators.json| Previous Research]]</div>AnneFreeman