|Has owner=Anne Freeman,
|Depends upon it=Incubator Seed Data
==Initial Review of INBIA==
The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] that contains information for 415 incubators within the United States. It provides reliable links to a secondary page within the INBIA domain. This page contains information including the incubator's name, address, a link to the home page of their website, and information for key contacts. The secondary pages have the same HTML structure and are reliable in the data they contain, making INBIA an ideal candidate for web crawling methods to collect data from the internal pages.
==Retrieve Data from
We wrote a web crawler that
# reads in the csv file into a pandas dataframe# changes the urls by
-- replacing ''?c=companyprofile&'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url# open each url and extract information using element tree parser# write information for each url to csv file
crawler generates a csv file called INBIA_data.csv containing [company_name, street_address, city, state, zipcode, country, website, contact_person] and is populated by information from the 415 entries from the database.
csv file and the python script (inbia_scrape.py ) are located in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA