Changes

Jump to navigation Jump to search
49 bytes added ,  09:24, 29 May 2019
no edit summary
==Retrieve Data from URLS Generated==
We wrote a web crawler that
# reads in the csv file containing the URLs to scrape into a pandas dataframe
# changes the urls by replacing ''?c=companyprofile&'' with ''companyprofile?'' and appending the domain http://exchange.inbia.org/network/findacompany to each url
# opens each url and extract extracts information using element tree parser# writes collects information for from each url to csv and stores it in a txt file
The crawler generates a csv tab separated text file called INBIA_data.csv txt containing [company_name, street_address, city, state, zipcode, country, website, contact_person] and is populated by information from the 415 entries from the database.
The csv txt file and the python script (inbia_scrape.py) are located in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\INBIA
83

edits

Navigation menu