Changes

1,523 bytes added , 12:14, 19 June 2017

Created page with "=Overview= The goal of this project was to acquire accelerator data from [http://gan.co/members] ==Desired Fields== 1. Name 2. Country/ Location 3. Seed Money 4...."

=Overview=
The goal of this project was to acquire accelerator data from [http://gan.co/members] 
==Desired Fields==
1. Name
2. Country/ Location
3. Seed Money
4. Equity
5. Funding Raised
6. Companies
7. Companies Funded
8. Companies Funding Raised
9. Exits
10. Exit Funding
11. Employees
12. Mentors
13. Years

==Scraper Use==
The scraper is implemented in using BeautifulSoup, a Python based web scraper. 
The scraper requires the following libraries:
1. Pandas
2. BeautifulSoup
It takes in as input the full HTML file of the website, converts it to "soup" object and scrapes the resulting html file. 
The items from the scrape are inputted into a Pandas DataFrame object, which is then converted to a tab-separated text file. 
When converting to text file, make sure to set the explicit encoding = "utf-8", sep = "\t", and index = False 
This ensures that the resulting strings are converted properly, the file is tab separated, and the data looks clean, respectively. 

==Code Location and Necessary Files==
The code and the resulting text file are located here:
E:\McNair\Projects\Accelerators\Web Scraping for Accelerators
The html file to scrape is located here:
E:\McNair\Projects\Accelerators\GAN_data.txt
To better understand what it is that is being scraped and look under parser specs:
[[http://mcnair.bakerinstitute.org/wiki/Talk:Accelerator_Seed_List_(Data)#Match_Potential_Accelerators_with_Cleaned_Cohort_Data_using_The_Matcher_.28Tool.29]]

AbhiB

146

edits

Changes

Scraping GAN Data (view source)

Revision as of 12:14, 19 June 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools