Changes

Jump to navigation Jump to search
1,523 bytes added ,  12:14, 19 June 2017
Created page with "=Overview= The goal of this project was to acquire accelerator data from [http://gan.co/members] <br> ==Desired Fields== 1. Name 2. Country/ Location 3. Seed Money 4...."
=Overview=
The goal of this project was to acquire accelerator data from [http://gan.co/members] <br>
==Desired Fields==
1. Name
2. Country/ Location
3. Seed Money
4. Equity
5. Funding Raised
6. Companies
7. Companies Funded
8. Companies Funding Raised
9. Exits
10. Exit Funding
11. Employees
12. Mentors
13. Years

==Scraper Use==
The scraper is implemented in using BeautifulSoup, a Python based web scraper. <br>
The scraper requires the following libraries:
1. Pandas
2. BeautifulSoup
It takes in as input the full HTML file of the website, converts it to "soup" object and scrapes the resulting html file. <br>
The items from the scrape are inputted into a Pandas DataFrame object, which is then converted to a tab-separated text file. <br>
When converting to text file, make sure to set the explicit encoding = "utf-8", sep = "\t", and index = False <br>
This ensures that the resulting strings are converted properly, the file is tab separated, and the data looks clean, respectively. <br>

==Code Location and Necessary Files==
The code and the resulting text file are located here:
E:\McNair\Projects\Accelerators\Web Scraping for Accelerators
The html file to scrape is located here:
E:\McNair\Projects\Accelerators\GAN_data.txt
To better understand what it is that is being scraped and look under parser specs:
[[http://mcnair.bakerinstitute.org/wiki/Talk:Accelerator_Seed_List_(Data)#Match_Potential_Accelerators_with_Cleaned_Cohort_Data_using_The_Matcher_.28Tool.29]]
146

edits

Navigation menu