Changes

Jump to navigation Jump to search
290 bytes added ,  15:14, 1 May 2019
no edit summary
==Crawler==
We decided to build a webcrawler using selenium to search for incubators using the domain for angelList companies ''https://angel.co/companies?'' with the ''locations[]='' option appended to the end as a specified state (50 states and the district of columbia).The crawler loaded the page as specified and then clicked the load more button while there were still more results to load. No state exceeded 500 results. Then the crawler collected information for all of the companies listed including state, name of company, a brief description, and the url for the company within angelList. This information was stored in a tab delimitated text file. The master file containing the unique results from the two crawlers contains 1512 results. 
===Crawler By Company Type===
This crawler appended ''company_types[]=Incubator'' to the url so that the companies appearing in the search results were only those with the listed company type of incubator. It yielded 1068 results. The script (angelList_companyTypeIncubator.py) and the data it generated (AngelList_companyTypeIncubator.txt) are on the RDP in the folder AngelList.
===Crawler By Keyword===
This crawler clicked on the search bar and entered the keyword "incubator" so that companies appeared in the results contained the keyword incubator somewhere on their company page. It yield 840 results. The script (angelList_keywordIncubator.py) and the data it generated (AngelList_keywordIncubator.txt) are on the RDP in the folder AngelList.
 
== Master File of Results ==
We performed a diff of the two files to create a master file with only unique results. The master file containing the unique results from the two crawlers contains 1512 results. We decided to use the urls to determine if the results were unique because occasionally the same company would be listed in different states, leading to repetitive results.
83

edits

Navigation menu