Changes

Jump to navigation Jump to search
3,022 bytes added ,  13:31, 30 September 2020
no edit summary
{{Team Member
|Went to school=Georgetown
|Has team sponsor=Kauffman Incubator Project
|Has name=Anne Freeman
|Has headshot=Headshot_small_2018.png
|Has team position=ResearcherStudent|Has job title=Tech Team
|Has team status=Active
|Has or doing degree=Bachelor
|Has academic major=Computer ScienceCS|Has skills=data analysisSQL, databases, machine learningSome ML
|Has email=aof7@georgetown.edu
}}
I'm working worked on various aspects of the Kauffman Incubator Projectduring the Spring of 2019.
My == What is an incubator? ==One of my initial goal is goals was to define data fields within what an excel spreadsheet/relational database for entrepreneurship ecosystem organizationsincubator is. I began by researching how other experts in the field defined incubators and how this research group in the past approached this issue. These fields should We developed a working definition which can be numericfound at our other wiki page [[Defining Incubators]]. We began finding ways to define an incubator, yes/nosuch it's purpose within the larger entrepreneurship system, or categories with a defined number the duration of optionstime companies spend at an incubator and the application process companies undergo. We also sought to differentiate between an incubator and an accelerator and to define high-growth technology incubators.
'''Qualities == What attributes define an incubator? ==From the definition of incubators, I began exploring the Data'''* Distinguish between incubators and other baseline attributes that should be collected on all entrepreneurship ecosystem organizations* Distinguish between in order to properly classify incubators, specifically high -growth tech technology incubators . There is a more in depth description of this work on the [[Formulate baseline attributes]] page. We created a working list of attributes focused on characterizing the timeline, affiliation, services provided, type of company incubated, and other incubators* Collected using automated/web scraping methodskey demographics.
'''Plan of Action'''== Evaluating Potential Data Sources ==* How do experts differentiate We evaluated potential data sources to select databases that would provide sufficient information on incubators . There is more information on this project on the page [[Incubator Seed Data]]. We started by going through the [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that the previous research group used. Then we performed google searches for other viable sources and other entrepreneurship organizationsevaluated them in a systematic manner. We decided that crunchbase, INBIA, a google crawler, such and AngelList were the best sources as accelerators?the data could be collected in an automated manner, and they were the databases with the most information on incubators. :* Background pages on wiki:* Internet research== INBIA Data ==* Do these definitions match Using a list of URLS from the INBIA website, I created a web crawler that used beautiful soup to directly request the urls, parse the data?information, and store it in a tab separated text file. There is more information about this process at the [[INBIA]] page. == Google Crawler ==:*McNair/Projects/Accelerators → see past data on accelerators Using a list of locations, I wrote a google crawler using beautiful soup that created urls for google and directly requested and parsed the results. This crawler was often blocked by google so I switched to using selenium. The selenium crawler searched google for the city and the key word "incubator" it received 10 pages of results and incubatorsstored the city, title, and url in a tab separated text file. There is more information about this process at the [[Google Crawler]] page.  == AngelList Data ==* How did Using selenium, I created a crawler to search the previous group approach this problem?:* Try angelList database using the keyword "incubator" and the state. I also created a crawler to search the angelList database for companies with the type "incubator" and make sense the state. The crawlers would click the "more" button at the bottom of the page to view all of the results and then save them in a tab separated text file. I performed a diff on the results to create a masterFile containing only unique entries. Then I used selenium to open the URL for the incubator within the angelList website and download it to a local folder. Then using beautfulsoup I parsed the static HTML files for information on the company, the employees, and the portfolio. There is more information on this process on the shared drive (McNair/Projects/Accelerators)[[AngelList Database]] page. == Things that still need work ==The selenium google crawler pushes the urls rather than typing them in to google and hitting enter. It also collects the same page 10 times rather than selecting the next page. The AngelList Data script to parse the employees is not collecting information from all the incubators. The script needs to be adjusted

Navigation menu