Difference between revisions of "Anne Freeman"

From edegan.com
Jump to navigation Jump to search
Line 9: Line 9:
 
|Has email=aof7@georgetown.edu
 
|Has email=aof7@georgetown.edu
 
}}
 
}}
I'm working on the Kauffman Incubator Project.  
+
I'm worked on various aspects of the Kauffman Incubator Project during the Spring of 2019.
  
My initial goal is to define data fields within an excel spreadsheet/relational database for entrepreneurship ecosystem organizations. These fields should be numeric, yes/no, or categories with a defined number of options.  
+
== What is an incubator? ==
 +
One of my initial goals was to define what an incubator is. I began by researching how other experts in the field defined incubators and how this research group in the past approached this issue. We developed a working definition which can be found at our other wiki page [[Defining Incubators]]. We began finding ways to define an incubator, such it's purpose within the larger entrepreneurship system, the duration of time companies spend at an incubator and the application process companies undergo. We also sought to differentiate between an incubator and an accelerator and to define high-growth technology incubators.
  
'''Qualities of the Data'''
+
== What attributes define an incubator? ==
* Distinguish between incubators and other entrepreneurship organizations
+
From the definition of incubators, I began exploring the baseline attributes that should be collected on all entrepreneurship ecosystem organizations in order to properly classify incubators, specifically high-growth technology incubators. There is a more in depth description of this work on the [[Formulate baseline attributes]] page. We created a working list of attributes focused on characterizing the timeline, affiliation, services provided, type of company incubated, and key demographics.
* Distinguish between high growth tech incubators and other incubators
 
* Collected using automated/web scraping methods
 
  
'''Plan of Action'''
+
== Evaluating Potential Data Sources ==
* How do experts differentiate incubators and other entrepreneurship organizations, such as accelerators?
+
We evaluated potential data sources to select databases that would provide  sufficient information on incubators. There is more information on this project on the page [[Incubator Seed Data]]. We started by going through the [[Accelerator_Seed_List_(Data)#Sources | Accelerator Data Sources]] that the previous research group used. Then we performed google searches for other viable sources and evaluated them in a systematic manner. We decided that crunchbase, INBIA, a google crawler, and AngelList were the best sources as the data could be collected in an automated manner, and they were the databases with the most information on incubators.
:* Background pages on wiki
+
 
:* Internet research
+
== INBIA Data ==
* Do these definitions match the data?
+
Using a list of URLS from the INBIA website, I created a web crawler that used beautiful soup to directly request the urls, parse the information, and store it in a tab separated text file. There is more information about this process at the [[INBIA]] page.
:*McNair/Projects/Accelerators → see past data on accelerators and incubators,  
+
 
* How did the previous group approach this problem?
+
== Google Crawler ==
:* Try and make sense of the files on the shared drive (McNair/Projects/Accelerators)
+
Using a list of locations, I wrote a google crawler using beautiful soup that created urls for google and directly requested and parsed the results. This crawler was often blocked by google so I switched to using selenium. The selenium crawler searched google for the city and the key word "incubator" it received 10 pages of results and stored the city, title, and url in a tab separated text file. There is more information about this process at the [[Google Crawler]] page.
 +
 
 +
== AngelList Data ==
 +
Using selenium, I created a crawler to search the angelList database using the keyword "incubator" and the state. I also created a crawler to search the angelList database for companies with the type "incubator" and the state. The crawlers would click the "more" button at the bottom of the page to view all of the results and then save them in a tab separated text file. I performed a diff on the results to create a masterFile containing only unique entries. Then I used selenium to open the URL for the incubator within the angelList website and download it to a local folder. Then using beautfulsoup I parsed the static HTML files for information on the company, the employees, and the portfolio. There is more information on this process on the [[AngelList Database]] page.

Revision as of 11:13, 29 May 2019

Team Member
Tech Team"Tech Team" is not in the list (Faculty, Staff, Student) of allowed values for the "Has team position" property.
Anne Freeman
Headshot small 2018.png
Information
Status Active
Degree Bachelor
Major CS
Skills (Students only) SQL, Some ML
Email aof7@georgetown.edu
Projects AngelList Database, Defining Incubators, Ecosystem Organization Classifier, Google Crawler, INBIA, Incubator Classifier - Formulate baseline attributes, Incubator Seed Data
Copyright © 2019 edegan.com. All Rights Reserved.

I'm worked on various aspects of the Kauffman Incubator Project during the Spring of 2019.

What is an incubator?

One of my initial goals was to define what an incubator is. I began by researching how other experts in the field defined incubators and how this research group in the past approached this issue. We developed a working definition which can be found at our other wiki page Defining Incubators. We began finding ways to define an incubator, such it's purpose within the larger entrepreneurship system, the duration of time companies spend at an incubator and the application process companies undergo. We also sought to differentiate between an incubator and an accelerator and to define high-growth technology incubators.

What attributes define an incubator?

From the definition of incubators, I began exploring the baseline attributes that should be collected on all entrepreneurship ecosystem organizations in order to properly classify incubators, specifically high-growth technology incubators. There is a more in depth description of this work on the Formulate baseline attributes page. We created a working list of attributes focused on characterizing the timeline, affiliation, services provided, type of company incubated, and key demographics.

Evaluating Potential Data Sources

We evaluated potential data sources to select databases that would provide sufficient information on incubators. There is more information on this project on the page Incubator Seed Data. We started by going through the Accelerator Data Sources that the previous research group used. Then we performed google searches for other viable sources and evaluated them in a systematic manner. We decided that crunchbase, INBIA, a google crawler, and AngelList were the best sources as the data could be collected in an automated manner, and they were the databases with the most information on incubators.

INBIA Data

Using a list of URLS from the INBIA website, I created a web crawler that used beautiful soup to directly request the urls, parse the information, and store it in a tab separated text file. There is more information about this process at the INBIA page.

Google Crawler

Using a list of locations, I wrote a google crawler using beautiful soup that created urls for google and directly requested and parsed the results. This crawler was often blocked by google so I switched to using selenium. The selenium crawler searched google for the city and the key word "incubator" it received 10 pages of results and stored the city, title, and url in a tab separated text file. There is more information about this process at the Google Crawler page.

AngelList Data

Using selenium, I created a crawler to search the angelList database using the keyword "incubator" and the state. I also created a crawler to search the angelList database for companies with the type "incubator" and the state. The crawlers would click the "more" button at the bottom of the page to view all of the results and then save them in a tab separated text file. I performed a diff on the results to create a masterFile containing only unique entries. Then I used selenium to open the URL for the incubator within the angelList website and download it to a local folder. Then using beautfulsoup I parsed the static HTML files for information on the company, the employees, and the portfolio. There is more information on this process on the AngelList Database page.