Difference between revisions of "Kauffman Incubator Project"

From edegan.com
Jump to navigation Jump to search
Line 3: Line 3:
 
==Project Introduction==
 
==Project Introduction==
  
 +
Our project will create tools to automate the identification and classification of entrepreneurship ecosystem organizations, and the extraction of data from their websites. Specifically, we are looking to 1) classify entrepreneurship support organizations, including high-growth technology incubators, startups, and venture capitalists based on a short textual description; 2) identify the client listing page on an incubator's website; and 3) automate the extraction of information about startups from an incubator's client listing page. At present, we envision using neural networks in these tools, and we expect that the third element will require new computer science.
  
 +
==Project Expected Outputs==
  
==Goals==
+
'''By March 2019'''
 +
# determine at least 4 primary sources of, or, secure licenses to extract ‘seed data’ from these sources, as measured by program records.
 +
 
 +
# have a working prototype of an automated classifier to distinguish between incubators and other entities described in seed data, as measured by program records.
 +
 
 +
# collect data in at least 5 ecosystems, as measured by availability of a dataset.
 +
 
 +
# develop a protocol for the tool to extract client company identity information from incubator websites, as measured by program records.
 +
 
 +
'''By June 2019'''
 +
 
 +
# have a working prototype of a tool to identify client company listings from incubator websites, as measured by program records.
 +
 
 +
# upload the collected data to GitHub, Dataverse, or other publicly accessible web platform for use by a set of academics, as measured by program records.

Revision as of 18:08, 4 March 2019

Welcome!

Project Introduction

Our project will create tools to automate the identification and classification of entrepreneurship ecosystem organizations, and the extraction of data from their websites. Specifically, we are looking to 1) classify entrepreneurship support organizations, including high-growth technology incubators, startups, and venture capitalists based on a short textual description; 2) identify the client listing page on an incubator's website; and 3) automate the extraction of information about startups from an incubator's client listing page. At present, we envision using neural networks in these tools, and we expect that the third element will require new computer science.

Project Expected Outputs

By March 2019

  1. determine at least 4 primary sources of, or, secure licenses to extract ‘seed data’ from these sources, as measured by program records.
  1. have a working prototype of an automated classifier to distinguish between incubators and other entities described in seed data, as measured by program records.
  1. collect data in at least 5 ecosystems, as measured by availability of a dataset.
  1. develop a protocol for the tool to extract client company identity information from incubator websites, as measured by program records.

By June 2019

  1. have a working prototype of a tool to identify client company listings from incubator websites, as measured by program records.
  1. upload the collected data to GitHub, Dataverse, or other publicly accessible web platform for use by a set of academics, as measured by program records.