Difference between revisions of "Ecosystem Organization Classifier"
| Line 9: | Line 9: | ||
The purpose of this project is to build a classifier, which takes the description of an ecosystem organization (i.e., a startup, a venture capitalist, an incubator, etc.) and either correctly classifies the organization's type or correctly classifies incubators vs. non-incubators. | The purpose of this project is to build a classifier, which takes the description of an ecosystem organization (i.e., a startup, a venture capitalist, an incubator, etc.) and either correctly classifies the organization's type or correctly classifies incubators vs. non-incubators. | ||
| + | |||
| + | ===Text Processing=== | ||
| + | |||
| + | There are two possible classification methods for the processing the text of target HTML pages. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses shallow 2 layer neural networks to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.) | ||
==Related Projects== | ==Related Projects== | ||
Revision as of 13:51, 30 March 2019
| Ecosystem Organization Classifier | |
|---|---|
| Project Information | |
| Has title | Ecosystem Organization Classifier |
| Has start date | |
| Has deadline date | |
| Has project status | Active |
| Is dependent on | Crunchbase Database, VentureXpert Database |
| Does subsume | Defining Incubators, Incubator Seed Data, Incubators in Five Ecosystems |
| Copyright © 2019 edegan.com. All Rights Reserved. | |
Introduction
The purpose of this project is to build a classifier, which takes the description of an ecosystem organization (i.e., a startup, a venture capitalist, an incubator, etc.) and either correctly classifies the organization's type or correctly classifies incubators vs. non-incubators.
Text Processing
There are two possible classification methods for the processing the text of target HTML pages. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses shallow 2 layer neural networks to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.)
Related Projects
Subsumed Projects: Defining Incubators, Incubator Seed Data, Incubators in Five Ecosystems
This project is dependent on: Crunchbase Database, VentureXpert Database