Difference between revisions of "Ecosystem Organization Classifier"

From edegan.com
Jump to navigation Jump to search
Line 13: Line 13:
  
 
There are two obvious classification methods for the processing the textual descriptions. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency (TF-IDF) to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses a shallow 2 layer neural network to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.) We are going to be trying both approaches.
 
There are two obvious classification methods for the processing the textual descriptions. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency (TF-IDF) to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses a shallow 2 layer neural network to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.) We are going to be trying both approaches.
 +
 +
====Code built already====
 +
 +
We have previously used bag-of-words in the [[Demo Day Page Google Classifier]] and in early versions of the [[Industry Classifier]]. Later versions of the [[Industry Classifier]] were based on our [[Deep Text Classifier]] project.
 +
 +
====First data====
 +
 +
For the first data, we are going to use organization descriptions from Crunchbase. Run this code on '''crunchbase3''' (see [[Crunchbase Database]]):
  
 
==Related Projects==
 
==Related Projects==

Revision as of 16:05, 30 March 2019


Project
Ecosystem Organization Classifier
Project logo 02.png
Project Information
Has title Ecosystem Organization Classifier
Has start date
Has deadline date
Has project status Active
Is dependent on Crunchbase Database, VentureXpert Database
Does subsume Defining Incubators, Incubator Seed Data, Incubators in Five Ecosystems
Copyright © 2019 edegan.com. All Rights Reserved.


Introduction

The purpose of this project is to build a classifier, which takes the description of an ecosystem organization (i.e., a startup, a venture capitalist, an incubator, etc.) and either correctly classifies the organization's type or correctly classifies incubators vs. non-incubators.

Text Processing

There are two obvious classification methods for the processing the textual descriptions. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency (TF-IDF) to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses a shallow 2 layer neural network to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.) We are going to be trying both approaches.

Code built already

We have previously used bag-of-words in the Demo Day Page Google Classifier and in early versions of the Industry Classifier. Later versions of the Industry Classifier were based on our Deep Text Classifier project.

First data

For the first data, we are going to use organization descriptions from Crunchbase. Run this code on crunchbase3 (see Crunchbase Database):

Related Projects

Subsumed Projects: Defining Incubators, Incubator Seed Data, Incubators in Five Ecosystems

This project is dependent on: Crunchbase Database, VentureXpert Database