Changes

Jump to navigation Jump to search
There are two obvious classification methods for the processing the textual descriptions. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency (TF-IDF) to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses a shallow 2 layer neural network to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.) We are going to be trying both approaches.
 
====Code built already====
 
We have previously used bag-of-words in the [[Demo Day Page Google Classifier]] and in early versions of the [[Industry Classifier]]. Later versions of the [[Industry Classifier]] were based on our [[Deep Text Classifier]] project.
 
====First data====
 
For the first data, we are going to use organization descriptions from Crunchbase. Run this code on '''crunchbase3''' (see [[Crunchbase Database]]):
==Related Projects==

Navigation menu