Changes

Industry classifier yang (view source)

Revision as of 16:12, 26 September 2017

120 bytes added , 16:12, 26 September 2017

no edit summary

==Industry Classifier== - by Yang Zhang

Goal: ~~for~~ For each company we want to classify its industry based on its description

Approach:

~~step~~ Step 1: encode the text description into numerical values.~~step~~ Step 2: build a deep neural network to learn to classify.

For step 1, a very naive way is to use the "bag of words " representation. The obvious ~~drawback of this way is~~ drawbacks are that you just ignore the correlations between the words and also their relative orders. So, instead, we use "word2vec " (https://en.wikipedia.org/wiki/Word2vec) this method, where , in short, each word is ~~represented as~~ mapped to a vector which ~~indicate~~ represents how likely the other words ~~will~~ appear around this center word.

For step 2: we have tried 1D /2D convolutional NN (Neural Network) and LSTMRNN (Recurrent Neural Network). ~~Both~~ All the models can achieve 90+% training accuracy and around 60% testing accuracy. Notice that this task is even hard for humans and the baseline of randomly guessing is around 10%, 60% is acceptable. Turning the parameters doesn't help much meaning we might have reached the model's max capability.

Next steps:

Try with longer descriptions and see if more information can provide us better accuracy

Yangzhang

78

edits

Changes

Industry classifier yang (view source)

Revision as of 16:12, 26 September 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools