Changes

Jump to navigation Jump to search
1,001 bytes added ,  15:25, 26 September 2017
no edit summary
Industry Classifier - by Yang Zhang
 
Goal: for each company we want to classify its industry based on its description
 
Approach:
step 1: encode the text description into numerical values
step 2: build a deep neural network to learn to classify
 
For step 1, a very naive way is to use the bag of words representation. The obvious drawback of this way is that you ignore the correlations between the words and also their orders. So, instead, we use word2vec this method, where each word is represented as a vector which indicate how likely other words will appear around this center word.
 
For step 2: we have tried 1D convolutional and LSTM. Both can achieve 90+% training accuracy and around 60% testing accuracy. Notice that this task is even hard for humans and the baseline of randomly guessing is around 10%, 60% is acceptable. Turning the parameters doesn't help much meaning we might have reached the model's max capability.
 
Next steps:
Try with longer descriptions and see if more information can provide us better accuracy
78

edits

Navigation menu