Changes

Jump to navigation Jump to search
1,179 bytes added ,  13:47, 21 September 2020
no edit summary
{{Project
|Has project output=Tool
|Has sponsor=McNair Center
|Has title=Deep Text Classifier
|Has owner=Yang Zhang,
|Has start date=September 2017
|Has keywords=Tool
|Has project status=Active
|Does subsume=Industry Classifier,
}}
=Deep Text Classifier=
 
E:\McNair\Projects\Deep Text Classifier
==Problem Description==
'''To advanced users:'''
1. One important step in data preprocessing is to encode words (strings) into integers. The solution is to build a dictionary mapping words to their corresponding indices. For example, let's say "hello" is the 17th words in our dictionary and thus "hello" is encoded to 17. Our advanced dictionary is ordered by the words' frequency. Higher the frequency smaller the index, i.e. That is you should expect to see "the, " and "a, ..." these words in the smallest 10 with very small indices : 2, 3, 4, .... Please also notice that 0 and 1 these two indices are not assigned to any words intentionally. The advantage of doing this here is that you can specify easily ignore those very common and meaningless words , like "the", by simply say saying I only want to consider words with the index indices > 20 for example. And for any word Notice that it's possible to encounter words that is are not in our dictionary, code it with and we will alway assign them to index 1, so again you can easily . These words are safe to ignore itgiven that our dictionary is big enough.
2. Saving a pickle file is an very efficient way to retrieve the data so that you don't need to do data preprocessing every time when you want to run your classifier.
==Model Training/Prediction==
==General Guidelines We write in [https://www.tensorflow.org/ Tensorflow] for Tuning all the classifiers. [https://keras.io/ Keras] is a good wrapper over the HyperTensorflow framework to allow you quickly build up a neural network and train it. ( if you are new to Deep Learning and Tensorflow, please do stay with Keras. ) * '''Embedding''' [https://keras.io/layers/embeddings/ Keras Official Documentation] [https://www.tensorflow.org/tutorials/word2vec Tensorflow : Vector Representations of Words] [https://en.wikipedia.org/wiki/Word2vec Wiki : Word2vec] * '''LSTM''' [http://colah.github.io/posts/2015-Parameters08-Understanding-LSTMs/ A Nice Blog about LSTM] [https://www.tensorflow.org/tutorials/recurrent Tensorflow : Recurrent Neural Networks] [https://keras.io/layers/recurrent/ Keras Official Documentation] ==Summer 2018 Work==Code, data, and attempts to run are located in: E:

Navigation menu