Changes

Deep Text Classifier (view source)

Revision as of 16:10, 12 October 2017

10 bytes added , 16:10, 12 October 2017

1. One important step in data preprocessing is to encode words (strings) into integers. The solution is to build a dictionary mapping words to their corresponding indices. For example, let's say "hello" is the 17th words in our dictionary and thus "hello" is encoded to 17. Our advanced dictionary is ordered by the words' frequency. Higher the frequency smaller the index. That is you should expect to see "the" and "a" these words with very small indices. Please also notice that 0 and 1 these two indices are not assigned to any words intentionally. The advantage here is that you can easily ignore those very common and meaningless words, like "the", by simply saying I only want to consider words with the indices > 20 for example. Notice that it's possible to encounter words that are not in our dictionary and we will alway assign them to index 1. These words are safe to ignore given that our dictionary is big enough.

2. Saving a pickle file is an very efficient way to retrieve the data so that you don't need to do data preprocessing every time when you want to run your classifier.

==Model Training/Prediction==

==General Guidelines for Tuning the Hyper-Parameters==

Yangzhang

78

edits

Changes

Deep Text Classifier (view source)

Revision as of 16:10, 12 October 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools