Changes

Deep Text Classifier (view source)

Revision as of 15:49, 12 October 2017

34 bytes added , 15:49, 12 October 2017

'''To advanced users:'''

1. One important step in data preprocessing is to encode words (strings) into integers. The solution is to build a dictionary mapping words to their corresponding indices. (For example, let's say "hello" is the 17th words in ~~the~~ our dictionary~~, so~~ and thus "hello" -> is encoded to 17) . Our dictionary is ordered by the words' frequency. Higher the frequency smaller the index, i.e. you should expect to see "the, a, ..." these words in the smallest 10 indices : 2, 3, 4, .... Please also notice that 0 and 1 these two indices are not assigned to any words intentionally. The advantage of doing this is that you can specify easily ignore those common and meaningless words by simply say I want to consider words with the index > 20 for example. And for any word that is not in our dictionary, code it with index 1, so again you can easily ignore it.

2. Saving a pickle file is an very efficient way to retrieve the data so that you don't need to do preprocessing every time you want to run your classifier.

Yangzhang

78

edits

Changes

Deep Text Classifier (view source)

Revision as of 15:49, 12 October 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools