Changes

Deep Text Classifier (view source)

Revision as of 15:35, 12 October 2017

652 bytes added , 15:35, 12 October 2017

For data preprocessing, we adopt the same standard as in the [http://ai.stanford.edu/~amaas/data/sentiment/ IMDB] dataset.

'''To general users:''' your input file (usually a single ".txt" file contains many examples each as a row) will be split into a training set (80% by default) and a testing set (20% by default). The labels you want to predict will be the folder names. The content (usually a block of text) of the examples will go into separate ".txt" files. To run the script, you basically need to specify the following:

1. "File Name" : without the ".txt" extension,

2. "Expected Columns" : total number of columns in the input file

4. "Label Index" : the column index of the label

The script will generate a pickle file with an ".pkl" extension and the name will be the same as your input. Please change the name properly to indicate the ~~target~~ label information as have been discussed above. And ~~make sure~~ place this pickle file is under the same directory with your classification code, i.e. "classification_MMM_LLL.py" '''To advanced users:''' one important step in data preprocessing is to convert words (strings) to integers. That is we need to build a dictionary mapping words to their corresponding indices. Our dictionary is ordered by the words' frequency. Higher the frequency smaller the index, i.e. you should expect to see "the, a, ..." these words in the smallest 10 indices : 2, 3, 4, .... Please also notice that 0 and 1 these two indices are not assigned to any words intentionally. The advantage of doing this is that you can specify easily ignore those common and meaningless words by simply say I want to consider words with the index > 20 for example.

==Model Training/Prediction==

==General Guidelines for Tuning the Hyper-Parameters==

Yangzhang

78

edits

Changes

Deep Text Classifier (view source)

Revision as of 15:35, 12 October 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools