Changes

Jump to navigation Jump to search
no edit summary
==The Classifier==
===Input (Features)===
The input (features) right now is the frequency of X_NUMBER of words appearing in each documents. The word choice is hand selected. This is the naive bag-of-word approach.
Idea: Create a matrix with the first col being the file BiBTex, and the following columns are the words, and the value at (file, word) is the frequency of that word in the file.
Then, split the matrix into an array of row vectors, and each vector is then feed into the RNN)
This seems to not give really high accuracy with our LSTM RNN, so I will consider a word2vec approach
==Reading resources==
http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf
197

edits

Navigation menu