Changes

Jump to navigation Jump to search
===Text Processing===
There are two possible obvious classification methods for the processing the text of target HTML pagestextual descriptions. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency (TF-IDF) to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses a shallow 2 layer neural networks network to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.) We are going to be trying both approaches.
==Related Projects==

Navigation menu