Changes

LP Extractor Protocol (view source)

Revision as of 16:00, 22 March 2019

26 bytes added , 16:00, 22 March 2019

no edit summary

There are two possible classification methods for the processing the text of target HTML pages. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses shallow 2 layer neural networks to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.)

=== Image Processing ===

=== HTML Tree Structure Analysis ===

LasyaRajan

65

edits

Changes

LP Extractor Protocol (view source)

Revision as of 16:00, 22 March 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools