Changes

Jump to navigation Jump to search
===NGram and LCS Matching===
Longest Common Subsequence (LCS) is an abundantly used fuzzy matching technique. The [http://en.wikipedia.org/wiki/Longest_common_subsequence Longest Common Subsequence page on wikipedia] provides a very detailed background. However, LCS is an matching of two datasets is an NP-Hard problem and extremely processor intensive. To avoid long run -times, LCS matching is done on only a small sub-set of string strings that have met the NGram criteria detailed below.
NGram are letter token strings. Source and reference strings are transformed to include only characters from one of the following numbered sets:
Anonymous user

Navigation menu