Changes

Normalizing Surnames (view source)

Revision as of 23:58, 16 June 2009

368 bytes added , 23:58, 16 June 2009

no edit summary

==Encodings==

Surnames can be represented in many different encodings. For comparison purposes, it is convenient to have surnames encoded in single standard encoding, such as the Latin alphabet.

The Latin alphabet offers the advantage of simplicity. There are only 26 letter characters, A to Z, provided one ignores case (upper or lower). There are no ligatures or diacritics. As n-grams have symbols<sup>n</sup> permutations, an encoding with a large number of symbols will result in a much higher number of dimensions for the data for even a small value of n.

==Tussenvoegsel==

Anonymous user

imported>Ed

Changes

Normalizing Surnames (view source)

Revision as of 23:58, 16 June 2009

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools