Changes

Jump to navigation Jump to search
no edit summary
*This page is a part of series in [[Classifying Names by Culture]]
 
Extracting features from surnames entails encoding the frequency of [http://en.wikipedia.org/wiki/Ngram n-grams] and other features such as the string length. Recall that 1-grams are letters or characters, also called unigrams, 2-grams are called bigrams or digraphs, and 3-grams are called trigrams. In some applications entire words, sentences or other tokens are used as grams.
Anonymous user

Navigation menu