Changes

Jump to navigation Jump to search
no edit summary
===Exact Matching Units===
The exact matching of units is performed for both the exception units and units of "well-formatted" records, that is records that have comma seperated logical units. Postcodes are extracted as a logical unit if possible first (to generate the PRS_POSTCODE field). Exact matching is case insensitive and units are trimmed of preceeding and subsequent spaces, but otherwise the match must be exact. Units are matched from the bottom to the top, in order of precedence. That is if the string is Unit1, Unit2, Unit3, Postcode; then Unit3 is matched with precedence over Units 2 and 1, and so forth. However, if multiple matches are made for a "Place" and one match is made for the "Area", then if then preference is given to a Place name that is different from the Area name. This is done as many Areas are also places, and more information from the source string is used in this way. For example if the string were Chelsea, London and both Chelsea and London were recorded in the GNS data as Places, but only London was recorded as a Area, then it would be most sensible to record Place=Chelsea, Area=London, and not Place=London, Area=London. The same 'difference preference' is also applied in the rare cases where there are multiple matches on Area but only one on Place.
===Token Matching===
#String2 (token set lenght=1, third set)
#String1 (token set lenght=1, fourth set)
 
As with the Exact Matching, the 'difference preference' for Areas and Places is invoked.
===NGram and LCS Matching===
Anonymous user

Navigation menu