Changes

Jump to navigation Jump to search
no edit summary
Postal codes, known as ZIP codes in the U.S., vary by national jurisdiction and for historical reasons. The following postal codes formats are posted for reference, as are some simple regular expressions that should safely match most variants:
*Australia: ([http://en.wikipedia.org/wiki/Postcodes_in_australia Sourced from Wikipedia]): NNNN where N is a numeric. Australian postcodes should appear at the end of addresses, and are frequently preceded by the acronym for the territory/state (specifically: NSW, ACT, VIC, QLD, SA, WA, TAS, and NT). In the patent data variations include: NNNN, AU-NNNN, XXX NNNN, Xxx. NNNN X.X.X. NNNN, XXXNNNN, where XXX indicate the two or three characters of the acronym.
**Simple Regex: <tt>(NSW|Nsw|ACT|Act|VIC|Vic|QLD|Qld|SA|Sa|WA|Wa|TAS|Tas|NT|Nt|Au|AU)?(\w\.\w\.\w\.)?\.?\s?-?\s?\d{4,4}</tt>
*United Kingdom ([http://en.wikipedia.org/wiki/UK_postcodes Sourced from Wikipedia]): A9 9AA, A99 9AA, A9A 9AA, AA9 9AA, AA99 9AA, AA9A 9AA. Simple Regex: <tt>([A-Z]{1,2}[0-9]{1,2}[A-Z]{0,1}\s[0-9][A-Z]{2,2})</tt>
*France ([http://en.wikipedia.org/wiki/Postal_codes_in_France Sourced from Wikipedia]): NNNMM or NNMMM where NN and NNN are numerics indicating the préfectures and sous-préfectures, respectively, and MMM are other numerics. However the following formats also appear frequently in the patent data: F NNNNN, F-NNNNN, F - NNNNN, F- NN NNN, FNNNNN, F-NN NNN, F-NN, FR - NNNNN, FR NNNNN, FR-NNNNN, (NNNNN), (NN), - NNNNN, -NNNNN-, -NN, NN, NN - N-NNNNN, NNNN, NN NNN, NN.NNN, NNN/N, NN., F. NNNNN, F.NNNNN, "FRNN,NNN". French postal codes most often, but not exclusively, occur at the start of the address string. If there are fewer than 5 digits, trailing zeros should be added.
*Spain ([http://en.wikipedia.org/wiki/List_of_postal_codes_in_Spain Sourced from Wikipedia]): Post 1976 Spanish postcodes are five digits of the format NNMMM, where NN indicates the province (01-52) or a reserved code (e.g. 80 for P.O. boxes). In the patent data Spansish postcodes are comparatively well behaved, with the following standard variants appearing: NNNNN, NNN NN, NNNN, NN NNNNN, NN- NNNNN, -NNNNN, NNNNN-, "NN, NNNNN", NNN, NN, N NNNNN-, NN-NN, NN-NN NNNNN, NNNNN-IBI, E-NNNNN, E-NNNN, E - NNNNN, E--NNNNN, ES-NNNNN.
**Simple Regex: <tt>(E|ES|)\d{0,2},?\s?-{0,2}\s?\d{2,5}-?(IBI|)</tt>
*Switzerland ([http://en.wikipedia.org/wiki/Postal_codes_in_Switzerland_and_Liechtenstein Sourced from Wikipedia]): Swiss (and Lictenstein) postcodes are hierarchical four-digit numbers of the form District+Area+Route+PONumber, where districts are numbered West to East (would you expect less from the Swiss?). In the patent data Swiss postcodes are comparatively immaculately behaved with the following formats appearing: NNNN, NNNN-, CH-NNNN, CH - NNNN, CH NNNN, CHNNN, CH- NNNN, CHNNN. Though the "H" may sometimes be lowercase.Test
**Simple Regex: <tt>(CH|Ch|)\s?-?\s?\d{3,4}-?</tt>
*Australia: ([http://en.wikipedia.org/wiki/Postcodes_in_australia Sourced from Wikipedia]): NNNN where N is a numeric. Australian postcodes should appear at the end of addresses, and are frequently preceded by the acronym for the territory/state (specifically: NSW, ACT, VIC, QLD, SA, WA, TAS, and NT). In the patent data variations include: NNNN, AU-NNNN, XXX NNNN, Xxx. NNNN X.X.X. NNNN, XXXNNNN, where XXX indicate the two or three characters of the acronym.
**Simple Regex: <tt>(NSW|Nsw|ACT|Act|VIC|Vic|QLD|Qld|SA|Sa|WA|Wa|TAS|Tas|NT|Nt|Au|AU)?(\w\.\w\.\w\.)?\.?\s?-?\s?\d{4,4}</tt>
The Match::PostalCodes.pm perl module provides a method to extract a postcode from a text string for a given ISO3166 code. The simple regular expressions listed above are not used verbatim, as more sophisticed techniques can be employed on per country basis.
Test
==The Matching Process==
Anonymous user

Navigation menu