Changes

Jump to navigation Jump to search
:'''1. Introduction'''
::*Five features (addrline1, addrline2, city, country, postcode) in the table contain location address information.
::*Features addrline1, addrline2 and city are not cleaned. They have city, country and postcode information.
::*The object of this project is to extract city, country and postcode information from the three features above.
::*'\s{2,} CityName [,] State Postcode'
CASE WHEN addrline1 ~* '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}' THEN LTRIM(RTRIM(REPLACE(SUBSTRING(SUBSTRING(addrline1, '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5} [-]\d{4}'),'.*[,]'),',','')))   800 CHRYSLER DR. EAST AUBURN HILLS, MICHIGAN 48326-2757 |AUBURN HILLS 550 MADISON AVENUE NEW YORK, NEW YORK 10022-3201 |NEW YORK P.O. BOX 15439 WILMINGTON, DE 19850-5439 |WILMINGTON
:::''Some noise exists (just a little).''
::*'[,]\s{1,} CityName [,] State Postcode'
CASE WHEN addrline1 ~* '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}' THEN LTRIM(RTRIM(REPLACE(SUBSTRING(SUBSTRING(addrline1, '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}'),'.*[,]'),',','')))  920 DISC DRIVE, SCOTTS VALLEY, CA 95067-0360 |CASCOTTS VALLEY 550 MADISON AVENUE, NEW YORK, NY 10022-3201 |NYNEW YORK BALLSTON TOWER ONE 800 NORTH QUINCY STREET, ARLINGTON, VA 22217-5660 |VAARLINGTON
::* 'CityName [,] State Postcode' (no leading spaces)
WHEN addrline1 ~* '^\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}' THEN REPLACE(SUBSTRING(SUBSTRING(addrline1, '^\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}'), '.*[,]'),',', '')  PHILADELPHIA, PA 19104-3147 |PAPHILADELPHIA ROCHESTER, NY 14650-2201 |NYROCHESTER
::* 'CityName State(abbreviation) Postcode' (no leading spaces)
WHEN addrline1 ~* '^\w{1,}\s{0,}\w{0,}\s{0,}\w{2}\s{0,}\d{5}[-]\d{4}' THEN regexp_replace(SUBSTRING(addrline1, '^\w{1,}\s{0,1}\w{0,}\s{1,}\w{2}\s{0,}\d{5}[-]\d{4}'), '\w{2}\s\d{5}[-]\d{4}', '')  TARRYTOWN NY 10591-6706 |NYTARRYTOWN
::* 'CityName State (full name) Postcode' (no leading spaces)
:::''This pattern can't be identified because of much noise.''
::* 'CityName POSTCODEPostcode' (no leading spaces)
LITTLE ELM 75068-3787

Navigation menu