Changes

Jump to navigation Jump to search
Some patterns can be used to extract state information.
*a. '[,] State Postcode'
The state and post code are always together, separated by a space. We can extract state information with regular expression
BROOKINGS, SOUTH DAKOTA 57006-0128 | SOUTH DAKOTA
*b. '\s State(abbreviation) Postcode'
'(^|\s)\w{2}\s{1}\d{5}[-]\d{4}'
WAUKEGAN IL 60085-2195 |IL
*c. 'D.C.'
'D[.]C[.]\s\d{5}-\d{4}'
Z:/PatentAddress/
*a. '\s{2,} CityName [,] State Postcode' SQL code:
CASE WHEN addrline1 ~* '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}'
THEN LTRIM(RTRIM(REPLACE(SUBSTRING(SUBSTRING(addrline1, '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5} [-]\d{4}'),'.*[,]'),',','')))
Examples:
800 CHRYSLER DR. EAST AUBURN HILLS, MICHIGAN 48326-2757 |AUBURN HILLS
550 MADISON AVENUE NEW YORK, NEW YORK 10022-3201 |NEW YORK
P.O. BOX 15439 WILMINGTON, DE 19850-5439 |WILMINGTON
:::''Some noise exists (just a little).''
1313 N. MARKET STREET HERCULES PLAZAWILMINGTON, DE 19894-0001
::*b. '[,]\s{1,} CityName [,] State Postcode'
SQL code:
CASE WHEN addrline1 ~* '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}'
THEN LTRIM(RTRIM(REPLACE(SUBSTRING(SUBSTRING(addrline1, '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}'),'.*[,]'),',','')))
Example:
920 DISC DRIVE, SCOTTS VALLEY, CA 95067-0360 |SCOTTS VALLEY
550 MADISON AVENUE, NEW YORK, NY 10022-3201 |NEW YORK
BALLSTON TOWER ONE 800 NORTH QUINCY STREET, ARLINGTON, VA 22217-5660 |ARLINGTON
::* c. 'CityName [,] State Postcode' (no leading spaces)
SQL code:
WHEN addrline1 ~* '^\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}'
THEN REPLACE(SUBSTRING(SUBSTRING(addrline1, '^\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5}[-]\d{4}'), '.*[,]'),',', '')
Examples:
PHILADELPHIA, PA 19104-3147 |PHILADELPHIA
ROCHESTER, NY 14650-2201 |ROCHESTER
::* d. 'CityName State(abbreviation) Postcode' (no leading spaces)
SQL code:
WHEN addrline1 ~* '^\w{1,}\s{0,}\w{0,}\s{0,}\w{2}\s{0,}\d{5}[-]\d{4}'
THEN regexp_replace(SUBSTRING(addrline1, '^\w{1,}\s{0,1}\w{0,}\s{1,}\w{2}\s{0,}\d{5}[-]\d{4}'), '\w{2}\s\d{5}[-]\d{4}', '')
Examples:
TARRYTOWN NY 10591-6706 |TARRYTOWN
::* e. 'CityName State (full name) Postcode' (no leading spaces)
NEW YORK NEW YORK 10022-3201
:::''This pattern can't be identified because of much noise.''
::* f. 'CityName Postcode' (no leading spaces)
LITTLE ELM 75068-3787
OAK RIDGE 37831-6498
:::''This pattern can't be identified because of the noise:''
MASSACHUSETTS 02780-7319 ('State Postcode')
::*Other Noise:
BATON, ROUGE, LA 70809-4562 (the city name is separated by a comma)
ST. PAUL, MIN 55133-3427 (city name contains a dot)
BOX 87703CHICAGO, IL 60680-0703 (no space between street and city name :(
::SQL code is in:
E:/McNair/Projects/PatentAddress/RxCity.sql
:'''5. Output (Tables)'''::*ptoassigneend_allus
:::This table contains all the U.S. patents extracted from ptoassigneend table. The rule to generate this table is 'country = 'UNITED STATES' ' or any of the features contains U.S. postcode.

Navigation menu