Changes

Jump to navigation Jump to search
:'''1. Introduction'''
::*Five features (addrline1, addrline2, city, country, postcode) in the table contain address location information.
::*Features addrline1, addrline2 and city are not cleaned. They have city, country and postcode information.
::*The object of this project is to extract city, country and postcode information from the three features above.
:: There are some patterns that can be used to extract state information.
::*'[,] STATE POSTCODEState Postcode'
:::The state and post code are always together, separated by a space. So we We can also extract state information with regular expression
'([,]|[.])\s\w{2,}\s{0,}\w{0,}\s{1,}\d{5}[-]\d{4}'
BROOKINGS, SOUTH DAKOTA 57006-0128 | SOUTH DAKOTA
::*'\s STATEState(Abbreviationabbreviation) POSTCODEPostcode'
'(^|\s)\w{2}\s{1}\d{5}[-]\d{4}'
:::For example:
NEW YORK NY 10022-3201 |NY WAUKEGAN IL 60085-2195 |IL
::*'D.C.'
'D[.]C[.]\s\d{5}-\d{4}'
::The extracted state records are stored in the table ptoassigneend_missus_final.
::SQL code is in:
Z:/PatentAddress/
::*'\s{2,} CITY NAMECityName [,]STATE POSTCODEState Postcode'
800 CHRYSLER DR. EAST AUBURN HILLS, MICHIGAN 48326-2757
P.O. BOX 15439 WILMINGTON, DE 19850-5439
:::Noise Some noise exists:(just a little).
1313 N. MARKET STREET HERCULES PLAZAWILMINGTON, DE 19894-0001
::*'[,]\s{1,} CityName [,] State Postcode'
::* '[ 920 DISC DRIVE,]\s{1SCOTTS VALLEY,} CITY NAME[CA 95067-0360 |CA 550 MADISON AVENUE,] STATE POSTCODE' NEW YORK, NY 10022-3201 |NY BALLSTON TOWER ONE 800 NORTH QUINCY STREET, ARLINGTON, VA 22217-5660 |VA
920 DISC DRIVE::* 'CityName [, SCOTTS VALLEY, CA 95067-0360 550 MADISON AVENUE, NEW YORK, NY 10022-3201 BALLSTON TOWER ONE 800 NORTH QUINCY STREET, ARLINGTON, VA 22217-5660] State Postcode' (no leading spaces)
::* 'CityName [ PHILADELPHIA,] STATE POSTCODE' (no leading spaces)PA 19104-3147 |PA ROCHESTER, NY 14650-2201 |NY
PHILADELPHIA, PA 19104-3147 ROCHESTER, NY 14650-2201 ::* 'CityName State(abbreviation) Postcode' (no leading spaces)
::* 'CityName STATE(abbreviation) POSTCODE' (no leading spaces) TARRYTOWN NY 10591-6706 |NY
TARRYTOWN NY 10591-6706::* 'CityName State (full name) Postcode' (no leading spaces)
::* 'CityName STATE(Full name) POSTCODE' (no leading spaces) NEW YORK NEW YORK 10022-3201
NEW YORK NEW YORK 10022-3201 This pattern can't be identified because of the much noise.
::* 'CityName POSTCODE' (no leading spaces)
OAK RIDGE 37831-6498
This pattern can't be identified because of the noise :
MASSACHUSETTS 02780-7319 ('STATE POSTCODEState Postcode')
::*Other Noise: 800 NORTH QUINCY STREETARLINGTON, VA 22217-5660 (street name instead of city name)
BATON, ROUGE, LA 70809-4562 (the city name is separated by a comma)
ST. PAUL, MIN 55133-3427 (special) IRVINE CA 92713-9658(no comma between city and statename contains a dot) QUINCY STREETARLINGTON, VA 22217-5660 (no space between street and city name :( BOX 87703CHICAGO, IL 60680-0703(no space between street and city name :(
:::SQL code is in:

Navigation menu