Changes

Jump to navigation Jump to search
=== Clean Address: more patterns ===
 
After identifying clean data, the remaining records are cleaned in the following way.
====Clean Postcode====
Identifying five-digit postcode is risky because of the existence of P.O. BOX #, SUITE #, etc.  One option is to identify state and postcode together with the following SQL code: (take 'addrline1' as an example)
SELECT addrline1
767 FIFTH AVE., NEW YORK, NY 10153 | 10153
Even excluding the P.O. BOX # and SUITE #, noise still exists.
The details and SQL function are in E:\McNair\Projects\PatentAddress\Cleang_Step2.sql
====Clean city====
 
'city' can be cleaned using the following patterns.
*Pattern 1: 'city' is like 'city name, state ID'

Navigation menu