Changes

Jump to navigation Jump to search
====Extract Address Information====
 
So far these instructions apply to:
*Records identified as US (by zip code or country feature) ~3.5m records
*Those that well-formatted zip code (5-4) in addrline1, addrline2, and city (not postcode, etc) ~5,000 records
 
To get the record identified as US, we make the table:
*'''ptoassigneend_allus'''
 
This table contains all the U.S. patents extracted from ptoassigneend table. The rule to generate this table is 'country = 'UNITED STATES' ' if any of the features contains U.S. postcode.
 
Noise exists because of postcode errors.
 
The ptoassigneend_allus table may miss some U.S. patents which lose postcode records.
 
SQL code:
CREATE TABLE ptoassigneend_allus AS
SELECT * FROM ptoassigneend
WHERE city ~* '.*\d{5}[-]\d{4}.*' OR addrline1 ~* '.*\d{5}[-]\d{4}.*' OR
addrline2 ~* '.*\d{5}[-]\d{4}.*' OR postcode ~* '.*\d{5}[-]\d{4}.*' OR
country ~* 'UNITED STATES';
SELECT 3572682
 
Table "public.ptoassigneend_allus"
Column | Type | Modifiers
reelno | integer |
frameno | integer |
name | character varying(500) |
addrline1 | character varying(500) |
addrline2 | character varying(500) |
city | character varying(500) |
state | character varying(500) |
country | character varying(500) |
postcode | character varying(80) |
=====Postcode=====
THEN LTRIM(RTRIM(REPLACE(SUBSTRING(SUBSTRING(addrline1, '\s{2,}\w{1,}\s{0,}\w{0,}\s{0,}[,]\s{0,}\w{1,}\s{0,}\w{0,}\s{0,}\d{5} [-]\d{4}'),'.*[,]'),',','')))
Examples(in any of addrline1, addrline2, and city): source |extracted ------------------------------------------------------------------------------------------------|---------------------
800 CHRYSLER DR. EAST AUBURN HILLS, MICHIGAN 48326-2757 |AUBURN HILLS
550 MADISON AVENUE NEW YORK, NEW YORK 10022-3201 |NEW YORK
Example:
source |extracted
---------------------------------------------------------------------------|---------------------
920 DISC DRIVE, SCOTTS VALLEY, CA 95067-0360 |SCOTTS VALLEY
550 MADISON AVENUE, NEW YORK, NY 10022-3201 |NEW YORK
Examples:
BATON, ROUGE, LA 70809-4562 ('BATON ROUGH' is separated by a comma) ST. PAUL, MIN 55133-3427 ('ST. PAUL' contains a dot)
QUINCY STREETARLINGTON, VA 22217-5660 (no space between street and city name :(
E:/McNair/Projects/PatentAddress/Functions.sql
=====Output (Tables)===== *'''ptoassigneend_allus''' This table contains all the U.S. patents extracted from ptoassigneend table. The rule to generate this table is 'country = 'UNITED STATES' ' or any of the features contains U.S. postcode. Noise exists because of postcode errors.  The ptoassigneend_allus table may miss some U.S. patents which lose postcode records. SQL code: CREATE TABLE ptoassigneend_allus AS SELECT * FROM ptoassigneend WHERE city ~* '.*\d{5}[-]\d{4}.*' OR addrline1 ~* '.*\d{5}[-]\d{4}.*' OR addrline2 ~* '.*\d{5}[-]\d{4}.*' OR postcode ~* '.*\d{5}[-]\d{4}.*' OR country ~* 'UNITED STATES'; SELECT 3572682  Table "public.ptoassigneend_allus" Column | Type | Modifiers reelno | integer | frameno | integer | name | character varying(500) | addrline1 | character varying(500) | addrline2 | character varying(500) | city | character varying(500) | state | character varying(500) | country | character varying(500) | postcode | character varying(80) |
*'''ptoassigneend_us_extracted'''

Navigation menu