Changes

Jump to navigation Jump to search
::*By now, we only focus on cleaning American patents.
:'''2. Tables'''::*ptoassigneend_allus :::This table contains all the U.S. patents extracted from ptoassigneend table. The rule to extract this table is 'country = 'UNITED STATES' ' or any of the features contains U.S. postcode. :::The SQL code is:  CREATE TABLE ptoassigneend_allus AS SELECT * FROM ptoassigneend WHERE city ~* '.*\d{5}[-]\d{4}.*' OR addrline1 ~* '.*\d{5}[-]\d{4}.*' OR addrline2 ~* '.*\d{5}[-]\d{4}.*' OR postcode ~* '.*\d{5}[-]\d{4}.*' OR country ~* 'UNITED STATES'; SELECT 3572682  Table "public.ptoassigneend_allus" Column | Type | Modifiers reelno | integer | frameno | integer | name | character varying(500) | addrline1 | character varying(500) | addrline2 | character varying(500) | city | character varying(500) | state | character varying(500) | country | character varying(500) | postcode | character varying(80) | :::The ptoassigneend_allus table may miss some U.S. patents. ::*ptoassigneend_missus_final :::State and postcode information are extracted from addrline1, addrline2 and city columns and are stored in ptoassigneend_missus_final table. See section 3 and 4. :::This table is a subset of ptoassigneend_allus table.   Table "public.ptoassigneend_missus_final" Column | Type | Modifiers reelno | integer | frameno | integer | name | character varying(500) | addrline1 | character varying(500) | addrline2 | character varying(500) | city | character varying(500) | state | character varying(500) | country | character varying(500) | postcode | character varying(80) | postcode_city | text | postcode_addr1 | text | postcode_addr2 | text | state_city | text | state_addr1 | text | state_addr2 | text | :::postcode_city is the postcode extracted from 'city'. postcode_addr1 is the postcode extracted from 'addrline1'. postcode_addr2 is the postcode extracted from 'addrline2'.  :::state_city is the state name extracted from 'city'. state_addr1 is the state name extracted from 'addrline1'. state_addr2 is the state name extracted from 'addrline2'. ::*ptoassigneend_city :::City information are extracted from addrline1, addrline2 and city columns and are stored in ptoassigneend_city table. See section 5. :::This table is a subset of ptoassigneend_allus table.  :'''3. Postcode(U.S.)'''
::U.S. post code follows the pattern [five digits - four digits]. In this way, U.S. patents can be extracted by searching for post code with regular expression "(^|\s)\d{5}-\d{4}($|\s)"
E:/McNair/Projects/PatentAddress/RxPostcode.sql
:'''43. State (U.S.)'''
:: There are some patterns that can be used to extract state information.
E:/McNair/Projects/PatentAddress/RxPostcode.sql
:'''54. City (U.S.)'''
:: There are some patterns that can be used to extract city information.
E:/McNair/Projects/PatentAddress/CityPatterns.sql
:'''5. Tables'''
::*ptoassigneend_allus
 
:::This table contains all the U.S. patents extracted from ptoassigneend table. The rule to extract this table is 'country = 'UNITED STATES' ' or any of the features contains U.S. postcode.
 
:::The SQL code is:
 
CREATE TABLE ptoassigneend_allus AS
SELECT * FROM ptoassigneend
WHERE city ~* '.*\d{5}[-]\d{4}.*' OR addrline1 ~* '.*\d{5}[-]\d{4}.*' OR
addrline2 ~* '.*\d{5}[-]\d{4}.*' OR postcode ~* '.*\d{5}[-]\d{4}.*' OR
country ~* 'UNITED STATES';
SELECT 3572682
 
Table "public.ptoassigneend_allus"
Column | Type | Modifiers
reelno | integer |
frameno | integer |
name | character varying(500) |
addrline1 | character varying(500) |
addrline2 | character varying(500) |
city | character varying(500) |
state | character varying(500) |
country | character varying(500) |
postcode | character varying(80) |
 
:::The ptoassigneend_allus table may miss some U.S. patents.
 
::*ptoassigneend_missus_final
 
:::State and postcode information are extracted from addrline1, addrline2 and city columns and are stored in ptoassigneend_missus_final table. See section 3 and 4.
 
:::This table is a subset of ptoassigneend_allus table.
 
Table "public.ptoassigneend_missus_final"
Column | Type | Modifiers
reelno | integer |
frameno | integer |
name | character varying(500) |
addrline1 | character varying(500) |
addrline2 | character varying(500) |
city | character varying(500) |
state | character varying(500) |
country | character varying(500) |
postcode | character varying(80) |
postcode_city | text |
postcode_addr1 | text |
postcode_addr2 | text |
state_city | text |
state_addr1 | text |
state_addr2 | text |
 
:::postcode_city is the postcode extracted from 'city'. postcode_addr1 is the postcode extracted from 'addrline1'. postcode_addr2 is the postcode extracted from 'addrline2'.
 
:::state_city is the state name extracted from 'city'. state_addr1 is the state name extracted from 'addrline1'. state_addr2 is the state name extracted from 'addrline2'.
 
::*ptoassigneend_city
 
:::City information are extracted from addrline1, addrline2 and city columns and are stored in ptoassigneend_city table. See section 5.
 
:::This table is a subset of ptoassigneend_allus table.
:'''6. Issues'''

Navigation menu