Changes

Jump to navigation Jump to search
*By now, we only focus on and clean American patents.
Methods: a. Postcode Extraction:
U.S. post code follows the pattern [five digits] or [five digits - four digits]. U.S. patents can be extracted by searching for post code following these patterns using regular expression. Some other countries also use [five digits] for post code, so only post codes following [five digits - four digits] are extracted.
E:/McNair/Projects/PatentAddress/
The extracted records are stored in table ptoassigneend_missus.
 
U.S. postcode is like [five digits - four digits]. In this way, I created a table named 'ptoassigneend_missus' to store records containing [five digits - four digits]. Then, using the method above to extract useful address information.
 
The SQL code is as follows:
 
UPDATE ptoassigneend_missus
SET postcode_city= SUBSTRING(city, '\d{5}[-]\d{4}')
WHERE city ~* '.*\d{5}[-]\d{4}.*';
 
UPDATE ptoassigneend_missus
SET state_city = SUBSTRING(city, '\w{2,}\s{0,}\d{5}[-]\d{4}')
WHERE city ~* '.*\s{1,}\w{2}\s{0,}\d{5}[-]\d{4}.*';
 
UPDATE ptoassigneend_missus
SET city_city = SUBSTRING(city, '\w{3,}\s{0,1}\w{0,}\s{0,}[,]')
WHERE city ~* '.*\w{2}\s{1,}\d{5}[-]\d{4}.*';
 
From addrline1, addrline2 and city, I extracted city, post code and state respectively and stored in 'city_addr1', 'city_addr2', 'city_city'.
Applied similar methods to filter out patent records from Japan. The post code in Japan follows pattern [three digits- four digits].

Navigation menu