Changes

Jump to navigation Jump to search
**Iterate!
====Introduction====
Currently done:
*For now, we only focus on American patents.
====Extract Address Information====
So far these instructions apply to:
postcode | character varying(80) |
=====Postcode=====
U.S. postcode should follow the pattern five digits - four digits. In this way, U.S. patents can be extracted by searching for postcode with regular expression.
E:/McNair/Projects/PatentAddress/Functions.sql
=====State=====
The following patterns can be used to extract state information.
E:/McNair/Projects/PatentAddress/Functions.sql
=====City=====
The following patterns can be used to extract city information.
E:/McNair/Projects/PatentAddress/Functions.sql
====Output (Tables)====
*'''ptoassigneend_us_extracted'''
E:/McNair/Projects/PatentAddress/Functions.sql
====Clean Address Info (Master Table)====
=====Introduction=====
As mentioned in Section 2.2.2, city, state and postcode info are extracted from 'addrline1', 'addrline2' and 'city'. Original table also contains 'postcode', 'city' and 'state'. In this way, we have four candidates for city, state and postcode.
'''The object of this section is to pick out the best postcode, city and state for each record, and create a Master Table with original features and cleaned postcode, city, and state.'''
=====Postcode=====
Reminder: 'postcode_city' is the postcodes extracted from 'city'; 'postcode_addr1' is the postcodes extracted from 'addrline1'; 'postcode_addr2' is the postcodes extracted from 'addrline2'.
All the cleaned postcodes for U.S. patents are stored in ptoassigneend_us_cleaned (see feature postcode_cleaned).
=====State=====
Reminder: 'state_city' is the states extracted from 'city'; 'state_addr1' is the states extracted from 'addrline1'; 'state_addr2' is the states extracted from 'addrline2'.
Note: We might want to convert state names to standard codes.
=====City=====
Reminder: 'city_city' is the city info extracted from 'city'; 'city_addr1' is the city info extracted from 'addrline1'; 'city_addr2' is the city info extracted from 'addrline2'.
All the cleaned cities for U.S. patents are stored in ptoassigneend_us_cleaned. (see feature city_cleaned)
=====Output (Table)=====
*ptoassigneend_us_cleaned
Feature postcode_cleaned, postcode_f5_cleaned (first five digits), state_cleaned and city_cleaned are cleaned postcode, state and city info.
====Functions to Simplify SQL Code====
===== Extraction =====
* Postcode
E:/McNair/Projects/PatentAddress/Functions.sql
===== Cleaning =====
* Clean Postcode
$city is the feature city.
====Issues====
*'''Inconsistency between 'addrline' and 'country' '''

Navigation menu