Changes

Jump to navigation Jump to search
::*By now, we only focus on cleaning American patents.
:'''2. Postcode Extractionand State'''
::U.S. post code follows the pattern [five digits - four digits]. In this way, U.S. patents can be extracted by searching for post code with regular expression.
::The state and post code are always together, separated by a space. So we can also extract state information with regular expressiontoo.
::SQL code are in:
::The extracted records are stored in table ptoassigneend_missus.
:'''3. Issues'''::* Different countries have different kinds of post codes. The ultimate post code and zip regex for other countries besides U.S. can be foundhere:
http://stackoverflow.com/questions/578406/what-is-the-ultimate-postal-code-and-zip-regex
 
::* The city feature needs to be standardized. For example, 'GRAND CAYMAN, CAYMAN ISLAND' and 'GRAND CAYMAN' indicate the same city.
 
::* Some state and country features don't match. For example, 'Beijing' - 'UNITED STATES, 10022'.

Navigation menu