Changes

Jump to navigation Jump to search
no edit summary
{{McNair Projects
|Has image=
|Has title=Patent Data Restructure
|Has owner=Marcela Interiano
|Has deadline=201705
|Has keywords=Patent
|Is billed to=
|Has notes=
|Has project status=Active
|Is dependent on=
|Depends upon it=
}}
In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:
Applied similar methods to filter out patent records from Japan. The post code in Japan follows pattern [three digits- four digits].
The post code extracted is quite accurate for almost all the countriesU.S., and so is the country information (and the state for U.S.).
The problem is that the city information extracted is not quite good. It messes up with street names. One approach to increase the accuracy is to list all the possible cities in each country, and then match the address columns to these cities, which is time consuming.

Navigation menu