Difference between revisions of "Patent Assignment Data Restructure"

From edegan.com
Jump to navigation Jump to search
Line 11: Line 11:
 
::::20090108066  
 
::::20090108066  
 
::::20100110022   
 
::::20100110022   
 
+
::* Design and Reissue patents ('%D%' or '%RE%')
::*(including patent numbers, application numbers, something else that we haven't matched yet).
+
::* alphanumeric character strings
  
 
:3. Clean ptoassignee to extract address components and clean it up.
 
:3. Clean ptoassignee to extract address components and clean it up.

Revision as of 16:59, 2 March 2017

In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:

1. Clean ptoassignment table to unique keys.
2. Clean ptoproperties to remove nonutility patents. The patent numbers currently include:
  • 7 digit patent numbers
  • application numbers
  • unknown numbers that cannot be matched to patent numbers in the patent table
20090108066
20100007288
20090108066
20100110022
  • Design and Reissue patents ('%D%' or '%RE%')
  • alphanumeric character strings
3. Clean ptoassignee to extract address components and clean it up.
4. Check all patent numbers accounted for in ptoassignee_currentusa.
5. Correspondence address clean up.
6. Transform structure of the dataset.