Difference between revisions of "Patent Assignment Data Restructure"

Revision as of 16:59, 2 March 2017

In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:

1. Clean ptoassignment table to unique keys.

2. Clean ptoproperties to remove nonutility patents. The patent numbers currently include:

20090108066

20100007288

20090108066

20100110022

3. Clean ptoassignee to extract address components and clean it up.

4. Check all patent numbers accounted for in ptoassignee_currentusa.

5. Correspondence address clean up.

6. Transform structure of the dataset.

@@ Line 11: / Line 11: @@
 ::::20090108066
 ::::20100110022
+::* Design and Reissue patents ('%D%' or '%RE%')
-::*(including patent numbers, application numbers, something else that we haven't matched yet).
+::* alphanumeric character strings
 :3. Clean ptoassignee to extract address components and clean it up.