Changes

Jump to navigation Jump to search
no edit summary
{{Project
|Has project output=Data
|Has sponsor=McNair Center
|Has title=Patent Data Cleanup - June 2016
|Has owner=Marcela Interiano,
|Has project status=Subsume
|Has keywords=Data
}}
== About this Page ==
This page contains the script that was used to clean up the patents and assignees tables in allpatent.
 
Cleaning up includes:
* Cleaning 'NULL' string and -1 inserts : at the time of merging the patentdata and patent_2015 databases, I inserted 'NULL' strings and -1 in integer columns to differentiate between NULLs that came from the vendor, and 'NULL's that I inserted because of no column overlap.
** The 'NULL's got replaced with NULL
** The -1s got replaced with NULL as well.
 
* Merging some more columns, and dropping unnecessary columns:
** At the time of merging the tables, some columns, particularly in the patent table, were not merged as they should have been.
** The script that follows merges those columns as well.
NOTE: The patent data page detailing the SQL steps followed to merge the data now has the updated table structures. The script on this page can be used as a reference when trying to debug any (unlikely) merging errors
 
* Renaming tables and columns
** Table names and column names have been standardized.
** General rule of thumb is : short column names, singular table names (for example : patent and not patents)
== Script ==
CREATE DATABASE allpatent_clone WITH TEMPLATE allpatent OWNER dbuser;
 
== Renaming Tables and Columns ==
 
To standardize table and column names, and to make them as user-friendly as possible, a few tables and columns have been renamed.
* '''allpatent''' database -> '''patent'''
* assignees -> assignee
* judges -> judge
* citations -> citation
* matchassignees -> MatchOrgNames
* patents -> patent
* assignees -> ptoassignee
* assignments -> ptoassignment
* assignors -> ptoassignor
* patentassignment -> ptopatentfile
* properties -> ptoproperty
* mslfee -> feestatus
* patentmaintenancefee -> fee

Navigation menu