This page contains the script that was used to clean up the patents and assignees tables in allpatent.
Cleaning up includes:
* Cleaning 'NULL' string and -1 inserts : at the time of merging the patentdata and patent_2015 databases, I inserted 'NULL' strings and -1 in integer columns to differentiate between NULLs that came from the vendor, and 'NULL's that I inserted because of no column overlap.
** The 'NULL's got replaced with NULL
** The -1s got replaced with NULL as well.
* Merging some more columns, and dropping unnecessary columns:
** At the time of merging the tables, some columns, particularly in the patent table, were not merged as they should have been.
** The script that follows merges those columns as well.
NOTE: The patent data page detailing the SQL steps followed to merge the data now has the updated table structures. The script on this page can be used as a reference when trying to debug any (unlikely) merging errors
* Renaming tables and columns
** Table names and column names have been standardized.
** General rule of thumb is : short column names, singular table names (for example : patent and not patents)
== Script ==
ALTER TABLE patents
RENAME COLUMN patentnumber TO patent;
ALTER TABLE patents
DROP COLUMN kind,
DROP COLUMN title,
DROP COLUMN ussubclass, **
DROP COLUMN maingroup, --
DROP COLUMN subgroup,--
DROP COLUMN cpcsubclass, ++
DROP COLUMN cpcmaingroup, ++
DROP COLUMN classificationnationalcountry,
DROP COLUMN classificationnationalclass,** (?)
DROP COLUMN primaryexaminerfirstname,
DROP COLUMN primaryexaminerlastname,
DROP COLUMN primaryexaminerdepartment,
DROP COLUMN filename;
UPDATE patents
SET type = '2015'
WHERE type != 'NULL';
-- RESULT : UPDATE 1646225
UPDATE patents
SET type = '2010'
WHERE type = 'NULL';
-- RESULT : UPDATE 3764926
/* Join the historical patent data from the US PTO with the patents table */
ALTER TABLE PATENTS
ADD COLUMN nber INT,
ADD COLUMN uspc varchar,
ADD COLUMN uspc_sub varchar;
UPDATE patents p
SET nber = hp.nber,
uspc = hp.uspc,
uspc_sub = hp.uspc
FROM historicalpatentdata hp
WHERE hp.patentnumber = CAST(p.patent AS varchar);
-- RESULT : UPDATE 5113655
/* Mergeing some columns - claims and number of claims - column name : claims*/