||Has keywords=Patent,Data|Has project status=ActiveSubsume|Does subsume=Patent Data (Wiki Page), Patent Data Cleanup - June 2016, Patent Data Extraction Scripts (Tool), USPTO Bulk Data Processing,
}}
In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:
===Matching Application and Publication Numbers===
The ptoproperty_cleaned documentids to verify the kind of different patents as specified in the ptoproperty tables.
First the table ptopropertynd was made, including only the distinct documentids in ptoproperty_cleaned.
DROP ptopropertynd;
CREATE TABLE ptopropertynd AS
SELECT DISTINCT * FROM ptoproperty;
--27266638
By creating this table, I also address the duplications caused by the kind XO.