||Has keywords=Patent,Data|Has project status=ActiveSubsume|Does subsume=Patent Data (Wiki Page), Patent Data Cleanup - June 2016, Patent Data Extraction Scripts (Tool), USPTO Bulk Data Processing,
}}
In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:
===Matching Application and Publication Numbers===
The ptoproperty_cleaned documentids were checked again to see which numbers would watch with verify the patent kind of different patents as specified in the ptoproperty tables. First the table ptopropertynd was made, including only the distinct documentids in ptoproperty_cleaned. DROP ptopropertynd; CREATE TABLE ptopropertynd AS SELECT DISTINCT * FROM ptoproperty; --27266638 By creating this table, I also address the duplications caused by the kind XO.