||Has keywords=Patent,Data|Has project status=ActiveSubsume|Does subsume=Patent Data (Wiki Page), Patent Data Cleanup - June 2016, Patent Data Extraction Scripts (Tool), USPTO Bulk Data Processing,
}}
In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:
===Matching Application and Publication Numbers===
In The ptoproperty_cleaned documentids to verify the kind of different patents as specified in the ptoproperty tables. First the table ptopropertynd was made, including only the distinct documentids in ptoproperty_cleaned . DROP ptopropertynd; CREATE TABLE ptopropertynd AS SELECT DISTINCT * FROM ptoproperty; --27266638 By creating this table, I also address the duplications caused by the documentids were matched tokind XO.