||Has keywords=Patent,Data|Has project status=ActiveSubsume|Does subsume=Patent Data (Wiki Page), Patent Data Cleanup - June 2016, Patent Data Extraction Scripts (Tool), USPTO Bulk Data Processing,
}}
In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:
SELECT documentid, max(recorded_date) as recorded_date FROM ptoassigneev2 GROUP BY documentid;
FROM ptotracking2 M1, ptoassigneend M2 WHERE (M1.reelno = M2.reelno) AND (M1.frameno = M2.frameno);
--16581236
DROP TABLE ptoassignee_current;
CREATE TABLE ptoassignee_current AS
SELECT M1.reelno, M1.frameno, M2.documentid, M2.recorded_date FROM ptoassignee_patent M1, datecheck M2
WHERE (M1.documentid = M2.documentid)
AND (M1.recorded_date = M2.recorded_date);
--6729698
A final version of the ptoassignee_current table was made using ptoassigneev2 given the larger pool of documentids included in the table by matching using documentid and recorded dates from datecheck.
DROP TABLE ptoassignee_current;
FROM ptoassigneev2 M1, datecheck M2
WHERE (M1.documentid = M2.documentid) AND (M1.recorded_date = M2.recorded_date);
--50040384994869 These codes should be used to recreate this table using Sonia's updated address information. ===Matching Application and Publication Numbers===The ptoproperty_cleaned documentids to verify the kind of different patents as specified in the ptoproperty tables. First the table ptopropertynd was made, including only the distinct documentids in ptoproperty_cleaned. DROP ptopropertynd; CREATE TABLE ptopropertynd AS SELECT DISTINCT * FROM ptoproperty; --27266638
ADD IN THE HISTPATENT STUFF FOR THE APPNUMBy creating this table, I also address the duplications caused by the kind XO.