Difference between revisions of "Patent Data Processing - SQL Steps"

From edegan.com
Jump to navigation Jump to search
Line 129: Line 129:
 
--3818842
 
--3818842
  
 +
Note : The assignees table was updated on 6/23 to remove the 'null' string and the '-1' values.
  
 
+
==Patents ==
--
 
 
 
 
 
  
 
  Patentdata:
 
  Patentdata:
Line 147: Line 145:
 
  appdate | date              |
 
  appdate | date              |
 
  appyear | integer          |
 
  appyear | integer          |
 
Column Names:
 
patent int,
 
kind varchar,
 
claims int,
 
apptype int,
 
appnum int,
 
gdate date,
 
gyear int,
 
appdate date,
 
appyear int
 
 
 
  
 
   
 
   
Line 190: Line 175:
 
   filename                      | varchar    |
 
   filename                      | varchar    |
 
    
 
    
  Column Names:
 
    patentnumber int,
 
    kind varchar,
 
    grantdate date,
 
    type varchar,
 
    applicationnumber varchar,
 
    filingdate date,
 
    prioritydate date,
 
    prioritycountry varchar,
 
    prioritypatentnumber varchar,
 
    ussubclass varchar,
 
    maingroup varchar,
 
    subgroup varchar,
 
    cpcsubclass varchar,
 
    cpcmaingroup varchar,
 
    cpcsubgroup varchar,
 
    classificationnationalcountry varchar,
 
    classificationnationalclass varchar,
 
    title varchar,
 
    numberofclaims int,
 
    primaryexaminerfirstname varchar,
 
    primaryexaminerlastname varchar,
 
    primaryexaminerdepartment varchar,
 
    pctpatentnumber varchar,
 
    filename varchar
 
 
 
Combined Schema:
 
Combined Schema:
  
Line 228: Line 187:
 
  appdate date,
 
  appdate date,
 
  appyear int
 
  appyear int
 
+
patentnumber int, -- patent  
 
+
kind varchar, -- kind
    patentnumber int, -- patent  
+
grantdate date, --gdate
    kind varchar, -- kind
+
type varchar,
    grantdate date, --gdate
+
applicationnumber varchar,
    type varchar,
+
filingdate date,
    applicationnumber varchar,
+
prioritydate date,
    filingdate date,
+
prioritycountry varchar,
    prioritydate date,
+
prioritypatentnumber varchar,
    prioritycountry varchar,
+
ussubclass varchar,
    prioritypatentnumber varchar,
+
maingroup varchar,
    ussubclass varchar,
+
subgroup varchar,
    maingroup varchar,
+
cpcsubclass varchar,
    subgroup varchar,
+
cpcmaingroup varchar,
    cpcsubclass varchar,
+
cpcsubgroup varchar,
    cpcmaingroup varchar,
+
classificationnationalcountry varchar,
    cpcsubgroup varchar,
+
classificationnationalclass varchar,
    classificationnationalcountry varchar,
+
title varchar,
    classificationnationalclass varchar,
+
numberofclaims int,
    title varchar,
+
primaryexaminerfirstname varchar,
    numberofclaims int,
+
primaryexaminerlastname varchar,
    primaryexaminerfirstname varchar,
+
primaryexaminerdepartment varchar,
    primaryexaminerlastname varchar,
+
pctpatentnumber varchar,
    primaryexaminerdepartment varchar,
+
filename varchar
    pctpatentnumber varchar,
+
claims int,
    filename varchar
+
apptype int,
    claims int,
+
appnum int,
    apptype int,
+
gyear int,
    appnum int,
+
appdate date,
    gyear int,
+
appyear int
    appdate date,
+
    appyear int
 
   
 
 
      
 
      
 
     Output Schema: patents
 
     Output Schema: patents
Line 363: Line 320:
 
primaryexaminerdepartment,
 
primaryexaminerdepartment,
 
pctpatentnumber,
 
pctpatentnumber,
filename,
+
  filename,
 
-1,
 
-1,
 
-1,
 
-1,
Line 493: Line 450:
  
  
********** INDEX CREATION **************
+
==== INDEX CREATION ====
  
 
ALTER TABLE patents ADD PRIMARY KEY (patentnumber);
 
ALTER TABLE patents ADD PRIMARY KEY (patentnumber);
Line 876: Line 833:
 
CREATE INDEX
 
CREATE INDEX
  
 
+
== Citations==
 
 
 
 
 
 
**Citations
 
  
 
To Extract Patents with Numbers Only and to Ignore Other RegExes
 
To Extract Patents with Numbers Only and to Ignore Other RegExes
Line 892: Line 845:
 
$$ LANGUAGE plperl;
 
$$ LANGUAGE plperl;
  
 
************ Columns *********************
 
 
patentdata:
 
patentdata:
 
  Column    |      Type        | Modifiers
 
  Column    |      Type        | Modifiers

Revision as of 17:03, 23 June 2016


Objective

The McNair Center owns two sets of patent data - one set that is inherited from Harvard, the Harvard dataverse, which is stored in the database patentdata and another that is generated by crawlers pulling data from the USPTO website, which is stored in the database patent_2015.

We are now merging and cleaning the two data sets, and storing them in a schema that is amalgamation of the two underlying schema for the citations tables, assignees tables, and patents tables. The destination schema is allpatent.

Assignees Data

The schema for the assignees table in patentdata database is:

  Column    |       Type        | Modifiers

+-------------------+-----------

patent      | integer           |
asgtype     | integer           |
assignee    | character varying |
city        | character varying |
state       | character varying |
country     | character varying |
nationality | character varying |
residence   | character varying |
asgseq      | integer           |

The schema for the assignees table in patent_2015 is :

   Column     |  Type   | Modifiers

+---------+-----------

lastname      | text    |
firstname     | text    |
orgname       | text    |
city          | text    |
country       | text    |
patentcountry | text    |
patentnumber  | integer |
state         | text    |
address       | text    |
postcode      | text    |

To merge both schemas, we have some columns that overlap, and some columns that don't.

Overlapping Columns

patent_2015  | patentdata

+--------------

orgname      | assignee 
city         | city 
country      | country
patentnumber | patent 
state        | state

These columns will have entries for most rows in the table, because they exist in both tables. The rest of the columns will be populated based on which table the row is coming from.

Final Schema

        Table "public.assignees"
   Column     |       Type        | Modifiers

+-------------------+-----------

lastname      | character varying |
firstname     | character varying |
address       | character varying |
postcode      | character varying |
orgname       | character varying |
city          | character varying |
country       | character varying |
patentnumber  | integer           |
state         | character varying |
patentcountry | character varying |
nationality2  | character varying |
residence     | character varying |
asgseq        | integer           |
asgtype       | integer           |

Non-overlapping Columns These are the columns that belong to either one of the assignees tables, and not to both. For these cases, to help users understand where the row is coming from, the following insert rules have been followed:

  • For columns of type int, insert -1
  • For columns of type string (character varying), the string 'null' has been inserted.

Therefore, if a row has appropriate values for orgname, state, city ,etc, but 'null' values for lastname, firstname, address and postcode, the row has come from the patentdata table.

Sample insert and copy commands

INSERT INTO assignees_merge ( SELECT 'null', 'null', 'null', 'null', a.assignee, a.city, a.country, a.patent, a.state, 'null', a.nationality, a.residence, a.asgseq, a.asgtype FROM assignees a );

INSERT INTO assignees_merge ( SELECT assignees.lastname, assignees.firstname, assignees.address, assignees.postcode, assignees.orgname, assignees.city, assignees.country, assignees.patentnumber, assignees.state, assignees.patentcountry, 'null', 'null', -1, -1 FROM assignees );


\COPY assignees_merge TO '/tmp/assignees_merge_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; \COPY assignees FROM '/tmp/assignees_merge_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --1607724

\COPY assignees_merge TO '/tmp/assignees_merge_export1.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; \COPY assignees FROM '/tmp/assignees_merge_export1.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --3818842

Note : The assignees table was updated on 6/23 to remove the 'null' string and the '-1' values.

Patents

Patentdata:
Column  |       Type        | Modifiers
---------+-------------------+-----------
patent  | integer           |
kind    | character varying |
claims  | integer           |
apptype | integer           |
appnum  | integer           |
gdate   | date              |
gyear   | integer           |
appdate | date              |
appyear | integer           |


Patent_2015:
            Column             |  Type   | Modifiers
-------------------------------+---------+-----------
 patentnumber                  | int | not null
 kind                          | varchar    |
 grantdate                     | date    |
 type                          | varchar    |
 applicationnumber             | varchar    |
 filingdate                    | date    |
 prioritydate                  | date    |
 prioritycountry               | varchar    |
 prioritypatentnumber          | varchar    |
 ussubclass                    | varchar    |
 maingroup                     | varchar    |
 subgroup                      | varchar    |
 cpcsubclass                   | varchar    |
 cpcmaingroup                  | varchar    |
 cpcsubgroup                   | varchar    |
 classificationnationalcountry | varchar    |
 classificationnationalclass   | varchar    |
 title                         | varchar    |
 numberofclaims                | int |
 primaryexaminerfirstname      | varchar    |
 primaryexaminerlastname       | varchar    |
 primaryexaminerdepartment     | varchar    |
 pctpatentnumber               | varchar    |
 filename                      | varchar    |
 

Combined Schema:

Column Names:

patent int,
kind varchar,
claims int,
apptype int,
appnum int,
gdate date,
gyear int,
appdate date,
appyear int
patentnumber int, -- patent 
kind varchar, -- kind
grantdate date, --gdate
type varchar,
applicationnumber varchar,
filingdate date,
prioritydate date,
prioritycountry varchar,
prioritypatentnumber varchar,
ussubclass varchar,
maingroup varchar,
subgroup varchar,
cpcsubclass varchar,
cpcmaingroup varchar,
cpcsubgroup varchar,
classificationnationalcountry varchar,
classificationnationalclass varchar,
title varchar,
numberofclaims int,
primaryexaminerfirstname varchar,
primaryexaminerlastname varchar,
primaryexaminerdepartment varchar,
pctpatentnumber varchar,
filename varchar
claims int,
apptype int,
appnum int,
gyear int,
appdate date,
appyear int

   
   Output Schema: patents
   
     CREATE TABLE patents_merged( 
       patentnumber int, 
       kind varchar, 
       grantdate date,
       type varchar,
       applicationnumber varchar,
       filingdate date,
       prioritydate date,
       prioritycountry varchar,
       prioritypatentnumber varchar,
       ussubclass varchar,
       maingroup varchar,
       subgroup varchar,
       cpcsubclass varchar,
       cpcmaingroup varchar,
       cpcsubgroup varchar,
       classificationnationalcountry varchar,
       classificationnationalclass varchar,
       title varchar,
       numberofclaims int,
       primaryexaminerfirstname varchar,
       primaryexaminerlastname varchar,
       primaryexaminerdepartment varchar,
       pctpatentnumber varchar,
       filename varchar,
       claims int,
       apptype int,
       appnum int,
       gyear int,
       appdate date,
       appyear int
       );
       
   
   patentdata:
   INSERT INTO patents_merged
   (
   		SELECT	
   		patent, 

kind, gdate, 'NULL', 'NULL', NULL, NULL, 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', -1, 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', claims, apptype, appnum, gyear, appdate,

       	appyear
       	FROM patents
   );
   -- RESULT : INSERT 0 3984771
   
   patent_2015:
   INSERT INTO patents_merged
   (
   	SELECT
   		patentnumber, 

kind, grantdate, type, applicationnumber, filingdate, prioritydate, prioritycountry, prioritypatentnumber, ussubclass, maingroup, subgroup, cpcsubclass, cpcmaingroup, cpcsubgroup, classificationnationalcountry, classificationnationalclass, title, numberofclaims, primaryexaminerfirstname, primaryexaminerlastname, primaryexaminerdepartment, pctpatentnumber,

  		filename,

-1, -1, -1, -1, NULL,

       	-1
   	FROM patents
   );
   -- RESULT : INSERT 0 1646225
   

COPY SCRIPTS: patentdata: \COPY patents_merged TO '/tmp/merged_patents_export1.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --COPY 3984771

patent_2015: \COPY patents_merged TO '/tmp/merged_patents_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --COPY 1646225

PATENTS TABLE \COPY patents FROM '/tmp/merged_patents_export1.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; -- RESULT : COPY 3984771 \COPY patents FROM '/tmp/merged_patents_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; -- RESULT : COPY 1646225


TESTING: select count(*) FROM (SELECT DISTINCT patentnumber FROM patents) AS t; --RESULT: 5411151 EXPECTED: 5426566

SELECT COUNT(*), * FROM patents GROUP BY patentnumber, kind, grantdate, type, applicationnumber, filingdate, prioritydate, prioritycountry, prioritypatentnumber, ussubclass, maingroup, subgroup, cpcsubclass, cpcmaingroup, cpcsubgroup, classificationnationalcountry, classificationnationalclass, title, numberofclaims, primaryexaminerfirstname, primaryexaminerlastname, primaryexaminerdepartment, pctpatentnumber, filename, claims, apptype, appnum, gyear, appdate, appyear HAVING COUNT(*) > 1;

SELECT patentnumber, count(*) FROM patents GROUP BY patentnumber HAVING count(*)>1; --7640598


SELECT * FROM patents op WHERE op.patentnumber IN ( SELECT ip.patentnumber FROM patents ip GROUP BY ip.patentnumber HAVING COUNT(*)>1 ) ORDER BY op.patentnumber;

( SELECT * INTO patentsCleaned FROM patents op WHERE op.patentnumber IN ( SELECT ip.patentnumber FROM patents ip GROUP BY ip.patentnumber HAVING COUNT(*)=1 ) ORDER BY op.patentnumber ) --SELECT 5191306


INSERT INTO patentsCleaned( SELECT * FROM patents op WHERE op.patentnumber IN ( SELECT ip.patentnumber FROM patents ip GROUP BY ip.patentnumber HAVING COUNT(*)>1 ) AND op.applicationnumber NOT LIKE 'NULL' ORDER BY op.patentnumber );

--219845

TESTING: allpatent=# select count(*) from patentsCleaned;

 count

5411151

(1 row)

allpatent=# select count(*), patentnumber FROM patentsCleaned group by patentnumber having count(*) > 1;

count | patentnumber

+--------------

(0 rows)


INDEX CREATION

ALTER TABLE patents ADD PRIMARY KEY (patentnumber); -- RESULT : ALTER TABLE allpatent=# CREATE UNIQUE INDEX patent_idx ON patents (patentnumber);

allpatent=# CREATE INDEX ON assignees (orgname); CREATE INDEX




    • Patents

Patentdata: Column | Type | Modifiers


+-------------------+-----------

patent  | integer           |
kind    | character varying |
claims  | integer           |
apptype | integer           |
appnum  | integer           |
gdate   | date              |
gyear   | integer           |
appdate | date              |
appyear | integer           |

Column Names:

patent int,
kind varchar,
claims int,
apptype int,
appnum int,
gdate date,
gyear int,
appdate date,
appyear int



Patent_2015:
            Column             |  Type   | Modifiers
-------------------------------+---------+-----------
 patentnumber                  | int | not null
 kind                          | varchar    |
 grantdate                     | date    |
 type                          | varchar    |
 applicationnumber             | varchar    |
 filingdate                    | date    |
 prioritydate                  | date    |
 prioritycountry               | varchar    |
 prioritypatentnumber          | varchar    |
 ussubclass                    | varchar    |
 maingroup                     | varchar    |
 subgroup                      | varchar    |
 cpcsubclass                   | varchar    |
 cpcmaingroup                  | varchar    |
 cpcsubgroup                   | varchar    |
 classificationnationalcountry | varchar    |
 classificationnationalclass   | varchar    |
 title                         | varchar    |
 numberofclaims                | int |
 primaryexaminerfirstname      | varchar    |
 primaryexaminerlastname       | varchar    |
 primaryexaminerdepartment     | varchar    |
 pctpatentnumber               | varchar    |
 filename                      | varchar    |
 
 Column Names:
   patentnumber int,
   kind varchar,
   grantdate date,
   type varchar,
   applicationnumber varchar,
   filingdate date,
   prioritydate date,
   prioritycountry varchar,
   prioritypatentnumber varchar,
   ussubclass varchar,
   maingroup varchar,
   subgroup varchar,
   cpcsubclass varchar,
   cpcmaingroup varchar,
   cpcsubgroup varchar,
   classificationnationalcountry varchar,
   classificationnationalclass varchar,
   title varchar,
   numberofclaims int,
   primaryexaminerfirstname varchar,
   primaryexaminerlastname varchar,
   primaryexaminerdepartment varchar,
   pctpatentnumber varchar,
   filename varchar

Combined Schema:

Column Names:

patent int,
kind varchar,
claims int,
apptype int,
appnum int,
gdate date,
gyear int,
appdate date,
appyear int


   patentnumber int, -- patent 
   kind varchar, -- kind
   grantdate date, --gdate
   type varchar,
   applicationnumber varchar,
   filingdate date,
   prioritydate date,
   prioritycountry varchar,
   prioritypatentnumber varchar,
   ussubclass varchar,
   maingroup varchar,
   subgroup varchar,
   cpcsubclass varchar,
   cpcmaingroup varchar,
   cpcsubgroup varchar,
   classificationnationalcountry varchar,
   classificationnationalclass varchar,
   title varchar,
   numberofclaims int,
   primaryexaminerfirstname varchar,
   primaryexaminerlastname varchar,
   primaryexaminerdepartment varchar,
   pctpatentnumber varchar,
   filename varchar
   claims int,
   apptype int,
   appnum int,
   gyear int,
   appdate date,
   appyear int
   
   
   Output Schema: patents
   
     CREATE TABLE patents_merged( 
       patentnumber int, 
       kind varchar, 
       grantdate date,
       type varchar,
       applicationnumber varchar,
       filingdate date,
       prioritydate date,
       prioritycountry varchar,
       prioritypatentnumber varchar,
       ussubclass varchar,
       maingroup varchar,
       subgroup varchar,
       cpcsubclass varchar,
       cpcmaingroup varchar,
       cpcsubgroup varchar,
       classificationnationalcountry varchar,
       classificationnationalclass varchar,
       title varchar,
       numberofclaims int,
       primaryexaminerfirstname varchar,
       primaryexaminerlastname varchar,
       primaryexaminerdepartment varchar,
       pctpatentnumber varchar,
       filename varchar,
       claims int,
       apptype int,
       appnum int,
       gyear int,
       appdate date,
       appyear int
       );
       
   
   patentdata:
   INSERT INTO patents_merged
   (
   		SELECT	
   		patent, 

kind, gdate, 'NULL', 'NULL', NULL, NULL, 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', -1, 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', claims, apptype, appnum, gyear, appdate,

       	appyear
       	FROM patents
   );
   -- RESULT : INSERT 0 3984771
   
   patent_2015:
   INSERT INTO patents_merged
   (
   	SELECT
   		patentnumber, 

kind, grantdate, type, applicationnumber, filingdate, prioritydate, prioritycountry, prioritypatentnumber, ussubclass, maingroup, subgroup, cpcsubclass, cpcmaingroup, cpcsubgroup, classificationnationalcountry, classificationnationalclass, title, numberofclaims, primaryexaminerfirstname, primaryexaminerlastname, primaryexaminerdepartment, pctpatentnumber, filename, -1, -1, -1, -1, NULL,

       	-1
   	FROM patents
   );
   -- RESULT : INSERT 0 1646225
   

COPY SCRIPTS: patentdata: \COPY patents_merged TO '/tmp/merged_patents_export1.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --COPY 3984771

patent_2015: \COPY patents_merged TO '/tmp/merged_patents_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --COPY 1646225

PATENTS TABLE \COPY patents FROM '/tmp/merged_patents_export1.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; -- RESULT : COPY 3984771 \COPY patents FROM '/tmp/merged_patents_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; -- RESULT : COPY 1646225


TESTING: select count(*) FROM (SELECT DISTINCT patentnumber FROM patents) AS t; --RESULT: 5411151 EXPECTED: 5426566

SELECT COUNT(*), * FROM patents GROUP BY patentnumber, kind, grantdate, type, applicationnumber, filingdate, prioritydate, prioritycountry, prioritypatentnumber, ussubclass, maingroup, subgroup, cpcsubclass, cpcmaingroup, cpcsubgroup, classificationnationalcountry, classificationnationalclass, title, numberofclaims, primaryexaminerfirstname, primaryexaminerlastname, primaryexaminerdepartment, pctpatentnumber, filename, claims, apptype, appnum, gyear, appdate, appyear HAVING COUNT(*) > 1;

SELECT patentnumber, count(*) FROM patents GROUP BY patentnumber HAVING count(*)>1; --7640598


SELECT * FROM patents op WHERE op.patentnumber IN ( SELECT ip.patentnumber FROM patents ip GROUP BY ip.patentnumber HAVING COUNT(*)>1 ) ORDER BY op.patentnumber;

( SELECT * INTO patentsCleaned FROM patents op WHERE op.patentnumber IN ( SELECT ip.patentnumber FROM patents ip GROUP BY ip.patentnumber HAVING COUNT(*)=1 ) ORDER BY op.patentnumber ) --SELECT 5191306


INSERT INTO patentsCleaned( SELECT * FROM patents op WHERE op.patentnumber IN ( SELECT ip.patentnumber FROM patents ip GROUP BY ip.patentnumber HAVING COUNT(*)>1 ) AND op.applicationnumber NOT LIKE 'NULL' ORDER BY op.patentnumber );

--219845

TESTING: allpatent=# select count(*) from patentsCleaned;

 count

5411151

(1 row)

allpatent=# select count(*), patentnumber FROM patentsCleaned group by patentnumber having count(*) > 1;

count | patentnumber

+--------------

(0 rows)


                    • INDEX CREATION **************

ALTER TABLE patents ADD PRIMARY KEY (patentnumber); -- RESULT : ALTER TABLE allpatent=# CREATE UNIQUE INDEX patent_idx ON patents (patentnumber);

allpatent=# CREATE INDEX ON assignees (orgname); CREATE INDEX

Citations

To Extract Patents with Numbers Only and to Ignore Other RegExes CREATE OR REPLACE FUNCTION cleanpatno (text) RETURNS bigint AS $$ if ($_[0]) { my $var=$_[0]; if ($var=~/^\d*$/) {return $var;} return undef; } return undef; $$ LANGUAGE plperl;

patentdata:

Column    |       Type        | Modifiers

+-------------------+-----------

patent      | integer           |
cit_date    | date              |
cit_name    | character varying |
cit_kind    | character varying |
cit_country | character varying |
citation    | integer           |
category    | character varying |
citseq      | integer           |

SELECT patent as citingpatentnumber, citation AS citedpatentnumber INTO citations_merged FROM citations; --SELECT 38452957

patent_2015:

Column        |  Type   | Modifiers

+---------+-----------

citingpatentnumber  | integer |
citingpatentcountry | text    |
citedpatentnumber   | text    |
citedpatentcountry  | text    |

SELECT CAST(citingpatentnumber AS bigint), CAST(cleanpatno( citedpatentnumber) AS bigint) as citedpatentnumber INTO citations_merged FROM citations; -- RESULT : SELECT 59227881

FINAL TABLE:

CREATE TABLE citations ( citingpatentnumber bigint, citedpatentnumber bigint );


Copy Statements:

patentdata: \COPY citations_merged TO '/tmp/merged_citations_export1.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --COPY 38452957

patent_2015: \COPY citations_merged TO '/tmp/merged_citations_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; -- RESULT : COPY 59227881

allpatent: \COPY citations FROM '/tmp/merged_citations_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --RESULT : COPY 59227881

\COPY citations FROM '/tmp/merged_citations_export1.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --RESULT: COPY 38452957


CLONING: CREATE DATABASE allpatentsProcessed WITH TEMPLATE allpatent OWNER researcher;

USPTO Consolidated Patent Data

Scripts:

/* creating patent data tables from : https://bulkdata.uspto.gov/data2/patent/maintenancefee/*/

CREATE TABLE PatentMaintenanceFee( patentnumber varchar, applicationnumber int, smallentity varchar, filingdate date, grantissuedate date, maintenancefeedate date, maintenancefeecode varchar );

\COPY PatentMaintenanceFee FROM '/bulk/USPTO_Consolidated/MaintFeeEvents_20160613.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; -- RESULT : COPY 14042059

/* creating tables for historical patent data - USPTO */

CREATE TABLE HistoricalPatentData( applicationid int, pubno varchar, patentnumber varchar, NBER int, USPC varchar, USPC_sub varchar, applicationdate date, prioritydate date, pubdate date, displaydate date, disptype varchar, exp_dt date, exp_dt_max date, pta int );

\COPY historicalpatentdata FROM '/bulk/USPTO_Consolidated/HistoricalFiles/historical_masterfile.csv' DELIMITER AS ',' HEADER NULL AS CSV;

--COPY 11191813