Changes

Jump to navigation Jump to search
no edit summary
[[Category:Internal]]
[[Internal Classification::Legacy| ]]** Assignees Data:CREATE TABLE assignees_temp2 (lastname varchar,== Objective ==firstname varchar,address varchar,postcode varchar,orgname varchar, --assignee varchar,city varchar, --city2 varchar,country varchar, The McNair Center owns two sets of patent data --country2 varcharone set that is inherited from Harvard,patentnumber intthe Harvard dataverse, --patent integerwhich is stored in the database patentdata and another that is generated by crawlers pulling data from the USPTO website,which is stored in the database patent_2015.state varchar, --state2 varchar,patentcountry varchar,nationality2 varcharWe are now merging and cleaning the two data sets,residence varcharand storing them in a schema that is amalgamation of the two underlying schema for the citations tables,asgseq intassignees tables,asgtype integer);and patents tables. The destination schema is '''allpatent'''.
CREATE TABLE assignees(lastname varchar,firstname varchar,address varchar,postcode varchar,orgname varchar,city varchar, country varchar, patentnumber int,state varchar, patentcountry varchar,nationality2 varchar,residence varchar,asgseq int,asgtype integer);== Assignees Data==
The schema for the assignees table in '''patentdata''' database is:
PatentData: Column | Type | Modifiers -------------+-------------------+-----------
patent | integer |
asgtype | integer |
asgseq | integer |
(The schema for the assignees table in patent_2015 is :  Column | Type | Modifiers---------------+---------+----------- lastname | text | firstname | text | orgname | text | city | text | country | text | patentcountry | text | patentnumber | integer | state | text | address | text | patent intpostcode | text | To merge both schemas, we have some columns that overlap,and some columns that don't. '''Overlapping Columns'''  asgtype int,patent_2015 | patentdata--------------+-------------- orgname | assignee varchar , city varchar , | city country | country patentnumber | patent state varchar | state These columns will have entries for most rows in the table,because they exist in both tables. The rest of the columns will be populated based on which table the row is coming from. '''Final Schema''' Table "public.assignees" Column | Type | Modifiers---------------+-------------------+----------- lastname | character varying | firstname | character varying | address | character varying | postcode | character varying | orgname | character varying | city | character varying | country varchar , | character varying | patentnumber | integer | state | character varying | patentcountry | character varying | nationality varchar ,nationality2 | character varying | residence varchar , | character varying | asgseq | integer | asgtype | integer | '''Non-overlapping Columns'''These are the columns that belong to either one of the assignees tables, and not to both. For these cases, to help users understand where the row is coming from, the following insert rules have been followed: *For columns of type int , insert -1*For columns of type string (character varying), the string 'null' has been inserted. Therefore, if a row has appropriate values for orgname, state, city ,etc, but 'null' values for lastname, firstname, address and postcode, the row has come from the patentdata table. ====Sample insert and copy commands ====
INSERT INTO assignees_merge
FROM assignees a
);
 
/// Reference:
CREATE TABLE assignees_temp2 (
 
lastname varchar,
firstname varchar,
address varchar,
postcode varchar,
orgname varchar, --assignee varchar,
city varchar, --city2 varchar,
country varchar, --country2 varchar,
patentnumber int, --patent integer,
state varchar, --state2 varchar,
patentcountry varchar,
nationality2 varchar,
residence varchar,
asgseq int,
asgtype integer
 
);
//////
 
 
 
Patent_2015:
Column | Type | Modifiers
---------------+---------+-----------
lastname | text |
firstname | text |
orgname | text |
city | text |
country | text |
patentcountry | text |
patentnumber | integer |
state | text |
address | text |
postcode | text |
 
 
(
lastname varchar,
firstname varchar,
orgname varchar,
city varchar,
country varchar,
patentcountry varchar,
patentnumber int,
state varchar,
address varchar,
postcode varchar
)
INSERT INTO assignees_merge
 
 
SELECT count(*), *
from assignees
GROUP BY lastname, firstname, address, postcode, orgname, city, country, patentnumber, state, patentcountry, nationality2, residence, asgseq, asgtype
HAVING COUNT(*) > 1
;
 
CREATE TABLE Assignees2 AS
SELECT *
from assignees
GROUP BY lastname, firstname, address, postcode, orgname, city, country, patentnumber, state, patentcountry, nationality2, residence, asgseq, asgtype
HAVING COUNT(*) = 1
;
 
DROP TABLE Assignees;
ALTER TABLE Assignees2 RENAME TO Assignees;
--

Navigation menu