Bulk Patent Assignee Processing

From edegan.com
Jump to navigation Jump to search

USPTO Assignees Data

We would like to download and absorb data from this location on the USPTO website into our tables. The objective is to determine whether this dataset is better than the current version of our patent data (a combination of the data in the patent_2015 and patentdata databases.

Steps Followed to Extract the Data

Extracting Data from XML Files

All the historical USPTO data is available as XML files. Here is the tree structure for the XML files:


Each of the above internal nodes is mandatory, and is a logical grouping of information fields. Each node has a corresponding table created with more or less the same fields as the XML elements.

Corresponding tables are:

  • assignment-records : assignment
  • patent-assignors : assignors
  • patent-assignees : assignees
  • patent-properties : properties

Additionally, for each file that is downloaded, there are some associated specs. All of these are stored in the PatentAssignment table. Here is the data model diagram.

Assignment Records

The fields in the assignment record are:

==== Assignores


Here is the DTD specified by the USPTO:

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE us-patent-assignments [<!ELEMENT us-patent-assignments (action-key-code, transaction-date, patent-assignments)> <!ATTLIST us-patent-assignments dtd-version CDATA #IMPLIED date-produced CDATA #IMPLIED> <!ELEMENT action-key-code (#PCDATA)> <!ELEMENT transaction-date (date)> <!ELEMENT patent-assignments (data-available-code | patent-assignment+)> <!ELEMENT date (#PCDATA)> <!ELEMENT data-available-code (#PCDATA)> <!ELEMENT patent-assignment (assignment-record, patent-assignors, patent-assignees, patent-properties)> <!ELEMENT assignment-record (reel-no, frame-no, last-update-date, purge-indicator, recorded-date, page-count?, correspondent, conveyance-text)> <!ELEMENT patent-assignors (patent-assignor+)> <!ELEMENT patent-assignees (patent-assignee+)> <!ELEMENT patent-properties (patent-property+)> <!ELEMENT reel-no (#PCDATA)> <!ELEMENT frame-no (#PCDATA)> <!ELEMENT last-update-date (date)> <!ELEMENT purge-indicator (#PCDATA)> <!ELEMENT recorded-date (date)> <!ELEMENT page-count (#PCDATA)> <!ELEMENT correspondent (name, address-1?, address-2?, address-3?, address-4?)> <!ELEMENT conveyance-text (#PCDATA)> <!ELEMENT patent-assignor (name, execution-date?, date-acknowledged?)> <!ELEMENT patent-assignee (name, address-1?, address-2?, city?, state?, country-name?, postcode?)> <!ELEMENT patent-property (document-id*, invention-title?)> <!ELEMENT name (#PCDATA)> <!ATTLIST name name-type (natural | legal) #IMPLIED> <!ELEMENT address-1 (#PCDATA)> <!ELEMENT address-2 (#PCDATA)> <!ELEMENT address-3 (#PCDATA)> <!ELEMENT address-4 (#PCDATA)> <!ELEMENT execution-date (date)> <!ELEMENT date-acknowledged (date)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT country-name (#PCDATA)> <!ELEMENT postcode (#PCDATA)> <!ELEMENT document-id (country, doc-number, kind?, name?, date?)> <!ELEMENT invention-title (#PCDATA | b | i | u | sup | sub)*> <!ATTLIST invention-title id ID #IMPLIED lang CDATA #REQUIRED> <!ELEMENT country (#PCDATA)> <!ELEMENT doc-number (#PCDATA)> <!ELEMENT kind (#PCDATA)> <!ELEMENT b (#PCDATA | i | u | smallcaps)*> <!ELEMENT i (#PCDATA | b | u | smallcaps)*> <!ELEMENT u (#PCDATA | b | i | smallcaps)*> <!ATTLIST u style (single | double | dash | dots ) 'single' > <!ELEMENT sup (#PCDATA | b | u | i)*> <!ELEMENT sub (#PCDATA | b | u | i)*> <!ELEMENT smallcaps (#PCDATA | b | u | i)*> ]>

Inserting Extracted Data into Tables

Clean Up