Data Model (Deprecated)

From edegan.com
Revision as of 09:54, 24 May 2017 by OliverC (talk | contribs) (OliverC moved page Data Model to Data Model (Deprecated))
Jump to navigation Jump to search

Return to Patent Data (Wiki Page).

The USPTO and Harvard Dataverse data has been combined into one database spanning from 1901 to 2016. The image below summarizes the data model the database was constructed to follow.

1007px-Patent Data.png

XML Schema

Tags we are using:


Tags we aren't using:

Fields of Interest

In order to satisfy the data model, the following fields were of particular interest when extracting the data from the XML files and placing them in tables.

  • type
  • applicationnumber
  • filingdate

For priority, if there is more than 1, we want sequence 01

  • prioritydate
  • prioritycountry (should use ISO country codes - may need a lookup table)
  • prioritypatentnumber

Classification IPC Classification CPC - we only need the main one

CPC is a classification scheme set up by the USPTO and the European Patent Office (EPO). The first classification codes rolled out on November 9, 2012.[1] Full implementation of the CPC classification system occurred on January 2015, at the same time of version 4.5 of the USPTO patent bulk data.[2]

  • Section, Class, Subclass
  • Main Group, Subgroup

Classification National: Note that the one below comes out to 2/2.11 (http://www.google.com/patents/US8925112#classifications)

  • Country
  • Class

Title of the patent Number of Claims Primary examiner:

  • FirstName, LastName, Department

PCT/Regional Patent Number:

Patent Citations (we need all of them):

  • CitingPatentNumber (from the patent)
  • CitingPatentCountry (from the patent)
  • CitedPatentNumber
  • CitedPatentCountry

For non-patent references, we are just going to count them:

  • NoNonPatRefs
  • PatentNumber (and country) to build a key
  • We need a standard name and address object for each inventor

Assignees

  • PatentNumber (and country) to build a key
  • We need a "standard" name and address object for each assignee


For further information on Assignee data from the USPTO, see USPTO Assignees Data.

Fields with Potential

  • Abstract
  • Claims (other than their count)