Changes

Jump to navigation Jump to search
Created page with "===Utility patent grants fields=== ====Patent==== *patent number *kind: http://www.uspto.gov/patents-application-process/patent-search/authority-files/uspto-kind-codes *gran..."
===Utility patent grants fields===

====Patent====

*patent number
*kind: http://www.uspto.gov/patents-application-process/patent-search/authority-files/uspto-kind-codes
*grantdate

For version 4.5:
<publication-reference>
<document-id>
<country>US</country>
<doc-number>08925112</doc-number>
<kind>B2</kind>
<date>20150106</date>
</document-id>
</publication-reference>

*type
*applicationnumber
*filingdate
<application-reference appl-type="utility">
<document-id>
<country>US</country>
<doc-number>13824291</doc-number>
<date>20110929</date>
</document-id>
</application-reference>

For priority, if there is more than 1, we want sequence 01
*prioritydate
*prioritycountry (should use ISO country codes - may need a lookup table)
*prioritypatentnumber
*'''find 4.3 file with priority claim'''

<priority-claims>
<priority-claim sequence="01" kind="national">
<country>GB</country>
<doc-number>1016384.8</doc-number>
<date>20100930</date>
</priority-claim>
</priority-claims>

Classification IPC - we only need the first one: http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf
*Section, Class, SubClass - Together these concord to US subclass: http://www.uspto.gov/web/patents/classification/international/ipc/ipc8/ipc_concordance/ipcsel.htm#a
*MainGroup, SubGroup

<classifications-ipcr>
<classification-ipcr>
<ipc-version-indicator>
<date>20060101</date>
</ipc-version-indicator>
<classification-level>A</classification-level>
<section>B</section>
<class>64</class>
<subclass>G</subclass>
<main-group>6</main-group>
<subgroup>00</subgroup>
<symbol-position>F</symbol-position>
<classification-value>I</classification-value>
...
</classification-ipcr>
...
</classifications-ipcr>

Classification CPC - we only need the main one

CPC is a classification scheme set up by the USPTO and the European Patent Office (EPO). The first classification codes rolled out on November 9, 2012.[http://www.cooperativepatentclassification.org/cpcSchemeAndDefinitions.html] Full implementation of the CPC classification system occurred on January 2015, at the same time of version 4.5 of the USPTO patent bulk data.[http://www.uspto.gov/sites/default/files/about/advisory/ppac/120927-09a-international_cpc.pdf]

*Section, Class, Subclass
*Main Group, Subgroup
*'''v 4.2, 4.3, 4.4 does not have this'''

<classifications-cpc>
<main-cpc>
<classification-cpc>
<cpc-version-indicator>
<date>20130101</date>
</cpc-version-indicator>
<section>B</section>
<class>64</class>
<subclass>D</subclass>
<main-group>10</main-group>
<subgroup>00</subgroup>
<symbol-position>F</symbol-position>
<classification-value>I</classification-value>
...
</classification-cpc>
</main-cpc>
</classifications-cpc>

Classification National: Note that the one below comes out to 2/2.11 (http://www.google.com/patents/US8925112#classifications)
*Country
*Class

'''THIS IS NOT UNIQUE. What classifications are we searching for?'''
<classification-national>
<country>US</country>
<main-classification>2 211</main-classification>
</classification-national>

Title of the patent:
<invention-title id="d2e61">Aircrew ensembles</invention-title>

Number of Claims:
<number-of-claims>12</number-of-claims>

Primary examiner:
*FirstName, LastName, Department

<examiners>
<primary-examiner>
<last-name>Patel</last-name>
<first-name>Tejash</first-name>
<department>3765</department>
</primary-examiner>
...
</examiners>

PCT/Regional Patent Number:
*PCTNumber (just the doc number - if it starts with PCT set a flag)
*'''not in all v 4.5'''
*'''not in v 4.2, 4.3, 4.4'''
*'''maybe not all patents are filed under PCT, need to use code to search all files for key word'''

<pct-or-regional-filing-data>
<document-id>
<country>WO</country>
<doc-number>PCT/EP2011/067014</doc-number>
<kind>00</kind>
<date>20110929</date>
</document-id>
...
</pct-or-regional-filing-data>

====Citations====

Patent Citations (we need all of them):
*CitingPatentNumber (from the patent)
*CitingPatentCountry (from the patent)

<publication-reference>
<document-id>
<country>US</country>
<doc-number>08925112</doc-number>
<kind>B2</kind>
<date>20150106</date>
</document-id>
</publication-reference>

*CitedPatentNumber
*CitedPatentCountry
*'''V 4.2 does not have <us-references-cited>

<us-references-cited>
<us-citation>
<patcit num="00001">
<document-id>
<country>US</country>
<doc-number>1105569</doc-number>
<kind>A</kind>
<name>Lacrotte</name>
<date>19140700</date>
</document-id>
</patcit>
<category>cited by examiner</category>
<classification-national>
<country>US</country>
<main-classification>2 214</main-classification>
</classification-national>
</us-citation>
...
</us-references-cited>

For non-patent references, we are just going to count them:
*NoNonPatRefs

<us-references-cited>
...
<us-citation>
<nplcit num="00020">
<othercit>
European Search Report dated Jan. 20, 2011 as received in European Patent Application No. GB1016384.8.
</othercit>
</nplcit>
<category>cited by applicant</category>
</us-citation>
</us-references-cited>

====Inventors====

*'''For v 4.3, 4.4, 4.5'''
*PatentNumber (and country) to build a key
*We need a "standard" name and address object for each inventor
<us-parties>
<us-applicants>
...
</us-applicants>
<inventors>
<inventor sequence="001" designation="us-only">
<addressbook>
<last-name>Oliver</last-name>
<first-name>Paul</first-name>
<address>
<city>Rhyl</city>
<country>GB</country>
</address>
</addressbook>
</inventor>
...
</inventors>
...
<us-parties>


*'''For v 4.2'''

<parties>
<applicants>
<applicant sequence="001" app-type="applicant-inventor" designation="us-only">
<addressbook>
<last-name>Kamath</last-name>
<first-name>Sandeep</first-name>
<address>
<city>Bangalore</city>
<country>IN</country>
</address>
</addressbook>
<nationality>
<country>omitted</country>
</nationality>
<residence>
<country>IN</country>
</residence>
</applicant>
...
</applicants>
...
</parties>

====Assignees====

*PatentNumber (and country) to build a key
*We need a "standard" name and address object for each assignee

<assignees>
<assignee>
<addressbook>
<orgname>Survitec Group Limited</orgname>
<role>03</role>
<address>
<city>Merseyside</city>
<country>GB</country>
</address>
</addressbook>
</assignee>
</assignees>


====Other things we might want====

*Abstract
*Claims (other than their count)

====Things we don't need====

General:
*Series Code: http://www.uspto.gov/web/offices/ac/ido/oeip/taf/filingyr.htm

Classification related:
*Level - This appears to be either core or advanced. Not sure it matters.
*SymbolPosition, ClassificationValue - we likely don't need them
*Classification status and data source - no idea what these do

====About the scripts====

The scripts to process the Patent Data are all located under /bulk/Software/Scripts/PatentData/ ("E:\Software\Scripts\PatentData\")

There are currently 5 .pm files: PatentApplication.pm, Inventor.pm, Claim.pm, and Addressbook.pm, and Loader.pm available.

Each of the first 4 represents an Object type. The last one is a helper object that is able to extract the wanted fields as a perl object given a schema file.
Future work should be done in this file to support more schema files.

Example Usage:
perl PatentParser.pl -file=ipa150319.xml
This will parse the xml file with name ipa150319.xml, extract all the Patents (in this case PatentApplications) each as a temporary xml file, and then, using a Loader object with a specified
schema file, in this case "us-patent-application-v44-2014-04-03.dtd" to be able to extract each of the 4 object types from the Patents.
If any error happened during the parsing of any file, that file will be moved to a directory called "failed_files". Most likely if a file failed the parsing it is likely not a Utility patent.

====About the Harvard Dataverse====
The patents from 1975-2010 loaded as .sqlite3 and csv files can be found at

[https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/15705 Harvard Dataverse]

I have also downloaded all of them on to the database server and can be found by
cd /bulk/patent

Navigation menu