Difference between revisions of "Patent Data"
Line 3: | Line 3: | ||
|Has owner=Marcela Interiano | |Has owner=Marcela Interiano | ||
|Has start date=Spring 2016 | |Has start date=Spring 2016 | ||
− | |Has keywords=Patent, Database | + | ||Has keywords=Patent, Database,Data |
|Has project status=Subsume | |Has project status=Subsume | ||
|Due Date=NA | |Due Date=NA |
Revision as of 13:59, 22 March 2017
Patent Data | |
---|---|
Project Information | |
Project Title | Patent Data (Wiki Page) |
Owner | Marcela Interiano |
Start Date | Spring 2016 |
Deadline | |
Keywords | Patent, Database, Data |
Primary Billing | |
Notes | |
Has project status | Subsume |
Copyright © 2016 edegan.com. All Rights Reserved. |
This project is concerned with maintaining and updating patent data, to enable the McNair Center staff to extract meaningful data for academic papers and reports. Currently, there are two primary sources for this data - the US Patent and Trademark Office as well as the Harvard Dataverse. Data from the LexMachina online database has been added to provide information on patent litigation. All the acquired data is stored in normalized tables to be accessed and modified using PostgreSQL. The patent data has been separated into multiple databases based on data source or subject matter. Each database consists of several tables for which the known issues have been recorded.
Contents
Data Sources
Data has been extracted from the USPTO Bulk Data, the Harvard Dataverse, and Lex Machina, an online patent litigation database. The USPTO provides bulk data recording patent transactions, applications, properties, reassignments, and history through XML files to the general public. These files have been downloaded and the data has been compiled in tables using PostgreSQL. The objective of processing the bulk data is to enhance the McNair Center's historical datasets (patent_2015 and patentdata) and track the entirety of US patent activity, specifically concerning utility patents.
The Harvard Dataverse provides clean versions of the U.S. utility patent datasets spanning 1975-2010. The data is post author disambiguation. The sources used were intended to follow the overall Data Model established by the McNair Center.
Database Specifics
The Patent Database contains the merged datasets from the USPTO bulk data and Harvard Dataverse using PostgreSQL. Specifics on how the datasets were merged are given in Patent Data Processing - SQL Steps. Patent Database focuses on patents, patent litigation, patent maintenance, patent assignment, and other details on patent owners. The USPTO Assignees Database (version 2) focuses on patent assignments, a transaction between one or more patent owners with one or more parties where ownership or interest in one or more patents is assigned or shared. The database consists of historical assignment data provided by the USPTO in XML files. Specifics on how the database are given on the USPTO Assignees Data Processing Page.
Academic Projects
'Little Guy' Academic Paper
The first application of the refined database will be the Little Guy Academic Paper.
Patent Trolls
Academic Paper: The patent database will also be used to explore the existence of patent trolls and characteristic litigation activity. An academic paper may be developed defining patent trolls and other entities often confused as patent trolls. The data from Lex Machina will be used to track troll behavior and associated outcomes as well as the impact of other patent intermediary and assertion bodies.
Issue Brief: Based on an analysis of the litigation data from Lex Machina, an issue brief, tentatively titled The Truth Behind Patent Trolls, on patent troll activity may be written to report on how best to curve abuses through innovation policy and reform.