Changes

Jump to navigation Jump to search
78 bytes removed ,  13:44, 21 September 2020
no edit summary
The {{Project|Has project output=Data,Tool,Content,Guide|Has sponsor=McNair Center|Has title=Patent Data|Has owner=Marcela Interiano|Has start date=Spring 2016||Has keywords=Patent , Database,Data page |Has project status=Subsume|Due Date=NA}}This project is concerned with maintaining and updating patent data, to enable the McNair Center staff to extract meaningful data for academic papers and reports. Currently, there are two primary sources for instructions this data - the US Patent and Trademark Office as well as the Harvard Dataverse. Data from the LexMachina online database has been added to provide information on how [[Guide to get Patent Litigation (Wiki Page) | patent litigation]]. All the USPTO acquired data is stored in normalized tables to be accessed and modified using PostgreSQL. The patent data, how to use the has been separated into multiple databases based on data source or subject matter. Each database, and consists of several tables for which the documentation of our databaseknown [[Patent Data Issues | issues]] have been recorded.
== ER diagram Data Sources==See Data has been extracted from the '''[http[USPTO Bulk Data]]''', the '''[[Harvard Dataverse]]''', and '''[[Lex Machina]]''', an online patent litigation database. {{#section://wwwUSPTO_Bulk_Data_Processing|bulk}}{{#section:Harvard_Dataverse|dataverse}} The sources used were intended to follow the overall [[Data Model]] established by the McNair Center.edegan.com/wiki/images/0/06/Patent_Data.png ER Diagram]
== Downloading the files Database Specifics==The files ([[Patent|Patent Database]] contains the merged datasets from the USPTO bulk data and Harvard Dataverse using PostgreSQL. Specifics on how the datasets were merged are given in xml format) for granted patent data can be obtained at [https://www[Patent Data Processing - SQL Steps]].googlePatent Database focuses on patents, patent litigation, patent maintenance, patent assignment, and other details on patent owners.com/googlebooks/uspto-The [[USPTOAssigneesData|USPTO Assignees Database (version 2)]] focuses on patent assignments, a transaction between one or more patent owners with one or more parties where ownership or interest in one or more patents-grants-textis assigned or shared. The database consists of historical assignment data provided by the USPTO in XML files.html granted patentSpecifics on how the database are given on the [[USPTO Assignees Data Processing]]Page.
The files for patent application data can be obtained at [https://www.google.com/googlebooks/uspto-patents-applications-text.html patent applications]==Academic Projects==
===[[Little Guy Academic Paper|'Little Guy' Academic Paper]]===The files for maintenance fees data can first application of the refined database will be obtained at the [https[Little Guy Academic Paper]]. {{#section://www.google.com/googlebooks/uspto-patents-maintenance-fees.html maintenance]Little_Guy_Academic_Paper|Little Guy}}===Patent Trolls===
Scripts are available Academic Paper: The patent database will also be used to perform a bulk download explore the existence of all patent trolls and characteristic litigation activity. An academic paper may be developed defining patent trolls and other entities often confused as patent trolls. The data from Lex Machina will be used to track troll behavior and associated outcomes as well as the above files:impact of other patent intermediary and assertion bodies.
[http://www.edegan.com/wiki/index.php/ImageIssue Brief:Applications_download_2001-2004.sh Script to download patent application Based on an analysis of the litigation data from 2001-2004Lex Machina, an issue brief, tentatively titled [[The Truth Behind Patent Trolls Issue Brief| The Truth Behind Patent Trolls]], on patent troll activity may be written to report on how best to curve abuses through [[Innovation Policy| innovation policy]] and reform.
[http://www.edegan.com/wiki/index.php/Image:Applications_download_2005-2015.sh Script to download patent application data from 2005-2015]
[http://www.edegan.com/wiki/index.php/Image:Grant_download_1976<!-2000.sh Script to download patent grant data from 1976-2000] [http://www.edegan.com/wiki/index.php/Image:Grant_download_2001flush -2004.sh Script to download patent grant data from 2001-2004] [http://www.edegan.com/wiki/index.php/Image:Grant_download_2005-2015.sh Script to download patent grant data from 2005-2015] To use the scripts, save the scripts as shell scripts, then either  ~ leodu$ sh Applications_download_2001-2004.sh or first change the script to an executable and execute it  ~ leodu$ chmod a+x Applications_download_2001-2004.sh ~ leodu$ ./Applications_download_2001-2004.sh Notice there will be several hundreds of .zip files of size ~100mb getting downloaded so the process might take a while.When all the files are downloaded, unzip all of them using  ~ leodu$ unzip *.zip ==XML Schema Notes== Tags we are using:*CPC Classification: https://en.wikipedia.org/wiki/Cooperative_Patent_Classification Tags we aren't using:*Kind codes: http://www.uspto.gov/learning-and-resources/support-centers/electronic-business-center/kind-codes-included-uspto-patent*Series codes: http://www.uspto.gov/web/offices/ac/ido/oeip/taf/filingyr.htm == Parsing and Processing the XML files == The ParserSpliter.pl script will first split a large Patent Data XML file into smaller XML files, one for each patent data. And it will then parse and process each Patent Data XML file. Some of the files are somehow mal-formatted, and will be moved to a ./failed_files directory If you add a character anywhere in these files, they somehow become fine to be processed by the script. In order to use this script, you will need to have XML::Simple and Try::Tiny installed. Open up CPAN shell: leodu$ perl -e shell -MCPAN Install: cpan[0]> install XML::Simple cpan[1]> install Try::Tiny cpan[2]> install Switch ==Other Resources==[http://www.uspto.gov/learning-and-resources/xml-resources Documentations for the xml files] [http://www.uspto.gov/learning-and-resources/xml-resources/xml-resources-retrospective See Also] [https://www.w3.org/2000/04/schema_hack/ tool to convert dtd to xsd]

Navigation menu