Changes

Reproducible Patent Data (view source)

Revision as of 00:30, 1 June 2017

200 bytes added , 00:30, 1 June 2017

no edit summary

}}

A continuation of [[Redesigning Patent Database]] that aims to write faster, more centralized code to deal with data from the United States Patent and Trademark Office (USPTO ~~data~~). By having an end-to-end pipeline we can easily reproduce or update data without worrying about unintentional side effects or missing data. Currently, it succeeds in bulk downloading from the USPTO; streaming file splitting, that is, splitting large concatenated files into their component parts in-memory; and parsing of XML to Java objects, APS to Java Maps, and maintenance fee data to Java objects.

== Progress ==

# <del>Splitter</del> ''done''

# <del>Parser</del> ''done''

# ~~Data Source Merger (''only USPTO'' not Harvard Dataverse or Lex Machina currently)~~Create tooling for minions# Setup PostgreSQL JDBC# Create naive schema based on previous approaches# Create new data structures

# Database Insert (modify <code>models/</code> files with some mapping to database fields)

# Data Cleanup (reference [[Patent_Assignment_Data_Restructure|Marcela and Sonia's work]])

# Setup pipeline script to complete all of these steps in series

# Data Source Merger (''only USPTO granted, maintfee, assignment'' not USPTO applications or Harvard Dataverse or Lex Machina currently)

== Directory Layout ==

All of the information for this project is located at <code>E:\McNair\Projects\SimplerPatentData</code>

There are ~~three~~ four interesting directories:

* <code>data/downloads/</code> is USPTO bulkdata, unmodified straight from the scraper

|January 1976 to December 2001

|APS

|~~Yes (syntactic parsing but little semantic knowledge)~~Only syntax

|-

|<del>January 2001 to December 2001</del>

|January 2002 to December 2004

|XML Version 2.5

|NoOnly syntax

|-

|January 2005 to December 2005

OliverC

Bots, Bureaucrats, Administrators (Semantic MediaWiki), Administrators

329

edits

Changes

Reproducible Patent Data (view source)

Revision as of 00:30, 1 June 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools