|Has title=Redesigning Patent Database
|Has owner=Shelby Bice
|Has keywords=Database, Patent
|Has notes= |Is dependent on= |Depends upon it=|Has project status= Active
Documentation on the process and eventual designs for the new patent database. Not to be confused with "Patent Data Restructure", which deals with condensing and cleaning USPTO assignees data.
Deliverables by end of semester: Design and build new patent database with core tables from data, add tables from Patent Data Restructure and Small Inventors Project, write thorough documentation on schema of new patent database and instructions on what documentation should be written when it is altered
TODO on 3/28/2017 * Find all SQL and documentation related to merging Harvard and USPTO data * Start documentation for the "core" tables of the databases that should not be deleted. * Make ER Diagram for " core" tables in database * Start documentation for how one should go about changing/altering the " core" tables
* Start documentation for how new tables should be added to database
== '''Redesigning Patent Database''' ==
*[[Patent Data Restructure]]
*[[Small Inventors Project]] - uses Fee Status and Citations
*[[Medical Centers and Grants]] - uses patent assignees, specifically their zipcodes and organizations
'''Previous documentation on the patent database:'''
'''As of 3/21/2017 the most up-to-date database containing patent data is "patent" not "allpatent" or "allpatent_clone" and "patent" is the database that the the other patent data redesign project, Restructuring Patent Data (link above) is working with. '''
http://mcnair.bakerinstitute.org/wiki/Patent_Data_(Wiki_Page) Patent Data] - overview of what the data is and where it came from, probably starting point for changing documentation
http://mcnair.bakerinstitute.org/wiki/Patent Patent Database] - overview of schema of database (specifically, the database patent, which includes data from Harvard dataverse (originally stored in patentdata) and USPTO (patent_2015)
http://mcnair.bakerinstitute.org/wiki/USPTOAssigneesData USPTO Assignees Database] - enhances assignee info in patent database, also being redesigned
http://mcnair.bakerinstitute.org/wiki/Patent_Data_Issues Problems with Patent Database] - lists issues with current schema
http://mcnair.bakerinstitute.org/wiki/Data_Model Previous ER Diagram] - does not match up with schema described in [ http://mcnair.bakerinstitute.org/wiki/Patent Patent Database] and contains outdated list of what we want to pull from XML files
http://mcnair.bakerinstitute.org/wiki/Patent_Data_Processing_-_SQL_Steps Processing Patent Data] - states that allpatent is the newest database and an amalgamation of patentdata or patent_2015
== Description ==
== Development ==
Design will be built upon a relational database model. I will be referencing this article on database design as I develop the design (http://en.tekstenuitleg.net/articles/software/database-design-tutorial/one-to-many.html), and I will be creating an ER diagram using [https://erdplus.com/#/standalone ERDPlus] or [https://creately.com/app/?tempID=hqdgwjki1&login_type=demo# Creately].
== Current Design and Scripts Documentation ==
The following pages are relevant to how previous databases are built/how to build tables in the database:
http://mcnair.bakerinstitute.org/wiki/Harvard_Dataverse_Data Harvard Dataverse Data] - explains how to make tables from Harvard Dataverse data , where to find scripts , etc .
[http:// mcnair.bakerinstitute.org/ wiki/USPTO_Bulk_Data_Processing USPTO Data] - explains how to make tables from USPTO data, where to find scripts, etc, specifically for assignment data.
[http://mcnair. bakerinstitute. org/wiki/Patent_Data_Extraction_Scripts_( Tool) Patent Data Extraction] - explains locations of XML files and lists (at the bottom) where the Perl scripts can be found
[http: //mcnair. bakerinstitute. org/wiki/Patent_Data_Cleanup_-_June_2016 Patent Data Cleanup] - explains changes that were made to clean up problems in the database allpatent as a result of merging the Harvard Dataverse data and the USPTO data
[http:// mcnair. bakerinstitute. org/wiki/Patent_Data_Processing_-_SQL_Steps Patent Data Processing - SQL Steps] - explains SQL needed to merge two existing databases, one that contained the Harvard Dataverse data and one that contained the USPTO data
== Specifications of USPTO Data To Extract ==
Go to https://bulkdata.uspto.gov/ to bulk data from USPTO.
To see a description of what each file the USPTO bulk data contains, go to the bulk drive and navigate to McNair/Projects/Redesigning Patent Database/2017BulkDataProductDescriptions .
== Test Plan ==
* Some tables that will later be deleted were included on the spreadsheet because their are currently being tables built to replace them
* May try to just move all the (twenty-something) "pto-" tables that have been created due to the "Restructuring Patent Data" project from "patent" to the new database
* Will work on understanding SQL for filling new database from this link next week
http://mcnair.bakerinstitute.org/wiki/Patent_Data_Processing_-_SQL_Steps and http: // mcnair.bakerinstitute. org/wiki/Patent_Data_Cleanup_-_June_2016