Redesigning Patent Database

From edegan.com
Jump to navigation Jump to search

Documentation on the process and eventual designs for the new patent database.

TODO on 3/7/2017

  • Continue working on Excel spreadhseet analyzing current schema and new schema design (this spreadsheet can be found under Projects/Redesigning Patent Database/Comparing current schema and new schema)



McNair Project
Redesigning Patent Database
Project logo 02.png
Project Information
Project Title Redesigning Patent Database
Owner Shelby Bice
Start Date 201701
Deadline 201705
Keywords Database, Patent
Primary Billing
Notes
Has project status Active
Copyright © 2016 edegan.com. All Rights Reserved.


Redesigning Patent Database

  • Design a better representation for database
  • Fix scripts if necessary
  • Start moving data into new database by querying existing databases (using SQL)
  • Use scripts to query new data
  • Test database
  • Remove extraneous information from database (copies, patents that we're not interested in, etc.)

Documentation I need to include:

  • Schema of new database (with justification of design), would like to include a visual representation
  • SQL commands that were used to fill database with explanation of what they do
  • Clear instructions on where to find scripts in bulk drive and an explanation of what each script does
  • Visual representation of example table entries that isn't just copied and pasted from a CSV file

Documentation Relevant to Current Patent Database

Description

The purpose of this project is to create a new, redesigned database to hold all of the patent information that the McNair Center has accumulated and document the process so that the design can be easily understood and replicated or edited as needed.

This database will include design patents, utility patents, and reissues.

Development

Design will be built upon a relational database model. I will be referencing this article on database design as I develop the design (http://en.tekstenuitleg.net/articles/software/database-design-tutorial/one-to-many.html), and I will be creating an ER diagram.

Current Design and Scripts information

The scripts for querying data for the patent database exist in McNair/software/scripts/patent. If the design for the schema of the database tables are altered too much, may have to write new scripts.

Test Plan

Log:

2/16/2017 - Talked over project with Ed, began reading existing wiki pages related to patent data and databases

2/21/2017 - Brushed up on SQL, Entity - Relationship model of designing databases

  • In the documentation, I want to briefly explain what the entity-relationship model is before including

the diagram so that readers have a little bit of background

  • Found a tool for creating a visual representation called ERDPlus.com - create a standalone instead of an account, can download

Learning commands from Patent Data - SQL Steps

  • copy command is PostgreSQL that copies a SQL table to a text file
    • DELIMITER set what will separate columns in text file
    • HEADER specifies that there will be a header in the text file with the names of the columns
  • Definitely need to include more detail about what these do in the documentation
  • insert into command inserts a new entry into the table

2/23/2017 - Read great database design article, dug through some more wiki articles, started reviewing Perl

  • What client do we use to interact with the current patent database?
  • Will need to determine all the fields that need to be included in the database before finishing the design and ER diagram, will need Ed's input

3/2/2017 - Started compiling a list of what fields to include and how they would be related.

  • Created an Excel spreadsheet that records the each table, their current attributes in the existing patentdata table, what I think the attributes should be in the new table, their relationship to a patent (i.e. one-to-many, many-to-many, etc.), their primary key, questions I have relating to the table, future steps for cleaning up the data in the table (i.e., once all the data has been move to the new database, removing patents that are not US-based), and current problems that have been recorded with the existing table for that information (if an existing table exists)
  • Once Excel spreadsheet is completed (and questions in the Questions column are answered or removed from the spreadsheet entirely) I will look into trying to embed it on my "Redesigning Patent Database" wiki page so that future users can sort of follow my thought process. I will also create separate wiki pages to explain each table once the new database is created
  • Making spreadsheet led me to realize there is some data that is repeated (filedate in fee table when it is also located in patent table, and the fee table includes the patent