Difference between revisions of "Shelby Bice (Research Plan)"

From edegan.com
Jump to navigation Jump to search
 
(7 intermediate revisions by one other user not shown)
Line 22: Line 22:
 
* SQL commands that were used to fill database with explanation of what they do
 
* SQL commands that were used to fill database with explanation of what they do
 
* Clear instructions on where to find scripts in bulk drive and an explanation of what each script does
 
* Clear instructions on where to find scripts in bulk drive and an explanation of what each script does
 +
* Visual representation of example table entries that isn't just copied and pasted from a CSV file
  
 
'''Project Pages:'''
 
'''Project Pages:'''
[[Shelby Bice (Redesigning Patent Database)]]
+
[[Redesigning Patent Database]]
  
 
== Log ==
 
== Log ==
 
+
[[Category:Work Log]]
'''2/16/2017''' - Talked over project with Ed, began reading existing wiki pages related to patent data and databases
 
 
 
'''2/21/2017''' - Brushed up on SQL, Entity - Relationship model of designing databases
 
* In the documentation, I want to briefly explain what the entity-relationship model is before including
 
the diagram so that readers have a little bit of background
 
* Found a tool for creating a visual representation called ERDPlus.com - create a standalone instead of an account, can download
 
Learning commands from Patent Data - SQL Steps
 
* copy command is PostgreSQL that copies a SQL table to a text file
 
** DELIMITER set what will separate columns in text file
 
** HEADER specifies that there will be a header in the text file with the names of the columns
 
* Definitely need to include more detail about what these do in the documentation
 
* insert into command inserts a new entry into the table
 
 
 
'''2/23/2017''' - Read great database design article, dug through some more wiki articles, started reviewing Perl
 
* What client do we use to interact with the current patent database?
 
* Will need to determine all the fields that need to be included in the database before finishing the design and ER diagram, will need Ed's input
 
 
 
'''3/2/2017''' - Started compiling a list of what fields to include and how they would be related.
 
* Created an Excel sheet that records the each table, their attributes, their relationship to a patent (i.e. one-to-many, many-to-many, etc.), their primary key, questions I have relating to the table, future steps for cleaning up the data in the table (i.e., once all the data has been move to the new database, removing patents that are not US-based), and current problems that have been recorded with the existing table for that information (if an existing table exists)
 

Latest revision as of 17:17, 21 March 2017

Overview

Overall goals:

  • Create better database that includes all the patent data to which the McNair Center has access.
  • More importantly, create documentation of process so it can improved upon/replicated in the future.

General Outline - updated 2/21/2017

  • Familiarize myself with SQL, Perl, and database design
  • Familiarize myself with existing scripts and schema for existing database
  • Design a better representation for database
  • Fix scripts if necessary
  • Start moving data into new database by querying existing databases (using SQL)
  • Use scripts to query new data
  • Test database
  • Remove extraneous information from database (copies, patents that we're not interested in, etc.)

Documentation I need to include:

  • Schema of new database (with justification of design), would like to include a visual representation
  • SQL commands that were used to fill database with explanation of what they do
  • Clear instructions on where to find scripts in bulk drive and an explanation of what each script does
  • Visual representation of example table entries that isn't just copied and pasted from a CSV file

Project Pages: Redesigning Patent Database

Log