Difference between revisions of "Patent Design Main Page"

McNair Project
Patent Design Main Page
Project Information
Project Title	Patent Design Main Page
Start Date
Deadline
Keywords	patent
Primary Billing
Notes
Has project status	Active
Is dependent on	Reproducible Patent Data
	Copyright © 2016 edegan.com. All Rights Reserved.

Revision as of 16:19, 2 November 2017

This is the main page for all the research, work, and design that has been put into working with the patent date. Currently, as of Fall 2017, Oliver Chang, Joe Reilly, and Shelby Bice are working on designing new patent and assignment databases (Shelby), creating a new parser and scripts to pull and parse the data (Oliver), and identifying all the paths within the XML files that lead to data that should be included in the databases(all three).

There has been lots of work on storing information about the patents in databases, including methods of cleaning the data, what data should be included, etc. Some of it is obsolete and some of it is incorrect. Generally, the newer pages are going to be the most relevant, but it can be helpful to see what is done in the past, especially since some methodology (like the cleaning the data) hasn't changed that much.

Shelby's Work

Most recent work (Redesigning the whole patent database and assignment database): This project page details my design for a new patent database and a new assignment database. http://mcnair.bakerinstitute.org/wiki/Redesign_Assignment_and_Patent_Database

Older work when we were going to modify the database: http://mcnair.bakerinstitute.org/wiki/Redesigning_Patent_Database It has a lot of information on the methodology we tried to use before to update the database, including our frustration with the perl scripts that pulled and parsed, the data, which eventually led to Oliver Chang writing new scripts for us.

This was done by a former McNair Center intern and subsumed by my current project. It includes information on how the assignment database should be formatted: http://mcnair.bakerinstitute.org/wiki/Patent_Assignment_Data_Restructure

Here are some somewhat outdated pages describing aspects of the patent data:

http://mcnair.bakerinstitute.org/wiki/Patent_Data - this page details the purpose for the patent data and the original sources for the patent data. For the current work being done, we are only pulling data from the USPTO.
http://mcnair.bakerinstitute.org/wiki/Patent - overview of old schema for one of the patent databases. Not very helpful besides seeing what the tables were broken down into
http://mcnair.bakerinstitute.org/wiki/Patent_Data_Processing_-_SQL_Steps - originally, patent data came from two sources - the Harvard Dataverse and the USPTO. Because of this, a lot of work had to be done to merge the two datasets after they had been parsed and format them into tables. This page provides details on that process, including the SQL scripts for creating tables and inserting the patent data into them.
http://mcnair.bakerinstitute.org/wiki/USPTOAssigneesData - old schema for assignment database. This has been subsumed by my current design for an assignment database
http://mcnair.bakerinstitute.org/wiki/Patent_Assignment_Data_Restructure - this details the process for cleaning up patent data and restructuring it (parsing addresses from long address strings, for example)
http://mcnair.bakerinstitute.org/wiki/Patent_Data_Issues - this lists some old issues with the older patent database. Might be a good resource in the future for cleaning the data in the new database, since some of the same problems might arise

Difference between revisions of "Patent Design Main Page"

Revision as of 16:19, 2 November 2017

Shelby's Work

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools

@@ Line 8: / Line 8: @@
 There has been lots of work on storing information about the patents in databases, including methods of cleaning the data, what data should be included, etc. Some of it is obsolete and some of it is incorrect. Generally, the newer pages are going to be the most relevant, but it can be helpful to see what is done in the past, especially since some methodology (like the cleaning the data) hasn't changed that much.
+==Shelby's Work==
+Most recent work (Redesigning the whole patent database and assignment database):
+This project page details my design for a new patent database and a new assignment database.
+http://mcnair.bakerinstitute.org/wiki/Redesign_Assignment_and_Patent_Database
+Older work when we were going to modify the database:
+http://mcnair.bakerinstitute.org/wiki/Redesigning_Patent_Database
+It has a lot of information on the methodology we tried to use before to update the database, including our frustration with the perl scripts that pulled and parsed, the data, which eventually led to Oliver Chang writing new scripts for us.
+This was done by a former McNair Center intern and subsumed by my current project. It includes information on how the assignment database should be formatted:
+http://mcnair.bakerinstitute.org/wiki/Patent_Assignment_Data_Restructure
+Here are some somewhat outdated pages describing aspects of the patent data:
+* http://mcnair.bakerinstitute.org/wiki/Patent_Data - this page details the purpose for the patent data and the original sources for the patent data. For the current work being done, we are only pulling data from the USPTO.
+* http://mcnair.bakerinstitute.org/wiki/Patent - overview of old schema for one of the patent databases. Not very helpful besides seeing what the tables were broken down into
+* http://mcnair.bakerinstitute.org/wiki/Patent_Data_Processing_-_SQL_Steps - originally, patent data came from two sources - the Harvard Dataverse and the USPTO. Because of this, a lot of work had to be done to merge the two datasets after they had been parsed and format them into tables. This page provides details on that process, including the SQL scripts for creating tables and inserting the patent data into them.
+* http://mcnair.bakerinstitute.org/wiki/USPTOAssigneesData - old schema for assignment database. This has been subsumed by my current design for an assignment database
+* http://mcnair.bakerinstitute.org/wiki/Patent_Assignment_Data_Restructure - this details the process for cleaning up patent data and restructuring it (parsing addresses from long address strings, for example)
+* http://mcnair.bakerinstitute.org/wiki/Patent_Data_Issues - this lists some old issues with the older patent database. Might be a good resource in the future for cleaning the data in the new database, since some of the same problems might arise