Changes

Redesigning Patent Database (view source)

Revision as of 18:40, 14 April 2017

1,067 bytes added , 18:40, 14 April 2017

→‎Current Design and Scripts Documentation

Now we need to split the files into individual, valid xml files. To do this:

Move the files to be split into E:/McNair/PatentData/Queue

Go to: E:/McNair/PatentData (this is the version of the splitter to use)

Run the command:

perl splitter.pl

Each file will then be blown out into a directory of xml files in E:/McNair/PatentData/Processed

Notes:

*Change line 26 of script to reflect the year (name of the directory that you want to put the split files into the appropriate directory)

this will go ahead and put the files in the directory so you don't have to copy them.

=====xmlparser_4.5_4.4_4.3.pl=====

The next step is to parse the actual files. '''Do not use the perl script PatentParser.pl'''. This script is out of date. Instead go to: E:/McNair/PatentData/Processed and use the perl script called xmlparser_4.5_4.4_4.3.pl by running (for example): perl xmlparser_4.5_4.4_4.3.pl E:\McNair\PatentData\Processed\2016

~~For~~ This will load the ~~patent~~ data into the database. To use this parser, you will have to move all the filesyou would like into a directory (ex. 2016, ~~based~~ for the files from year 2016). Then pass the path to the directory to the parser and it will parse the data and load it.Notes:*The parser will open a connection to a database on the ~~existing documentation,~~ RDP's installation of postgres. It will then put the data directly into this database. Once complete. we manually move the tables to the dbase server's database (i.e. patent). *The password hint is ''tsn''. You can run pgAdmin III to connect to it ~~looks like PatentParser~~.*The default database is called PatentDB*If you want to make a new dbase, ~~found in~~ you'll have to run a sql script to make the tables. It is E:\McNair/\Software/\Scripts/\Patent\createTables.sql*The script populates an inventors table. We may have failed to move this table over to the production dbase server.*If you are updating the data, ~~has~~ make sure that you don't add duplicate records to ~~be run on each~~ the dbase. The easiest way to fix this is to make sure that you don't parse duplicate xml ~~file that was downloaded and unzipped during the previous step~~files.

~~(For future updates to~~ During the ~~perl files, we should~~ last update ~~this script so that it can be run on a directory of files like the parser for USPTO assignment data).~~:*~~It then stores~~ ipg160322_6745.xml was the ~~parsed xml files all in a text~~ last loaded XML file ~~called "Results~~.~~txt" (which~~ It covered <doc-number>09295186</doc-number><kind>B2</kind><date>20160322</date>. I ~~assume will have to be deleted afterward). This script utilizes~~ confirmed that this was the last file in the ~~Claim.pm, Inventor.pm, PatentApplication.pm, and Loader.pm modules~~E:\McNair\PatentData\Processed\ipg160322. *~~It no longer uses~~ I also confirmed that this was the highest patent number with the highest grant data in the ~~AddressBook.pm module.~~ dbase PatentDB*~~If we have a perl module for getting the inventor, why do we not have an inventors table in~~ I therefore put every folder from ipg160329 to ipg161227 into the ~~database? THIS IS A GOOD QUESTION!~~E:\McNair\PatentData\Processed\2016 folder

5) This parser will open an ODBC (or similar) connection to a database on the RDP's installation of postgres. It will then put the data directly into this database. Once complete. we manually move the tables to the dbase server's database (i.e. patent).

====For the USPTO Assignment Data====

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,658

edits

Changes

Redesigning Patent Database (view source)

Revision as of 18:40, 14 April 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools