Changes

Jump to navigation Jump to search
Now we need to split the files into individual, valid xml files. To do this:
Move the files to be split into E:/McNair/PatentData/Queue
Go to: E:/McNair/PatentData (this is the version of the splitter to use)
Run the command:
perl splitter.pl
Each file will then be blown out into a directory of xml files in E:/McNair/PatentData/Processed
 
Notes:
*Change line 26 of script to reflect the year (name of the directory that you want to put the split files into the appropriate directory)
this will go ahead and put the files in the directory so you don't have to copy them.
 
=====xmlparser_4.5_4.4_4.3.pl=====
The next step is to parse the actual files. '''Do not use the perl script PatentParser.pl'''. This script is out of date.  Instead go to: E:/McNair/PatentData/Processed  and use the perl script called xmlparser_4.5_4.4_4.3.pl by running (for example): perl xmlparser_4.5_4.4_4.3.pl E:\McNair\PatentData\Processed\2016
For This will load the patent data into the database. To use this parser, you will have to move all the filesyou would like into a directory (ex. 2016, based for the files from year 2016). Then pass the path to the directory to the parser and it will parse the data and load it.Notes:*The parser will open a connection to a database on the existing documentation, RDP's installation of postgres. It will then put the data directly into this database. Once complete. we manually move the tables to the dbase server's database (i.e. patent). *The password hint is ''tsn''. You can run pgAdmin III to connect to it looks like PatentParser.*The default database is called PatentDB*If you want to make a new dbase, found in you'll have to run a sql script to make the tables. It is E:\McNair/\Software/\Scripts/\Patent\createTables.sql*The script populates an inventors table. We may have failed to move this table over to the production dbase server.*If you are updating the data, has make sure that you don't add duplicate records to be run on each the dbase. The easiest way to fix this is to make sure that you don't parse duplicate xml file that was downloaded and unzipped during the previous stepfiles.
(For future updates to During the perl files, we should last update this script so that it can be run on a directory of files like the parser for USPTO assignment data).:*It then stores ipg160322_6745.xml was the parsed xml files all in a text last loaded XML file called "Results.txt" (which It covered <doc-number>09295186</doc-number><kind>B2</kind><date>20160322</date>. I assume will have to be deleted afterward). This script utilizes confirmed that this was the last file in the Claim.pm, Inventor.pm, PatentApplication.pm, and Loader.pm modulesE:\McNair\PatentData\Processed\ipg160322. *It no longer uses I also confirmed that this was the highest patent number with the highest grant data in the AddressBook.pm module. dbase PatentDB*If we have a perl module for getting the inventor, why do we not have an inventors table in I therefore put every folder from ipg160329 to ipg161227 into the database? THIS IS A GOOD QUESTION!E:\McNair\PatentData\Processed\2016 folder
5) This parser will open an ODBC (or similar) connection to a database on the RDP's installation of postgres. It will then put the data directly into this database. Once complete. we manually move the tables to the dbase server's database (i.e. patent).
====For the USPTO Assignment Data====

Navigation menu