Now we need to split the files into individual, valid xml files. To do this:
Move the files to be split into E:/McNair/PatentData/Queue
Run the command:
Each file will then be blown out into a directory of xml files in E:/McNair/PatentData/Processed
The next step is to parse the actual files.
For the patent data files, based on the existing documentation, it looks like PatentParser, found in McNair /Software /Scripts /Patent, has to be run on each xml file that was downloaded and unzipped during the previous step.
(For future updates to the perl files, we should update this script so that it can be run on a directory of files like the parser for USPTO assignment data).* It then stores the parsed xml files all in a text file called "Results. txt" (which I assume will have to be deleted afterward). This script utilizes the Claim.pm, Inventor.pm, PatentApplication.pm, and Loader.pm modules. * It no longer uses the AddressBook.pm module. * If we have a perl module for getting the inventor, why do we not have an inventors table in the database? THIS IS A GOOD QUESTION!
5) This parser will open an ODBC (or similar) connection to a database on the RDP's installation of postgres. It will then put the data directly into this database. Once complete. we manually move the tables to the dbase server's database (i.e. patent).
====For the USPTO Assignment Data====