Changes

Jump to navigation Jump to search
*It no longer uses the AddressBook.pm module.
*If we have a perl module for getting the inventor, why do we not have an inventors table in the database? THIS IS A GOOD QUESTION!
 
For the USPTO Assignment Data, the parsing file is called USPTO_Assignee_XML_parser. It takes the path to the files that need to be parsed (an example I think is ":E/PatentData/Processed/year" where
"year" is the name of the folder you've placed the xml files to be parsed. It iterates through all the files in the "year" directory that you passed. This file directly loads the information into the database
while it parses the file.
5) This parser will open an ODBC (or similar) connection to a database on the RDP's installation of postgres. It will then put the data directly into this database. Once complete. we manually move the tables to the dbase server's database (i.e. patent).
For USPTO Assignment Data, there is a script, under McNair/usptoAssignment, called USPTO_Assignee_Download, which lets a user pass it a text file (file ending in.txt) which contains the url(s) of the assignment data that needs to be downloaded. The script then downloads all the zip files available at that URL. An example called BaseUrls.txt (containing the url that you will probably be using to download the assignment data, unless you're downloading the data from this currrent year, which is in a different link) can be found in McNair/usptoAssignment It then places the downloaded zip files in "E:/McNair/usptoAssigneeData/name", where "name" is the name of the file. If you want to check which files have already been processed, check "McNair/usptoAssigneeData/Finished" to see the finished zip files. (In the future, this should be updated, if possible to specify which years to download, since all assignment data that is not from this current year is under one url, and we've already downloaded most of it.)
 
Then, to parse the actual files, do the following:
 
For the USPTO Assignment Data, the parsing file is called USPTO_Assignee_XML_parser. It takes the path to the files that need to be parsed (an example I think is ":E/PatentData/Processed/year" where
"year" is the name of the folder you've placed the xml files to be parsed. It iterates through all the files in the "year" directory that you passed. This file directly loads the information into the database
while it parses the file.
== Specifications of USPTO Data To Extract ==

Navigation menu