Changes

Jump to navigation Jump to search
==== NAME ====
uspto_assignees_XML_parser.plx - Retrieves Parses XML files and parses Whois informationpopulates a database. Specifically, takes parses every file in a file with directory according to a column of domain names and schema (see above).Then populates a database on thecorresponding columns with information from the WhoIs APIRDP.
==== SYNOPSIS ====
==== USAGE & FEATURES ====
 
'''Arguments'''
The full path to directory is provided as a command line argument. It should contain the XML files to parse and no other file.
This path should be specified in Windows format (with '\') and NOT unix format.
 
'''Features and Effects'''
As each XML file is parsed, a database on local host (RDP) is populated. If at any point there is an error, for example a particular
XML file is bad/invalid or the psql statement cannot be executed, the program aborts with a message.
 
We choose to populate local database because remote connections are too slow. The database is eventually moved to DataBase server manually.
 
==== TESTS ====
The first version does the job as expected. It was used to populate the assignees database by parsing XML files from USPTO(see above).
We parsed all XML files dated till 7/4/2016.
 
==== TO DO ====
*Add more command line options to improve usability.
*Improve portability to allow Unix/Linux pathnames. This is straightforward to do with Perl modules File::Basename and File::Spec.

Navigation menu