Changes

Redesigning Patent Database (view source)

Revision as of 22:17, 11 April 2017

417 bytes added , 22:17, 11 April 2017

Note the zip files should appear briefly (sequentially) in E:/McNair/Software/Scripts/Patent before disappearing and reappearing unzipped in E:/McNair/PatentData

3b) Now we need to split the files into individual, valid xml files. To do this:

Move the files to be split into E:/McNair/PatentData/Queue

Run the command:

perl splitter.pl

Each file will then be blown out into a directory of xml files in E:/McNair/PatentData/Processed

4) The next step would be to parse the actual files.

For the patent data files, based on the existing documentation, it looks like PatentParser, found in McNair/Software/Scripts/Patent, has to be run on each xml file that was downloaded and unzipped during the previous step. *It then stores the parsed xml files all in a text file called "Results.txt" (which I assume will have to be deleted afterward). This script utilizes the Claim.pm, Inventor.pm, PatentApplication.pm, and Loader.pm modules. *It nolonger uses the AddressBook.pm module. *If we have a perl module for getting the inventor, why do we not have an inventors table in the database?*THIS IS A GOOD QUESTION!

For the USPTO Assignment Data, the parsing file is called USPTO_Assignee_XML_parser. It takes the path to the files that need to be parsed (an example I think is ":E/PatentData/Processed/year" where

while it parses the file.

5) ~~For patent~~ This parser will open an ODBC (or similar) connection to a database on the RDP's installation of postgres. It will then put the data~~, I assume~~ directly into this database. Once complete. we manually move the ~~next step would be~~ tables to ~~create a table from~~ the ~~text file - possibly using CreateTables~~ dbase server's database (~~a PostgreSQL file~~i.e. patent).

== Specifications of USPTO Data To Extract ==

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,658

edits

Changes

Redesigning Patent Database (view source)

Revision as of 22:17, 11 April 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools