The data is described in a USPTO Economic Working Paper by Marco, Myers, Graham and others: https://www.uspto.gov/sites/default/files/documents/USPTO_Patents_Assignment_Dataset_WP.pdf
==Pre-load checks==
The data is large. We don't have space on the main dbase server for it.
df -h
/dev/nvme1n1p2 235G 208G 15G 94% /var/postgresql
Note: To check dbase space usage on the dbase server see [[Posgres_Server_Configuration#Size.2C_Backup_.26_Restore]].
The postgres dbase on the RDP, however, currently has more than 300Gb free and is on a solid state drive, so its performance should be acceptable.
==Getting the data==
The data is available pre-processed (see the working paper) from https://bulkdata.uspto.gov/#addt. Specifically, download csv.zip (1284462233, 2017-03-28 15:47) from https://bulkdata.uspto.gov/data/patent/assignment/economics/2016/
The load script is:
LoadUSPTOPAD.sql
To get the data into ASCII or ASCII, move it to the dbase server then:
*Check its encoding using:
file -i Car.java
*Convert it to UTF-8 using (the TRANSLIT option approximates characters that can't be directly encoded)