Edit Project: USPTO Patent Assignment Dataset

You do not have permission to edit this page, for the following reason:

The action you have requested is limited to users in one of the groups: Users, team.

Has image:
Has title:
Has owner:
Has start date:
Has deadline date:
Has keywords:
Has project output:	Tool Data Content How-to Guide
Has project status:
Is dependent on:
Does subsume:
Has sponsor:
Has file locations:

Free text:

This project describes the build out and basic use of the USPTO Assignment Dataset. The data, scripts, etc. are in: E:\McNair\Projects\USPTO Patent Assignment Dataset The data is described in a USPTO Economic Working Paper by Marco, Myers, Graham and others: https://www.uspto.gov/sites/default/files/documents/USPTO_Patents_Assignment_Dataset_WP.pdf ==Pre-load checks== The data is large. We don't have space on the main dbase server for it. df -h /dev/nvme1n1p2 235G 208G 15G 94% /var/postgresql Note: To check dbase space usage on the dbase server see [[Posgres_Server_Configuration#Size.2C_Backup_.26_Restore]]. The postgres dbase on the RDP, however, currently has more than 300Gb free and is on a solid state drive, so its performance should be acceptable. ==Getting the data== The data is available pre-processed (see the working paper) from https://bulkdata.uspto.gov/#addt. Specifically, download csv.zip (1284462233, 2017-03-28 15:47) from https://bulkdata.uspto.gov/data/patent/assignment/economics/2016/ The load script is: LoadUSPTOPAD.sql To get the data into ASCII or ASCII, move it to the dbase server then: *Check its encoding using: file -i Car.java *Convert it to UTF-8 using (the TRANSLIT option approximates characters that can't be directly encoded) iconv -f oldformat -t UTF-8//TRANSLIT file -o outfile **The sc options forces iconv to ignore bad chars and move on: iconv -sc -f oldformat -t UTF-8//TRANSLIT file -o outfile *Bash scripts to do all of the csvs is in Z:\USPTO_assigneesdata; make them executable and then run whichever you need chmod +x encoding.sh ./encoding.sh *Note that the final source encoding was Win1252 and the final target encoding was ASCII *All bar three of the files had to be manually fixed to remove errors. Final files are in E:\McNair\Projects\USPTO Patent Assignment Dataset

Summary:

This is a minor edit Watch this page

Cancel

Edit Project: USPTO Patent Assignment Dataset

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools