Changes

Jump to navigation Jump to search
no edit summary
{{Project|Has project output=Data|Has sponsor=McNair ProjectsCenter
|Has title=USPTO Patent Assignment Dataset
|Has owner=Ed Egan,
LoadUSPTOPAD.sql
To get the data into UTF-8ASCII or ASCII, move it to the dbase server then:
*Check its encoding using:
file -i Car.java
*Convert it to UTF-8 using (the TRANSLIT option approximates characters that can't be directly encoded)
iconv -f oldformat -t UTF-8//TRANSLIT file -o outfile
*A bash script *The sc options forces iconv to ignore bad chars and move on: iconv -sc -f oldformat -t UTF-8//TRANSLIT file -o outfile*Bash scripts to do all of the csvs is in Z:\USPTO_assigneesdata; make it them executable and then run itwhichever you need
chmod +x encoding.sh
./encoding.sh
*Note that the final source encoding was Win1252 and the final target encoding was ASCII
*All bar three of the files had to be manually fixed to remove errors. Final files are in E:\McNair\Projects\USPTO Patent Assignment Dataset

Navigation menu