2017-09-24T04:49:37Z

OliverC:

[[Oliver Chang]] [[Work Logs]]
[[Oliver Chang (Work Log)|(logpage)]]

''To-do List'':

* Expand XPath use in the patent data
* Edit to include Application data
* Finish ID joining
* Look into NIH document similarity algorithm
* Sysadminy stuff

''Projects'':

* [[Reproducible_Patent_Data|Reproducible Patent Data]]
* [[Patent_Validity_Ideas_for_ML|Predictive Patent Validity Machine Learning Ideas]]
* [[Equivalent_XPath_and_APS_Queries|Equivalent XPath and APS Queries]]
* [[US_Address_Verification|US Address Verification]]
* [[GPU_Build|GPU Computer Build]]
* [[Parallel_Enclosing_Circle_Algorithm|Parallel Enclosing Circle Algorithm]]

''Uploads'':

* [[File:PADX-File-Description-v2_Hague.pdf]]
** Describes patent kind codes (notably, what the hell X0 represents)
* [[File:PatentFullTextAPSDoc_GreenBook_pgs13-22.pdf]]
** Describes the fields in APS, their supposed character lengths, and if they are required/optional
* [[File:Aps-wku-modulus11.pdf]]
** Describes the layout of the check digit on magnetic tape
* [[File:Mod-11-algorithm.pdf]]
** Describes the algorithm used to calculate the check digit

== Day-by-Day (in reverse chronological order) ==

=== September 2017 ===

* Sept 23: make Project/OliverLovesCircles usable and add initial splitting ability
* Sept 22: goal setting & server debugging & meet with Yang

=== August 2017 ===

* Aug 4: setup parallel instance python framework for job reporting; begin test run
* Aug 2: finish up some documentation of the code and for the wiki
* Aug 1: discuss with Abhi & Ed about alternatives to Java port because of algorithmic constants that would be hard to port; run test batches on python with addition of equality operators and convergence early stopping

=== July 2017 ===

* July 31: sketch out parallel enclosing circle algorithm
* July 28: field questions and data cleanup questions from Kerda & Joe & Adrian
* ''travelling''
* July 19: try to remove duplicated records (esp. those with empty titles) which are preventing the addition of a unique constraint
* July 18: run correspondent join on properties and correspondents table to match previous project; sync with Adrian and Abhi
* July 17: redo db operations after cleaning up granted patent number bugs
* July 13: powwow about parallelizing Enclosing Circle Algorithm; sketch out what to do for the rest of the summer; work more on joins
* July 12: generate some example data illustrating the difficulty of joining different tables
* July 11: track down some bugs that happen very rarely and were missed in the initial qa phase
* July 7: catch up on documentation
* July 6: try (unsuccessfully) to understand docid mapping...create exploration scripts
* July 5: add invention title to proper grouping of assignment properties; optimize XML parsing

=== June 2017 ===

* June 30: powwow with James, Abhi, Ed about optimization issues; discuss document ids, X0 etc with Ed; pinpoint issues with APS doc numbers (see Repro Pat Dat#Gotchas) for more info
* June 29: add logging of copy commands, more chattiness to scripts, debug assignment data failure
* June 28: create examples for expansion to plant, reissue, design patent collection; start optimizing xml
* June 27: write SQL to replicate assignees, extract postcodes for ongoing projects
* June 26: speedup code, abstract in-memory file splitters to avoid repetition and some weird edge cases
* June 25: create mappings for APS, assignment properties, XML 2.5 for data import; run data imports for granted data
* June 23: cleanup hacky models with a better set of abstractions; cleanup IDE warnings; redefine patent-address mapping
* June 22: create postcode<->patent table
* June 21: document granted patent queries and equivalencies
* June 20: sketch out APS driver; discuss patent id problem; further document with evidence the zipcode data validity
* June 19: skim address regular expressions; cursory investigation of patent table
* June 16: create method of getting all data into the database, whether it likes it or not; copy over assignments, granted data using new scheme
* June 15: add more robust error reporting, fix race conditions; build out assignment driver; build out fee event driver; add error logging
* June 14: migrate bulk inserts to copy command; refresh on address data and start in on that; convert processor to multi-threaded application
* June 13: spot check SQL tables; fix broken final case endlessly looping; investigate smarter insert methods
* June 12: add XML printer, use it to inspect applications; extend BaseScraper to fetch patent application data; add applications documentation to my project page; add CREATE of other tables
* June 8: add foreign key inserts; create pretty printer for XML analysis
* June 7: finalize DB abstraction layer; migrate code to bulk inserts; upgrade webserver software and do optimization on RDP postgres with Ed
* June 6: add jdbc; create basic schema; add db interaction; schedule meeting for later in the week
* June 5: look into postgresql; refersh on postgis; add some notes to the Enclosing Circle Algorithm page
* June 1: add RDP git remote; add more documentation to wiki page; refactor downloader scripts; start creation of tooling for interacting with data

=== May 2017 ===

* May 31: finish copy-pasting attributes into the wiki page; retroactively fill out work log; meet with Ed to discuss next steps
* May 30: update documentation on wiki, restructure large binary files to have more hierarchy instead of a flat listing at the root
* May 29: expand to APS; expand to raw assignment data
* May 27: expand to maintenance fee data
* May 26: create models, translate <code>xmlparser*.pl</code> file into Java; start using builder pattern
* May 25: sketch out OO design of project; download bulk data
* May 24: move wiki pages around; start git repository for project
* May 21: discuss technical details of previous work with Ed
* May 8: cleanup dead links on wiki and start reading about previous work; discuss current project status with Ed
* May 4: setup wiki account, rdp account, database training

[[Category:Work Log]]

PostGIS Installation

2017-09-22T22:39:10Z

OliverC: /* Bulk Download TIGER Shapefiles */

PostGIS Installation

2017-09-22T22:33:59Z

OliverC: /* Bulk Download TIGER Shapefiles */