Difference between revisions of "Oliver Chang (Work Log)"

From edegan.com
Jump to navigation Jump to search
(remove old todo list)
(transition to new format)
Line 27: Line 27:
 
2017-10-25: start ingestion of application xml files and deal with all the bugs which accompany that
 
2017-10-25: start ingestion of application xml files and deal with all the bugs which accompany that
  
2017-10-21: create xml explorer script to mass-inspect xpaths (can be found at <code>E:\McNair\Projects\SimplerPatentData\src\main\java\org\bakerinstitute\mcnair\xml_schema_explorer</code>
+
2017-10-21: create xml explorer script to mass-inspect xpaths (can be found at <code>E:\McNair\Projects\SimplerPatentData\src\main\java\org\bakerinstitute\mcnair\xml_schema_explorer</code>)
  
 
2017-10-20: re-run assignment import, review xpaths
 
2017-10-20: re-run assignment import, review xpaths
Line 46: Line 46:
 
===Summer 2017===  
 
===Summer 2017===  
  
* Aug 4: setup parallel instance python framework for job reporting; begin test run
+
2017-08-04: setup parallel instance python framework for job reporting; begin test run
* Aug 2: finish up some documentation of the code and for the wiki
+
 
* Aug 1: discuss with Abhi & Ed about alternatives to Java port because of algorithmic constants that would be hard to port; run test batches on python with addition of equality operators and convergence early stopping
+
2017-08-02: finish up some documentation of the code and for the wiki
* July 31: sketch out parallel enclosing circle algorithm
+
 
* July 28: field questions and data cleanup questions from Kerda & Joe & Adrian
+
2017-08-01: discuss with Abhi & Ed about alternatives to Java port because of algorithmic constants that would be hard to port; run test batches on python with addition of equality operators and convergence early stopping
* ''travelling''
+
 
* July 19: try to remove duplicated records (esp. those with empty titles) which are preventing the addition of a unique constraint
+
2017-07-31: sketch out parallel enclosing circle algorithm
* July 18: run correspondent join on properties and correspondents table to match previous project; sync with Adrian and Abhi
+
 
* July 17: redo db operations after cleaning up granted patent number bugs
+
2017-07-18: field questions and data cleanup questions from Kerda & Joe & Adrian
* July 13: powwow about parallelizing Enclosing Circle Algorithm; sketch out what to do for the rest of the summer; work more on joins
+
 
* July 12: generate some example data illustrating the difficulty of joining different tables
+
2017-07-19: try to remove duplicated records (esp. those with empty titles) which are preventing the addition of a unique constraint
* July 11: track down some bugs that happen very rarely and were missed in the initial qa phase
+
 
* July 7: catch up on documentation
+
2017-07-18: run correspondent join on properties and correspondents table to match previous project; sync with Adrian and Abhi
* July 6: try (unsuccessfully) to understand docid mapping...create exploration scripts
+
 
* July 5: add invention title to proper grouping of assignment properties; optimize XML parsing
+
2017-07-17: redo db operations after cleaning up granted patent number bugs
* June 30: powwow with James, Abhi, Ed about optimization issues; discuss document ids, X0 etc with Ed; pinpoint issues with APS doc numbers (see Repro Pat Dat#Gotchas) for more info
+
 
* June 29: add logging of copy commands, more chattiness to scripts, debug assignment data failure
+
2017-07-13: powwow about parallelizing Enclosing Circle Algorithm; sketch out what to do for the rest of the summer; work more on joins
* June 28: create examples for expansion to plant, reissue, design patent collection; start optimizing xml
+
 
* June 27: write SQL to replicate assignees, extract postcodes for ongoing projects
+
2017-07-12: generate some example data illustrating the difficulty of joining different tables
* June 26: speedup code, abstract in-memory file splitters to avoid repetition and some weird edge cases
+
 
* June 25: create mappings for APS, assignment properties, XML 2.5 for data import; run data imports for granted data
+
2017-07-11: track down some bugs that happen very rarely and were missed in the initial qa phase
* June 23: cleanup hacky models with a better set of abstractions; cleanup IDE warnings; redefine patent-address mapping
+
 
* June 22: create postcode<->patent table
+
2017-07-07: catch up on documentation
* June 21: document granted patent queries and equivalencies
+
 
* June 20: sketch out APS driver; discuss patent id problem; further document with evidence the zipcode data validity
+
2017-07-06: try (unsuccessfully) to understand docid mapping...create exploration scripts
* June 19: skim address regular expressions; cursory investigation of patent table
+
 
* June 16: create method of getting all data into the database, whether it likes it or not; copy over assignments, granted data using new scheme
+
2017-07-05: add invention title to proper grouping of assignment properties; optimize XML parsing
* June 15: add more robust error reporting, fix race conditions; build out assignment driver; build out fee event driver; add error logging
+
 
* June 14: migrate bulk inserts to copy command; refresh on address data and start in on that; convert processor to multi-threaded application
+
2017-06-30: powwow with James, Abhi, Ed about optimization issues; discuss document ids, X0 etc with Ed; pinpoint issues with APS doc numbers (see Repro Pat Dat#Gotchas) for more info
* June 13: spot check SQL tables; fix broken final case endlessly looping; investigate smarter insert methods
+
 
* June 12: add XML printer, use it to inspect applications; extend BaseScraper to fetch patent application data; add applications documentation to my project page; add CREATE of other tables
+
2017-06-29: add logging of copy commands, more chattiness to scripts, debug assignment data failure
* June 8: add foreign key inserts; create pretty printer for XML analysis
+
 
* June 7: finalize DB abstraction layer; migrate code to bulk inserts; upgrade webserver software and do optimization on RDP postgres with Ed
+
2017-06-28: create examples for expansion to plant, reissue, design patent collection; start optimizing xml
* June 6: add jdbc; create basic schema; add db interaction; schedule meeting for later in the week
+
 
* June 5: look into postgresql; refersh on postgis; add some notes to the Enclosing Circle Algorithm page
+
2017-06-27: write SQL to replicate assignees, extract postcodes for ongoing projects
* June 1: add RDP git remote; add more documentation to wiki page; refactor downloader scripts; start creation of tooling for interacting with data
+
 
* May 31: finish copy-pasting attributes into the wiki page; retroactively fill out work log; meet with Ed to discuss next steps
+
2017-06-26: speedup code, abstract in-memory file splitters to avoid repetition and some weird edge cases
* May 30: update documentation on wiki, restructure large binary files to have more hierarchy instead of a flat listing at the root
+
 
* May 29: expand to APS; expand to raw assignment data
+
2017-06-25: create mappings for APS, assignment properties, XML 2.5 for data import; run data imports for granted data
* May 27: expand to maintenance fee data
+
 
* May 26: create models, translate <code>xmlparser*.pl</code> file into Java; start using builder pattern
+
2017-06-23: cleanup hacky models with a better set of abstractions; cleanup IDE warnings; redefine patent-address mapping
* May 25: sketch out OO design of project; download bulk data
+
 
* May 24: move wiki pages around; start git repository for project
+
2017-06-22: create postcode<->patent table
* May 21: discuss technical details of previous work with Ed
+
 
* May 8: cleanup dead links on wiki and start reading about previous work; discuss current project status with Ed
+
2017-06-29: document granted patent queries and equivalencies
* May 4: setup wiki account, rdp account, database training
+
 
 +
2017-06-20: sketch out APS driver; discuss patent id problem; further document with evidence the zipcode data validity
 +
 
 +
2017-06-19: skim address regular expressions; cursory investigation of patent table
 +
 
 +
2017-06-16: create method of getting all data into the database, whether it likes it or not; copy over assignments, granted data using new scheme
 +
 
 +
 
 +
2017-06-15: add more robust error reporting, fix race conditions; build out assignment driver; build out fee event driver; add error logging
 +
 
 +
2017-06-14: migrate bulk inserts to copy command; refresh on address data and start in on that; convert processor to multi-threaded application
 +
 
 +
2017-06-13: spot check SQL tables; fix broken final case endlessly looping; investigate smarter insert methods
 +
 
 +
2017-06-12: add XML printer, use it to inspect applications; extend BaseScraper to fetch patent application data; add applications documentation to my project page; add CREATE of other tables
 +
 
 +
2017-06-08: add foreign key inserts; create pretty printer for XML analysis
 +
 
 +
2017-06-07: finalize DB abstraction layer; migrate code to bulk inserts; upgrade webserver software and do optimization on RDP postgres with Ed
 +
 
 +
2017-06-06: add jdbc; create basic schema; add db interaction; schedule meeting for later in the week
 +
 
 +
2017-06-05: look into postgresql; refersh on postgis; add some notes to the Enclosing Circle Algorithm page
 +
 
 +
2017-06-01: add RDP git remote; add more documentation to wiki page; refactor downloader scripts; start creation of tooling for interacting with data
 +
 
 +
2017-05-31: finish copy-pasting attributes into the wiki page; retroactively fill out work log; meet with Ed to discuss next steps
 +
 
 +
2017-05-30: update documentation on wiki, restructure large binary files to have more hierarchy instead of a flat listing at the root
 +
 
 +
2017-05-29: expand to APS; expand to raw assignment data
 +
 
 +
2017-05-27: expand to maintenance fee data
 +
 
 +
2017-05-26: create models, translate <code>xmlparser*.pl</code> file into Java; start using builder pattern
 +
 
 +
2017-05-25: sketch out OO design of project; download bulk data
 +
 
 +
2017-05-24: move wiki pages around; start git repository for project
 +
 
 +
2017-05-21: discuss technical details of previous work with Ed
 +
 
 +
2017-05-08: cleanup dead links on wiki and start reading about previous work; discuss current project status with Ed
 +
 
 +
2017-05-04: setup wiki account, rdp account, database training
  
 
[[Category:Work Log]]
 
[[Category:Work Log]]

Revision as of 14:32, 9 November 2017

Projects:

Uploads:

Fall 2017

Oliver Chang Work Logs (log page)

2017-10-25: start ingestion of application xml files and deal with all the bugs which accompany that

2017-10-21: create xml explorer script to mass-inspect xpaths (can be found at E:\McNair\Projects\SimplerPatentData\src\main\java\org\bakerinstitute\mcnair\xml_schema_explorer)

2017-10-20: re-run assignment import, review xpaths

2017-10-19: repopulate patents database and create spreadsheet of assignment equivalencies

2017-10-11: add issues to work on to the PECA wiki page; cleanup PECA git repo; start patent application code from granted patent code and start customizing to new domain

2017-10-03: troubleshoot vc_circles.py and make command line interface a little nicer

2017-10-02: discuss mapping strategies & investigate missing eca data

2017-09-23: make Project/OliverLovesCircles usable and add initial splitting ability

2017-09-22: goal setting & server debugging & meet with Yang


Summer 2017

2017-08-04: setup parallel instance python framework for job reporting; begin test run

2017-08-02: finish up some documentation of the code and for the wiki

2017-08-01: discuss with Abhi & Ed about alternatives to Java port because of algorithmic constants that would be hard to port; run test batches on python with addition of equality operators and convergence early stopping

2017-07-31: sketch out parallel enclosing circle algorithm

2017-07-18: field questions and data cleanup questions from Kerda & Joe & Adrian

2017-07-19: try to remove duplicated records (esp. those with empty titles) which are preventing the addition of a unique constraint

2017-07-18: run correspondent join on properties and correspondents table to match previous project; sync with Adrian and Abhi

2017-07-17: redo db operations after cleaning up granted patent number bugs

2017-07-13: powwow about parallelizing Enclosing Circle Algorithm; sketch out what to do for the rest of the summer; work more on joins

2017-07-12: generate some example data illustrating the difficulty of joining different tables

2017-07-11: track down some bugs that happen very rarely and were missed in the initial qa phase

2017-07-07: catch up on documentation

2017-07-06: try (unsuccessfully) to understand docid mapping...create exploration scripts

2017-07-05: add invention title to proper grouping of assignment properties; optimize XML parsing

2017-06-30: powwow with James, Abhi, Ed about optimization issues; discuss document ids, X0 etc with Ed; pinpoint issues with APS doc numbers (see Repro Pat Dat#Gotchas) for more info

2017-06-29: add logging of copy commands, more chattiness to scripts, debug assignment data failure

2017-06-28: create examples for expansion to plant, reissue, design patent collection; start optimizing xml

2017-06-27: write SQL to replicate assignees, extract postcodes for ongoing projects

2017-06-26: speedup code, abstract in-memory file splitters to avoid repetition and some weird edge cases

2017-06-25: create mappings for APS, assignment properties, XML 2.5 for data import; run data imports for granted data

2017-06-23: cleanup hacky models with a better set of abstractions; cleanup IDE warnings; redefine patent-address mapping

2017-06-22: create postcode<->patent table

2017-06-29: document granted patent queries and equivalencies

2017-06-20: sketch out APS driver; discuss patent id problem; further document with evidence the zipcode data validity

2017-06-19: skim address regular expressions; cursory investigation of patent table

2017-06-16: create method of getting all data into the database, whether it likes it or not; copy over assignments, granted data using new scheme


2017-06-15: add more robust error reporting, fix race conditions; build out assignment driver; build out fee event driver; add error logging

2017-06-14: migrate bulk inserts to copy command; refresh on address data and start in on that; convert processor to multi-threaded application

2017-06-13: spot check SQL tables; fix broken final case endlessly looping; investigate smarter insert methods

2017-06-12: add XML printer, use it to inspect applications; extend BaseScraper to fetch patent application data; add applications documentation to my project page; add CREATE of other tables

2017-06-08: add foreign key inserts; create pretty printer for XML analysis

2017-06-07: finalize DB abstraction layer; migrate code to bulk inserts; upgrade webserver software and do optimization on RDP postgres with Ed

2017-06-06: add jdbc; create basic schema; add db interaction; schedule meeting for later in the week

2017-06-05: look into postgresql; refersh on postgis; add some notes to the Enclosing Circle Algorithm page

2017-06-01: add RDP git remote; add more documentation to wiki page; refactor downloader scripts; start creation of tooling for interacting with data

2017-05-31: finish copy-pasting attributes into the wiki page; retroactively fill out work log; meet with Ed to discuss next steps

2017-05-30: update documentation on wiki, restructure large binary files to have more hierarchy instead of a flat listing at the root

2017-05-29: expand to APS; expand to raw assignment data

2017-05-27: expand to maintenance fee data

2017-05-26: create models, translate xmlparser*.pl file into Java; start using builder pattern

2017-05-25: sketch out OO design of project; download bulk data

2017-05-24: move wiki pages around; start git repository for project

2017-05-21: discuss technical details of previous work with Ed

2017-05-08: cleanup dead links on wiki and start reading about previous work; discuss current project status with Ed

2017-05-04: setup wiki account, rdp account, database training