Difference between revisions of "Oliver Chang (Work Log)"
(26 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ''Projects'': | |
− | |||
− | |||
− | |||
− | '' | ||
− | |||
− | |||
+ | * [[PostGIS_Installation|PostGIS Installation]] | ||
* [[Reproducible_Patent_Data|Reproducible Patent Data]] | * [[Reproducible_Patent_Data|Reproducible Patent Data]] | ||
* [[Patent_Validity_Ideas_for_ML|Predictive Patent Validity Machine Learning Ideas]] | * [[Patent_Validity_Ideas_for_ML|Predictive Patent Validity Machine Learning Ideas]] | ||
Line 12: | Line 7: | ||
* [[US_Address_Verification|US Address Verification]] | * [[US_Address_Verification|US Address Verification]] | ||
* [[GPU_Build|GPU Computer Build]] | * [[GPU_Build|GPU Computer Build]] | ||
− | + | * [[Parallel_Enclosing_Circle_Algorithm|Parallel Enclosing Circle Algorithm]] | |
+ | * [[Python_on_the_RDP|Python on the RDP]] | ||
+ | * [[Hierarchical_Clustering|Hierarchical Clustering]] | ||
''Uploads'': | ''Uploads'': | ||
+ | * [http://mcnair.bakerinstitute.org/wiki/File:Path-example.PNG File:Path-example.PNG] | ||
+ | ** Screenshot of how to set the path environment variable | ||
* [[File:PADX-File-Description-v2_Hague.pdf]] | * [[File:PADX-File-Description-v2_Hague.pdf]] | ||
** Describes patent kind codes (notably, what the hell X0 represents) | ** Describes patent kind codes (notably, what the hell X0 represents) | ||
Line 25: | Line 24: | ||
** Describes the algorithm used to calculate the check digit | ** Describes the algorithm used to calculate the check digit | ||
+ | ===Fall 2017=== | ||
+ | <onlyinclude> | ||
+ | |||
+ | [[Oliver Chang]] [[Work Logs]] [[Oliver Chang (Work Log)|(log page)]] | ||
+ | |||
+ | 2017-12-02: communicated results of running ECA on 2 stddev table (not the error source); update db and web server software; check if scipy elision is ECA bug (it is not) | ||
+ | |||
+ | 2017-12-01: re-tasked Kyran with the hierarchical clustering algorithm implementation; create 2 standard deviations tables; freed up DB space | ||
+ | |||
+ | 2017-11-30: stub out implementation, add parsing code and mapping code...just need the meat of the algorithm now | ||
+ | |||
+ | 2017-11-27: documentation & bug finding on the parallel enclosing circle project; research hierarchical linkage approach | ||
+ | |||
+ | 2017-11-14: hand off work on xpathing to Shelby; walked through some program design decisions | ||
+ | |||
+ | 2017-11-13: finish javadoc of common/ and some trickier parts about downloading; added descriptions, results to one-off java/python scripts so that they actually make sense in context | ||
+ | |||
+ | 2017-11-10: add javadoc documentation to patent reproducibility project after forgetting half of the stuff myself | ||
+ | |||
+ | 2017-10-25: start ingestion of application xml files and deal with all the bugs which accompany that | ||
+ | |||
+ | 2017-10-21: create xml explorer script to mass-inspect xpaths (can be found at <code>E:\McNair\Projects\SimplerPatentData\src\main\java\org\bakerinstitute\mcnair\xml_schema_explorer</code>) | ||
+ | |||
+ | 2017-10-20: re-run assignment import, review xpaths | ||
+ | |||
+ | 2017-10-19: repopulate patents database and create spreadsheet of assignment equivalencies | ||
+ | |||
+ | 2017-10-11: add issues to work on to the PECA wiki page; cleanup PECA git repo; start patent application code from granted patent code and start customizing to new domain | ||
+ | |||
+ | 2017-10-03: troubleshoot vc_circles.py and make command line interface a little nicer | ||
+ | |||
+ | 2017-10-02: discuss mapping strategies & investigate missing eca data | ||
+ | |||
+ | 2017-09-23: make Project/OliverLovesCircles usable and add initial splitting ability | ||
+ | |||
+ | 2017-09-22: goal setting & server debugging & meet with Yang | ||
+ | </onlyinclude> | ||
+ | |||
+ | ===Summer 2017=== | ||
+ | |||
+ | 2017-08-04: setup parallel instance python framework for job reporting; begin test run | ||
+ | |||
+ | 2017-08-02: finish up some documentation of the code and for the wiki | ||
+ | |||
+ | 2017-08-01: discuss with Abhi & Ed about alternatives to Java port because of algorithmic constants that would be hard to port; run test batches on python with addition of equality operators and convergence early stopping | ||
+ | |||
+ | 2017-07-31: sketch out parallel enclosing circle algorithm | ||
+ | |||
+ | 2017-07-18: field questions and data cleanup questions from Kerda & Joe & Adrian | ||
+ | |||
+ | 2017-07-19: try to remove duplicated records (esp. those with empty titles) which are preventing the addition of a unique constraint | ||
+ | |||
+ | 2017-07-18: run correspondent join on properties and correspondents table to match previous project; sync with Adrian and Abhi | ||
+ | |||
+ | 2017-07-17: redo db operations after cleaning up granted patent number bugs | ||
+ | |||
+ | 2017-07-13: powwow about parallelizing Enclosing Circle Algorithm; sketch out what to do for the rest of the summer; work more on joins | ||
+ | |||
+ | 2017-07-12: generate some example data illustrating the difficulty of joining different tables | ||
+ | |||
+ | 2017-07-11: track down some bugs that happen very rarely and were missed in the initial qa phase | ||
+ | |||
+ | 2017-07-07: catch up on documentation | ||
+ | |||
+ | 2017-07-06: try (unsuccessfully) to understand docid mapping...create exploration scripts | ||
+ | |||
+ | 2017-07-05: add invention title to proper grouping of assignment properties; optimize XML parsing | ||
+ | |||
+ | 2017-06-30: powwow with James, Abhi, Ed about optimization issues; discuss document ids, X0 etc with Ed; pinpoint issues with APS doc numbers (see Repro Pat Dat#Gotchas) for more info | ||
+ | |||
+ | 2017-06-29: add logging of copy commands, more chattiness to scripts, debug assignment data failure | ||
+ | |||
+ | 2017-06-28: create examples for expansion to plant, reissue, design patent collection; start optimizing xml | ||
+ | |||
+ | 2017-06-27: write SQL to replicate assignees, extract postcodes for ongoing projects | ||
+ | |||
+ | 2017-06-26: speedup code, abstract in-memory file splitters to avoid repetition and some weird edge cases | ||
+ | |||
+ | 2017-06-25: create mappings for APS, assignment properties, XML 2.5 for data import; run data imports for granted data | ||
+ | |||
+ | 2017-06-23: cleanup hacky models with a better set of abstractions; cleanup IDE warnings; redefine patent-address mapping | ||
+ | |||
+ | 2017-06-22: create postcode<->patent table | ||
+ | |||
+ | 2017-06-29: document granted patent queries and equivalencies | ||
+ | |||
+ | 2017-06-20: sketch out APS driver; discuss patent id problem; further document with evidence the zipcode data validity | ||
+ | |||
+ | 2017-06-19: skim address regular expressions; cursory investigation of patent table | ||
+ | |||
+ | 2017-06-16: create method of getting all data into the database, whether it likes it or not; copy over assignments, granted data using new scheme | ||
+ | |||
+ | 2017-06-15: add more robust error reporting, fix race conditions; build out assignment driver; build out fee event driver; add error logging | ||
+ | |||
+ | 2017-06-14: migrate bulk inserts to copy command; refresh on address data and start in on that; convert processor to multi-threaded application | ||
+ | |||
+ | 2017-06-13: spot check SQL tables; fix broken final case endlessly looping; investigate smarter insert methods | ||
+ | |||
+ | 2017-06-12: add XML printer, use it to inspect applications; extend BaseScraper to fetch patent application data; add applications documentation to my project page; add CREATE of other tables | ||
+ | |||
+ | 2017-06-08: add foreign key inserts; create pretty printer for XML analysis | ||
+ | |||
+ | 2017-06-07: finalize DB abstraction layer; migrate code to bulk inserts; upgrade webserver software and do optimization on RDP postgres with Ed | ||
+ | |||
+ | 2017-06-06: add jdbc; create basic schema; add db interaction; schedule meeting for later in the week | ||
+ | |||
+ | 2017-06-05: look into postgresql; refersh on postgis; add some notes to the Enclosing Circle Algorithm page | ||
+ | |||
+ | 2017-06-01: add RDP git remote; add more documentation to wiki page; refactor downloader scripts; start creation of tooling for interacting with data | ||
+ | |||
+ | 2017-05-31: finish copy-pasting attributes into the wiki page; retroactively fill out work log; meet with Ed to discuss next steps | ||
+ | |||
+ | 2017-05-30: update documentation on wiki, restructure large binary files to have more hierarchy instead of a flat listing at the root | ||
− | + | 2017-05-29: expand to APS; expand to raw assignment data | |
− | + | 2017-05-27: expand to maintenance fee data | |
− | + | 2017-05-26: create models, translate <code>xmlparser*.pl</code> file into Java; start using builder pattern | |
− | |||
− | |||
− | * | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | 2017-05-25: sketch out OO design of project; download bulk data | |
− | + | 2017-05-24: move wiki pages around; start git repository for project | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | 2017-05-21: discuss technical details of previous work with Ed | |
− | + | 2017-05-08: cleanup dead links on wiki and start reading about previous work; discuss current project status with Ed | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | 2017-05-04: setup wiki account, rdp account, database training | ||
[[Category:Work Log]] | [[Category:Work Log]] |
Latest revision as of 23:19, 2 December 2017
Projects:
- PostGIS Installation
- Reproducible Patent Data
- Predictive Patent Validity Machine Learning Ideas
- Equivalent XPath and APS Queries
- US Address Verification
- GPU Computer Build
- Parallel Enclosing Circle Algorithm
- Python on the RDP
- Hierarchical Clustering
Uploads:
- File:Path-example.PNG
- Screenshot of how to set the path environment variable
- File:PADX-File-Description-v2 Hague.pdf
- Describes patent kind codes (notably, what the hell X0 represents)
- File:PatentFullTextAPSDoc GreenBook pgs13-22.pdf
- Describes the fields in APS, their supposed character lengths, and if they are required/optional
- File:Aps-wku-modulus11.pdf
- Describes the layout of the check digit on magnetic tape
- File:Mod-11-algorithm.pdf
- Describes the algorithm used to calculate the check digit
Fall 2017
Oliver Chang Work Logs (log page)
2017-12-02: communicated results of running ECA on 2 stddev table (not the error source); update db and web server software; check if scipy elision is ECA bug (it is not)
2017-12-01: re-tasked Kyran with the hierarchical clustering algorithm implementation; create 2 standard deviations tables; freed up DB space
2017-11-30: stub out implementation, add parsing code and mapping code...just need the meat of the algorithm now
2017-11-27: documentation & bug finding on the parallel enclosing circle project; research hierarchical linkage approach
2017-11-14: hand off work on xpathing to Shelby; walked through some program design decisions
2017-11-13: finish javadoc of common/ and some trickier parts about downloading; added descriptions, results to one-off java/python scripts so that they actually make sense in context
2017-11-10: add javadoc documentation to patent reproducibility project after forgetting half of the stuff myself
2017-10-25: start ingestion of application xml files and deal with all the bugs which accompany that
2017-10-21: create xml explorer script to mass-inspect xpaths (can be found at E:\McNair\Projects\SimplerPatentData\src\main\java\org\bakerinstitute\mcnair\xml_schema_explorer
)
2017-10-20: re-run assignment import, review xpaths
2017-10-19: repopulate patents database and create spreadsheet of assignment equivalencies
2017-10-11: add issues to work on to the PECA wiki page; cleanup PECA git repo; start patent application code from granted patent code and start customizing to new domain
2017-10-03: troubleshoot vc_circles.py and make command line interface a little nicer
2017-10-02: discuss mapping strategies & investigate missing eca data
2017-09-23: make Project/OliverLovesCircles usable and add initial splitting ability
2017-09-22: goal setting & server debugging & meet with Yang
Summer 2017
2017-08-04: setup parallel instance python framework for job reporting; begin test run
2017-08-02: finish up some documentation of the code and for the wiki
2017-08-01: discuss with Abhi & Ed about alternatives to Java port because of algorithmic constants that would be hard to port; run test batches on python with addition of equality operators and convergence early stopping
2017-07-31: sketch out parallel enclosing circle algorithm
2017-07-18: field questions and data cleanup questions from Kerda & Joe & Adrian
2017-07-19: try to remove duplicated records (esp. those with empty titles) which are preventing the addition of a unique constraint
2017-07-18: run correspondent join on properties and correspondents table to match previous project; sync with Adrian and Abhi
2017-07-17: redo db operations after cleaning up granted patent number bugs
2017-07-13: powwow about parallelizing Enclosing Circle Algorithm; sketch out what to do for the rest of the summer; work more on joins
2017-07-12: generate some example data illustrating the difficulty of joining different tables
2017-07-11: track down some bugs that happen very rarely and were missed in the initial qa phase
2017-07-07: catch up on documentation
2017-07-06: try (unsuccessfully) to understand docid mapping...create exploration scripts
2017-07-05: add invention title to proper grouping of assignment properties; optimize XML parsing
2017-06-30: powwow with James, Abhi, Ed about optimization issues; discuss document ids, X0 etc with Ed; pinpoint issues with APS doc numbers (see Repro Pat Dat#Gotchas) for more info
2017-06-29: add logging of copy commands, more chattiness to scripts, debug assignment data failure
2017-06-28: create examples for expansion to plant, reissue, design patent collection; start optimizing xml
2017-06-27: write SQL to replicate assignees, extract postcodes for ongoing projects
2017-06-26: speedup code, abstract in-memory file splitters to avoid repetition and some weird edge cases
2017-06-25: create mappings for APS, assignment properties, XML 2.5 for data import; run data imports for granted data
2017-06-23: cleanup hacky models with a better set of abstractions; cleanup IDE warnings; redefine patent-address mapping
2017-06-22: create postcode<->patent table
2017-06-29: document granted patent queries and equivalencies
2017-06-20: sketch out APS driver; discuss patent id problem; further document with evidence the zipcode data validity
2017-06-19: skim address regular expressions; cursory investigation of patent table
2017-06-16: create method of getting all data into the database, whether it likes it or not; copy over assignments, granted data using new scheme
2017-06-15: add more robust error reporting, fix race conditions; build out assignment driver; build out fee event driver; add error logging
2017-06-14: migrate bulk inserts to copy command; refresh on address data and start in on that; convert processor to multi-threaded application
2017-06-13: spot check SQL tables; fix broken final case endlessly looping; investigate smarter insert methods
2017-06-12: add XML printer, use it to inspect applications; extend BaseScraper to fetch patent application data; add applications documentation to my project page; add CREATE of other tables
2017-06-08: add foreign key inserts; create pretty printer for XML analysis
2017-06-07: finalize DB abstraction layer; migrate code to bulk inserts; upgrade webserver software and do optimization on RDP postgres with Ed
2017-06-06: add jdbc; create basic schema; add db interaction; schedule meeting for later in the week
2017-06-05: look into postgresql; refersh on postgis; add some notes to the Enclosing Circle Algorithm page
2017-06-01: add RDP git remote; add more documentation to wiki page; refactor downloader scripts; start creation of tooling for interacting with data
2017-05-31: finish copy-pasting attributes into the wiki page; retroactively fill out work log; meet with Ed to discuss next steps
2017-05-30: update documentation on wiki, restructure large binary files to have more hierarchy instead of a flat listing at the root
2017-05-29: expand to APS; expand to raw assignment data
2017-05-27: expand to maintenance fee data
2017-05-26: create models, translate xmlparser*.pl
file into Java; start using builder pattern
2017-05-25: sketch out OO design of project; download bulk data
2017-05-24: move wiki pages around; start git repository for project
2017-05-21: discuss technical details of previous work with Ed
2017-05-08: cleanup dead links on wiki and start reading about previous work; discuss current project status with Ed
2017-05-04: setup wiki account, rdp account, database training