Changes

Jump to navigation Jump to search
2,993 bytes added ,  13:47, 21 September 2020
no edit summary
{{Project|Has project output=Tool|Has sponsor=McNair ProjectsCenter
|Has title=US Address Verification
|Has owner=Oliver Chang,
|Has start date=June 2017
|Has deadline=June 2017
|Has project status=ActiveComplete
}}
|+Number of Rows
|-
! scope="col" | Table Name !! scope="col" | count(*) !! scope="col" | Descriptionof Tables in <code>patent</code>
|-
| ptoassigneend
| 5,343
| cleaned up complete addresses; too small to be worthwhile
|-
| june_2017_hotfix
| 3,370,613
| quick and dirty fix to get first 5 postcode digits from ptoassigneend_us_cleaned using heuristics (see [[#A_Stopgap_Measure]])
|-
| june_2017_zipcode_join
| 1,365,408
| hack to join zipcodes to patent ids; see <code>E:\McNair\Projects\SimplerPatentData\src\db\ZipcodeMapAssignmentToPatent.sql</code> for methodology
|}
Moreover, selecting from the tables that <code>ptoassigneend_us_cleaned</code> is derived from did not yield this string. Therefore, there is likely an error in the SQL script, perhaps with some wonky copy-pasting from the internet.
 
==== ...this case is hit when postcode is present but none of the other fields are present ====
 
<nowiki>
patent=# select count(*) from ptoassigneend_us_cleaned where postcode_cleaned = E'£\u009B\u0084Ê\u0082Ò£\u009B\u0084Ë\u
0082';
count
---------
3371136
(1 row)
patent=# select count(*) from ptoassigneend_us_cleaned where postcode_cleaned = E'£\u009B\u0084Ê\u0082Ò£\u009B\u0084Ë\u
0082' and postcode is not null;
count
---------
3371136
(1 row)
 
patent=# select count(*) from ptoassigneend_us_cleaned where postcode_cleaned = E'£\u009B\u0084Ê\u0082Ò£\u009B\u0084Ë\u 0082' and postcode is not null and postcode_addr1 is null and postcode_addr2 is null and postcode_city is null;
count
---------
3371136
(1 row)</nowiki>
==== ...easy gains can be gotten as a quick fix ====
==== ...9 digit postcode works ok (two forms) ====
<nowiki>
patent=# select postcode from ptoassigneend_us_cleaned where postcode_cleaned = E'£\u009B\u0084Ê\u0082Ò£\u009B\u0084Ë\u0082' and char_length(postcode) = 9 limit 12;
postcode
==== ...10 digit postcode works great ====
<nowiki>
patent=# select postcode from ptoassigneend_us_cleaned where postcode_cleaned = E'£\u009B\u0084Ê\u0082Ò£\u009B\u0084Ë\u0082' and char_length(postcode) = 10 limit 12;
postcode
</nowiki>
== Conclusion A Stopgap Measure == As a quick and dirty fix, walk through the cases and do the best we can. The script for that is located at <code>E:\McNair\Projects\SimplerPatentData\src\db\hacks\June2017PostcodeHotfix.sql</code> and it creates the following table:  <nowiki>DROP VIEW IF EXISTS june_2017_postcodecode_hotfix;CREATE VIEW june_2017_postcode_hotfix AS SELECT reelno, frameno, name, city_cleaned as city, GodHelpUsAll(postcode_cleaned, postcode) as postcode FROM ptoassigneend_us_cleaned;</nowiki>  <nowiki>patent=# select count(*) from june_2017_postcode_hotfix ; count--------- 3572605(1 row)
patent=# select count(* Consistent Rows:) from june_2017_postcode_hotfix where postcode is not null;* Inconsistent Rows: count--------- 3370613* Uncertain Rows:(1 row)</nowiki>

Navigation menu