Changes

Jump to navigation Jump to search
ptoassigneend_us_identify4 | 38
ptoassigneend_us_temp5 | 376797
----------------------------------------------------|-------------
ptoassigneend_us_identify_subtotal | 3195769
----------------------------------------------------|-------------
ptoassigneend_us_candid2 (postcode is clean) | 184123
Union ptoassigneend_us_identify(0-4) to get ptoassigneend_us_identify_subtotal (3195769)with clean city, state and postcode. 10.5% left in ptoassigneend_us_temp5.
Table "public.ptoassigneend_us_identify_subtotal"
postcode_cleaned | text |
* ptoassigneend_us_candid1 is a subset of ptoassigneend_us_temp5. It contains clean city and state info, but postcode is missing. 6.7% data left in ptoassigneend_us_temp6.
ptoassigneend_us_candid1 is a subset of ptoassigneend_us_temp5. It doesn't contain clean postcode info, but contains clean city and state info. 6.7% data left in ptoassigneend_us_temp6. * ptoassigneend_us_candid2 is also a subset of ptoassigneend_us_temp5. It contains clean postcode info. 5.0% data left in ptoassigneend_us_temp7. I randomly checked the city_extracted in ptoassigneend_us_candid2, and it is quite clean actually. But Since these cities don't exist in ptoassigneend_us_citylist2, so we have no idea how to identify clean records. Maybe we can restrict the length of records to filter out clean city.
Note:
About 60 records are missing. For example, the # of records in ptoassigneend_us_temp + # of records in ptoassigneend_us_identify0 != # ptoassigneend_allus.

Navigation menu