Changes

Jump to navigation Jump to search
postcode_cleaned | text |
* ====Output: ptoassigneend_us_candid1 is a subset of ptoassigneend_us_temp5. It contains clean city and state info, but postcode is missing. ====
* ptoassigneend_us_candid2 is also a subset One problem of records in ptoassigneend_us_temp5. It contains clean is that the postcode info, but city and state are not identifiedis missing.
I randomly checked the city_extracted in ptoassigneend_us_candid2, and it ptoassigneend_us_candid1 is quite a subset of ptoassigneend_us_temp5. It contains clean. Since these city records may not be accurateand state info, such as Oklahama City, we have no idea how to identify clean records. Maybe we can restrict the length of records to filter out clean citybut postcode is missing.
NoteSQL code: CREATE TABLE ptoassigneend_us_candid1 AS SELECT * FROM ptoassigneend_us_temp5 WHERE city_extracted2 IN ( SELECT citylist FROM ptoassigneend_us_citylist2) AND state IS NOT NULL AND state != ''; SELECT 136958
Remaining records are in table ptoassigneend_us_temp6 (SELECT 239837). ====Output: ptoassigneend_us_candid2====ptoassigneend_us_candid2 is also a subset of ptoassigneend_us_temp5. It contains clean postcode info, but city and state are not identified.  I randomly checked the city_extracted in ptoassigneend_us_candid2, and it is quite clean. Some city records are misspelt, such as 'Oklahama City'. We may identify clean city based on the length of records.  Note: About 60 records are missing. For example, the # of records in ptoassigneend_us_temp + # of records in ptoassigneend_us_identify0 != # ptoassigneend_allus.

Navigation menu