Changes

Jump to navigation Jump to search
Since the city list is not long, I briefly cleaned the list by hand, and stored it in ptoassigneend_us_citylist2.
 
* Actually, we can buy a full list of U.S. cities online: https://www.uscitieslist.org/
Then, similar to Section 4.2, identify clean data that meets all the requirements: postcode_extracted with 5-4 or 5 digits, state not null or not spaces, and city_extracted in ptoasigneend_us_citylist2.
postcode_extracted | text |
city_extracted | text |
 
Remaining records are in table ptoassigneend_us_temp4.
 
One reason for the REMAINING records is that the postcode is missing.
 
So relax the requirements for postcode, and we get clean data (city that is a city, state that is a state) stored in ptoassigneend_us_identify4.
CREATE TABLE ptoassigneend_us_identify4 AS
SELECT *
FROM ptoassigneend_us_temp4
WHERE city_extracted IN (
SELECT citylist
FROM ptoassigneend_us_citylist2) AND
state IS NOT NULL AND state != '';
SELECT 136958
 
Remaining records are in table ptoassigneend_us_temp5 (SELECT 239875).

Navigation menu