Changes

Jump to navigation Jump to search
5,106 bytes added ,  12:56, 7 August 2018
==GeoCoding Companies, Firms, and Branch Offices==
A helpful page here is the [[Geocode.py]] page which explains how to use the Geocoding script. You will have to tweak the Geocode script when geocoding as each of these tables has a different primary key. It is vital that you include the primary keys in the file you input and output from the Geocoding script. Without these, you will not be able to join the latitudes and longitudes back to the firm, branch office, or company base tables.
 
Geocoding costs money since we are using the Google Maps API. The process doesn't cost much, but in order to save money I tried to salvage as many of the preexisting geocode information I could find.
===Companies===
I found the table of old companies with latitudes and longitudes in vcdb2 and loaded these into vcdb3.
 
DROP TABLE oldgeocords;
CREATE TABLE oldgeocords (
coname varchar(255),
statecode varchar(10),
datefirstinv date,
ivestedk real,
city varchar(255),
addr1 varchar(255),
addr2 varchar(100),
latitude numeric,
longitude numeric
);
\COPY oldgeocords FROM 'companybasegeomaster.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--44740
 
The API occasionally will give erroneous latitude and longitude readings. In order to catch only the good ones, I found the latitude and longitude lines that encompass the mainland US and created an exclude flag to make sure companies were in this box. I then created flags to include companies in Puerto Rico, Hawaii, and Alaska.
 
DROP TABLE geoallcoords;
CREATE TABLE geoallcoords AS
SELECT *, CASE
WHEN longitude < -125 OR longitude > -66 OR latitude < 24 OR latitude > 50 OR latitude IS NULL OR longitude IS NULL THEN 1::int ELSE
0::int END AS excludeflag FROM oldgeocords;
--44740
 
DROP TABLE geoallcoords1;
CREATE TABLE geoallcoords1 AS SELECT
*, CASE WHEN statecode='PR' AND latitude!=44.9331 AND longitude!=7.54012 THEN 1::int ELSE 0::int END as prflag,
CASE WHEN statecode='HI' AND latitude!=44.9331 AND longitude!=7.54012 THEN 1::int ELSE 0::int END as hiflag,
CASE WHEN statecode='AK' AND latitude!=44.9331 AND longitude!=7.54012 THEN 1::int ELSE 0::int END as akflag
FROM geoallcoords;
--44740
 
I then included only companies that were either in the mainland US, Hawaii, Alaska, or Puerto Rico.
 
DROP TABLE goodgeoold;
CREATE TABLE goodgeoold AS SELECT
A.*, B.latitude, B.longitude, B.prflag, B.excludeflag, B.hiflag, B.akflag FROM companybasecore AS A LEFT JOIN geoallcoords1 AS B ON
A.coname=B.coname AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv WHERE excludeflag=0 or prflag=1 or hiflag=1 or akflag=1;
--38498
 
I then found the remaining companies that needed to be geocoded. Only companies that have addresses listed are able to be accurately geocoded. If we attempt to geocode based on city, the location returned will simply be the center of the city. Thus, I chose the companies that we did not already have and had a valid address.
 
DROP TABLE remaininggeo;
CREATE TABLE remaininggeo AS SELECT A.coname, A.statecode, A.datefirstinv, A.addr1, A.addr2, A.city, A.zip FROM companybasecore AS A LEFT JOIN goodgeoold AS B ON A.coname=B.coname
AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv
WHERE B.coname IS NULL AND A.addr1 IS NOT NULL;
--5955
 
\COPY remaininggeo TO 'RemainingGeo.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--5955
 
I copied this table into excel to concatenate the address, city, state, and zipcode columns into one column. This can and should be done in SQL, but I was not aware this could be done. I then ran remaininggeo through the Geocode script with columns coname, statecode, datefirstinv, and address.
 
DROP TABLE remaining;
CREATE TABLE remaining (
coname varchar(255),
statecode varchar(10),
datefirstinv date,
latitude numeric,
longitude numeric
);
\COPY remaining FROM 'RemainingLatLong.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--5955
 
I then ran the same geographical checks on the newly geocoded companies and found all of the good geocodes.
 
DROP TABLE geoallcoords2;
CREATE TABLE geoallcoords2 AS
SELECT *, CASE
WHEN longitude < -125 OR longitude > -66 OR latitude < 24 OR latitude > 50 OR latitude IS NULL OR longitude IS NULL THEN 1::int ELSE
0::int END AS excludeflag FROM remaining;
--5955
 
DROP TABLE geoallcoords3;
CREATE TABLE geoallcoords3 AS
SELECT *, CASE WHEN statecode='PR' AND latitude!=44.9331 AND longitude!=7.54012 THEN 1::int 0::int END as prflag,
CASE WHEN statecode='HI' AND latitude!=44.9331 AND longitude!=7.54012 THEN 1::int ELSE 0::int END as hiflag,
CASE WHEN statecode='AK' AND latitude!=44.9331 AND longitude!=7.54012 THEN 1::int ELSE 0::int END as akflag
FROM geoallcoords2;
--5955
 
DROP TABLE goodgeonew;
CREATE TABLE goodgeonew AS
SELECT A.*, B.latitude, B.longitude, B.prflag, B.excludeflag, B.hiflag, B.akflag FROM companybasecore AS A LEFT JOIN geoallcoords3 AS B ON
A.coname=B.coname AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv WHERE excludeflag=0 or prflag=1 or hiflag=1 or akflag=1;
--5913
 
I then combined the old and new geocodes and matched them back to the company base table to get a geo table for companies.
 
DROP TABLE geocodesportco;
CREATE TABLE geocodesportco AS SELECT
A.* from goodgeonew
UNION
SELECT B.* from goodgeoold;
--44411
 
DROP TABLE portcogeo;
CREATE TABLE portcogeo AS SELECT
A.*, B.latitude, B.longitude FROM companybasecore AS A LEFT JOIN Geocodesportco AS B ON A.coname=B.coname AND A.datefirstinv=B.datefirstinv AND A.statecode=B.statecode;
--48001
158

edits

Navigation menu