Changes

Jump to navigation Jump to search
1,798 bytes added ,  12:11, 25 July 2017
==Name Based Matching geo keys to companybase keys==
Get a list of geokeys and companybasekeys and run them through the [[The Matcher]]. The key is (coname, city, startyear) so you'll need to extract the year from the datefirstinv from the companybasecore table. See below.
DROP TABLE geokeys;
CREATE TABLE geokeys AS
SELECT coname, city, startyear
FROM geocore
WHERE noaddress = 0::boolean;
--33628
\COPY geokeys TO 'geokeys.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
 
CREATE TABLE portcokeysforgeo AS
SELECT coname, city, EXTRACT(YEAR FROM datefirstinv)
FROM companybasecore;
--44740
\COPY portcokeysforgeo TO 'portcokeysforgeo.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
After you run the matcher you will notice there are a ton of matching errors as usual. If you simply import this into your vcdb2 and try joining companybasecore with geocore your tables will start to explode. Notice how the line count jumps from 44,740 to 45,018.
DROP TABLE matcherportcogeo;
CREATE TABLE matcherportcogeo (
portcoconame varchar(255),
portcocity varchar(100),
portcostartyear integer,
geoconame varchar(255),
geocity varchar(100),
geodatefirstyear integer
);
\COPY matcherportcogeo FROM 'matcheroutputportcogeo.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--33608
--try matching companybase to geo through the matcherportcogeo
CREATE TABLE companybasecorejoingeo AS
SELECT c.coname, c.statecode, c.datefirstinv, c.investedk, c.city, c.addr1, c.addr2, g.lattitude, g.longitude
FROM companybasecore c
LEFT JOIN matcherportcogeo AS m ON m.portcoconame = c.coname AND m.portcocity = c.city AND m.portcostartyear = EXTRACT(YEAR FROM
c.datefirstinv)
LEFT JOIN geocore AS g ON g.coname = m.geoconame AND m.geocity = g.city AND m.geodatefirstyear = g.startyear;
--45018
Okay so we need to fix this.
==Gathering geo data from company addresses==

Navigation menu