Tiger Geocoder

From edegan.com
Revision as of 16:18, 31 October 2017 by Peterjalbert (talk | contribs)
Jump to navigation Jump to search


McNair Project
Tiger Geocoder
Tiger.jpg
Project Information
Project Title Tiger Geocoder
Owner Peter Jalbert
Start Date Fall 2017
Deadline
Keywords Tiger, Geocoder, Database
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:

PostGIS Installation Database Server Documentation

The official documentation for using and installing the Tiger Geocoder can be found in the following.

General Instructions Installation Instructions Geocoder Documentation

Location

The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called "coffeeshops" that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then,

psql geocoder


Installation

Install and Nation Data

I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:

--Add Extensions to database
CREATE EXTENSION postgis;
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION postgis_tiger_geocoder;
CREATE EXTENSION address_standardizer;

You can test that the installation worked by running the following query:

SELECT na.address, na.streetname,na.streettypeabbrev, na.zip
	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;

This should return the following:

 address | streetname | streettypeabbrev |  zip
---------+------------+------------------+-------
	   1 | Devonshire | Pl               | 02109

Next, a new profile needs to be created by using the following command.

INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, 
		   loader, environ_set_command, county_process_command)
SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,
	   loader, environ_set_command, county_process_command
  FROM tiger.loader_platform
  WHERE os = 'sh';

The installation instructions also provide the following note:

As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.

If you would like this feature, you can enable it by using the following command. This should be done before loading the script.

UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';

The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:

export PGHOST=localhost                       +
export PGUSER=postgres                        +
export PGPASSWORD=yourpasswordhere            +
export PGDATABASE=geocoder                    +
PSQL=${PGBIN}/psql                            +
SHP2PGSQL=shp2pgsql                           +
cd ${staging_fold}                            +

TMPDIR="${staging_fold}/temp/"                +
UNZIPTOOL=unzip                               +
WGETTOOL="/usr/bin/wget"                      +
export PGBIN=/usr/lib/postgresql/9.6/bin      +
export PGPORT=5432                            +
export PGHOST=localhost                       +
export PGUSER=postgres                        +
export PGPASSWORD=yourpasswordhere            +
export PGDATABASE=geocoder                    +
PSQL=${PGBIN}/psql                            +
SHP2PGSQL=shp2pgsql                           +
cd ${staging_fold}

Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:

/gisdata

There needs to be a directory called "temp" in the gisdata directory. To make the script, use the following from the command line:

psql -c "SELECT Loader_Generate_Nation_Script('test')" -d databasename -tA > /gisdata/nation_script_load.sh

This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths.

Change directories:

cd /gisdata

Edit the script using your favorite command line text editor. Specifically, edit the following fields.

PGUSER=postgres
PGPASSWORD=(Ask Anne for this password)!

Everything else remains the same.

Run the script by using: 
sh nation_script_load.sh

Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.

State Data

==