Changes

Jump to navigation Jump to search
9,642 bytes added ,  11:40, 21 October 2019
no edit summary
[[Category: McNair Admin]]==PostgreSQL on edegan.com==
==Configuring We have a copy of dedicated Postgres on Windows==server available for use by interns, affiliates, and researchers. It is available by SSH directly and through the RDP.
Add PostgreSQL ===Connecting to the path if it isn't already: Control Panel->System->Advanced->Environmental Variables Add: C:\Program Files\PostgreSQL\9.0\bindbase server===
or from To connect through the RDP get a cmd window copy of PuTTY (though sometimes this doesn't stickput PuTTY.exe on your desktop)from: SET PATH=%PATH%;CE:\Program Files\PostgreSQL\9.0\bin;/installs
To use PLPerl on windows, The preferred way to connect is from the RDP where you will need to be careful to mix and match can stay inside the right versions of PostgreSQL private network, and Perlhave gigabit connection speeds. Connect to: researcher@192.168.2. 92
Perl version 5.10.1 (a 64bit build It is available possible to connect from [httpother machines over the Internet. You shouldn't do this unless you have to but in this case connect via SSH to://www reseacher@199.activestate188.com/activeperl/downloads ActiveState]) works with PostgreSQL 9177.0215 or researcher@ssh.1 (64bit build 1500)edegan. To see your PostgreSQL version type:com
psql All of the data files (tab-c "select version(delimited text);" template1that need to be loaded in and out of a dbase for your project should be stored in: /bulk/YourDbase
Note: To see make your perl version typelife easy, map the database's bulk drive on your RDP account. [[Help:Access_RDP_Sever#Mapping the Database Server as Z|Follow the instructions]] to do this. We refer to the database server's bulk drive as either dbase/bulk or as Z:, as this is the drive letter most commonly mapped to.
perl -v With these two versions together you should be able to add plperl to template1 (which all new dbs will inherit) ==Working with the command:  createlang plperl template1 There is a [http://www.postgresql.org/docs/7.3/static/reference-client.html list of commands/client applications], with links to documentation, which is useful. You will almost surely want to 'performance tune' your postgresql database, as the default settings are near useless. In particular edit postgresql.conf (which is in the data directory of your install) to change:psql==
shared_buffers = 512MB #Use about 10-15% of available RAM, After you have ssh'd onto the server change directory to a max of 512Mb on windowsyour data directory: effective_cache_size = 3GB #Use half to 3cd /bulk/4 of available RAM, depending on your pref work_mem = 128MB # This is the memory allocated to each query sort maintenance_work_mem = 256MB #This is for vacuum, and a max of 256 is recommended.yourdir
Note that with ''work_mem'' this is the allocation to each sort. Each query you run may do many sorts and you may have many users, so this can explode quickly. 128Mb is an aggresive setting that assumes only Then connect a single user. See the [httpdatabase://wiki.postgresql.org/wiki/Performance_Optimization various documentation resources], especially the [http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server official performance optimization page] for more information. psql DBName
Restart the server Note: to use a local copy of psql (on windows use if you have it installed locally), connect using the 'services' control panel) for many changes to take effectusername researcher and DBname: psql -h dbase.edegan.com -U researcher dbname
You MUST store all of your SQL commands in a file named yourfilename.sql that is stored in: pg_ctl restartE:/projects/YourProject/
Create There are NO EXCEPTIONS to this. All of your code must go into a user using pgAdmin or the createuser command:.sql file. Even exploratory code. You can copy out of there line by line to run code.
createuser ed_egan===Useful PostgreSQL commands===
And then create a database again using pgAdmin or the createdb commandUseful commands are: \q Quits createdb -O ed_egan DBName ==Postgres.Haas== We have a new dedicated Postgres server \l List all dbases available for use by PhD Students and Faculty only. Access to the server is available from within Haas or over the VPN. To get an account see Ed Egan (in F533). Ed can either set you up with SSH access so that you can create an account on the database server and as many databases as you like, or he can create an account on the database server, and a blank database owner solely by your account, for you.username Once you have a username and password you and a blank database, can use ''psql'' on your local computer (available from [http://psql \i basics.sourceforge.net/]) to connect to the database server as detailed below. You will probably find that this gives you everything you need - you can load data, perform queries, and produce output datasets for analysis (for example in STATA)sql Run script basics. sql \dt List tablesWithout an SSH account you may not be able to use \d tablename Shows the backup and restore command-line tools. We are currently experimenting with ways to overcome this if necessary. Please ask for more information. If you are comfortable using SSH then all advanced functionality will be available, including schema of the ability to mount your R drive using the commands:table mkdir /mnt/username \r Reset the query buffer mount Ctrl-t cifs //bear/username/ /mnt/ed -o user=username c Abort the current query q Go back to the prompt when viewing a datasetCurrently we are running Postgres version 8.1 on this server. An upgrade is being considered, but the core functionality is more than adequate for our purposes. One useful function that is missing is \COPY Psql'''unnest''', which was introduced in s version 8.4. However, this can be created manually with [http://wiki.postgresql.org/wiki/Array_Unnest details here]. of copy (See below)
===Dumping and Restoring a Database===
This can be done in pgAdmin in Windows, but the commands (needed on Linux with SSH access) are:
pg_restore -d DBName db.backup
==Working with psql==We typically use compression (Postgres' format custom), so the best command is: pg_dump -Fc dbase > dbase_fc.dump pg_restore -Fc -d DBName db.backup
To connect your psql client to All backups are stored in /bulk/backups. You can drop a db type (for localhost don't specify the host)database after it has been dumped with: psql -h host DBNamedropdb dbase
Therefore to connect to PostgresTo selectively restore a single table use the -t tablename option.Haas you will typeSee https: psql //www.postgresql.org/docs/9.2/static/app-h postgres.haaspgrestore.berkeleyhtml for the other options.edu DBName
The 'new' postgres server is currently on 128.32.252.201. Connect using your username and DBname (firstname_data by default): psql -h 128.32.252.201 -U username dbname==SQL Commands==
 Useful commands are: \q Quits \i basics.sql Run script basics.sql \dt List tables ===CREATE, DROP, \COPY Psql's version of copy \r Reset the query buffer Ctrl-c Abort the current query ==SQL Commands==
There is a list of [http://www.postgresql.org/docs/7.3/static/sql-commands.html SQL commands] that may help.
DROP TABLE tablename;
CREATE TABLE tablename (
field1 varchar(100),
field2 int,
field3 date,
field4 real
);
 
Functions can be written in Perl, Python and other languages. See below for more information.
 
CREATE FUNCTION getreal (text) RETURNS real AS $$
if ($_[0]=~/^\d{1,}\.\d{0,}$/) { return $_[0]; }
DROP Function correctyear(int,int);
'''Do not do any of the following:'''  Populate data with COPY, INSERT and UPDATE:
INSERT INTO tablename VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
COPY tablename FROM '/home/user/weather.txt'; --http://www.postgresql.org/docs/8.4/interactive/sql-copy.html
UPDATE tablename SET kind = 'Dramatic' WHERE kind = 'Drama';
(SELECT last_name, first_name FROM salesmen
WHERE salesmen.id = accounts.sales_id);
 
'''Instead, always build stack of tables using:'''
 
CREATE TABLE tablename AS
SELECT * FROM tablename WHERE ...;
DROP TABLE tablename;
 
Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS.
 
Load using:
<nowiki>\COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV</nowiki>
 
Unload (copy to txt file) using:
<nowiki>\COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV</nowiki>
 
===SELECT===
Retrieve results with SELECT:
SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'tablename';
===ALTER===
 
'''DON'T DO THIS. CREATE A NEW TABLE INSTEAD!'''
 
Change a table with ALTER:
ALTER TABLE tablename ADD COLUMN colname real;
ALTER TABLE tablename RENAME COLUMN product_no TO product_number;
===EXPLAIN===
 
Find out how a query will be executed with EXPLAIN (a Postgre command):
EXPLAIN ANALYZE SELECT * FROM x;
 
===CREATE OR DROP INDEX===
 
If the dbase is large or you just need things to run faster, add an index to your key fields.
 
CREATE UNIQUE INDEX title_idx ON films (title);
CREATE INDEX title_idx ON films (title);
DROP INDEX title_idx;
 
See https://www.postgresql.org/docs/9.5/static/sql-createindex.html for more options
 
===SEQUENCES===
 
If you want to create a sequence:
CREATE SEQUENCE serial START 101;
 
To use the sequence call:
nextval('serial');
 
==Perl Functions==
 
NOTE: Perl and Python Functions only work the the dbase server, not the RDP (where perl has a dependency error in plperl.dll and python has unknown issues).
 
PLPerl was installed into Template1 (and hence all new databases) when the server was first set up.
 
An example perl function is:
<nowiki>
CREATE OR REPLACE FUNCTION getint (text) RETURNS int AS $$
if ($_[0]) {
my $var=$_[0];
if ($var=~/^\d\d\d\d\d\d\d+$/) {
return 1;
}
return undef;
}
return undef;
$$ LANGUAGE plperl;
</nowiki>
 
==Python Functions==
 
[[Installing python in a database]] - if Python is not already installed
 
Source: https://www.postgresql.org/docs/9.4/static/plpython-funcs.html
 
To get into database via terminal:
1) ssh researcher@ssh.edegan.com
2) cd \folder_name
3) psql database_name
 
Creating Functions:
CREATE FUNCTION pymax (a integer, b integer)
RETURNS integer
AS $$
if a > b:
return a
return b
$$ LANGUAGE plpythonu;
 
==PostGIS Resources==
 
See:
*http://postgis.net/features/
*https://www.census.gov/geo/maps-data/data/tiger-line.html
*https://www.census.gov/geo/maps-data/data/tiger.html
*https://en.wikipedia.org/wiki/GIS_file_formats
 
===Useful PostGIS functions for spatial joins===
 
'''sum(expression)''': aggregate to return a sum for a set of records
'''count(expression)''': aggregate to return the size of a set of records
'''ST_Area(geometry)''' returns the area of the polygons
'''ST_AsText(geometry)''' returns WKT text
'''ST_Buffer(geometry, distance)''': For geometry: Returns a geometry that represents all points whose distance from this Geometry is less than or equal to distance. Calculations are in the Spatial Reference System of this Geometry. For geography: Uses a planar transform wrapper.
'''ST_Contains(geometry A, geometry B)''' returns the true if geometry A contains geometry B
'''ST_Distance(geometry A, geometry B)''' returns the minimum distance between geometry A and geometry B
'''ST_DWithin(geometry A, geometry B, radius)''' returns the true if geometry A is radius distance or less from geometry B
'''ST_GeomFromText(text)''' returns geometry
'''ST_Intersection(geometry A, geometry B)''': Returns a geometry that represents the shared portion of geomA and geomB. The geography implementation does a transform to geometry to do the intersection and then transform back to WGS84
'''ST_Intersects(geometry A, geometry B)''' returns the true if geometry A intersects geometry B
'''ST_Length(linestring)''' returns the length of the linestring
'''ST_Touches(geometry A, geometry B)''' returns the true if the boundary of geometry A touches geometry B
'''ST_Within(geometry A, geometry B)''' returns the true if geometry A is within geometry B
geometry_a '''&&''' geometry_b: Returns TRUE if A’s bounding box overlaps B’s.
geometry_a '''=''' geometry_b: Returns TRUE if A’s bounding box is the same as B’s.
'''ST_SetSRID(geometry, srid)''': Sets the SRID on a geometry to a particular integer value.
'''ST_SRID(geometry)''': Returns the spatial reference identifier for the ST_Geometry as defined in spatial_ref_sys table.
'''ST_Transform(geometry, srid)''': Returns a new geometry with its coordinates transformed to the SRID referenced by the integer parameter.
'''ST_Union()''': Returns a geometry that represents the point set union of the Geometries.
'''substring(string [from int] [for int])''': PostgreSQL string function to extract substring matching SQL regular expression.
'''ST_Relate(geometry A, geometry B)''': Returns a text string representing the DE9IM relationship between the geometries.
'''ST_GeoHash(geometry A)''': Returns a text string representing the GeoHash of the bounds of the object.
 
===Native functions for geography===
 
'''ST_AsText(geography)''' returns text
'''ST_GeographyFromText(text)''' returns geography
'''ST_AsBinary(geography)''' returns bytea
'''ST_GeogFromWKB(bytea)''' returns geography
'''ST_AsSVG(geography)''' returns text
'''ST_AsGML(geography)''' returns text
'''ST_AsKML(geography)''' returns text
'''ST_AsGeoJson(geography)''' returns text
'''ST_Distance(geography, geography)''' returns double
'''ST_DWithin(geography, geography, float8)''' returns boolean
'''ST_Area(geography)''' returns double
'''ST_Length(geography)''' returns double
'''ST_Covers(geography, geography)''' returns boolean
'''ST_CoveredBy(geography, geography)''' returns boolean
'''ST_Intersects(geography, geography)''' returns boolean
'''ST_Buffer(geography, float8)''' returns geography [1]
'''ST_Intersection(geography, geography)''' returns geography [1]
 
===Functions for Linear Referencing===
'''ST_LineInterpolatePoint(geometry A, double measure)''': Returns a point interpolated along a line.
'''ST_LineLocatePoint(geometry A, geometry B)''': Returns a float between 0 and 1 representing the location of the closest point on LineString to the given Point.
'''ST_Line_Substring(geometry A, double from, double to)''': Return a linestring being a substring of the input one starting and ending at the given fractions of total 2d length.
'''ST_Locate_Along_Measure(geometry A, double measure)''': Return a derived geometry collection value with elements that match the specified measure.
'''ST_Locate_Between_Measures(geometry A, double from, double to)''': Return a derived geometry collection value with elements that match the specified range of measures inclusively.
'''ST_AddMeasure(geometry A, double from, double to)''': Return a derived geometry with measure elements linearly interpolated between the start and end points. If the geometry has no measure dimension, one is added.
 
===3-D Functions===
'''ST_3DClosestPoint''' — Returns the 3-dimensional point on g1 that is closest to g2. This is the first point of the 3D shortest line.
'''ST_3DDistance''' — For geometry type Returns the 3-dimensional cartesian minimum distance (based on spatial ref) between two geometries in projected units.
'''ST_3DDWithin''' — For 3d (z) geometry type Returns true if two geometries 3d distance is within number of units.
'''ST_3DDFullyWithin''' — Returns true if all of the 3D geometries are within the specified distance of one another.
'''ST_3DIntersects''' — Returns TRUE if the Geometries “spatially intersect” in 3d - only for points and linestrings
'''ST_3DLongestLine''' — Returns the 3-dimensional longest line between two geometries
'''ST_3DMaxDistance''' — For geometry type Returns the 3-dimensional cartesian maximum distance (based on spatial ref) between two geometries in projected units.
'''ST_3DShortestLine''' — Returns the 3-dimensional shortest line between two geometries
 
===Relevant PostgreSQL Commands===
'''\dt *.*''' Show all tables
'''\q''' Exit table
 
===To make a circle===
 
SELECT ST_Buffer(''[desired point]'', ''[desired radius]'', 'quad_segs=8')
FROM ''[desired table]''
quad_segs=8 indicates circle
 
[[File: CirclePostGIS.png]]
 
For more precision in circle:
SELECT ST_Transform(geometry(
ST_Buffer(geography(
ST_Transform( ''[desired point]'', 4326 )),
''[desired radius]')),
900913) FROM ''[desired table]''
4326 and 900913 represent particular precision.
 
===Decimal Degrees===
 
We are working with longitude and latitude in decimal degrees. See https://en.wikipedia.org/wiki/Decimal_degrees
 
When converting radius to km, multiply by 111.3199. For area, multiple by (111.3199)^2=12,392.12013601.
==Configuring a copy of Postgres on Windows==
 
If you'd like to set up a copy of PostgreSQL on your windows laptop or desktop, the following instructions may be helpful.
 
===Adding PostgresSQL to the PATH===
 
Add PostgreSQL to the path if it isn't already:
Control Panel->System->Advanced->Environmental Variables
Add: C:\Program Files\PostgreSQL\9.0\bin
 
or from a cmd window (though sometimes this doesn't stick):
SET PATH=%PATH%;C:\Program Files\PostgreSQL\9.0\bin;
 
===PostgreSQL and Perl===
 
To use PLPerl on windows, you will need to be careful to mix and match the right versions of PostgreSQL and Perl.
 
Perl version 5.10.1 (a 64bit build is available from [http://www.activestate.com/activeperl/downloads ActiveState]) works with PostgreSQL 9.0.1 (64bit build 1500). To see your PostgreSQL version type:
 
psql -c "select version();" template1
 
To see your perl version type:
 
perl -v
 
With these two versions together you should be able to add plperl to template1 (which all new dbs will inherit) with the command:
 
createlang plperl template1
 
There is a [http://www.postgresql.org/docs/7.3/static/reference-client.html list of commands/client applications], with links to documentation, which is useful.
 
===Basic Performance Tuning===
 
Note that the dbase server at ssh.edegan.com does not use the settings below. Its configuration is much more aggressive.
 
You will almost surely want to 'performance tune' your postgresql database, as the default settings are near useless. In particular edit postgresql.conf (which is in the data directory of your install) to change:
 
shared_buffers = 512MB #Use about 10-15% of available RAM, to a max of 512Mb on windows
effective_cache_size = 3GB #Use half to 3/4 of available RAM, depending on your pref
work_mem = 128MB # This is the memory allocated to each query sort
maintenance_work_mem = 256MB #This is for vacuum, and a max of 256 is recommended.
 
Note that with ''work_mem'' this is the allocation to each sort. Each query you run may do many sorts and you may have many users, so this can explode quickly. 128Mb is an aggresive setting that assumes only a single user. See the [http://wiki.postgresql.org/wiki/Performance_Optimization various documentation resources], especially the [http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server official performance optimization page] for more information.
 
Restart the server (on windows use the 'services' control panel) for many changes to take effect:
 
pg_ctl restart
 
===Creating Users and Dbases===
 
Create a user using pgAdmin or the createuser command:
 
createuser username
 
And then create a database again using pgAdmin or the createdb command:
 
createdb -O username DBName
 
[[admin_classification::IT Build| ]]

Navigation menu