Postgres Tips & Tricks

From The Crunchy Data Team

Just a few helpful tips how to use Postgres to it's fullest. Have one you'd like to share or something is unclear? Share with us on twitter @crunchydata

locks

Find processes that are idle and possibly blocked by a lock

This query looks at the pg_stat_activity view for processes that are active but have a wait_event or wait_event_type that are non-NULL.

SELECT pid, datname, usename, application_name, client_addr, client_port, to_char (now (), 'YYYY-MM-DD HH24:MI:SS') as now, to_char (now () - xact_start, 'DD HH24:MI:SS MS') as xact_time, to_char (now () - query_start, 'DD HH24:MI:SS MS') as query_time, state, to_char (now () - state_change, 'DD HH24:MI:SS MS') as state_time, wait_event, wait_event_type, left (query, 40) FROM pg_stat_activity WHERE state != 'idle' and pid != pg_backend_pid () ORDER BY query_time desc;

Find processes that have a wait event, rolling up to the PID holding the initial lock.

This query looks at at the pg_stat_activity and pg_locks view showing the pid, state, wait_event, and lock mode, as well as blocking pids.

WITH sos AS ( SELECT array_cat(array_agg(pid), array_agg((pg_blocking_pids(pid))[array_length(pg_blocking_pids(pid),1)])) pids FROM pg_locks WHERE NOT granted ) SELECT a.pid, a.usename, a.datname, a.state, a.wait_event_type || ': ' || a.wait_event AS wait_event, current_timestamp-a.state_change time_in_state, current_timestamp-a.xact_start time_in_xact, l.relation::regclass relname, l.locktype, l.mode, l.page, l.tuple, pg_blocking_pids(l.pid) blocking_pids, (pg_blocking_pids(l.pid))[array_length(pg_blocking_pids(l.pid),1)] last_session, coalesce((pg_blocking_pids(l.pid))[1]||'.'||coalesce(case when locktype='transactionid' then 1 else array_length(pg_blocking_pids(l.pid),1)+1 end,0),a.pid||'.0') lock_depth, a.query FROM pg_stat_activity a JOIN sos s on (a.pid = any(s.pids)) LEFT OUTER JOIN pg_locks l on (a.pid = l.pid and not l.granted) ORDER BY lock_depth;

Set a lock timeout

It can be a good idea to set a lock_timeout within a session so that it will cancel the transaction and relinquish any locks it was holding after a certain period of time.

ALTER SYSTEM SET lock_timeout = '10s';

Logging

Control which statement types get logged

Control the types of statements that are logged for your database.

ALTER DATABASE postgres SET log_statement = 'all';

Valid values include all, ddl, none, mod

performance

Use statement timeouts to control runaway queries

Setting a statement timeout prevents queries from running longer than the specified time. You can set a statement timeout on the database, user, or session level. We recommend you set a global timeout on Postgres and then override that one specific users or sessions that need a longer allowed time to run.

ALTER DATABASE mydatabase SET statement_timeout = '60s';

Monitor connections in Postgres

This query will provide the number of connection based on type.

SELECT count(*), state FROM pg_stat_activity GROUP BY state;

If you see idle connections is above 20, it is recommended to explore using PgBouncer.

Query size of specific table

Will give you the size of the specific relation you pass in.

SELECT pg_relation_size('table_name'); -- For prettier formatting you can wrap with: SELECT pg_size_pretty(pg_relation_size('table_name'));

Query all relation sizes

Will report on all table sizes in descending order

SELECT relname AS relation, pg_size_pretty ( pg_total_relation_size (C .oid) ) AS total_size FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C .relnamespace) WHERE nspname NOT IN ( 'pg_catalog', 'information_schema' ) AND C .relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size (C .oid) DESC

Check for unused indexes

Will return the unused indexes in descending order of size. Keep in mind you want to also check replicas before dropping indexes.

SELECT schemaname || '.' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS "index size", idx_scan as "index scans" FROM pg_stat_user_indexes ui JOIN pg_index i ON ui.indexrelid = i.indexrelid WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192 ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST, pg_relation_size(i.indexrelid) DESC;

Get approximate counts for a table

Will return the approximate count for a table based on PostgreSQL internal statistics. Useful for large tables where performing a `SELECT count(*)` is costly on performance.

SELECT reltuples::numeric as count FROM pg_class WHERE relname='table_name';

Non-blocking index creation

Adding `CONCURRENTLY` during index creation, while not permitted in a transaction, will not hold a lock on the table while creating your index.

CREATE INDEX CONCURRENTLY foobar ON foo (bar);

Psql

Automatically log query time in psql

Will automatically print the time it took to run a query from within psql. *Of note this is the round trip time not simply query execution time.*

\timing

You can save this in your `.psqlrc` to be a default setting

Autoformat query results in psql

Will automatically reorganize the query output based on your terminal window for better readability.

\x auto

You can save this in your `.psqlrc` to be a default setting

Set a value for nulls

Will render the nulls as whatever character you specify. Handy for easier parsing of nulls vs. blank text.

\pset null 👻

You can save this in your `.psqlrc` to be a default setting

Show queries issued by internal psql commands

Add "-E" (or --echo-hidden) option to psql in the command line. This option will display queries that internal psql commands generate (like "\dt mytable"). This is a cool way to learn more about system catalogs, or reuse queries issued by psql in your own tool.

psql -E

Get data back, and only the data

Add "-qtA" options to psql in the command line. Those options will have psql run in quiet mode ("-q"), return tuples only ("-t") in an unaligned fashion ("-A"). Combined with "-c" option to send a single query, it can be useful for your scripts if you want the data and only that back from Postgres. Returns one line per row.

psql -qtA

Get results as an HTML table

Add "-qtH" options to psql in the command line. Those options will have psql run in quiet mode ("-q"), return tuples only ("-t") in an HTML table ("-H"). Combined with "-c" option to send a single query, can be a fast way to embed the result of a query in an HTML page.

psql -qtH

Search previous queries with Ctrl + R

Ctrl + R will start a search session and you can start typing part of the query or command to find and run it again. If you tag specific queries with a comment, this can help with searching later.

(reverse-i-search)

Continually run a query with watch

Will automatically run the last query every 2 seconds and display the output. You can also specify the query that will run after watch as well.

\watch

Show non-default configs

Will list the parameter and value for Postgres configuration parameters that have been changes from the default parameters.

\dconfig

Export a CSV from directly in psql

When providing the `--csv` value with a query, this command will run the specific query and return CSV to STDOUT.

psql <connection-string> --csv -c 'select * from test;'

Provide clean border within psql

Will give you a border around the output of your queries when in psql.

\pset border 2

You can save this in your `.psqlrc` to be a default setting

Save queries in psqlrc

Add these sample queries to psqlrc for long running queries, cache hit ratio, unused_indexes, and table sizes. Then to execute inside psql use :long_running, :cache_hit, :unused_indexes, :table_sizes.

\set long_running 'SELECT pid, now() - pg_stat_activity.xact_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.xact_start) > interval ''5 minutes'' ORDER by 2 DESC;' \set cache_hit 'SELECT ''index hit rate'' AS name, (sum(idx_blks_hit)) / nullif(sum(idx_blks_hit + idx_blks_read),0) AS ratio FROM pg_statio_user_indexes UNION ALL SELECT ''table hit rate'' AS name, sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read),0) AS ratio FROM pg_statio_user_tables;' \set unused_indexes 'SELECT schemaname || ''.'' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, idx_scan as index_scans FROM pg_stat_user_indexes ui JOIN pg_index i ON ui.indexrelid = i.indexrelid WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192 ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST, pg_relation_size(i.indexrelid) DESC;' \set table_sizes 'SELECT c.relname AS name, pg_size_pretty(pg_table_size(c.oid)) AS size FROM pg_class c LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace) WHERE n.nspname NOT IN (''pg_catalog'', ''information_schema'') AND n.nspname !~ ''^pg_toast'' AND c.relkind=''r'' ORDER BY pg_table_size(c.oid) DESC;'

Set linestyle to unicode

Changes the linestyle to unicode, which when combined with above tip leads to much cleaner formatting

\pset linestyle unicode

You can save this in your `.psqlrc` to be a default setting

SQL

Look for skewed data

This query looks in pg_statistics to find cases where certain values or attributes are a larger percentage of column values. This retrieves table name, column name, and the attributes with their percentage of common value. This formats results in psql, but can also be run as a query.

SELECT starelid::regclass AS table_name,attname AS column_name, (SELECT string_agg('',format(E''%s': %s%% ', v,ROUND(n::numeric*100, 2))) FROM unnest(stanumbers1,stavalues1::text::text[])nvs(n,v)) pcts FROM pg_statistic JOIN pg_attribute ON attrelid=starelid AND attnum = staattnum JOIN pg_class ON attrelid = pg_class.oid JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE nspname NOT LIKE 'pg_%' AND nspname <> 'information_schema' \x\g\x

Replace nulls with other value

Coalesce will use the value and if the value is null display your specified string.

SELECT id, coalesce(ip, 'no IP') FROM logs;

You can supply two columns as well prior to your replacement value and the function will use first not null value.

Import Schema with Mapping a Foreign Data Wrapper (FDW)

Import foreign schema creates foreign tables representing those from the foreign server.

IMPORT FOREIGN SCHEMA "public";

You can IMPORT FOREIGN SCHEMA when mapping a foreign data wrapper to save you from building a new one

Generate data with generate_series

Generates values from the start to the end values supplied based on the interval. Values can be numbers or timestamps. Can be used in a FROM or JOIN clause or CTE. Commonly used when building charts and reports that require all dates to be filled.

SELECT * FROM generate_series(now() - '3 month'::interval, now(), '1 day');

Round dates with date_trunc

Will truncate the date to the specified level of precision. Some example precision levels include: month, week, day, hour, minute.

SELECT date_trunc('day', now());

Perform time math with intervals

You can add or subtract specific amounts of time of a timestamp by casting the value you want as an interval.

SELECT now() - '1 month'::interval;

Make your session rest a bit

This function will make your session sleep for 2.5 seconds. Useful in any testing tool executing a script in a given loop where you want to pause a bit between iterations, as an example.

select pg_sleep(2.5);

utility

Are you close to overflowing an integer?

Identify all auto-incrementing columns, which SEQUENCE object it owns, data types of the column and SEQUENCE object, and percent until the sequence value exceeds the sequence or column data type.

SELECT seqs.relname AS sequence, format_type(s.seqtypid, NULL) sequence_datatype, CONCAT(tbls.relname, '.', attrs.attname) AS owned_by, format_type(attrs.atttypid, atttypmod) AS column_datatype, pg_sequence_last_value(seqs.oid::regclass) AS last_sequence_value, TO_CHAR(( CASE WHEN format_type(s.seqtypid, NULL) = 'smallint' THEN (pg_sequence_last_value(seqs.relname::regclass) / 32767::float) WHEN format_type(s.seqtypid, NULL) = 'integer' THEN (pg_sequence_last_value(seqs.relname::regclass) / 2147483647::float) WHEN format_type(s.seqtypid, NULL) = 'bigint' THEN (pg_sequence_last_value(seqs.relname::regclass) / 9223372036854775807::float) END) * 100, 'fm9999999999999999999990D00%') AS sequence_percent, TO_CHAR(( CASE WHEN format_type(attrs.atttypid, NULL) = 'smallint' THEN (pg_sequence_last_value(seqs.relname::regclass) / 32767::float) WHEN format_type(attrs.atttypid, NULL) = 'integer' THEN (pg_sequence_last_value(seqs.relname::regclass) / 2147483647::float) WHEN format_type(attrs.atttypid, NULL) = 'bigint' THEN (pg_sequence_last_value(seqs.relname::regclass) / 9223372036854775807::float) END) * 100, 'fm9999999999999999999990D00%') AS column_percent FROM pg_depend d JOIN pg_class AS seqs ON seqs.relkind = 'S' AND seqs.oid = d.objid JOIN pg_class AS tbls ON tbls.relkind = 'r' AND tbls.oid = d.refobjid JOIN pg_attribute AS attrs ON attrs.attrelid = d.refobjid AND attrs.attnum = d.refobjsubid JOIN pg_sequence s ON s.seqrelid = seqs.oid WHERE d.deptype = 'a' AND d.classid = 1259;

Create read only user

Once already creating a specific user role, you can user the `pg_read_all_data` to grant read only access to all tables.

GRANT pg_read_all_data TO username;

View tables using TOAST

View what source tables have TOAST (the oversized attribute storage technique) tables with data and how big the TOAST tables are.

SELECT c.relname AS source_table_name, c.relpages AS source_table_number_of_pages, c.reltuples AS source_table_number_of_tuples, c.reltoastrelid AS toast_table_oid, t.relname AS toast_table_name, t.relpages AS toast_table_number_of_pages, t.reltuples AS toast_table_number_of_tuples FROM pg_class c JOIN pg_class t ON c.reltoastrelid = t.oid WHERE t.relpages > 0;

Find active and idle users

View roles, state, and connection count.

SELECT usename, state, count(1) cnt FROM pg_stat_activity WHERE usename is not null GROUP BY usename, state ORDER BY usename, state;