What is pg_dump compression?

pg_dump is a PostgreSQL utility for backing up a local or remote PostgreSQL database. It creates a logical backup file that contains either plain SQL commands for recreating the database, or a binary file that can be restored with the pg_restore utility. The binary backup file can be used to restore the database from scratch, or to restore only selected parts of the database.

Binary files produced by pg_dump can be compressed to reduce their size. This is done by passing the -Fc or --format=custom flag to pg_dump, which causes it to produce a compressed backup file. The file can then be decompressed with pg_restore later. To indicate the level of compression, you should provide option -Z, followed by a number which indicates the compression level (0-9, where 0 is no compression and 9 is the maximum compression level.) The compression algorithm used in pg_dump is the same as used in gzip.

What is new for pg_dump compression in PostgreSQL 16?

Georgios Kokolatos implemented a patch which allows you to specify not only the level of compression to be used but also a method for compression.

The patch was reviewed and committed by Michael Paquier. The commit message is:

Compression specifications are currently used by pg_basebackup and
pg_receivewal, and are able to let the user control in an extended way
the method and level of compression used.  As an effect of this commit,
pg_dump's -Z/--compress is now able to use more than just an integer, as
of the grammar "method[:detail]".

The method can be either "none" or "gzip", and can optionally take a
detail string. If the detail string is only an integer, it defines the
compression level. A comma-separated list of keywords can also be used
method allows for more options, the only keyword supported now is
"level".

The change is backward-compatible, hence specifying only an integer
leads to no compression for a level of 0 and gzip compression when the
level is greater than 0.

Most of the code changes are straight-forward, as pg_dump was relying on
an integer tracking the compression level to check for gzip or no
compression. These are changed to use a compression specification and
the algorithm stored in it.

As of this change, note that the dump format is not bumped because there
is no need yet to track the compression algorithm in the TOC entries.
Hence, we still rely on the compression level to make the difference
when reading them. This will be mandatory once a new compression method
is added, though.

In order to keep the code simpler when parsing the compression
specification, the code is changed so as pg_dump now fails hard when
using gzip on -Z/--compress without its support compiled, rather than
enforcing no compression without the user knowing about it except
through a warning. Like before this commit, archive and custom formats
are compressed by default when the code is compiled with gzip, and left
uncompressed without gzip.

As you can see, only two method specifiers are implemented here: none and gzip. But the importance of this change is that it prepares the infrastructure for adding all other compression methods available at the time, like gzip, lz4, zstd.

Let’s try it out!

~$ pg_dump --version
pg_dump (PostgreSQL) 16devel

~$ pgbench --initialize --scale=100
dropping old tables...
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
creating tables...
generating data (client-side)...
10000000 of 10000000 tuples (100%) done (elapsed 18.34 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 23.30 s (drop tables 0.00 s, create tables 0.00 s, client-side generate 18.49 s, vacuum 0.32 s, primary keys 4.48 s).

~$ psql --command="select pg_size_pretty(pg_database_size('postgres'))"
 pg_size_pretty 
----------------
 1503 MB
(1 row)

~$ pg_dump --format=custom --compress=9 > 1_compress_9.gz

~$ pg_dump --format=custom --compress=0 > 2_compress_0.gz

~$ pg_dump --format=custom --compress=none:9 > 3_none_9.gz
pg_dump: error: invalid compression specification: compression algorithm "none" does not accept a compression level

~$ pg_dump --format=custom --compress=gzip:9 > 4_gzip_9.gz

~$ pg_dump --format=custom --compress=gzip:0 > 5_gzip_0.gz
pg_dump: error: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9 (default at -1)

~$ pg_dump --format=custom --compress=gzip:1 > 6_gzip_1.gz

~$ ls -l --block-size=M
total 1061M
-rw-rw-r-- 1 postgres postgres  27M Dec  2 12:12 1_compress_9.gz
-rw-rw-r-- 1 postgres postgres 981M Dec  2 12:12 2_compress_0.gz
-rw-rw-r-- 1 postgres postgres   0M Dec  2 12:12 3_none_9.gz
-rw-rw-r-- 1 postgres postgres  27M Dec  2 12:13 4_gzip_9.gz
-rw-rw-r-- 1 postgres postgres   0M Dec  2 12:13 5_gzip_0.gz
-rw-rw-r-- 1 postgres postgres  28M Dec  2 12:13 6_gzip_1.gz

The results show that the non-zero compression level raises an error when the compression algorithm is none. The results also show that a compression level other than zero is required when the compression algorithm is gzip.

Finally…

pg_dump’s -Z/--compress in PostgreSQL 16 will support more than just an integer. It can be used to specify the method and level of compression used. The default is still gzip with a level of 6. As I already said, pg_dump is sometimes used to update and/or upgrade the database. In case you want to understand the difference between an update and an upgrade, check out this blog post by Hans-Jürgen Schönig. Or check our other related publications about updating and upgrading.