pgdump, text and xz

I have a database that contains around 50 GB of data. I do continuos backup thru pgBackRest, I also do regular pg_dump in directory format via multiple jobs, so Iā€™m fine with backups.
However, why not have a look at SQL backups?
First of all: the content of the database is mostly numeric, being a quite large container of sensors data. This means that the data should be very good for compression.
Moreover, tables are partitioned on a per-year and per-month basis, therefore I have a regular structure with one year table and twelve month childrens. For instance, in the current year there is a table named y2021 with other partitions named y2021m01 thru y2021m12.

pg_dump in text mode

I did a simple for loop in my shell to produce a few backup files, separating every single file by its year:

% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Backup year $y"
time pg_dump -h miguel -U postgres -f sensorsdb.$y.sql -t "respi.y${y}*" sensorsdb
done



This produce the following amount of data:

% ls -sh1 *.sql     
3,5G sensorsdb.2018.sql
 13G sensorsdb.2019.sql
 12G sensorsdb.2020.sql
 10G sensorsdb.2021.sql
 20K sensorsdb.2022.sql
The following is a table that summarizes the file size and the time required to create it:
year SQL size time
2018 3.5 GB 7 minutes
2019 13 GB 20 minutes
2020 12 GB 20 minutes
2021 10 GB 17 minutes

Compress them!

Use xz with the default settings, that according to my installation is a compression level 6:

% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Compress year $y"
time xz sensorsdb.$y.sql                                                          
done

Compress year 2018
xz sensorsdb.$y.sql  2911,75s user 12,62s system 98% cpu 49:22,22 total
Compress year 2019
xz sensorsdb.$y.sql  7411,57s user 41,22s system 98% cpu 2:06:24,38 total
Compress year 2020
xz sensorsdb.$y.sql  6599,22s user 19,08s system 98% cpu 1:52:07,38 total
Compress year 2021
xz sensorsdb.$y.sql  5487,37s user 15,25s system 98% cpu 1:33:08,32 total
Compress year 2022
xz sensorsdb.$y.sql  0,01s user 0,01s system 36% cpu 0,069 total



It requires from one to two hours to compress every single file, as summarized in the following table:

File size Time Compressed size Compression ratio
3.5 GB 50 minutes 227 MB 92 %
13 GB 2 hours 766 MB 94 %
12 GB 2 hours 658 MB 94 %
10 GB 1 and half hour 566 MB 94 %


Therefore, xz is a great tool to compress dump data, especially if that data is textual and most in a numeric form. Unluckily, xz results a little slow when applied with the default compression.
How much does it take to decompress the data? Well, it takes around 4 minutes for every file, that is much faster than the compression.
Just as a comparison, doing a compression with -2 instead of -6 requires around one quarter of the time doing only 1/3 of less compression, e.g., 13 GB required 35 minutes instead of 120 minutes, requiring 1.1 GB of disk space instead of 0.77 GB. Let's see the result using -2` as default compression:

File size Time Compressed size Compression ratio
3.5 GB 10 minutes 338 MB 90 %
13 GB 35 minutes 1.1 GB 91 %
12 GB 37 minutes 918 MB 92 %
10 GB 30 minutes 786 MB 92 %


As you can see, using compression -2 can greatly improve the speed of compression with a minum extra disk space requirement.
What about a directory format of dumping? Well, the same backup with pg_dump -Fd, that defaults at creating compressed objects, required 4.7 GB of disk space. The xz version requires from 3.1 GB (compression -2) to 2.2 GB (compression -6).

Conclusions

xz can help you save a lot of disk storage for textual (SQL) backups, but the default compression level could require an huge amount of time, especially on not-so-poweful machines. However, a lower level of compression can greatly make pg_dump and xz as fast as pg_dump -Fd with some extra space saving.

The article pgdump, text and xz has been posted by Luca Ferrari on December 6, 2021