Reduce WAL by increasing checkpoint distance

If you are running PostgreSQL in production you might have already noticed that adjusting checkpoints can be a very beneficial thing to tune the server and to improve database performance overall. However, there is more to this: Increasing checkpoint distances can also help to actually reduce the amount of WAL that is created in the first place. That is right. It is not just a performance issue, but it also has an impact on the volume of log created.

Table of Contents

You want to know why? Let us take a closer look!

Creating a test installation

For my test I am using a normal PostgreSQL 12 installation. To do that I am running initdb by hand:

 % initdb -D /some_where/db12 restart

1	% initdb -D /some_where/db12 restart

Then I have adjusted two parameters in postgresql.conf:

       max_wal_size = 128MB
       synchronous_commit = off

1 2	max_wal_size = 128MB synchronous_commit = off

and restarted the server:

 % pg_ctl -D /some_where/db12 restart

1	% pg_ctl -D /some_where/db12 restart

First of all I have reduced max_wal_size to make sure that PostgreSQL checkpoint a lot more often than this would be the case in the default installation. Performance will drop like a stone. The second parameter is to tell PostgreSQL to use asynchronous commits. This does not impact WAL generation but will simply speed up the benchmark to get results faster.

Finally I am using pgbench to create a test database containing 10 million rows (= 100 x 100.000)

% createdb test
% pgbench -s 100 -i test
dropping old tables...
NOTICE:  table 'pgbench_accounts' does not exist, skipping
NOTICE:  table 'pgbench_branches' does not exist, skipping
NOTICE:  table 'pgbench_history' does not exist, skipping
NOTICE:  table 'pgbench_tellers' does not exist, skipping
creating tables...
generating data...
100000 of 10000000 tuples (1%) done (elapsed 0.13 s, remaining 12.90 s)
200000 of 10000000 tuples (2%) done (elapsed 0.28 s, remaining 13.79 s)
…
9700000 of 10000000 tuples (97%) done (elapsed 12.32 s, remaining 0.38 s)
9800000 of 10000000 tuples (98%) done (elapsed 12.46 s, remaining 0.25 s)
9900000 of 10000000 tuples (99%) done (elapsed 12.59 s, remaining 0.13 s)
10000000 of 10000000 tuples (100%) done (elapsed 12.68 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.

% createdb test

% pgbench -s 100 -i test

dropping old tables...

NOTICE: table 'pgbench_accounts' does not exist, skipping

NOTICE: table 'pgbench_branches' does not exist, skipping

NOTICE: table 'pgbench_history' does not exist, skipping

NOTICE: table 'pgbench_tellers' does not exist, skipping

creating tables...

generating data...

100000 of 10000000 tuples (1%) done (elapsed 0.13 s, remaining 12.90 s)

200000 of 10000000 tuples (2%) done (elapsed 0.28 s, remaining 13.79 s)

…

9700000 of 10000000 tuples (97%) done (elapsed 12.32 s, remaining 0.38 s)

9800000 of 10000000 tuples (98%) done (elapsed 12.46 s, remaining 0.25 s)

9900000 of 10000000 tuples (99%) done (elapsed 12.59 s, remaining 0.13 s)

10000000 of 10000000 tuples (100%) done (elapsed 12.68 s, remaining 0.00 s)

vacuuming...

creating primary keys...

done.

Running a PostgreSQL benchmark

Let us run a test now: We want to run 10 x 1 million transactions. If you want to repeat the tests on your database make sure the test runs long enough. Let us take a look at the test!

 % pgbench -c 10 -j 5 -t 1000000 test
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 5
number of transactions per client: 1000000
number of transactions actually processed: 10000000/10000000
latency average = 0.730 ms
tps = 13699.655677 (including connections establishing)
tps = 13699.737288 (excluding connections establishing)

% pgbench -c 10 -j 5 -t 1000000 test

starting vacuum...end.

transaction type: <builtin: TPC-B (sort of)>

scaling factor: 100

query mode: simple

number of clients: 10

number of threads: 5

number of transactions per client: 1000000

number of transactions actually processed: 10000000/10000000

latency average = 0.730 ms

tps = 13699.655677 (including connections establishing)

tps = 13699.737288 (excluding connections establishing)

As you can see we managed to run roughly 13.700 transactions per second (on my personal laptop). Let us take a look at the amount of WAL created:

 test=# SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
13/522921E8
(1 row)

test=# SELECT pg_current_wal_insert_lsn();

pg_current_wal_insert_lsn

---------------------------

13/522921E8

(1 row)

The way to figure out the current WAL position is to call the pg_current_insert_lsn() function. To most people this format might be incomprehensible and therefore it is easier to transform the value into a readable number:

 test=# SELECT pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn;
?column?
-------------
82982805992
(1 row)

test=# SELECT pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn;

?column?

-------------

82982805992

(1 row)

The pg_size_pretty() function will return the number in a readily understandable format:

 test=# SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn);
pg_size_pretty
----------------
77 GB
(1 row)

test=# SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn);

pg_size_pretty

----------------

77 GB

(1 row)

Increasing checkpoint distances

Let us stop the database, delete the data directory and initialize it again. This time my postgresql.conf parameters will be set to way higher values:

  
max_wal_size = 150GB
checkpoint_timeout = 1d
synchronous_commit = off

max_wal_size = 150GB

checkpoint_timeout = 1d

synchronous_commit = off

Then we start the database again, populate it with data and run pgbench with the exact same parameters. Keep in mind that to repeat the process it is necessary to run a fixed number of transactions instead of running the test for a certain time. This way we can fairly compare the amount of WAL created.

Let us take a look and see how much WAL has been created this time:

 test=# SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
1/4F210328
(1 row)


test=# SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn);
pg_size_pretty
----------------
5362 MB
(1 row)

test=# SELECT pg_current_wal_insert_lsn();

pg_current_wal_insert_lsn

---------------------------

1/4F210328

(1 row)

test=# SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn);

pg_size_pretty

----------------

5362 MB

(1 row)

Wow, only 5.3 GB of WAL. This is a major improvement but why did it happen?

Understanding WAL creation

The idea of the transaction log is to make sure that in case of a crash the system will be able to recover nicely. However, PostgreSQL cannot keep the transaction log around forever. The purpose of a checkpoint is ensuring that all data is safe in the data file before the WAL can be removed from disk.

Here comes the trick: If a block is modified after a checkpoint for the first time it has to be logged completely. The entire block will be written to the WAL. Every additional change does not require the entire block to be written – only the changes have to be logged which is, of course, a lot less.

This is the reason for the major difference we see here. By increasing the checkpoint distances to a really high value, we basically reduced checkpoints to an absolute minimum which in turn reduces the number of “first blocks” written.

Changing WAL settings is not just good for performance in the long run – it will also reduce the volume of WAL.