postgresql when it's not your job

13:21

The importance of max_wal_size

23 March 2023

The reality is that most PostgreSQL configuration parameters don’t have a huge impact on overall system performance. There are, however, a couple that really can make a huge difference when tuned from the defaults. work_mem is one of them, and max_wal_size is another.

max_wal_size controls how large the write-ahead log can get on disk before PostgreSQL does a checkpoint. It’s not a hard limit; PostgreSQL adapts checkpoint frequency to keep the WAL on disk no larger than that, but excursions above it can definitely happen. The only thing it costs you is disk space; there’s no other problem with it being too large.

Having max_wal_size too small can cause checkpoints to happen very frequently. Frequent checkpointing is bad for two reasons:

  1. Checkpoints themselves are expensive, since all of the dirty buffers in shared_buffers need to be written out.
  2. The first time a page is changed after a checkpoint, the entire page is written to the WAL rather than just the change. On a busy system, this can be a very significant burst of WAL activity.

Here’s a process to set max_wal_size properly:

First, set the general checkpoint parameters. This is a good start:

checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_compression = on
log_checkpoints = on
max_wal_size = 16GB

Then, let the system run, and check the logs (or any other tools you may have to determine checkpoint frequency). If the checkpoints are happening more frequently than every 15 minutes, increase max_wal_size until they are being triggered by the timeout.

How about min_wal_size? This controls the amount of reserved WAL files that PostgreSQL will retain on disk even if it doesn’t need it for other reasons. This can speed up the WAL slightly, since PostgreSQL can use one of those retained files instead of having to create a new one. There’s no harm in bumping it up (again, all it costs is disk space), but on nearly every environment, the performance impact is small.

Lukas Fittl at 17:19, 29 March 2023:

Re: “The only thing it costs you is disk space; there’s no other problem with it being too large.”

Doesn’t this omit the fact that a higher max_wal_size leads to longer recovery times after a crash? In my experience that was the reason why you wouldn’t want max_wal_size to e.g. be 100GB, since it means your database might take a while to get back up and running after crashes.