Monday, April 10, 2023

Updating the Cost Limit On The Fly

Rejoice and be glad! I was so pleased this morning to see that Melanie Plageman's patch to make autovacuum absorb new cost limit settings more quickly was committed by Daniel Gustafsson while I was busy enjoying a long Easter weekend. It's a minor change in the grand scheme of things, but there's a reasonably common situation where it's going to make life a lot easier.

Imagine that you realize that you've got a vacuum running on your largest table which has been running for way too long. Perhaps several days, perhaps a week. Maybe it's made a significant amount of progress, but still has a long way to go. You investigate and find out that, actually, VACUUM isn't really doing that much work. It's not running at 100% CPU time, and it isn't doing that much I/O, either.

At this point, you may realize that your problem is related to vacuum_cost_limit and related parameters -- by related, I mean all of the ones whose names start with vacuum_cost or autovacuum_vacuum_cost. By default, when autovacuum runs VACUUM, these parameters limit the speed at which VACUUM runs. The intention is to avoid overloading the system with too much autovacuum work all at once, but sometimes the result is that the VACUUM does not finish in a timely manner.

If you knew in advance that you were going to have a problem with VACUUM not running fast enough,  the fix would be quite simple. You could just raise vacuum_cost_limit and, assuming that you raised it high enough and that related settings have reasonable values, that would fix the whole problem. The trouble is that, in existing releases, changing vacuum_cost_limit doesn't affect any autovacuum operations that are already in progress.

So, in the scenario I describe above, you would be faced with a bit of a dilemma. You could adjust the setting for the future and then just tough it out for present, but sometimes that's not workable. You could kill off the current vacuum and start a new one with a new setting, but that loses all the work that vacuum has already done up to that point. I've had to advise multiple customers to fire up gdb -p $AUTOVACUUM_PID and use gdb to set VacuumCostActive = false. That's really not something you want to be doing on a production system, but we just haven't had a viable alternative.

Now, we will. With this change, starting in PostgreSQL 16, a new vacuum_cost_limit should affect all workers almost as soon as you reload the configuration. That should make dealing with this problem much less aggravating. Hooray!

No comments:

Post a Comment