Anyone trying the upcoming PostgreSQL 15 might observe that one of the background processes is missing.

if we compare this with PostgreSQL 14:

Yes, the “stats collector” is missing, and it is gone for good. One of the major bottlenecks and headaches is gone forever.

What does the stats collector do?

Novice users might be wondering what it is and why it is needed for PG 14 and older versions. At least a few users get confused about table-level statistics collection (ANALYZE), which is used for query planning. But this is different. PostgreSQL tracks all activities of each process to have cumulative stats like how many times a table or index is scanned, or when the last vacuum or autovacuum ran on the table, or how many times the autovacuum ran on a table, etc. All the information collected by the stats collector is available through different pg_stat_* views.

What was wrong?

Since each backend of a session is an individual process in PostgreSQL, collecting stats and transmitting them is not an easy task. Each backend sends the information about what activity they have done to a single “stats collector” process. This communication used to happen over the UDP socket. There were a lot of problems with this approach, this is not a scalable model. Users often reported different types of issues like 1. stale statistics, 2. stats collector not running, 3. autovacuum not working/starting, etc.

It used to be really hard to understand what was wrong if a stats collector had a problem with a specific machine.

Another adverse effect of “stats collector” is the IO it causes. if you enable the DEBUG level 2, you might see messages that keep coming to PostgreSQL log like:

This can cause considerable IO on the mount point where your data directory is located.
This is the place pointed by the value of the parameter stats_temp_directory. On many systems, it will be the pg_stat_tmp within the data directory.

On Ubuntu/Debian, it will be in /var/run/postgresql, for example:

What is new in PostgreSQL 15?

Instead of using the files and filesystem, statistics now use dynamic shared memory.

You can refer to the commit here by Andres Freund for a summary :

Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful
statistics.

Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory.

The header for pgstat.c contains an overview of the architecture.

The stats collector is not needed anymore, remove it.

By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot “leak” anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. Onsystems with many small relations pgstat_vacuum_stat() could be quite
expensive.

Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica.

Obviously, the parameter stats_temp_directory is gone. So we don’t need the pg_stat_tmp the directory which gets created within the data directory (or other location) where all the stats files are generated and read from. However, this directory is retained for not breaking many extensions like pg_stat_statements, which depend on the directory. The directory remains empty until the extension libraries are loaded,  For example, if we load the pg_stat_statements library, a file appears in the directory.

Of course, the extensions are not free. They carry their own cost.

In the new architecture, most stats updates are first accumulated locally in each process as “pending” (each backend has a backend-local hashtable). “Pending” in the sense that they are accumulated but not yet submitted to the shared stats system.  This is later flushed to shared memory just after a commit or by timeout.

Since stats are getting updated concurrently while someone tries to read, read consistency comes into the picture. So PostgreSQL 15 introduces a new parameter: stats_fetch_consistency which can take three values none, cache or snapshot.

none” is the most efficient. But that won’t give read consistency if there are monitoring queries that expect that. But should be OK for most of the use.  “cache” ensures repeat accesses yield the same values, which is essential for queries involving e.g. self-joins. “snapshot” can be useful when interactively inspecting statistics but has higher overhead. The default is “cache“.

If it is in shared memory, how does it survive a restart?

They are written out to the filesystem by the checkpointer process just before the shutdown and again loaded back during the startup by the startup process. As usual, stats will be discarded if there is a crash.

Will this affect my monitoring tool/script?

All stats monitoring views pg_stat_* will continue to work as it is. But, please make sure to select the appropriate value for stats_fetch_consistency . As mentioned above, the pg_stat_tmp directory is preserved not to break extensions developed using this approach. However, it is up to the extension developer to thoroughly test the extension against PostgreSQL 15

What else?

People like me use PostgreSQL wait events to understand where PostgreSQL and its sessions are spending their time. The data collection and analysis tools like pg_gather, which we use in our day-to-day life, make use of these wait event analyses to understand problems. Three new wait events are introduced for better monitoring.

PgStatsDSAWaiting for stats dynamic shared memory allocator access
PgStatsHashWaiting for stats shared memory hash table access
PgStatsDataWaiting for shared memory stats data access

With all the overhead of the stats collector and its maintenance going away, other subsystems like autovacuum have less work to do.

Additionally, monitoring tools that query the stats information frequently are expected to cause much less load on the system.

Thanks to the community

Thanks to the entire PostgreSQL community, especially the hackers, for this amazing improvement. The whole discussion started four years back when Kyotaro Horiguchi started discussing the idea and patches. It is finally materialized by the great work of Andres Freund, Melanie Plageman, and the team. We can see that it was indeed great teamwork of many contributors like Alvaro Herrera, David G Johnston, Thomas Munro, Tomas Vondra, Arthur Zakirov, Antonin Houska, Justin Pryzby, Tom Lane, Fujii Masao, Greg Stark, Robert Haas, Stephen Frost, Bertrand Drouvot, Magnus Hagander, and many others.

It is time to celebrate that PostgreSQL is becoming slim and trim while acquiring many more capabilities.

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Charlie Arehart

Very helpful. Thanks for calling this out, Jobin.