Recently I blogged about a significant improvement in PostgreSQL 15: PostgreSQL 15: Stats Collector Gone? What’s New? While there is great cheer for this upcoming improvement, we could see a few comments about “inefficiencies” in previous versions.

That brought me to the realization that even though the feature to tune stats collector is part of the official documentation and recommendations, and there were many blog posts in the past about it, I rarely see someone trying to tune it in practice.  So I feel it is worth a reminder again.

Option for PostgreSQL 14 and older

Simply, moving the directory in which the stats files are generated (stats_temp_directory) to a location in RAMFS or TEMPFS, which are RAM-based filesystems, can save the IO overhead while making the stats collector much more efficient and accurate.

PostgreSQL documentation also officially advises so to reduce the impact.

For better performance, stats_temp_directory can be pointed at a RAM-based file system, decreasing physical I/O requirements. When the server shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat subdirectory so that statistics can be retained across server restarts.

Reference: PostgreSQL Documentation

How much space is required?

The current location where stats files are generated can be checked by inspecting the value of stats_temp_directory.

On Red Hat clones, the default location will be within the data directory.

and on Debian/Ubuntu, it will be in /var/run/postgresql, for example:

Once the location is identified, it is just a matter of checking the size of that directory using df -h.

Generally, it won’t be higher than a few hundred MBs. The size depends on the number of databases and objects (tables and indexes) within those databases. More importantly, what are the statistics collected, which is controlled by parameters like track_activities , track_activity_query_size , track_commit_timestamp , track_counts , track_functions  and track_io_timing  as mentioned in PostgreSQL Documentation.

Ramfs or tempfs?

There are two main RAM-based filesystems: ramfs and tempfs.

A ramfs can be mounted using a /etc/fstab entry like

However, there are a couple of disadvantages. Even if we specify the uid and gid, as shown above, the ramfs will be mounted as root :(. and we need a script or method to change ownership or grant permission to the “postgres” account under which PostgreSQL will be running.

Another problem with ramfs is that we won’t be able to see the size using the df command. But there is an advantage also: ramfs is dynamically resizable, and it can grow dynamically as needed.  This eliminates the problem of estimating the size required. However, ramfs cannot use swap if required, so there is a slight risk of a system hang, especially on those systems with high memory constraints.

Considering the risk and demerits, ramfs is less popular, and tempfs is the one which is generally used.

Here is a sample /etc/fstab entry for tempfs:

Once this filesystem is mounted, it will appear as a regular filesystem, unlike ramfs:

And it is now a matter of instructing PostgreSQL to use this location as the stats_temp_directory. Please remember that this is a dynamic parameter and doesn’t need a PostgreSQL to bounce.

We just need to signal (SIGHUP) the PostgreSQL to reload the new configuration.

Another option is to make use of /dev/shm. However, it is recommended to maintain a subdirectory that is owned by the Postgres user account with strict permissions.

What are the options for DBaaS?

Restrictions imposed by Database as a Service (DBaaS) generally prevent the users from even starting to estimate the RAM-based filesystem required. Some of the cloud vendors like AWS give instructions on how to setup ramdisk for their DBaaS solution (RDS). Similar documentation exists for the Aurora offering, also. However, I couldn’t see any doc for size estimation.

Many other cloud vendors who offer DBaaS solutions are entirely silent about the stats_temp_directory.

Additional note

Taking the stat_temp_directory outside of the data directory needs additional care if we plan to run multiple instances in the same host machine. Each PostgreSQL instance/cluster needs to have its own stat_temp_directory.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments