I recently blogged about how Patroni Addresses the Problem of the Logical Replication Slot Failover in a PostgreSQL Cluster. In fact, nothing else was hurting a logical replication as much as this problem. Even while I am writing this post, I could see customers/users who don’t have Patroni struggling to address this. Thanks to the Patroni community for addressing this problem in the most magnificent way: No patch to PostgreSQL, no Extensions required! Completely non-invasive solution.

As the biggest road-block/deterrent is gone, we expect more and more users to start looking into OR re-considering the logical replication, especially those who discarded it due to practical difficulties. I want to let them know that there are a lot more exciting new features related to logical replication/decoding in new versions like PostgreSQL 13 and 14.

Before getting into new features, let us look at what else was hurting in logical replication in older PostgreSQL versions.

Memory Usage and Disk Usage

PostgreSQL used to keep only 4096 changes (max_changes_in_memory) for each transaction in memory. If there is a very lengthy transaction, the rest of the changes will be spilled to disk as spill files. This has two important implications. First, if each change is really big and if there are sub-transactions, the memory consumption can easily run into several GBs. This can even affect the host machine’s stability and the chance of OOM kicks-in. On the other hand, if the changes are very small and if there are too many small changes, it will be spilled to disk if the transaction is lengthy, causing IO overheads.

Massive Replication Delays and CPU Load

It was almost a regular complaint from many users that they keep seeing huge replication lags. A closer inspection shows the WAL Sender is consuming a lot of CPU. Single-core saturation was the commonly reported case. Many times, a further closer investigation reveals that there was a long-running transaction or a bulk data loading and causing the generation of spill files. The system is busy going through the spill files and preparing the commit order, which needs to be sent to a logical replica

Again, we witnessed a few cases where users opted for logical replication to reduce the load on the primary. But the complexity (CPU and IO usage) during the logical decoding by the WAL sender wiped out all potential gains.

These problems were not something unknown to the PostgreSQL community. In fact, the discussions started around the same time PostgreSQL 10 was released about the problems and their fixes. The good news is all these are addressed in the recent development.

I would like to express my gratitude from user communities to those who contributed their time and effort in developing these wonderful solutions. Namely, Tomas Vondra, Amit Kapila, Dilip Kumar, Masahiko Sawada, Vignesh C, and there are many more who gave very valuable input like Peter Eisentraut, Masahiko Sawada, and Andres Freund.

Improvements in PostgreSQL 13

The problem of memory and disk usage is basically addressed in PostgreSQL 13. Now the max_changes_in_memory (4096) is not used while adding changes. Instead, total memory usage for all transactions together and memory usage for individual transactions are tracked. A new parameter logical_decoding_work_mem is introduced. The buffer will be spilled to disk only if this limit is exceeded and only the largest transaction which consumes the highest amount of memory will be the victim to be spilled to disk. This is smarter and reduces unwanted disk spills also.

Reference: ReorderBufferCheckMemoryLimit (src/backend/replication/logical/reorderbuffer.c)

Improvements in PostgreSQL 14

Spilling to disk when logical_decoding_work_mem is full is one idea. But what about transmitting the changes directly to subscribers instead of spilling to disk. This is the major change/improvement in PostgreSQL 14. But that is not that easy as say because we are dealing with ongoing transactions. Overall logic and feature for logical replication had to undergo huge changes.  But yes, PostgreSQL 14 introduces the option to stream the reorderbuffer to the subscriber rather than spill to the disk first. Obviously, this new feature to stream the ongoing transaction required the improvement of the replication protocol. New message formats like “Stream Start”, “Stream Stop”, “Stream Commit” and “Stream Abort” etc are added to the replication protocol. Please refer to the PostgreSQL documentation here for more details.

The corresponding changes are also required on the output plugin interface also. This is also an improvement in PG 14. Please refer the commit 45fdc9738b for more details and refer to the PostgreSQL doc.

The streaming is considered when the logical_decoding_work_mem is exceeded. This doesn’t mean that the buffer is never spilled to the disk. Spilling to disk remains the option if streaming is not possible. This happens if the information currently available is not sufficient to decode.

The commit 7259736a6e5b7c7588fff9578370736a6648acbb summarizes the big improvement.

Instead of serializing the transaction to disk after reaching the logical_decoding_work_mem limit in memory, we consume the changes we have in memory and invoke stream API methods added by commit 45fdc9738b. However, sometimes if we have incomplete toast or speculative insert we spill to the disk because we can’t generate the complete tuple and stream. And, as soon as we get the complete tuple we stream the transaction including the serialized changes.

We can do this incremental processing thanks to having assignments (associating subxact with toplevel xacts) in WAL right away, and thanks to logging the invalidation messages at each command end. These features are added by commits 0bead9af48 and c55040ccd0 respectively.

Now that we can stream in-progress transactions, the concurrent aborts may cause failures when the output plugin consults catalogs (both system and user-defined). We handle such failures by returning ERRCODE_TRANSACTION_ROLLBACK sqlerrcode from system table scan APIs to the backend or WALSender decoding a specific uncommitted transaction. The decoding logic on the receipt of such a sqlerrcode aborts the decoding of the current transaction and continue with the decoding of other transactions.

How to Setup

The necessary features are available only from PostgreSQL 14. And client needs to initiate a replication connection with “streaming” on. To facilitate this, the CREATE SUBSCRIPTION takes a new input parameter “streaming”, which is off by default. Following is an example:

Please make a note of the new parameter streaming =on which specifies whether streaming of in-progress transactions should be enabled for this subscription.

Alternatively, an existing subscription can be modified to enable streaming.

Monitoring Improvements

There are two major improvements in terms of monitoring.

Monitoring the Initial Data Copy

PostgreSQL 14 allows users to monitor the progress of the COPY command using a new monitoring view pg_stat_progress_copy. This is a great value addition when someone is setting up the logical replication. Please refer to the documentation for more details.

The following is the sample output of: select * from pg_stat_progress_copy ;  from the PUBLISHER side using psql’s watch

Since we know how many tuples are there in the table, we don’t have any difficulty in understanding how far it is progressed.

Similar monitoring is possible from the SUBSCRIBER side also:

Monitoring the Logical Replication

Monitoring logical replication is possible through the new view available from PostgreSQL 14: pg_stat_replication_slots on the PUBLISHER side. (Name is similar to pg_replication_slots ). But this view is a great improvement.

This is of great use even if we are not using the new streaming feature because there is a higher chance of generating spill files.

As we can see in the above case, there was a single bulky transaction, which caused a lot of spill files.

Statistics related to a particular slot can be reset using function  pg_stat_reset_replication_slot();

With the streaming enabled, we can get details of streaming of ongoing transactions:

It is recommended to adjust the value of logical_decoding_work_mem (default is 64MB) to set the maximum amount of memory that we can spend for a (each) walsender process. Using this we can avoid many spills to disk at the same time avoiding excessive memory usage.

For example:

Conclusion

In this blog post, I want to encourage PostgreSQL users who abandoned logical replication in the past due to its shortcomings to reconsider it again as PostgreSQL 13 and 14 along with Patroni solves most of the difficulties. Lengthy, bulk transactions are known to cause severe problems to logical replication. The effect was very serious in previous versions but with new improvements, it is mitigated to a good extent and expected to reduce the load on the publisher side considerably.

However, this doesn’t mean that it is perfect. Community and developers are aware of lot more areas for improvement, especially improvements on the subscription side. We should expect such changes in the upcoming versions.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments