Logical Replication/Decoding Improvements in PostgreSQL 13 and 14

I recently blogged about how Patroni Addresses the Problem of the Logical Replication Slot Failover in a PostgreSQL Cluster. In fact, nothing else was hurting a logical replication as much as this problem. Even while I am writing this post, I could see customers/users who don’t have Patroni struggling to address this. Thanks to the Patroni community for addressing this problem in the most magnificent way: No patch to PostgreSQL, no Extensions required! Completely non-invasive solution.

As the biggest road-block/deterrent is gone, we expect more and more users to start looking into OR re-considering the logical replication, especially those who discarded it due to practical difficulties. I want to let them know that there are a lot more exciting new features related to logical replication/decoding in new versions like PostgreSQL 13 and 14.

Before getting into new features, let us look at what else was hurting in logical replication in older PostgreSQL versions.

Memory Usage and Disk Usage

PostgreSQL used to keep only 4096 changes (max_changes_in_memory) for each transaction in memory. If there is a very lengthy transaction, the rest of the changes will be spilled to disk as spill files. This has two important implications. First, if each change is really big and if there are sub-transactions, the memory consumption can easily run into several GBs. This can even affect the host machine’s stability and the chance of OOM kicks-in. On the other hand, if the changes are very small and if there are too many small changes, it will be spilled to disk if the transaction is lengthy, causing IO overheads.

Massive Replication Delays and CPU Load

It was almost a regular complaint from many users that they keep seeing huge replication lags. A closer inspection shows the WAL Sender is consuming a lot of CPU. Single-core saturation was the commonly reported case. Many times, a further closer investigation reveals that there was a long-running transaction or a bulk data loading and causing the generation of spill files. The system is busy going through the spill files and preparing the commit order, which needs to be sent to a logical replica

Again, we witnessed a few cases where users opted for logical replication to reduce the load on the primary. But the complexity (CPU and IO usage) during the logical decoding by the WAL sender wiped out all potential gains.

These problems were not something unknown to the PostgreSQL community. In fact, the discussions started around the same time PostgreSQL 10 was released about the problems and their fixes. The good news is all these are addressed in the recent development.

I would like to express my gratitude from user communities to those who contributed their time and effort in developing these wonderful solutions. Namely, Tomas Vondra, Amit Kapila, Dilip Kumar, Masahiko Sawada, Vignesh C, and there are many more who gave very valuable input like Peter Eisentraut, Masahiko Sawada, and Andres Freund.

Improvements in PostgreSQL 13

The problem of memory and disk usage is basically addressed in PostgreSQL 13. Now the max_changes_in_memory (4096) is not used while adding changes. Instead, total memory usage for all transactions together and memory usage for individual transactions are tracked. A new parameter logical_decoding_work_mem is introduced. The buffer will be spilled to disk only if this limit is exceeded and only the largest transaction which consumes the highest amount of memory will be the victim to be spilled to disk. This is smarter and reduces unwanted disk spills also.

Reference: ReorderBufferCheckMemoryLimit (src/backend/replication/logical/reorderbuffer.c)

Improvements in PostgreSQL 14

Spilling to disk when logical_decoding_work_mem is full is one idea. But what about transmitting the changes directly to subscribers instead of spilling to disk. This is the major change/improvement in PostgreSQL 14. But that is not that easy as say because we are dealing with ongoing transactions. Overall logic and feature for logical replication had to undergo huge changes. But yes, PostgreSQL 14 introduces the option to stream the reorderbuffer to the subscriber rather than spill to the disk first. Obviously, this new feature to stream the ongoing transaction required the improvement of the replication protocol. New message formats like “Stream Start”, “Stream Stop”, “Stream Commit” and “Stream Abort” etc are added to the replication protocol. Please refer to the PostgreSQL documentation here for more details.

The corresponding changes are also required on the output plugin interface also. This is also an improvement in PG 14. Please refer the commit 45fdc9738b for more details and refer to the PostgreSQL doc.

The streaming is considered when the logical_decoding_work_mem is exceeded. This doesn’t mean that the buffer is never spilled to the disk. Spilling to disk remains the option if streaming is not possible. This happens if the information currently available is not sufficient to decode.

The commit 7259736a6e5b7c7588fff9578370736a6648acbb summarizes the big improvement.

Instead of serializing the transaction to disk after reaching the logical_decoding_work_mem limit in memory, we consume the changes we have in memory and invoke stream API methods added by commit 45fdc9738b. However, sometimes if we have incomplete toast or speculative insert we spill to the disk because we can’t generate the complete tuple and stream. And, as soon as we get the complete tuple we stream the transaction including the serialized changes.
We can do this incremental processing thanks to having assignments (associating subxact with toplevel xacts) in WAL right away, and thanks to logging the invalidation messages at each command end. These features are added by commits 0bead9af48 and c55040ccd0 respectively.
Now that we can stream in-progress transactions, the concurrent aborts may cause failures when the output plugin consults catalogs (both system and user-defined). We handle such failures by returning ERRCODE_TRANSACTION_ROLLBACK sqlerrcode from system table scan APIs to the backend or WALSender decoding a specific uncommitted transaction. The decoding logic on the receipt of such a sqlerrcode aborts the decoding of the current transaction and continue with the decoding of other transactions.

How to Setup

The necessary features are available only from PostgreSQL 14. And client needs to initiate a replication connection with “streaming” on. To facilitate this, the CREATE SUBSCRIPTION takes a new input parameter “streaming”, which is off by default. Following is an example:

CREATE SUBSCRIPTION sub1 CONNECTION 'host=pg0 port=5432 dbname=postgres user=postgres password=xxxx' PUBLICATION tap_pub WITH (streaming = on);

1	CREATE SUBSCRIPTION sub1 CONNECTION 'host=pg0 port=5432 dbname=postgres user=postgres password=xxxx' PUBLICATION tap_pub WITH (streaming = on);

Please make a note of the new parameter streaming =on which specifies whether streaming of in-progress transactions should be enabled for this subscription.

Alternatively, an existing subscription can be modified to enable streaming.

ALTER SUBSCRIPTION sub1 SET(STREAMING = ON)

1	ALTER SUBSCRIPTION sub1 SET(STREAMING = ON)

Monitoring Improvements

There are two major improvements in terms of monitoring.

Monitoring the Initial Data Copy

PostgreSQL 14 allows users to monitor the progress of the COPY command using a new monitoring view pg_stat_progress_copy. This is a great value addition when someone is setting up the logical replication. Please refer to the documentation for more details.

The following is the sample output of: select * from pg_stat_progress_copy ; from the PUBLISHER side using psql’s watch

                                     Wed 23 Feb 2022 07:01:46 AM UTC (every 1s)

 pid  | datid | datname  | relid | command | type | bytes_processed | bytes_total | tuples_processed | tuples_excluded 
------+-------+----------+-------+---------+------+-----------------+-------------+------------------+-----------------
 2034 | 16401 | postgres | 16390 | COPY TO | PIPE |       932960052 |           0 |          9540522 |               0
(1 row)

                                      Wed 23 Feb 2022 07:01:47 AM UTC (every 1s)

 pid  | datid | datname  | relid | command | type | bytes_processed | bytes_total | tuples_processed | tuples_excluded 
------+-------+----------+-------+---------+------+-----------------+-------------+------------------+-----------------
 2034 | 16401 | postgres | 16390 | COPY TO | PIPE |       976060287 |           0 |          9979509 |               0
(1 row)

Wed 23 Feb 2022 07:01:46 AM UTC (every 1s)

------+-------+----------+-------+---------+------+-----------------+-------------+------------------+-----------------

2034 | 16401 | postgres | 16390 | COPY TO | PIPE | 932960052 | 0 | 9540522 | 0

(1 row)

Wed 23 Feb 2022 07:01:47 AM UTC (every 1s)

------+-------+----------+-------+---------+------+-----------------+-------------+------------------+-----------------

2034 | 16401 | postgres | 16390 | COPY TO | PIPE | 976060287 | 0 | 9979509 | 0

(1 row)

Since we know how many tuples are there in the table, we don’t have any difficulty in understanding how far it is progressed.

Similar monitoring is possible from the SUBSCRIBER side also:

                                        Wed 23 Feb 2022 07:01:46 AM UTC (every 1s)

 pid  | datid | datname  | relid |  command  |   type   | bytes_processed | bytes_total | tuples_processed | tuples_excluded 
------+-------+----------+-------+-----------+----------+-----------------+-------------+------------------+-----------------
 1204 | 14486 | postgres | 16385 | COPY FROM | CALLBACK |       912168274 |           0 |          9328360 |               0
(1 row)

                                         Wed 23 Feb 2022 07:01:47 AM UTC (every 1s)

 pid  | datid | datname  | relid |  command  |   type   | bytes_processed | bytes_total | tuples_processed | tuples_excluded 
------+-------+----------+-------+-----------+----------+-----------------+-------------+------------------+-----------------
 1204 | 14486 | postgres | 16385 | COPY FROM | CALLBACK |       948074690 |           0 |          9694752 |               0
(1 row)

Wed 23 Feb 2022 07:01:46 AM UTC (every 1s)

------+-------+----------+-------+-----------+----------+-----------------+-------------+------------------+-----------------

1204 | 14486 | postgres | 16385 | COPY FROM | CALLBACK | 912168274 | 0 | 9328360 | 0

(1 row)

Wed 23 Feb 2022 07:01:47 AM UTC (every 1s)

------+-------+----------+-------+-----------+----------+-----------------+-------------+------------------+-----------------

1204 | 14486 | postgres | 16385 | COPY FROM | CALLBACK | 948074690 | 0 | 9694752 | 0

(1 row)

Monitoring the Logical Replication

Monitoring logical replication is possible through the new view available from PostgreSQL 14: pg_stat_replication_slots on the PUBLISHER side. (Name is similar to pg_replication_slots ). But this view is a great improvement.

This is of great use even if we are not using the new streaming feature because there is a higher chance of generating spill files.

postgres=# select * from pg_stat_replication_slots ;
 slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset 
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
 sub       |          1 |          34 |  2250000000 |           0 |            0 |            0 |       2701 |     1766040 | 
(1 row)

postgres=# select * from pg_stat_replication_slots ;

-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------

sub | 1 | 34 | 2250000000 | 0 | 0 | 0 | 2701 | 1766040 |

(1 row)

As we can see in the above case, there was a single bulky transaction, which caused a lot of spill files.

Statistics related to a particular slot can be reset using function pg_stat_reset_replication_slot();

postgres=# select pg_stat_reset_replication_slot('sub');
 pg_stat_reset_replication_slot 
--------------------------------
 (1 row)

postgres=# select * from pg_stat_replication_slots ;
 slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes |          stats_reset          
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------------------------
 sub       |          0 |           0 |           0 |           0 |            0 |            0 |          0 |           0 | 2022-02-23 15:39:08.472519+00
(1 row)

postgres=# select pg_stat_reset_replication_slot('sub');

pg_stat_reset_replication_slot

--------------------------------

(1 row)

postgres=# select * from pg_stat_replication_slots ;

-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------------------------

sub | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2022-02-23 15:39:08.472519+00

(1 row)

With the streaming enabled, we can get details of streaming of ongoing transactions:

                                                       Wed 23 Feb 2022 03:58:53 PM UTC (every 2s)

 slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes |         stats_reset         
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-----------------------------
 sub       |          1 |           9 |   603980550 |           0 |           29 |   1914455250 |        242 |  1914488162 | 2022-02-23 15:55:46.8994+00
(1 row)

Wed 23 Feb 2022 03:58:53 PM UTC (every 2s)

-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-----------------------------

sub | 1 | 9 | 603980550 | 0 | 29 | 1914455250 | 242 | 1914488162 | 2022-02-23 15:55:46.8994+00

(1 row)

It is recommended to adjust the value of logical_decoding_work_mem (default is 64MB) to set the maximum amount of memory that we can spend for a (each) walsender process. Using this we can avoid many spills to disk at the same time avoiding excessive memory usage.

For example:

postgres=# ALTER SYSTEM SET logical_decoding_work_mem = '512MB';
ALTER SYSTEM
postgres=# select pg_reload_conf();

postgres=# ALTER SYSTEM SET logical_decoding_work_mem = '512MB';

ALTER SYSTEM

postgres=# select pg_reload_conf();

Conclusion

In this blog post, I want to encourage PostgreSQL users who abandoned logical replication in the past due to its shortcomings to reconsider it again as PostgreSQL 13 and 14 along with Patroni solves most of the difficulties. Lengthy, bulk transactions are known to cause severe problems to logical replication. The effect was very serious in previous versions but with new improvements, it is mitigated to a good extent and expected to reduce the load on the publisher side considerably.

However, this doesn’t mean that it is perfect. Community and developers are aware of lot more areas for improvement, especially improvements on the subscription side. We should expect such changes in the upcoming versions.

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Logical Replication/Decoding Improvements in PostgreSQL 13 and 14

Memory Usage and Disk Usage

Massive Replication Delays and CPU Load

Improvements in PostgreSQL 13

Improvements in PostgreSQL 14

How to Setup

Monitoring Improvements

Monitoring the Initial Data Copy

Monitoring the Logical Replication

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

PostgreSQL Database Security Best Practices

Did MyDumper LIKE Triggers?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Logical Replication/Decoding Improvements in PostgreSQL 13 and 14

Memory Usage and Disk Usage

Massive Replication Delays and CPU Load

Improvements in PostgreSQL 13

Improvements in PostgreSQL 14

How to Setup

Monitoring Improvements

Monitoring the Initial Data Copy

Monitoring the Logical Replication

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

PostgreSQL Database Security Best Practices

Did MyDumper LIKE Triggers?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation