Categories
Uncategorized

What happens to logical replication after running pg_upgrade?

Even if logical replication is a method of major upgrade itself, sometimes you’ll want to mix and match them. Consider the following scenario: a big and busy database that has a logical replica, which has its specific tables and indexes. When it’s time to upgrade, you will probably want to upgrade both instances – the publisher and the subscriber. As the database is big an busy, using logical replication itself or dump-and-restore aren’t the best choices – those methods will require massive amount of storage, and dump-and-restore will also require a long period of downtime. So, the most viable option is to use the pg_upgrade tool, capable of upgrading a multi-terabyte cluster in the matter of minutes with –link option. But…

pg_upgrade will destroy all replication slots

After successful pg_upgrade run, the new cluster will have no replication slots. That means if you allow your users and applications to connect to the upgraded clusters immediately, any data changes won’t be replicated to subscribers. So, it’s very important to ensure no data is written to the publisher database until logical replication is up and running.

Things to do when upgrading a publisher-subscriber environment

So, how to correctly run pg_upgrade without losing any data changes? Those steps worked for me:

  1. Block all traffic to the primary database from apps and users – using firewall settings, the “host all all 0.0.0.0/0 reject” rule in pg_hba.conf file, or both. Allow only replication connections from the subscriber(s).
  2. Check if there is any replication lag, and when it reaches zero, run DROP SUBSCRIPTION on the subscriber. This will drop the replication slot on publisher – but it would be removed by pg_upgrade anyway.
  3. Upgrade the subscriber cluster.
  4. Upgrade the publisher cluster.
  5. On the subscriber, CREATE SUBSCRIPTION … WITH (copy_data=FALSE)
  6. Check if the logical replication works as expected, for example add a bogus row to some replicated table, check for its existence on subscriber and delete it afterwards.
  7. Recreate any other logical replication slots (for example for Debezium), if required.
  8. Unlock the write traffic.
  9. Profit!

While it’s possible to detach a subscription from the replication slot (using ALTER SUBSCRIPTION … SET (slot_name=NONE)), recreate the slot, and attach the subscription to the slot again – in my test environment it resulted in the subscriber database not accepting new changes, so I finally decided to drop and recreate the subscription. This worked without any problems both in test and production.