Experiencing WAL REDO in PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

Experiencing WAL REDO in PostgreSQL

1. Overview

Write-Ahead Logging (WAL) is a standard method used in PostgreSQL to ensure data integrity. Many key features rely on this WAL design, such as streaming replication, and Point-In-Time recovery, etc. While there is a detailed online book explaining how WAL works in PostgreSQL, there is a lack of detailed documentation or blogs describing the process of adding a new XLOG record to WAL. This blog aims to share the steps on how to add a piece of customized information as an XLOG record for use during WAL REDO.

2. WAL REDO

WAL not only ensures data integrity but is also crucial for replication and recovery. Sometimes, you might consider adding a piece of information as an XLOG to help achieve specific tasks. All the REDO functions implemented by the PostgreSQL community are defined in `src/include/access/rmgrlist.h`.

/* symbol name, textual name, redo, desc, identify, startup, cleanup */
PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL)
PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, NULL)
PG_RMGR(RM_SMGR_ID, "Storage", smgr_redo, smgr_desc, smgr_identify, NULL, NULL)
PG_RMGR(RM_CLOG_ID, "CLOG", clog_redo, clog_desc, clog_identify, NULL, NULL)
PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, dbase_identify, NULL, NULL)
PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, tblspc_identify, NULL, NULL)
PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, multixact_identify, NULL, NULL)
PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, relmap_identify, NULL, NULL)
PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, standby_identify, NULL, NULL)
PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, heap2_identify, NULL, NULL)
PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, heap_identify, NULL, NULL)
PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, NULL, NULL)
PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, hash_identify, NULL, NULL)
PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_identify, gin_xlog_startup, gin_xlog_cleanup)
PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_identify, gist_xlog_startup, gist_xlog_cleanup)
PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, seq_identify, NULL, NULL)
PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_identify, spg_xlog_startup, spg_xlog_cleanup)
PG_RMGR(RM_BRIN_ID, "BRIN", brin_redo, brin_desc, brin_identify, NULL, NULL)
PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_identify, NULL, NULL)
PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL)
PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL)
PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL)

To add an extra REDO and extend the XLOG, you can refer to git commit logs. For example, below commit introduces a new XLOG record to help track commit timestamps.

commit 73c986adde5d73a5e2555da9b5c8facedb146dcd (HEAD -> redo3)
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Wed Dec 3 11:53:02 2014 -0300

    Keep track of transaction commit timestamps
    
    Transactions can now set their commit timestamp directly as they commit,
    or an external transaction commit timestamp can be fed from an outside
    system using the new function TransactionTreeSetCommitTsData().  This
    data is crash-safe, and truncated at Xid freeze point, same as pg_clog.
    ...

And another git commit below adds Generic Messages for Logical Decoding.

commit 3fe3511d05127cc024b221040db2eeb352e7d716 (HEAD -> redo2)
Author: Simon Riggs <simon@2ndQuadrant.com>
Date:   Wed Apr 6 10:05:41 2016 +0100

    Generic Messages for Logical Decoding
    
    API and mechanism to allow generic messages to be inserted into WAL that are
    intended to be read by logical decoding plugins. This commit adds an optional
    new callback to the logical decoding API.
    ...

While these changes are complex, providing a comprehensive picture of what needs to be changed to introduce a new WAL REDO to PostgreSQL, sometimes you may want to experience a quick change using WAL to evaluate possible solutions. Here, I will use smgr_redo to walk through a quick change, and hopefully, it can help someone speed up the evaluation.

Here is the general picture of what I want to achieve using WAL REDO.

First, define a new XLOG type, a data structure to pass the data through an XLOG record, and declare a function to insert XLOG like below in src/include/catalog/storage_xlog.h

#define XLOG_SMGR_PAGE_NEW     0x40    /* record a page new xlog */

typedef struct xl_smgr_page_new
{
       RelFileNode     rnode;
       ForkNumber      forkNum;
       BlockNumber blockNum;
       /*
        * required parameters
        * */
} xl_smgr_page_new;

extern void log_smgr_page_new(const RelFileNode *rnode, ForkNumber forkNum, BlockNumber blockNum);

Second, implement the function to insert XLOG record and also add the logic (for example, print some message) in src/backend/catalog/storage.c, a simple diff like below:

diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index c06e414a38..3403e39ce9 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -1058,6 +1058,31 @@ smgr_redo(XLogReaderState *record)

                FreeFakeRelcacheEntry(rel);
        }
+       else if (info == XLOG_SMGR_PAGE_NEW)
+       {
+               xl_smgr_page_new *xlrec = (xl_smgr_page_new *) XLogRecGetData(record);
+
+               // logic to handle redo, for example standby only
+               ereport(WARNING, (errmsg("REDO PAGE_NEW: spcNode %u, dbNode %u, relNode %u, blocknum %u, forknum %d ",
+                               xlrec->rnode.spcNode, xlrec->rnode.dbNode, xlrec->rnode.relNode, xlrec->blockNum, xlrec->forkNum)));
+       }
        else
                elog(PANIC, "smgr_redo: unknown op code %u", info);
 }
+
+void log_smgr_page_new(const RelFileNode *rnode, ForkNumber forkNum, BlockNumber blockNum)
+{
+       xl_smgr_page_new xlrec;
+       XLogRecPtr      EndPos;
+
+       xlrec.rnode = *rnode;
+       xlrec.forkNum = forkNum;
+       xlrec.blockNum = blockNum;
+
+       XLogBeginInsert();
+       XLogRegisterData((char *) &xlrec, sizeof(xlrec));
+       EndPos = XLogInsert(RM_SMGR_ID, XLOG_SMGR_PAGE_NEW);
+       XLogFlush(EndPos);
+
+}

Third, add some corresponding description to smgrdesc.c to allow the user to see the details when using pg_waldump.

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 7547813254..74bae53524 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,17 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
                                                 xlrec->blkno, xlrec->flags);
                pfree(path);
        }
+       else if (info == XLOG_SMGR_PAGE_NEW)
+       {
+               xl_smgr_page_new *xlrec = (xl_smgr_page_new *) rec;
+               char       *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+               appendStringInfo(buf, "%s to buftag: %u/%u/%u/%d/%u",
+                                                path, xlrec->rnode.spcNode, xlrec->rnode.dbNode,
+                                                xlrec->rnode.relNode, xlrec->forkNum, xlrec->blockNum);
+
+               pfree(path);
+       }
 }

 const char *
@@ -55,6 +66,9 @@ smgr_identify(uint8 info)
                case XLOG_SMGR_TRUNCATE:
                        id = "TRUNCATE";
                        break;
+               case XLOG_SMGR_PAGE_NEW:
+                       id = "PAGENEW";
+                       break;
        }

        return id;

Lastly, but not least, find somewhere to insert the XLOG record. As I use smgr_extend to trigger this XLOG record, I add one line below to the beginning of smgrextend function in smgr.c.

log_smgr_page_new(&reln->smgr_rnode.node, forknum, blocknum);

After recompiling and installing Postgres, if you create a table and insert some records, you should be able to see the new XLOG record dumped out using pg_waldump like below:

$ pg_waldump -p /tmp/pgdata/pg_wal   000000010000000000000003 |grep NEW
pg_waldump: error: error in WAL record at 0/3015E60: invalid record length at 0/3015E98: wanted 24, got 0
rmgr: Storage     len (rec/tot):     46/    46, tx:        738, lsn: 0/03015CC8, prev 0/03015C90, desc: PAGENEW base/5/16391 to buftag: 1663/5/16391/0/0

If you have a standby set up to replicate the XLOG record and perform the REDO, then you should see some messages like below on the standby side:

$ tail -f logfile-standby 
2023-11-17 14:58:20.223 PST [2461464] WARNING:  REDO PAGE_NEW: spcNode 1663, dbNode 5, relNode 16391, blocknum 0, forknum 0 
2023-11-17 14:58:20.223 PST [2461464] CONTEXT:  WAL redo at 0/3015CC8 for Storage/PAGENEW: base/5/16391 to buftag: 1663/5/16391/0/0

3. Summary

In this blog post, I provide a quick way to experience the WAL REDO process, and hopefully, it can help you better understand the WAL REDO concepts.