Good database maintenance includes not only performing the standard adding, updating, and deleting records, etc., but periodic edits to the table schema too. Operations such as adding, editing, and removing table columns are part of today’s life-cycle reality too as new functionality is constantly being added.

In quieter, and less demanding times, one could get away with schema updates with minimal impact by performing the operation during low load times. However, as all database systems have become more critical to the business bottom line, maintenance windows have become, by necessity, much smaller and more time-sensitive.

So the question is asked: How can one update a multi-terabyte table with near-zero downtime?

Looking deep into our Postgres bag of tricks, we look not at the most recent and advanced features but instead, we leverage a very old capability that’s been part of Postgres since when it was first released as an open source project, i.e. its object-relational capacity implementation of inheritance.

Use Case

Before going into the details of the solution let’s define just what we are trying to solve.

Strictly speaking; there’s two use cases that come to mind when using inheritance as the prime ETL data migration mechanism:

  • Removing table OIDs, such as when moving to Postgres version 12 and greater.
  • Performing DML/DDL operations which include:
    • updating the data
    • adding or removing table columns

Life starts getting complicated when dealing with issues such as updating data types. But we’ll talk about this at the end of the blog.

About Inheritance

Unlike object-oriented programming languages, where the child inherits attributes from the parent, in Postgres it’s the parent that has the ability to inherit from the child ( if only this was true in real-life ;-)). Thus a table column from a child has the potential of becoming available in the parent relation.

Consider the following snippet of code: Two parents and three children are created and populated with records:

Columns declared in the parents must therefore exist in the child, i.e. they are merged. But notice those columns unique to the child are not necessarily propagated to the parents:

 

 

 

 

 

And even though records populated in the child can be seen by the parent the reverse is NOT true when records are populated into the parent and are not seen by the child:

Developing The ETL/Migration Model

Performing data migration under production conditions should take into consideration these four (4) distinct query operations:

  • SELECT from both the target and source tables at the same time.
  • UPDATE and/or DELETE records from both the target and source tables.
  • INSERT new records into the target table.
  • Moving data from the source to the target tables.

For the sake of discussion we’ll demonstrate using one source and target table respectively inheriting from a single parent:

Querying Both Target And Source Tables

Inheritance makes the SELECT query a straightforward operation. This query checks all the tables for the queried record(s):

UPDATE and/or DELETE Records

Similarly to SELECT queries, one doesn’t have to worry about editing the existing application’s DML operation(s) when performing UPDATE and DELETE. Again, notice how both source and target tables are queried as well as the parent:

INSERT New Records Into The Target Table

The thing to keep in mind about INSERT is that without a redirect mechanism all records are inserted into the parent.

Since everybody already knows about triggers, I thought it would be fun to use a REWRITE RULE instead:

 

And here’s our validation; notice how the INSERT is redirected from parent to target:

Moving Records Between Source And Target Tables

It’s time to introduce the last mechanism needed to perform the actual data migration. Essentially, the data is moved in batches otherwise, if you can afford the downtime of moving your records in one very large transaction, this dog and pony show is a bit redundant.

In the real world we need to anticipate multiple processes attempting simultaneous EXCLUSIVE LOCKS. In the case that one or more records are locked by another operation the following example demonstrates how one can simply skip over them:

Putting It All Together

It’s time to demonstrate a Proof Of Concept using pgbench.

SETUP

Initialize database db02:

Create the tables parent and child.

NOTE: In order to demonstrate data migration from a deprecated table, table pgbench_accounts is altered by adding OIDs.

TEST

This query is at the heart of the solution. Any exclusively locked record that it tries to move is automatically skipped and a new attempt can be made the next time this script is invoked.

VALIDATE

MIGRATION SCRIPT

The query has been incorporated into this script moving 1,000 records every 5 seconds.

BENCHMARKING

All the while the aforementioned script is active pgbench is running the bench-marking.

CAVEAT

This is a simple and powerful method but there are limitations: whereas common columns between tables must be of the same datatype.

For example, if column c1 in the table source is of datatype int and you want to migrate the data into table target with the same column c1 but with a datatype bigint then this method won’t work. An alternate solution however could take advantage of Updatable Views, which you can read more about here and using the appropriate triggers and rewrite rules.

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments