A Look at Postgres 15: MERGE Command with Examples

Jean-Paul Argudo

4 min read

With PostgreSQL 15 comes a new SQL Command called MERGE. MERGE has been in the SQL standard for quite a while, but it just made its way into the PostgreSQL codebase. Prior to MERGE, if you wanted to refresh a target table from a source table, prior to Postgres 15, you could use the "upsert" method with the ON CONFLICTclause.

Now, MERGE can be used instead! Some situations where MERGE makes a lot of sense are:

  • data loading from external sources, thru foreign data wrappers
  • staged and batched process jobs

About MERGE

Let's look at the synopsis in the documentation:

[[ WITH with_query [, ...] ]
MERGE INTO target_table_name [ [ AS ] target_alias ]
USING data_source ON join_condition
when_clause [...]

where data_source is:

{ source_table_name | ( source_query ) } [ [ AS ] source_alias ]

and when_clause is:

{ WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }

and merge_insert is:

INSERT [( column_name [, ...] )]
[ OVERRIDING { SYSTEM | USER } VALUE ]
{ VALUES ( { expression | DEFAULT } [, ...] ) | DEFAULT VALUES }

and merge_update is:

UPDATE SET { column_name = { expression | DEFAULT } |
             ( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) } [, ...]

and merge_delete is:

DELETE

We can read that the source of the data_source can be either a table or some data brought to the command thanks to the prior WITH with_query, or inside the MERGE after USING source_query. When the data matches, i.e. there's a matching record in the target_table_name, we can specify what to do:

  • do an UPDATE as defined in the merge_update (see UPDATE SET...) part or
  • do a DELETE as defined in the merge_delete or
  • do nothing

When the data doesn't matches, i.e. there's no matching record in the target_table_name, then we do an INSERT (see merge_update).

MERGE example - remote sensors & batches

In working with clients, I have seen a need for data loading with remote sensors or stations. Anything “sending regularly data” and loading that data into a dataset is a common need. I’m particularly excited about using MERGE to solve these issues for Postgres users.

In order to explain this data loading use case of MERGE, I’m going to set up an example. We’ll have a database of stations that’s a remote measurement tool with data coming in intermittently.

  • Streaming data, data arriving continuously
  • Stations emit periodic and intermittent data measures
  • There's a batch collecting this data
  • station_data_new is a temp table for common storage
  • Data is stored long term station_data_actual with the last possible values
  • We want to keep track of when the station has been created

Let's create some tables for testing purpose. Beware that the temporary table station_data_new will only exist in the context of a given session.

create temporary table station_data_new (
    station_id   integer
  , a            integer
  , b            integer
);

create table station_data_actual (
    station_id   integer   primary key
  , a integer
  , b integer
  , created      timestamp default current_timestamp
  , updated      timestamp default current_timestamp
);

Let's create some sample data into station_data_new: our 1st 5 stations are up and sent the 1st batch of data:

with measures as (
  select *
  from generate_series(1,5)
)
insert into station_data_new (
    station_id
  , a
  , b
)
select
    generate_series
  , round(random()*100+1)
  , round(random()*100+1)
from
  measures;

Basic MERGE

At this point, we could just do a plain INSERT with SELECT from station_data_new to station_data_actual. Instead we’ll use MERGE because we’re planning for cases when the data already exists, an UPDATE will be issued and not an INSERT.

merge into station_data_actual sda
using station_data_new sdn
on sda.station_id = sdn.station_id
when matched then
  update set a = sdn.a, b = sdn.b, updated = default
when not matched then
  insert (station_id, a, b)
  values (sdn.station_id, sdn.a, sdn.b);

If you execute it once, you'll have in return MERGE 5, and the data is inserted into station_data_actual, where the timestamps in created and updated have the same value.

To do even more testing you could:

  • truncate table station_data_new;
  • Re-create data in it, by changing generate_series(1,5) with generate_series(1,10)
  • Issue again the very same MERGE as we ran before

You'll see the data in station_data_actual updated. Stations 1 to 5 will have updated status and stations 6 to 10 will be created and updated.

Conclusion

MERGE was a long awaited feature for PostgreSQL fans! Now we have it. I encourage you to review your processes when it's about merging existing data in your database. In this example, new data entering with existing data can be a good place to use this.

MERGE opens new usages of aggregating and/or merging data from many databases.  In distributed models, where all data exists across different locations, using foreign data wrappers with MERGE could be a really elegant solution. MERGE may simplify your processing of data all along the database's life. With strong and fast SQL statements as opposed to functions or other complex operations.

Avatar for Jean-Paul Argudo

Written by

Jean-Paul Argudo

October 14, 2022 More by this author