COPY command and bulk loading in PostgreSQL 12

Posted on 2019-06-05 by Hans-Jürgen Schönig

If you are relying heavily on the PostgreSQL COPY command to load data into PostgreSQL quickly, PostgreSQL 12 might offer a feature, which is most likely very beneficial to you. Bulkloading is an important operation and every improvement in this area is certainly going to help many people out there, who want to import data into PostgreSQL as fast as possible.

COPY: Loading and unloading data as fast as possible

When taking a closer look at the syntax of the COPY command in PostgreSQL 12 you will quickly see two things:

• \h will now point to the correct page in the documentation
• COPY now supports a WHERE condition

Here is the complete syntax overview:

db12=# \h COPY
Command:     COPY
Description: copy data between a file and a table
Syntax:
COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | PROGRAM 'command' | STDOUT }
    [ [ WITH ] ( option [, ...] ) ]

where option can be one of:

    FORMAT format_name
    FREEZE [ boolean ]
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    HEADER [ boolean ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    FORCE_NOT_NULL ( column_name [, ...] )
    FORCE_NULL ( column_name [, ...] )
    ENCODING 'encoding_name'

URL: https://www.postgresql.org/docs/12/sql-copy.html

While having a link to the documentation around is certainly beneficial, the WHERE condition added to PostgreSQL 12 might even be more important. What is the purpose of this new feature? So far it was possible to completely import a file. However, in some cases this has been a problem: More often than not people only wanted to load a subset of data and had to write a ton of code to filter data before the import or once data has been written into the database already.

COPY … WHERE: Applying filters while importing data

In PostgreSQL data can be filtered while importing easily. The COPY become is pretty flexible and allows a lot of trickery. To show you, how the new WHERE clause works, I have compiled a simple example:

db12=# CREATE TABLE t_demo AS 
		SELECT * FROM generate_series(1, 1000) AS id;
SELECT 1000

First of all 1000 rows are generated to make sure that we got some data to play. Then we export the content of this table to a file:

db12=# COPY t_demo TO '/tmp/file.txt';
COPY 1000

Finally, we can try to import this data again:

db12=# CREATE TABLE t_import (x int);
CREATE TABLE
db12=# COPY t_import FROM '/tmp/file.txt' WHERE x < 5;
COPY 4
db12=# SELECT * FROM t_import;
 x 
---
 1
 2
 3
 4
(4 rows)

As you can see filtering data is pretty simple and very straight forward. One important thing to note here is: I exported an “id” column and imported it as “x”. Keep in mind that the text file does not know the data structure of our target table – you have to make sure that you filter on the column name of the table you want to import.

Old gems revisited …

If you are new to PostgreSQL in general I also want to present one of the older features, which I like a lot personally. COPY can send data to the UNIX pipe or read data from a pipe. Here is how it works:

db12=# COPY t_demo TO PROGRAM 'gzip -c > /tmp/file.txt.gz';
COPY 1000
db12=# COPY t_import FROM PROGRAM 'gunzip -c /tmp/file.txt.gz' 
	WHERE x BETWEEN 100 AND 103;
COPY 4
db12=# SELECT * FROM t_import WHERE x >= 100;
  x  
-----
 100
 101
 102
 103
(4 rows)

In some cases, you might want to do more than to just export data. In this case I decided to compress the data while exporting. Before the data is imported again it is uncompressed and again filtered. As you can see it is pretty simple to combine those features in a flexible way.

If you want to learn more about PostgreSQL and loading data in general, check out our post about rules and triggers. If you want to learn more about COPY, checkout the PostgreSQL documentation.