The last two posts introduced TOASTing strategies and how TOAST tables look like. Have a look at those posts if you are not yet familiar with TOAST in PostgreSQL. In this post we’ll look a bit under the hood to understand how TOASTing really works.

We’ll use the same table as in the last post, which is still there but empty right now:

postgres=# \d t
                 Table "public.t"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | text    |           |          | 

postgres=# \d :reltoastrelid
TOAST table "pg_toast.pg_toast_16418"
   Column   |  Type   
------------+---------
 chunk_id   | oid
 chunk_seq  | integer
 chunk_data | bytea
Owning table: "public.t"
Indexes:
    "pg_toast_16418_index" PRIMARY KEY, btree (chunk_id, chunk_seq)

postgres=# select count(*) from t;
 count 
-------
     0
(1 row)

What happens if insert a row like this:

postgres=# insert into t values (1, repeat('x',10000));
INSERT 0 1

The second column now contains a string with a length of 10000 characters. Did this trigger the TOASTer?

postgres=# select count(*) from :reltoastrelid;
 count 
-------
     0
(1 row)

It did not, and the reason is easy to explain: This string can easily be compressed and there is no need to move it out of line. This leads us to another question: What compression algorithm was used to compress the data? The default is this:

postgres=# show default_toast_compression;
 default_toast_compression 
---------------------------
 pglz
(1 row)

Starting with PostgreSQL 14 you have an additional option:

postgres=# set default_toast_compression = lz4;
SET

LZ4 “is focused on compression and decompression speed” and usually gives you better results than the buid-in PGLZ algorithm. Support for LZ4 needs to be compiled in, otherwise it is not available. The packages provided by the PostgreSQL community have it enabled by default, if you compile from source you need to enable it:

postgres@debian11pg:/home/postgres/postgresql/ [pg16] ./configure --help | grep lz4
  --with-lz4              build with LZ4 support

You also have the flexibility to change the compression algorithm on a column basis:

postgres=# alter table t alter column b set compression lz4;
ALTER TABLE

As soon as you change the default compression algorithm for a column, this is recorded in pg_attribute (it is empty if the default compression is used):

postgres=# select attcompression from pg_attribute where attname = 'b';
 attcompression 
----------------
 l
(1 row)

The performance benefit of LZ4 compared to PGLZ is quite huge, as you can see in the little test below:

postgres=# \timing
Timing is on.
postgres=# truncate t;
TRUNCATE TABLE
Time: 18.493 ms
postgres=# insert into t select i, repeat('x',10000) from generate_series(1,1000000) i;
INSERT 0 1000000
Time: 3431.030 ms (00:03.431)
postgres=# select pg_size_pretty(pg_total_relation_size('t'));
 pg_size_pretty 
----------------
 89 MB
(1 row)
Time: 1.361 ms
postgres=# truncate t;
TRUNCATE TABLE
Time: 45.045 ms
postgres=# alter table t alter column b set compression pglz;
ALTER TABLE
Time: 13.442 ms
postgres=# insert into t select i, repeat('x',10000) from generate_series(1,1000000) i;
INSERT 0 1000000
Time: 51065.485 ms (00:51.065)
postgres=# select pg_size_pretty(pg_total_relation_size('t'));
 pg_size_pretty 
----------------
 160 MB
(1 row)

How can we force the TOASTer to kick in? One way of doing that is by generating a string that is not easily compressible like this:

postgres=# truncate t;
TRUNCATE TABLE
postgres=# with dummy_string as
( select string_agg (md5(random()::text),'') as dummy 
    from generate_series(1,5000) )
  insert into t 
  select 1
       , dummy_string.dummy 
    from dummy_string;
INSERT 0 1

This will give us 81 chunks in the corresponding TOAST table:

postgres=# select count(*) from :reltoastrelid;
 count 
-------
    81
(1 row)

… and the data looks like this:

postgres=# select * from :reltoastrelid limit 3;
 chunk_id | chunk_seq |                                                                                                      >
----------+-----------+------------------------------------------------------------------------------------------------------>
    16455 |         0 | \x313839623233643865653830316434323863336230333463633731313435333766396133393130356262383739343765373>
    16455 |         1 | \x313461633239646532633936613433306664636664303337346332343737626563336266373663346435386563633166663>
    16455 |         2 | \x633262373562616563393536636537383537323434633565363639396337646164653138623430366635633265383135643>
(3 rows)

We managed to trigger the TOASTer. What still needs to be answered is: At what size of a row is the TOASTer triggered. The answer is in “/src/include/access/heaptoast.h”: If a new tuple is larger than 2000 bytes, then the TOASTer kicks in and does its job.

Some final words: You never should use “select *” against tables anyway, but especially against tables which contain data which is moved out-of-line. If you do that, you will always pull out the TOASTed data, and this might take some time. Only retrieve it, if you really need it.