PostgreSQL: ltree vs. WITH RECURSIVE

Posted on 2020-06-23 by Hans-Jürgen Schönig

After my last post about ltree and recursive data in PostgreSQL people have asked me privately about performance issues. To share this information, I decided to come up with a follow up post to discuss this topic in a bit more detail. WITH RECURSIVE in PostgreSQL is efficient. However, ltree does have its strengths as well. Let us take a closer look …

Preparing sample data

In preparation for a test I have created a table which contains a balanced tree:

test=# CREATE TABLE tree AS
         WITH RECURSIVE tree (id, parent, lvl) AS 
         (
            SELECT generate_series(1, 5) AS id, 
                   NULL :: int4 AS parent, 
                   1 AS lvl 
            UNION ALL
            SELECT n, 
                   id, 
                   lvl + 1
            FROM tree, 
                 generate_series(power(5, lvl) :: int4 + (id - 1)*5 + 1,
                 power(5, lvl) :: int4 + (id -1)*5 + 5 ) g(n)
            WHERE lvl < 10
         ) 
        SELECT * FROM tree ORDER BY id;

There is a bit of magic involved here but let us not worry too much about the details at this point. Once the data is generated, you should see the following content in your table:

test=# SELECT * FROM tree LIMIT 20;
 id | parent | lvl
----+--------+-----
  1 |        |   1
  2 |        |   1
  3 |        |   1
  4 |        |   1
  5 |        |   1
  6 |      1 |   2
  7 |      1 |   2
  8 |      1 |   2
  9 |      1 |   2
 10 |      1 |   2
 11 |      2 |   2
 12 |      2 |   2
 13 |      2 |   2
 14 |      2 |   2
 15 |      2 |   2
 16 |      3 |   2
 17 |      3 |   2
 18 |      3 |   2
 19 |      3 |   2
 20 |      3 |   2
(20 rows)

test=# SELECT count(*) FROM tree;
  count   
----------
 12207030
(1 row)

12 million rows will be a nice way to show how efficient ltree can really be. As stated before, using a recursive query on this data might take a bit more time than running the query on prepared data. What you can do in the next listing is to resolve the tree and materialize an ltree column containing all the data. Make sure ltree is enabled in your database:

test=# CREATE EXTENSION ltree;
CREATE EXTENSION

Materializing tree data in PostgreSQL

Here is a materialized view containing the same data in a different format. The goal is to materialize the tree column to speed up searching:

test=# CREATE MATERIALIZED VIEW mat AS
         WITH RECURSIVE x AS 
         (
           SELECT  
             id::text, parent::text, 
             id::text::ltree AS mypath                                                                                                      
           FROM tree
           WHERE parent IS NULL
           UNION ALL
           SELECT  
             y.id::text, y.parent::text, 
             ltree_addtext(x.mypath, y.id::text) AS mypath
           FROM x, tree AS y                                                                                                                                           
           WHERE x.id::text = y.parent::text
         )
         SELECT * FROM x;

To access the materialized view efficiently, we have to create a Gist index on the ltree column:

test=# CREATE INDEX idx_mypath ON mat USING gist (mypath);
CREATE INDEX

Note that a btree is not what we are looking for here. A gist index is necessary. The reason is that a btree only supports sorting which is not quite what we need here in this case.

Observing superior performance gains

If your data is static materializing makes a lot of sense. In case your data changes frequently and in case you always need 100% correct results it is of course pretty counterproductive to pre-calculate data. However, let us take a look at the speed of the materialized case:

test=# explain analyze
	SELECT *
	FROM mat
	WHERE '4.21.129.770.4475.25499.143118.793712.4359182.23749035' @> mypath;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on mat  
	(cost=161.88..4775.02 rows=1221 width=112)
	(actual time=0.117..0.117 rows=1 loops=1)
   Recheck Cond: ('4.21.129.770.4475.25499.143118.793712.4359182.23749035'::ltree @> mypath)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on idx_mypath  (cost=0.00..161.57 rows=1221 width=0) (actual time=0.105..0.105 rows=1 loops=1)
         Index Cond: (mypath <@ '4.21.129.770.4475.25499.143118.793712.4359182.23749035'::ltree)
 Planning Time: 0.098 ms
 Execution Time: 0.270 ms
(7 rows)

The query executes in way less than a millisecond, which is exceptional.

Finally …

Performance is king and we have written a lot about PostgreSQL performance in the past. One of the key aspects is VACUUM. Read on to find out how to enable and disable AUTOVACUUM.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.