Waiting for PostgreSQL 13 – Allow vacuum command to process indexes in parallel.

On 20th of January 2020, Amit Kapila committed patch:

Allow vacuum command to process indexes in parallel.
 
This feature allows the vacuum to leverage multiple CPUs in order to
process indexes.  This enables us to perform index vacuuming and index
cleanup with background workers.  This adds a PARALLEL option to VACUUM
command where the user can specify the number of workers that can be used
to perform the command which is limited by the number of indexes on a
table.  Specifying zero as a number of workers will disable parallelism.
This option can't be used with the FULL option.
 
Each index is processed by at most one vacuum process.  Therefore parallel
vacuum can be used when the table has at least two indexes.
 
The parallel degree is either specified by the user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers.  The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.
 
Author: Masahiko Sawada and Amit Kapila
 
Mahendra Singh and Sergei Kornilov
 
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com

Description is pretty long, so let's see how it really works.

First, I will need some sample table, with several indexes:

=$ CREATE TABLE test (
    id                INT4 GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    some_int          INT4,
    some_timestamp    TIMESTAMPTZ,
    other_int         INT4,
    other_timestamp   TIMESTAMPTZ
);
CREATE TABLE
 
=$ INSERT INTO test (some_int, some_timestamp, other_int, other_timestamp)
SELECT
    random() * 500000000,
    '2000-01-01'::DATE + '20 years'::INTERVAL * random(),
    random() * 500000000,
    '1970-01-01'::DATE + '20 years'::INTERVAL * random()
FROM
    generate_series(1,100000000) i;
INSERT 0 100000000

OK, so we have some rows, which look like:

=$ SELECT * FROM test LIMIT 10;
 id | some_int  |       some_timestamp        | other_int |       other_timestamp
----+-----------+-----------------------------+-----------+-----------------------------
  1 | 179275930 | 2010-01-28 23:02:57.0048+01 |  31223069 | 1975-10-21 03:30:13.536+01
  2 | 119009254 | 2019-05-01 14:40:19.5168+02 | 390536066 | 1974-02-18 16:12:31.7952+01
  3 | 153965899 | 2010-04-26 00:36:46.3968+02 | 109395281 | 1985-10-09 01:30:36.0288+01
  4 | 123106154 | 2006-06-28 18:59:21.0624+02 | 399537003 | 1982-12-08 21:13:32.5056+01
  5 | 338157258 | 2006-11-04 07:21:34.7328+01 | 487378393 | 1975-02-14 05:59:28.7232+01
  6 | 108837322 | 2006-10-10 04:07:35.2704+02 |  53539283 | 1987-05-17 00:59:49.5744+02
  7 | 434671405 | 2011-04-09 00:21:43.4304+02 | 374841058 | 1980-05-13 17:12:37.1808+02
  8 | 407587896 | 2013-08-02 15:26:41.3376+02 | 180180561 | 1985-01-06 04:22:03.6768+01
  9 | 450852732 | 2008-10-27 18:14:00.4992+01 |  81128068 | 1975-06-26 01:55:29.8848+01
 10 | 306987401 | 2013-12-17 08:39:19.1232+01 |  28668776 | 1970-08-01 15:43:29.0208+01
(10 ROWS)

Now, let's add some indexes:

=$ CREATE INDEX i1 ON test (some_int);
CREATE INDEX
 
=$ CREATE INDEX i2 ON test (some_timestamp);
CREATE INDEX
 
=$ CREATE INDEX i3 ON test (other_int);
CREATE INDEX
 
=$ CREATE INDEX i4 ON test (other_timestamp);
CREATE INDEX
 
=$ CREATE INDEX i5 ON test (some_int, some_timestamp);
CREATE INDEX
 
=$ CREATE INDEX i6 ON test (other_int, other_timestamp);
CREATE INDEX

So, I have 7 indexes, and the table itself:

=$ SELECT c.relname, c.relkind, pg_size_pretty(pg_relation_size(c.oid))
    FROM pg_class c
    WHERE c.relname = 'test' OR
        c.oid IN ( SELECT i.indexrelid FROM pg_index i WHERE i.indrelid = 'test'::regclass );
  relname  | relkind | pg_size_pretty
-----------+---------+----------------
 test      | r       | 5744 MB
 test_pkey | i       | 2142 MB
 i1        | i       | 2142 MB
 i2        | i       | 2142 MB
 i3        | i       | 2142 MB
 i4        | i       | 2142 MB
 i5        | i       | 3004 MB
 i6        | i       | 3004 MB
(8 ROWS)

With this in place, and disabled autovacuum I ran:

=$ DELETE FROM test WHERE random() < 0.5;

So that indexes (and table) will have some cleanup to be done.

Then I ran vacuum without any parallelization:

=$ vacuum (verbose ON, analyze ON, parallel 0) test;

Afterwards I recreated the whole thing, and ran:

=$ SET max_parallel_maintenance_workers = 2;
=$ vacuum (verbose ON, analyze ON) test;

and finally, after another recreate:

=$ SET max_parallel_maintenance_workers = 8;
=$ vacuum (verbose ON, analyze ON) test;

Logs showed, for sequential vacuum:

=$ vacuum (verbose ON, analyze ON, parallel 0) test;
psql:test1.sql:2: INFO:  vacuuming "public.test"
...
CPU: USER: 663.45 s, system: 87.05 s, elapsed: 1505.04 s.
psql:test1.sql:2: INFO:  analyzing "public.test"
psql:test1.sql:2: INFO:  "test": scanned 30000 OF 735295 pages, containing 2040581 live ROWS AND 0 dead ROWS; 30000 ROWS IN sample, 50014300 estimated total ROWS
VACUUM
TIME: 1505238.738 ms (25:05.239)

for vacuum with 2 workers:

=$ SET max_parallel_maintenance_workers = 2;
SET
 
=$ vacuum (verbose ON, analyze ON) test;
psql:test2.sql:3: INFO:  vacuuming "public.test"
psql:test2.sql:3: INFO:  launched 2 parallel vacuum workers FOR INDEX vacuuming (planned: 2)
...
CPU: USER: 119.29 s, system: 43.63 s, elapsed: 694.13 s.
psql:test2.sql:3: INFO:  analyzing "public.test"
psql:test2.sql:3: INFO:  "test": scanned 30000 OF 735295 pages, containing 2039828 live ROWS AND 0 dead ROWS; 30000 ROWS IN sample, 49995844 estimated total ROWS
VACUUM
TIME: 694336.035 ms (11:34.336)

and when I picked 8 workers:

=$ SET max_parallel_maintenance_workers = 8;
SET
 
=$ vacuum (verbose ON, analyze ON) test;
psql:test3.sql:3: INFO:  vacuuming "public.test"
psql:test3.sql:3: INFO:  launched 6 parallel vacuum workers FOR INDEX vacuuming (planned: 6)
CPU: USER: 134.24 s, system: 51.37 s, elapsed: 776.12 s.
psql:test3.sql:3: INFO:  analyzing "public.test"
psql:test3.sql:3: INFO:  "test": scanned 30000 OF 735295 pages, containing 2040985 live ROWS AND 0 dead ROWS; 30000 ROWS IN sample, 50024202 estimated total ROWS
VACUUM
TIME: 776326.118 ms (12:56.326)

It looks that it would need a bit more tuning to run with more workers, but in my test case – I have only 7 indexes, so there is not going to be big gain.

Also – speed seemed to be best with 2 workers, but this could be related to some side load on the server. In any way – vacuum is much faster when handling indexes in parallel.

Great stuff, thanks a lot to all involved.