Waiting for PostgreSQL 16 – Add pg_stat_io view, providing more detailed IO statistics

On 11st of February 2023, Andres Freund committed patch:

Add pg_stat_io view, providing more detailed IO statistics
 
Builds on 28e626bde00 and f30d62c2fc6. See the former for motivation.
 
Rows of the view show IO operations for a particular backend type, IO target
object, IO context combination (e.g. a client backend's operations on
permanent relations in shared buffers) and each column in the view is the
total number of IO Operations done (e.g. writes). So a cell in the view would
be, for example, the number of blocks of relation data written from shared
buffers by client backends since the last stats reset.
 
In anticipation of tracking WAL IO and non-block-oriented IO (such as
temporary file IO), the "op_bytes" column specifies the unit of the "reads",
"writes", and "extends" columns for a given row.
 
Rows for combinations of IO operation, backend type, target object and context
that never occur, are ommitted entirely. For example, checkpointer will never
operate on temporary relations.
 
Similarly, if an IO operation never occurs for such a combination, the IO
operation's cell will be null, to distinguish from 0 observed IO
operations. For example, bgwriter should not perform reads.
 
Note that some of the cells in the view are redundant with fields in
pg_stat_bgwriter (e.g. buffers_backend). For now, these have been kept for
backwards compatibility.
 
Bumps catversion.
 
Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Samay Sharma <smilingsamay@gmail.com>
Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de

I missed this post when it happened, but thanks to post by Lukas Fittl, I got 2nd chance to write about it.

Lukas described in depth certain aspects of the stats. Me, I prefer the everyday dba use.

Let's see how it looks.

When I just make new database, pg_stat_io looks like this:

$ SELECT * FROM pg_stat_io ;
    backend_type     │   io_object   │ io_context │  reads  │ writes  │ extends │ op_bytes │ evictions │ reuses  │ fsyncs │          stats_reset
─────────────────────┼───────────────┼────────────┼─────────┼─────────┼─────────┼──────────┼───────────┼─────────┼────────┼───────────────────────────────
 autovacuum launcher │ relation      │ bulkread   │       00[NULL]819200[NULL]2023-02-27 13:25:39.725072+01
 autovacuum launcher │ relation      │ normal     │       31[NULL]81922[NULL]02023-02-27 13:25:39.725072+01
 autovacuum worker   │ relation      │ bulkread   │       00[NULL]819200[NULL]2023-02-27 13:25:39.725072+01
 autovacuum worker   │ relation      │ normal     │  1852196565538192169395[NULL]02023-02-27 13:25:39.725072+01
 autovacuum worker   │ relation      │ vacuum     │ 13042871127571081923961303766[NULL]2023-02-27 13:25:39.725072+01
 client backend      │ relation      │ bulkread   │  273190270239[NULL]81921615270692[NULL]2023-02-27 13:25:39.725072+01
 client backend      │ relation      │ bulkwrite  │    70916650366884388192134024555022[NULL]2023-02-27 13:25:39.725072+01
 client backend      │ relation      │ normal     │  65737149272346446981921112974[NULL]02023-02-27 13:25:39.725072+01
 client backend      │ relation      │ vacuum     │       000819200[NULL]2023-02-27 13:25:39.725072+01
 client backend      │ temp relation │ normal     │       00081920[NULL][NULL]2023-02-27 13:25:39.725072+01
 background worker   │ relation      │ bulkread   │  268221268189[NULL]81920268189[NULL]2023-02-27 13:25:39.725072+01
 background worker   │ relation      │ bulkwrite  │       000819200[NULL]2023-02-27 13:25:39.725072+01
 background worker   │ relation      │ normal     │  2536134641908192253613[NULL]02023-02-27 13:25:39.725072+01
 background worker   │ relation      │ vacuum     │       000819200[NULL]2023-02-27 13:25:39.725072+01
 background worker   │ temp relation │ normal     │       00081920[NULL][NULL]2023-02-27 13:25:39.725072+01
 background writer   │ relation      │ normal     │  [NULL]38628[NULL]8192[NULL][NULL]02023-02-27 13:25:39.725072+01
 checkpointer        │ relation      │ normal     │  [NULL]32121[NULL]8192[NULL][NULL]33562023-02-27 13:25:39.725072+01
 standalone backend  │ relation      │ bulkread   │       00[NULL]819200[NULL]2023-02-27 13:25:39.725072+01
 standalone backend  │ relation      │ bulkwrite  │       008819200[NULL]2023-02-27 13:25:39.725072+01
 standalone backend  │ relation      │ normal     │     68997746481920[NULL]02023-02-27 13:25:39.725072+01
 standalone backend  │ relation      │ vacuum     │      1000819200[NULL]2023-02-27 13:25:39.725072+01
 startup             │ relation      │ bulkread   │       00[NULL]819200[NULL]2023-02-27 13:25:39.725072+01
 startup             │ relation      │ bulkwrite  │       000819200[NULL]2023-02-27 13:25:39.725072+01
 startup             │ relation      │ normal     │       00081920[NULL]02023-02-27 13:25:39.725072+01
 startup             │ relation      │ vacuum     │       000819200[NULL]2023-02-27 13:25:39.725072+01
 walsender           │ relation      │ bulkread   │       00[NULL]819200[NULL]2023-02-27 13:25:39.725072+01
 walsender           │ relation      │ bulkwrite  │       000819200[NULL]2023-02-27 13:25:39.725072+01
 walsender           │ relation      │ normal     │       00081920[NULL]02023-02-27 13:25:39.725072+01
 walsender           │ relation      │ vacuum     │       000819200[NULL]2023-02-27 13:25:39.725072+01
 walsender           │ temp relation │ normal     │       00081920[NULL][NULL]2023-02-27 13:25:39.725072+01
(30 ROWS)

This is strangly much. SO much read/writes? Well, the stats in here are global. So it doesn't matter what DB we're checking, we can see stats on IO. That is interesting.

Let's create pgbench database, not big:

$ pgbench -s 1000 -i bench
dropping old tables...
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
creating tables...
generating data (client-side)...
100000000 of 100000000 tuples (100%) done (elapsed 67.19 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 87.74 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 67.86 s, vacuum 0.20 s, primary keys 19.68 s).

Immediately afterwards pg_stat_io looks very different:

    backend_type     |   io_object   | io_context |  reads  | writes  | extends | op_bytes | evictions | reuses  | fsyncs |          stats_reset
---------------------+---------------+------------+---------+---------+---------+----------+-----------+---------+--------+-------------------------------
 autovacuum launcher | relation      | bulkread   |       0 |       0 |         |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 autovacuum launcher | relation      | normal     |       3 |       1 |         |     8192 |         2 |         |      0 | 2023-02-27 13:25:39.725072+01
 autovacuum worker   | relation      | bulkread   |       0 |       0 |         |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 autovacuum worker   | relation      | normal     |  185219 |     656 |     553 |     8192 |    169395 |         |      0 | 2023-02-27 13:25:39.725072+01
 autovacuum worker   | relation      | vacuum     | 1304287 | 1127571 |       0 |     8192 |       396 | 1303766 |        | 2023-02-27 13:25:39.725072+01
 client backend      | relation      | bulkread   | 1086086 |  270239 |         |     8192 |      1647 | 1083556 |        | 2023-02-27 13:25:39.725072+01
 client backend      | relation      | bulkwrite  |    7091 | 2302311 | 2327783 |     8192 |    136076 | 2192315 |        | 2023-02-27 13:25:39.725072+01
 client backend      | relation      | normal     |  657835 |  492723 |  464530 |     8192 |   1113499 |         |      0 | 2023-02-27 13:25:39.725072+01
 client backend      | relation      | vacuum     |   29978 |       0 |       0 |     8192 |        32 |   29946 |        | 2023-02-27 13:25:39.725072+01
 client backend      | temp relation | normal     |       0 |       0 |       0 |     8192 |         0 |         |        | 2023-02-27 13:25:39.725072+01
 background worker   | relation      | bulkread   | 1092586 |  268189 |         |     8192 |        32 | 1092522 |        | 2023-02-27 13:25:39.725072+01
 background worker   | relation      | bulkwrite  |       0 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 background worker   | relation      | normal     |  253613 |   46419 |       0 |     8192 |    253613 |         |      0 | 2023-02-27 13:25:39.725072+01
 background worker   | relation      | vacuum     |       0 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 background worker   | temp relation | normal     |       0 |       0 |       0 |     8192 |         0 |         |        | 2023-02-27 13:25:39.725072+01
 background writer   | relation      | normal     |         |   38628 |         |     8192 |           |         |      0 | 2023-02-27 13:25:39.725072+01
 checkpointer        | relation      | normal     |         |   35418 |         |     8192 |           |         |   3713 | 2023-02-27 13:25:39.725072+01
 standalone backend  | relation      | bulkread   |       0 |       0 |         |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 standalone backend  | relation      | bulkwrite  |       0 |       0 |       8 |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 standalone backend  | relation      | normal     |     689 |     977 |     464 |     8192 |         0 |         |      0 | 2023-02-27 13:25:39.725072+01
 standalone backend  | relation      | vacuum     |      10 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 startup             | relation      | bulkread   |       0 |       0 |         |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 startup             | relation      | bulkwrite  |       0 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 startup             | relation      | normal     |       0 |       0 |       0 |     8192 |         0 |         |      0 | 2023-02-27 13:25:39.725072+01
 startup             | relation      | vacuum     |       0 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 walsender           | relation      | bulkread   |       0 |       0 |         |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 walsender           | relation      | bulkwrite  |       0 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 walsender           | relation      | normal     |       0 |       0 |       0 |     8192 |         0 |         |      0 | 2023-02-27 13:25:39.725072+01
 walsender           | relation      | vacuum     |       0 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-27 13:25:39.725072+01
 walsender           | temp relation | normal     |       0 |       0 |       0 |     8192 |         0 |         |        | 2023-02-27 13:25:39.725072+01
(30 ROWS)

We can see what is most responsible for IO by doing, for example:

$ SELECT * FROM pg_stat_io ORDER BY reads DESC NULLS LAST LIMIT 10;
    backend_type    │ io_object │ io_context │  reads  │ writes  │ extends │ op_bytes │ evictions │ reuses  │ fsyncs │          stats_reset
────────────────────┼───────────┼────────────┼─────────┼─────────┼─────────┼──────────┼───────────┼─────────┼────────┼───────────────────────────────
 autovacuum worker  │ relation  │ vacuum     │ 13042871127571081923961303766[NULL]2023-02-27 13:25:39.725072+01
 background worker  │ relation  │ bulkread   │ 1092586268189[NULL]8192321092522[NULL]2023-02-27 13:25:39.725072+01
 client backend     │ relation  │ bulkread   │ 1086086270239[NULL]819216471083556[NULL]2023-02-27 13:25:39.725072+01
 client backend     │ relation  │ normal     │  65783549272346453081921113499[NULL]02023-02-27 13:25:39.725072+01
 background worker  │ relation  │ normal     │  2536134641908192253613[NULL]02023-02-27 13:25:39.725072+01
 autovacuum worker  │ relation  │ normal     │  1852196565538192169395[NULL]02023-02-27 13:25:39.725072+01
 client backend     │ relation  │ vacuum     │   299780081923229946[NULL]2023-02-27 13:25:39.725072+01
 client backend     │ relation  │ bulkwrite  │    70912302311232778381921360762192315[NULL]2023-02-27 13:25:39.725072+01
 standalone backend │ relation  │ normal     │     68997746481920[NULL]02023-02-27 13:25:39.725072+01
 standalone backend │ relation  │ vacuum     │      1000819200[NULL]2023-02-27 13:25:39.725072+01
(10 ROWS)

or

$ SELECT * FROM pg_stat_io ORDER BY writes DESC NULLS LAST LIMIT 10;
    backend_type    │ io_object │ io_context │  reads  │ writes  │ extends │ op_bytes │ evictions │ reuses  │ fsyncs │          stats_reset
────────────────────┼───────────┼────────────┼─────────┼─────────┼─────────┼──────────┼───────────┼─────────┼────────┼───────────────────────────────
 client backend     │ relation  │ bulkwrite  │    70912302311232778381921360762192315[NULL]2023-02-27 13:25:39.725072+01
 autovacuum worker  │ relation  │ vacuum     │ 13042871127571081923961303766[NULL]2023-02-27 13:25:39.725072+01
 client backend     │ relation  │ normal     │  65783549272346453081921113499[NULL]02023-02-27 13:25:39.725072+01
 client backend     │ relation  │ bulkread   │ 1086086270239[NULL]819216471083556[NULL]2023-02-27 13:25:39.725072+01
 background worker  │ relation  │ bulkread   │ 1092586268189[NULL]8192321092522[NULL]2023-02-27 13:25:39.725072+01
 background worker  │ relation  │ normal     │  2536134641908192253613[NULL]02023-02-27 13:25:39.725072+01
 background writer  │ relation  │ normal     │  [NULL]38628[NULL]8192[NULL][NULL]02023-02-27 13:25:39.725072+01
 checkpointer       │ relation  │ normal     │  [NULL]35418[NULL]8192[NULL][NULL]37132023-02-27 13:25:39.725072+01
 standalone backend │ relation  │ normal     │     68997746481920[NULL]02023-02-27 13:25:39.725072+01
 autovacuum worker  │ relation  │ normal     │  1852196565538192169395[NULL]02023-02-27 13:25:39.725072+01
(10 ROWS)

While cool, it doesn't tell us which tables, or which backends are to blame, but at the very least we know when the IO is related to clients, autovacuum, checkpoints, or what.

Reads and writes columns are obvious. Extends means how many operations happened that caused some relation to grow. Since this was new db, then virtually all of them grew their tables, but in long running db there will be high(er) difference between writes and extends.

All of reads/writes/extends is in number of operations, and to get total size of data read/written/extended, it should be multipled by op_bytes. For example: 7091 value in reads means that ~ 55MB of data was read (7091 * 8192 bytes).

Evictions are related to situation where a block has to be written to disk, so that some space will be freed in shared_buffers for other user.

Reuses aren't all that really clear to me, the docs say: “The number of times an existing buffer in a size-limited ring buffer outside of shared buffers was reused as part of an I/O operation in the bulkread, bulkwrite, or vacuum io_contexts". Generally seems to be related only to bulk operations (and vacuum, which is kinda bulk too), so I would assume that this number in production databases without external imports/exports/dumps, shouldn't be high.

All in all it seems to be pretty interesting. Let's see what else we can do about it. First, I'll reset the stats:

$ SELECT pg_stat_reset_shared('io');
 pg_stat_reset_shared
──────────────────────
 
(1 ROW)

Now, let's see test pgbench run:

=$ pgbench -c 5 -T 15 bench
pgbench (16devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1000
query mode: simple
number of clients: 5
number of threads: 1
maximum number of tries: 1
duration: 15 s
number of transactions actually processed: 58702
number of failed transactions: 0 (0.000%)
latency average = 1.277 ms
initial connection time = 5.987 ms
tps = 3914.702930 (without initial connection time)

Afterwards, the stats (removing rows with no reads/writes):

=$ SELECT * FROM pg_stat_io WHERE reads > 0 OR writes > 0;
   backend_type    │ io_object │ io_context │ reads  │ writes │ extends │ op_bytes │ evictions │ reuses │ fsyncs │          stats_reset
───────────────────┼───────────┼────────────┼────────┼────────┼─────────┼──────────┼───────────┼────────┼────────┼───────────────────────────────
 autovacuum worker │ relation  │ normal     │    41710308192417[NULL]02023-02-27 16:36:37.394792+01
 autovacuum worker │ relation  │ vacuum     │    6663330819232634[NULL]2023-02-27 16:36:37.394792+01
 client backend    │ relation  │ normal     │ 12879811066513988192130079[NULL]02023-02-27 16:36:37.394792+01
 background writer │ relation  │ normal     │ [NULL]11957[NULL]8192[NULL][NULL]02023-02-27 16:36:37.394792+01
 checkpointer      │ relation  │ normal     │ [NULL]1666[NULL]8192[NULL][NULL]352023-02-27 16:36:37.394792+01
(5 ROWS)

I already wish I had it in my production servers (which are, obviously, not running on Pg 16 :).

Amazing stuff, thanks to all involved.