Tuesday, July 18, 2017

PostgreSQL Index bloat under a microscope

I've posted a snippet query to the PostgreSQL Wiki that "summarizes the keyspace" of a target B-Tree index. This means that it displays which range of indexed values belong on each page, starting from the root. It requires pageinspect. The query recursively performs a breadth-first search. Along the way, it also displays information about the space utilization of each page, and the number of distinct key values that actually exist on the page, allowing you to get a sense of how densely filled each page is relative to what might be expected.

The query is available from:

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index


If I use the query against the largest index that results from initializing a pgbench database at scale factor 10 (pgbench_accounts_pkey), the query takes about 3 seconds to execute on my laptop, and returns the following:

       
 level | l_item | blkno | btpo_flags | type | live_items | dead_items | avg_item_size | page_size | free_size | distinct_real_item_keys | highkey | distinct_block_pointers 
-------+--------+-------+------------+------+------------+------------+---------------+-----------+-----------+-------------------------+---------+-------------------------
     2 |      1 |   290 |          2 | r    |         10 |          0 |            15 |      8192 |      7956 |                      10 |         |                      10
     1 |      1 |     3 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  103945 |                     284
     1 |      2 |   289 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  207889 |                     284
     1 |      3 |   575 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  311833 |                     284
     1 |      4 |   860 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  415777 |                     284
     1 |      5 |  1145 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  519721 |                     284
     1 |      6 |  1430 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  623665 |                     284
     1 |      7 |  1715 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  727609 |                     284
     1 |      8 |  2000 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  831553 |                     284
     1 |      9 |  2285 |          0 | i    |        285 |          0 |            15 |      8192 |      2456 |                     284 |  935497 |                     284
     1 |     10 |  2570 |          0 | i    |        177 |          0 |            15 |      8192 |      4616 |                     177 |         |                     177
     0 |      1 |     1 |          1 | l    |        367 |          0 |            16 |      8192 |       808 |                     366 |     367 |                       6
     0 |      2 |     2 |          1 | l    |        367 |          0 |            16 |      8192 |       808 |                     366 |     733 |                       6
     0 |      3 |     4 |          1 | l    |        367 |          0 |            16 |      8192 |       808 |                     366 |    1099 |                       6
...
     0 |   2730 |  2741 |          1 | l    |        367 |          0 |            16 |      8192 |       808 |                     366 |  999181 |                       6
     0 |   2731 |  2742 |          1 | l    |        367 |          0 |            16 |      8192 |       808 |                     366 |  999547 |                       6
     0 |   2732 |  2743 |          1 | l    |        367 |          0 |            16 |      8192 |       808 |                     366 |  999913 |                       6
     0 |   2733 |  2744 |          1 | l    |         88 |          0 |            16 |      8192 |      6388 |                      88 |         |                       2
(2744 rows)
       
 

Note that I changed the query to use int4_from_page_data() here, so that the split points/high key values are displayed as ordinary int4 output. The items are in logical order (the int4 sort order the index uses).

A few interesting things are displayed here, or can be inferred from what is displayed:
  • There are 3 levels - one root level, an additional level of internal pages, and the leaf level, which is always level 0. (Note that most leaf pages are omitted for brevity.)
B-Trees are almost always rather short, and in general tend to look a lot more like a bush than a tree when they are greater than a few pages in size. We see that here.
  • Leaf pages point to 6 distinct table blocks in all cases, with the sole exception of the rightmost page (at block 2744). Since the index is 21MB and the table is 128MB, the ratio of index size to table size is just over 6:1. They match.
The size of the 11 internal index pages is negligible. The width of both (leaf) index tuples and heap (table) tuples happens to be completely uniform with any pgbench table. The ratio of the width of every index tuple to any heap tuple almost exactly matches the ratio of the overall size of the index to the size of the table it indexes, which almost exactly matches the number of heap tuples pointed to from within each leaf page. The logical/physical correlation between index and table must be very close to 1.0. This is good for B-Tree index scans that read through many leaf pages.
  • There are no duplicates within pages.
The lack of any duplicates is evidenced by the number of live items (not including the high key) exactly matching the number of distinct real items. They're almost always both 366 here (once again, the rightmost page is the only exception). Note that there is usually one more "live item" than real item (item with a real table pointer). The page high key is counted as a live item by pageinspect, but is not counted as a "real item value" by my query, since it's just metadata. This is a little confusing, but does make sense when considered in the broader context of how PostgreSQL B-Trees work.

Even though this is a unique index, it might still have physical duplicates at some point in the future. Not right now, though.

Production issues

I'm not aware that anyone has used a query like this to debug tricky production performance problems with index bloat before now. It could certainly help with that. Bloat can sometimes be localized to one part of an index, and feedback from this query could allow someone to tie it back to a problem in application code.

The query might also help users provide information on production performance issues to a mailing list like pgsql-performance or pgsql-hackers, for the perusal of hackers like myself. It might be possible to use this kind of feedback to improve how VACUUM and related mechanisms handle index bloat. The query could take a while to execute for larger indexes. It probably wouldn't be unreasonable to run in production at an off-peak time with a moderately large index.

No comments:

Post a Comment