Waiting for 9.5 – Add support for index-only scans in GiST.

On 26th of March, Heikki Linnakangas committed patch:

Add support for index-only scans in GiST.
 
This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.
 
This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).
 
Anastasia Lubennikova, with additional fixes and modifications by me.

After this commit there was also a bunch of others that add necessary logic so that index only gist scans can be used on other datatypes too, but since I'm just writing about new functionality, I figured I can use points:

$ CREATE TABLE test (id serial PRIMARY KEY, some_point point);
$ INSERT INTO test (some_point) SELECT point(random() * 5000, random() * 5000) FROM generate_series(1,100000) i;
$ CREATE INDEX tst_idx ON test USING gist (some_point);

And now, let's try to use index only scan to find 10 points nearest to given location:

$ EXPLAIN analyze SELECT some_point, some_point <-> point(500,500) FROM test ORDER BY some_point <-> point(500,500) LIMIT 10;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 LIMIT  (cost=0.28..1.09 ROWS=10 width=16) (actual TIME=0.169..0.284 ROWS=10 loops=1)
   ->  INDEX ONLY Scan USING tst_idx ON test  (cost=0.28..8136.28 ROWS=100000 width=16) (actual TIME=0.168..0.280 ROWS=10 loops=1)
         ORDER BY: (some_point <-> '(500,500)'::point)
         Heap Fetches: 10
 Planning TIME: 0.129 ms
 Execution TIME: 0.332 ms
(6 ROWS)

Of course, for sanity checking, the values:

$ SELECT some_point, some_point <-> point(500,500) FROM test ORDER BY some_point <-> point(500,500) LIMIT 10;
             some_point              |     ?COLUMN?     
-------------------------------------+------------------
 (496.12135393545,491.772019304335)  | 9.09634880720175
 (479.593751952052,492.762844078243) | 21.6515908244684
 (478.387083858252,508.600422181189) | 23.2612425688083
 (476.656502578408,504.556254018098) | 23.7839929900202
 (494.28480444476,526.699291076511)  | 27.3041316328299
 (474.246060475707,515.573592856526) | 30.0965478997474
 (490.935954730958,529.630510136485) |  30.985868514334
 (530.101526528597,485.602258704603) |  33.367601858105
 (466.621380764991,489.233953412622) | 35.0719258261832
 (526.196013670415,523.584464099258) | 35.2485188209342
(10 ROWS)

Of course, if I wanted to add id column, it would switch to normal index scan, because index doesn't contain values for id column:

$ EXPLAIN analyze SELECT id, some_point, some_point <-> point(500,500) FROM test ORDER BY some_point <-> point(500,500) LIMIT 10;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 LIMIT  (cost=0.28..1.09 ROWS=10 width=20) (actual TIME=0.128..0.205 ROWS=10 loops=1)
   ->  INDEX Scan USING tst_idx ON test  (cost=0.28..8136.28 ROWS=100000 width=20) (actual TIME=0.126..0.200 ROWS=10 loops=1)
         ORDER BY: (some_point <-> '(500,500)'::point)
 Planning TIME: 0.168 ms
 Execution TIME: 0.254 ms
(5 ROWS)

Which, of course, can be fixed:

$ DROP INDEX tst_idx ;
$ CREATE extension btree_gist ;
$ CREATE INDEX tst_idx ON test USING gist (some_point, id);

And now:

$ EXPLAIN analyze SELECT id, some_point, some_point <-> point(500,500) FROM test ORDER BY some_point <-> point(500,500) LIMIT 10;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 LIMIT  (cost=0.28..1.17 ROWS=10 width=20) (actual TIME=0.066..0.107 ROWS=10 loops=1)
   ->  INDEX ONLY Scan USING tst_idx ON test  (cost=0.28..8908.28 ROWS=100000 width=20) (actual TIME=0.065..0.106 ROWS=10 loops=1)
         ORDER BY: (some_point <-> '(500,500)'::point)
         Heap Fetches: 10
 Planning TIME: 0.114 ms
 Execution TIME: 0.130 ms
(6 ROWS)

Nice. That's definitely going to be useful. Thanks Anastasia and Heikki.

One thought on “Waiting for 9.5 – Add support for index-only scans in GiST.”

  1. Heikki, with some help from me, added index-only scan support for many of the btree_gist types and ranges and inet/cidr.

Comments are closed.