git.postgresql.org Git - postgresql.git/commit

Improve performance of ORDER BY / DISTINCT aggregates

ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been
executed by always performing a sort in nodeAgg.c to sort the tuples in
the current group into the correct order before calling the transition
function on the sorted tuples.  This was not great as often there might be
an index that could have provided pre-sorted input and allowed the
transition functions to be called as the rows come in, rather than having
to store them in a tuplestore in order to sort them once all the tuples
for the group have arrived.

Here we change the planner so it requests a path with a sort order which
supports the most amount of ORDER BY / DISTINCT aggregate functions and
add new code to the executor to allow it to support the processing of
ORDER BY / DISTINCT aggregates where the tuples are already sorted in the
correct order.

Since there can be many ORDER BY / DISTINCT aggregates in any given query
level, it's very possible that we can't find an order that suits all of
these aggregates.  The sort order that the planner chooses is simply the
one that suits the most aggregate functions.  We take the most strictly
sorted variation of each order and see how many aggregate functions can
use that, then we try again with the order of the remaining aggregates to
see if another order would suit more aggregate functions.  For example:

SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...

would request the sort order to be {a, b} because {a} is a subset of the
sort order of {a,b}, but;

SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...

would just pick a plan ordered by {a} (we give precedence to aggregates
which are earlier in the targetlist).

SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ...

would choose to order by {b} since two aggregates suit that vs just one
that requires input ordered by {a}.

Author: David Rowley
Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane
Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com

author	David Rowley <drowley@postgresql.org>
	Tue, 2 Aug 2022 11:11:45 +0000 (23:11 +1200)
committer	David Rowley <drowley@postgresql.org>
	Tue, 2 Aug 2022 11:11:45 +0000 (23:11 +1200)
commit	1349d2790bf48a4de072931c722f39337e72055e
tree	3b525f30da6d37513522cdb5ea34ce14b653de87	tree
parent	a69959fab2f3633992b5cabec85acecbac6074c8	commit \| diff

contrib/postgres_fdw/expected/postgres_fdw.out		diff \| blob \| blame \| history
contrib/postgres_fdw/sql/postgres_fdw.sql		diff \| blob \| blame \| history
src/backend/executor/execExpr.c		diff \| blob \| blame \| history
src/backend/executor/execExprInterp.c		diff \| blob \| blame \| history
src/backend/executor/nodeAgg.c		diff \| blob \| blame \| history
src/backend/jit/llvm/llvmjit_expr.c		diff \| blob \| blame \| history
src/backend/jit/llvm/llvmjit_types.c		diff \| blob \| blame \| history
src/backend/optimizer/path/pathkeys.c		diff \| blob \| blame \| history
src/backend/optimizer/plan/planagg.c		diff \| blob \| blame \| history
src/backend/optimizer/plan/planner.c		diff \| blob \| blame \| history
src/backend/optimizer/prep/prepagg.c		diff \| blob \| blame \| history
src/backend/parser/parse_expr.c		diff \| blob \| blame \| history
src/backend/parser/parse_func.c		diff \| blob \| blame \| history
src/include/catalog/catversion.h		diff \| blob \| blame \| history
src/include/executor/execExpr.h		diff \| blob \| blame \| history
src/include/executor/nodeAgg.h		diff \| blob \| blame \| history
src/include/nodes/pathnodes.h		diff \| blob \| blame \| history
src/include/nodes/primnodes.h		diff \| blob \| blame \| history
src/include/optimizer/paths.h		diff \| blob \| blame \| history
src/test/regress/expected/aggregates.out		diff \| blob \| blame \| history
src/test/regress/expected/partition_aggregate.out		diff \| blob \| blame \| history
src/test/regress/expected/sqljson.out		diff \| blob \| blame \| history
src/test/regress/expected/tuplesort.out		diff \| blob \| blame \| history
src/test/regress/sql/aggregates.sql		diff \| blob \| blame \| history