What’s new in the Citus 9.4 extension to Postgres

Written by Marco Slot
September 5, 2020

Our latest release to the Citus extension to Postgres is Citus 9.4. If you’re not yet familiar, Citus transforms Postgres into a distributed database, distributing your data and your SQL queries across multiple nodes. This post is basically the Citus 9.4 release notes.

If you’re ready to get started with Citus, it’s easy to download Citus open source packages for 9.4.

I always recommend people check out docs.citusdata.com to learn more. The Citus documentation has rigorous tutorials, details on every Citus feature, explanations of key concepts—things like choosing the distribution column—tutorials on how you can set up Citus locally on a single server, how to install Citus on multiple servers, how to build a real-time analytics dashboard, how to build a multi-tenant database, and more...

A more advanced resource for you is the actual source code which you can find on the Citus open source repo, so you can see what we’re doing and what’s going to be in future releases of the Citus extension to Postgres.

What’s new in Citus 9.4

Citus 9.4 open source packages and docs have already rolled out, so it seemed like a good time to walk through everything new and notable in Citus 9.4. And if you want to catch up on other recent releases, take a peek at the release notes for Citus 9.3 as well as all the HTAP performance goodness in Citus 9.2.

Before diving into what’s new in Citus 9.4, here’s an overview of the new capabilities:

  • EXPLAIN ANALYZE improvements 
  • INSERT INTO local SELECT .. FROM distributed 
  • Joins between local tables and CTEs 
  • COPY connection management 
  • Calculating percentiles at scale with t-digest 
  • Modifying CTE pushdown in router queries
  • Optional support for binary protocol
  • Citus downgrade scripts

Understanding performance with EXPLAIN ANALYZE

The Postgres EXPLAIN ANALYZE command is essential for you to understand your query performance. In Citus 9.4, we improved how EXPLAIN ANALYZE works with Citus to give you more insights about your distributed Citus queries.

If you peek under the covers into the internals of Citus (Citus is open source after all), you’ll see that the Citus EXPLAIN ANALYZE command used to run the distributed Citus query twice. Once to get the results from the workers to build the EXPLAIN plan on the Citus coordinator and then again to get the EXPLAIN plans from the Citus worker nodes.

This two-step approach could give misleading results for workloads with frequent cache misses, since the EXPLAIN was always a cache hit.

In Citus 9.4, we combined the two steps into one to give you more accurate EXPLAIN ANALYZE results. In addition, when explaining all the tasks of a distributed Citus query, we now create the EXPLAIN plans in parallel across the cluster, which gives you EXPLAIN ANALYZE results a lot faster. The new approach also enables EXPLAIN ANALYZE for inserts and other DML commands.

We also added additional information that is relevant to Citus, including:

  • EXPLAIN now shows the number of bytes transferred over the network
  • EXPLAIN can now show the Postgres queries that are sent to and received from the worker nodes

The EXPLAIN ANALYZE improvements in Citus 9.4 make it much easier for you to understand and tune your Postgres query performance in your Citus database cluster. An example of getting an average over a large table. You can see that Citus only transfers 44 bytes over the network because it only gets the sum and the count from each shard:

EXPLAIN (ANALYZE, VERBOSE) SELECT avg(rating) FROM item_ratings;
                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=500.00..500.02 rows=1 width=8) (actual time=95.942..95.942 rows=1 loops=1)
   Output: (sum(remote_scan.avg) / (pg_catalog.sum(remote_scan.avg_1))::double precision)
   ->  Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=16) (actual time=95.917..95.918 rows=2 loops=1)
     Output: remote_scan.avg, remote_scan.avg_1
     Task Count: 2
     Tuple data received from nodes: 44 bytes
     Tasks Shown: All
     ->  Task
       Query: SELECT sum(rating) AS avg, count(rating) AS avg FROM item_ratings_102077 item_ratings WHERE true
       Tuple data received from node: 22 bytes
       Node: host=localhost port=9700 dbname=postgres
       ->  Aggregate  (cost=10872.68..10872.69 rows=1 width=16) (actual time=92.390..92.391 rows=1 loops=1)
         Output: sum(rating), count(rating)
         ->  Seq Scan on public.item_ratings_102077 item_ratings  (cost=0.00..8141.12 rows=546312 width=8) (actual time=0.120..53.408 rows=495406 loops=1)
           Output: user_id, rating
       Planning Time: 0.051 ms
       Execution Time: 92.424 ms
     ->  Task
       Query: SELECT sum(rating) AS avg, count(rating) AS avg FROM item_ratings_102078 item_ratings WHERE true
       Tuple data received from node: 22 bytes
       Node: host=localhost port=9701 dbname=postgres
       ->  Aggregate  (cost=11075.68..11075.69 rows=1 width=16) (actual time=93.468..93.468 rows=1 loops=1)
         Output: sum(rating), count(rating)
         ->  Seq Scan on public.item_ratings_102078 item_ratings  (cost=0.00..8293.12 rows=556512 width=8) (actual time=0.053..53.639 rows=504594 loops=1)
           Output: user_id, rating
       Planning Time: 0.090 ms
       Execution Time: 93.500 ms
 Planning Time: 0.370 ms
 Execution Time: 96.191 ms
(29 rows)

Making sure your app can leverage both local & distributed Citus tables

INSERT INTO local SELECT .. FROM distributed:

Inserting or upserting the result of a query into a table is a powerful data transformation mechanism in PostgreSQL and in Citus. As of Citus 9.4, you can now insert or upsert the result of a SQL query on a distributed table directly into a local table. An important use case this enables is to maintain small rollup tables that are placed on the coordinator for low latency, but it also simplifies other workflows. For instance, you can now write the result of a query on a distributed table directly to a temporary table, which is commonly needed for more complex analytics. With this feature, Citus now supports INSERT..SELECT for any combination of tables.

Joins between local tables and CTEs:

JOINs between local tables on the Citus coordinator and Common Table Expressions (CTEs) that query distributed Citus tables are now supported. That makes it easier to use hybrid data models that use a mixture of distributed, reference, and local tables.

Making sure you don’t run out of Postgres connections with COPY connection management

When running many concurrent COPY commands in the past, Citus sometimes overwhelmed worker nodes with a large number of Postgres connections.

As of Citus 9.4, you can now use the citus.max_adaptive_executor_pool_size setting to reduce the number of Postgres connections made by each individual COPY command—so you can support much higher concurrency. In the next release (Citus 9.5), we will make the connection management for COPY fully automatic.

Calculating percentiles at scale with t-digest

Aggregate functions from the t-digest extension to Postgres can now be distributed by Citus, which enables very fast, parallel percentile calculations on Citus and satisfies an important need for some of our Citus open source users. Props to Postgres committer Tomas Vondra for creating the open source t-digest extension and being so helpful in making t-digest work with Citus. The open source t-digest extension is now supported on the Citus managed service on Azure, too—I’m told this docs page about Hyperscale (Citus) extension support for the Citus managed service on Azure will soon be updated to include t-digest as well. Update in October 2022: Hyperscale (Citus) is now known as Azure Cosmos DB for PostgreSQL.

Are percentile approximations useful? Yes. One of my teammates, Nils Dijk, is writing a blog post about how one of our customers improved Postgres percentile calculations by 45x with t-digest and Hyperscale (Citus). And Matt Watson from the Stackify team just wrote about how to use Postgres, t-digest, and Citus to create SQL percentile rollups at scale. Also, the Windows Data and Intelligence team—who manage a petabyte-scale analytics dashboard using Citus on Microsoft Azure—are using t-digest with Citus, too.

Performance improvements

Modifying CTE pushdown in router queries:

Modifying CTEs can be used to modify a table and then do an additional query using the modified rows. For instance, we may want to delete completed TODO items, but also lower a counter:

WITH deleted_todo AS (
  DELETE FROM todo WHERE user_id = 423 AND isdone RETURNING *
)
UPDATE users
SET remaining_todos = remainining_todos - (SELECT count(*) FROM deleted_todo)
WHERE user_id = 423;

When a query with modifying CTE has filters on the distribution column (e.g. the user ID or tenant_id), it can now be fully delegated to the worker in a single network round trip, which means these queries now run with much lower latency and higher throughput.

Optional support for binary protocol: 

In Citus 9.4, you can enable the binary PostgreSQL protocol for worker-coordinator communication.

Normally, values are sent over the wire as text, which can be very efficient for some data. Consider that a bigint with the value 9 only requires 1 byte in text, but 8 bytes in binary. On the other hand, there are also data types such as timestamps where the text format is way bigger than the binary format. For instance, PostgreSQL uses 8 bytes to store a timestamptz, but the text form (e.g. 2020-09-04 09:58:36.788916+02) is a whopping 29 bytes. When queries return millions of rows, that can be a lot of extra network traffic.

By enabling the citus.enable_binary_protocol, you can perform queries using the binary protocol, which can reduce network traffic and speed up queries for data types whose binary representation is much more compact than their text representation (e.g. timestamp, float, tdigest). The binary protocol is disabled by default because it can increase the network traffic in some cases (e.g. small integers) but will help performance especially if your queries return a lot of timestamps, floating point numbers, or other data types whose text representations is very long.

More controls for operating your distributed Citus cluster

Citus downgrade scripts: 

Previously, the Citus extension could only ever be upgraded to a new version, because the Citus upgrade script makes schema changes and those cannot be reverted. As of Citus 9.4, we have now committed to maintaining a downgrade script for each upgrade script. That means that if you upgrade and for some reason you need to revert back, you would be able to install an older version of Citus and revert the schema changes by running ALTER EXTENSION.

It’s all about performance

Citus users have some of the most challenging data problems around. It’s not just that Citus users have a lot of data. It’s that Citus users typically need their Postgres queries to be just as fast with a lot of data as the queries are with a tiny bit of data—in order to make sure their application continues to work smoothly even while growing rapidly.

To achieve these levels of performance, you need to get the most out of your hardware, and that’s why the 9.4 release is packed with better performance insights, performance improvements, and new performance tuning possibilities.

Let us know what you think of the new 9.4 release on the Citus Slack or if you find any bug or performance issue please report it on GitHub. Your feedback will help make Citus even better.

Marco Slot

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus