#379 — October 28, 2020

Web Version

Postgres Weekly

▶  How a DBA Used Postgres to Discover 40K Voters Purged by Mistake — A long-time database administrator was running queries on voter data to learn how to use Postgres and stumbled over a significant voter purge error. (The part mentioning Postgres is at around 4 minutes in.)

VICE (on YouTube)

Improving Postgres Connection Scalability: Snapshots — Following on from his look at connection scalability, Andres has moved on to explaining the improvements he’s making in Postgres 14 regarding snapshots, complete with a look at before and after performance.

Andres Freund

Best-Practices on How to Speed Up Your Postgres Queries. Free eBook — We share our learnings from helping companies like Atlassian, Robinhood, and others speed up their queries.

pganalyze sponsor

Slonik: A Sophisticated Node.js Postgres Client Library — A battle tested framework that abstracts repeating code patterns, protects against unsafe behavior, and provides a rich debugging experience. If you’re working with Postgres from Node at all, this deserves a look.

Gajus Kuizinas

Using Postgres to Shape and Prepare Scientific Data — Rather than lean on Google Sheets or a quick Python script (which are both fine!) Steve wanted to use just Postgres for a data modelling task and here’s what happened.

Steve Pousty

Using SQL to Randomly Sample Data — A follow-up to the data modelling post (above) that moves on to sampling the data.

Steve Pousty

Detecting Gaps in Time-Series Data in Postgres — A creative solution to the problem worth looking at even if you don’t have this problem (yet).

David Christensen

Automating Postgres Extension Releases on GitHub and PGXN — I’d be intrigued to know how many folks actually write their own extensions, but if you do, there’s some useful knowledge here on automating the release of extensions on both GitHub and PGXN by way of GitHub Actions.

David E. Wheeler

Keep Your Data In-House with Scalefield - Your Private Postgres Cloud — Scalefield allows you to easily host PostgreSQL clusters within your own cloud – fully flexible, scalable and reliable.

CYBERTEC sponsor

Foreign Keys and Insertion Order in SQL
Hans-Jürgen Schönig

Joins Using `LIKE` or Why PostgreSQL FTS is a Powerful Alternative
Vadim Yatsenko

▶  An Extensive Webinar on Postgres Partitioning
2ndQuadrant

supported by Cockroach Labs

💡 Tip of the Week

Using partial indexes to improve performance

A partial index, as the name implies, indexes only a subset of rows in a table. Partial indexes can improve performance compared to full indexes, because queries that use partial indexes don’t need to scan as many rows.

You can see the performance difference by running an EXPLAIN statement with both a full index and a partial index and comparing the scanned row count.

Suppose we have an orders table, with 1,000,000 rows of data:

CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT,
  status TEXT
);

INSERT INTO orders (id, customer_id, status)

SELECT
  i,
  (random()*10000)::INT,
  CASE (random() * 2)::int
    WHEN 0 THEN 'pending'
    WHEN 1 THEN 'shipped'
    WHEN 2 THEN 'completed'
  END
    FROM generate_series(1, 1000000) i;

First we add a secondary index on the customer_id column:

CREATE INDEX full_idx ON orders (customer_id);

Assume that incomplete orders of specific customers are frequently queried. A simple EXPLAIN ANALYZE with an example SELECT query shows that 108 rows are scanned in full_idx, with 36 being removed by the filter afterward:

EXPLAIN SELECT * FROM orders where customer_id = 1001 AND status != 'completed';

            QUERY PLAN
------------------------------------
 Bitmap Heap Scan on orders
  (actual time=0.041..0.116 rows=72 loops=1)
   Recheck Cond: (customer_id = 1001)
   Filter: (status <> 'completed'::text)
   Rows Removed by Filter: 36
   Heap Blocks: exact=108
   ->  Bitmap Index Scan on full_idx
       (actual time=0.026..0.026 rows=108 loops=1)
  Index Cond: (customer_id = 1001)

By creating a partial index that indexes only incomplete orders, we can reduce the number of rows scanned. Running the same EXPLAIN ANALYZE as above shows only the 72 rows that satisfy the query are scanned in partial_idx:

CREATE INDEX partial ON orders (customer_id)
  WHERE status != 'completed';
 
EXPLAIN SELECT * FROM orders WHERE
  customer_id = 1001 AND status != 'completed';

            QUERY PLAN
------------------------------------
 Bitmap Heap Scan on orders
  (actual time=0.043..0.100 rows=72 loops=1)
   Recheck Cond: ((customer_id = 1001) AND
                  (status <> 'completed'::text))
   Heap Blocks: exact=72
   ->  Bitmap Index Scan on partial_idx
       (actual time=0.034..0.034 rows=72 loops=1)
  Index Cond: (customer_id = 1001)

This week’s tip is sponsored by Cockroach Labs, the creators of CockroachDB. Partial indexes will be available soon in Version 20.2. Learn about CockroachDB in this demo: “Distributed SQL: a modern, cloud-native PostgreSQL.”