#395 — March 3, 2021

Web Version

👋 Good news, folks. The Tip of the Week is back. It's a bit of an unusual one too as it's more a mixture of showing off an embarassingly bad database design (of mine), how it led me into discovering a poor choice made by the query planner, and how I made the query a lot faster with a counterintuitive workaround.
__
Peter Cooper, your editor

Postgres Weekly

Designing High-Performance Time Series Data Tables on Amazon RDS — Storing time series data at scale is enough of a distinct database problem that things like TimescaleDB and Timestream exist, but some workloads suit vanilla Postgres just fine if you design things properly, as we see here.

Jim Mlodgenski and Andy Katz

Connection Queuing in pgBouncer: Is it a Magical Remedy? — Jobin refers to connection queuing (no, not pooling) as a ‘non-celebrity’ feature of pgBouncer but one that can address some of the problems related to connection management, particularly when absorbing spikes in load.

Jobin Augustine

Fast, Cheerful, Collaborative Project Management — Whether you are in a startup striving for product-market fit, or a large org that has strict ship dates to hit, Clubhouse helps engineering teams collaborate and succeed. Easily import from Jira or Trello.

Clubhouse.io sponsor

Amazon RDS Now Supports Postgres 13 — If you’re using RDS, you can now upgrade your database to Postgres 13, if you like.

Amazon Web Services

How Many Engineers Does It Take to Make Subscripting Work? — If you use JSONB columns a lot, you might, like me, be a bit tired of the arcane functions and syntax you sometimes have to use to set values.. good news is on the way for Postgres 14 (hopefully!) – this post not only shows off the potential but also how the feature came together and how new features in Postgres involve a lot of engineers generally.

Dmitry Dolgov

Estimating Connection Pool Size with New Postgres Statistics — How you can you use a variety of additions to pg_stat_database to get an upper limit for the correct size for a connection pool.

Laurenz Albe

Where Can Postgres Config Parameters Be Set? — Proving that it’s not just in postgresql.conf.

Hubert depesz Lubaczewski

Helm, GitOps and the Postgres Operator — How can you apply GitOps principles to running PostgreSQL on Kubernetes with Helm?

Jonathan S. Katz

🔧 Tools and Code

pspg 4.3.0 Released — pspg is a Unix pager for working with Postgres query results as well as CSV or TSV data. 4.3 now lets you select rows, columns, or blocks of data for export.

Pavel Stěhule

A RFC6238 TOTP Implementation in Pure PL/pgSQL — On some sites that use 2FA, you scan a QR code into an app like Google Authenticator or Authy and you then get a (usually) six digit code to enter to log in to the service. Those are “time-based one time passwords” (TOTPs) and the algorithm to calculate them is reasonably simple.

Dan Lynch

Build Internal Apps (Faster) on Top of Postgres — Build internal apps without the mundane, boring bits (wrestling with UI libraries or hacking together data sources & APIs).

Retool sponsor

EdgeDB 1.0 Beta 1 Unveiled — EdgeDB is a mildly stealthy database we’ve covered a few times in the past few years and which is very gradually approaching its 1.0 release. It’s built on top of Postgres and aims to offer the best parts of NoSQL but with a new set logic based query language aimed to address SQL’s shortcomings.

Łukasz Langa and Victor Petrovykh

Pgpool-II's Six Clustering Modes — Pgpool-II is a more versatile tool than you might remember it, with support for streaming replication, native replication, snapshot isolation, logical replication, Slony mode, and a ‘you’re on your own’ raw mode.

Bo Peng

supported by ALT

💡 Tip of the Week

Beware the query planner and random_page_cost

This week's tip is a weird one as it's based on an issue I ran into just today! We store all of the links included in any of our newsletters in a naive database, like so:

CREATE TABLE links (
  uid CHAR(60) PRIMARY KEY,
  data TEXT,
  timestamp INT
)
 
CREATE INDEX idx_trgm ON links USING GIN (data gin_trgm_ops)

Yes, this is a terrible design, but it's solely for internal use and was only meant to be a rough prototype of an idea. data is, rather stupidly, a TEXT column that contains JSON (I know, I know..) and we check for the presence of links in a similarly ghastly manner:

SELECT * FROM links WHERE data ILIKE '%whatever we want%' LIMIT 1;

It works quite well at low volume but the query times were occasionally pushing 300ms and I was intrigued why.

One EXPLAIN ANALYZE later, I got to see that Postgres wasn't using the GIN index at all and was doing a sequential scan over the whole table. Yet if I dropped the LIMIT 1, the query would use the index and only took 5ms! Why was Postgres ignoring the index?

Postgres's query planner isn't really based around people doing silly things like scanning an entire table with ILIKE but it is concerned whether index scans are likely to be faster or slower than a table scan. The variable random_page_cost is used to determine whether using an index is 'worth it' or not by defining a relative 'cost' to seq_page_cost.

In this situation doing an index scan is worth it, but the query planner disagreed due to the simple seeming LIMIT 1 query (after all, it can stop as soon as it finds a single result) and went ahead with its full scan anyway. Here's how I confirmed my hunch in psql:

SET random_page_cost = 1;
EXPLAIN ANALYZE SELECT * ... LIMIT 1;
    [see the index being used]
    
SET random_page_cost = DEFAULT;    
EXPLAIN ANALYZE SELECT * ... LIMIT 1;
    [see the index NOT being used]

Other than the obvious 'fix that terrible schema!' way forward, this little trick for seeing what Postgres would do if it felt an index scan had no 'cost' compared to sequentially scanning the table provided a fantastic way to confirm what was going on.

So if you end up with a query which unnaturally seems to be avoiding using your indexes in preference of a table scan, it's worth trying to see if the query planner is being diverted away from your indexes by the perceived cost of using them (and if so, why? Is it because the design is bad, like in our case?) And, as always, use EXPLAIN ANALYZE to see what's going on with any slow query!

This week’s tip is sponsored by Fauna. An operations-free, scalable and flexible alternative to Postgres.