#​405 — May 12, 2021

Web Version

Postgres Weekly

Skip Scan: A Way to Make DISTINCT Queries Up to 8000x Faster — First, it’s worth noting this is a TimescaleDB feature, but given Timescale is a Postgres extension, it’s an interesting development all the same. “Skip Scan” is a new feature that works around Postgres’s inability to efficiently pull a list of unique values from an ordered index. A similar feature is coming to Postgres 15 but TimescaleDB users can get it now and here’s how it works. Even if you have no plans to use TimescaleDB, it’s worth understanding the underlying concept here.

Sven Klemm and Ryan Booz (Timescale)

Using Postgres as a Data Warehouse — Postgres isn’t usually the first ‘go to’ for creating a data warehouse but with some tweaking and best practices it can work pretty well at certain scales if you don’t want to move data into something like Snowflake or BigQuery.

Cedric Dussud

Avoiding Integer Overflows with Zero Downtime — Learn how Buildkite migrated over 2 billion rows in one of our largest tables across three partitions covering 8.5TB, plus plenty more rows with foreign keys with the same data type. Read blog from one of our engineers now.

Buildkite sponsor

IN BRIEF:

Creating a Read-Only Postgres User — Learn how to create the quintessential read-only user in Postgres both now and with a look ahead to Postgres 14.

Jonathan S. Katz

Getting Random Rows Faster. Very Much Faster — It feels like the topic of grabbing a random row from a table has come up a lot over the years because there are numerous ways to do it, all with their own pros and cons. This approach uses TABLESAMPLE as a way to make it fast.

Magnus Hagander

On Clustering a Postgres Table — Bruce’s opening sold this for me: “Having written over 600 blog entries, I thought I would have already covered the complexities of the CLUSTER command, but it seems I have not, so let’s do that now.” 😄

Bruce Momjian

📺 May 19th Livestream: ALTER DATABASE <db> SURVIVE REGION FAILURE — A distributed systems engineer explains multi-region application architecture & how it’s not as complex as you think.

Cockroach Labs sponsor

▶  ETL Patterns with Postgres — It dates from 2018, but this talk on doing ETL and data warehousing work with Postgres has come strongly recommended. Practical, great slides, and clearly a lot of experience was distilled into this talk.

Dr. Martin Loetzsch

To WAL or Not to WAL? When Unlogged Becomes Logged...
Luca Ferrari

🔧 Tools and Code

windyquery: A Non-Blocking Python Postgres Query Builder — A non-blocking PostgreSQL query builder using Asyncio

windymile

PGSync 2.0: A Postgres to Elasticsearch Syncing Tool — Middleware for syncing Postgres to Elasticsearch so you can expose denormalized documents in Elasticsearch for querying.

Tolu Aina

💡 An Interesting Query

As seen in this tweet:

Don't forget (or did you know?): you can use Postgres generate_series to generate series of e.g. dates, useful to build reporting dashboards of all sorts via joins!

Example of a query actually used in France's national transport data system:

You can see a similar query here or a gist of the above query if you want to play with the idea.

Thanks to Thibaut for giving us permission to include this here.