#300 — April 10, 2019

Read on the Web

Postgres Weekly

How To Improve The Performance of COUNT(*) — Using count(*) is often quite slow in Postgres. This article explores a variety of options to make counting rows faster using approximations and other tricks.

Laurenz Albe

When A Vulnerability is Not a Vulnerability — Sometimes intentional features can get recorded as ‘vulnerabilities’ for the record, such as with CVE-2019-9193 and Postgres’s COPY TO/FROM PROGRAM feature. If you administer a Postgres system at all, it’s worth knowing about in case it has abuse potential on your deployment.

Magnus Hagander

Real-Time Postgres Performance Monitoring — Collect out-of-the-box and custom PostgreSQL metrics and correlate them with data from your distributed infrastructure, applications, and logs. Gain real-time performance insights and set intelligent alerts with Datadog. Start a free trial.

Datadog sponsor

PGCon 2019 Registration Now Open — This year the popular Postgres conference is taking place in Ottawa, Canada across May 28-31 including two days of tutorials, two days of talks, and an unconference.

PGCon

pgmetrics 1.6.2: Collect and Display Stats from a Running Postgres Server — The homepage has a lot more info including example output.

RapidLoop

Controlling Postgres' Planner with Explicit JOIN Clauses — It’s straight from the docs, but this is a handy set of advice, particularly if, like me, you usually prefer the ANSI SQL 89 join style.

Postgres Documentation

Why hot_standby_feedback can be misleading — When setting up replication in Postgres, hot_standby_feedback is a feature that can present a bit more complication than you might expect.

Richard Yen

Continuous Replication From a Legacy Postgres Version to a Newer Version Using Slony — Native streaming replication in PostgreSQL works only between servers running the same major version, but Slony provides an application-level alternative.

Nickolay Ihalainen

dbdot: Generate DOT Descriptions for Postgres Schemas — Essentially another way to create diagrams of your database schema.

Aashish Karki

How CockroachDB Brought JSONB to Their Distributed Database — ...plus why they went with Postgres syntax.

Cockroach Labs sponsor

vipsql: A Vim Plugin for Interacting with 'psql' — Version 2.0 came out just a couple of weeks ago.

Martin Gammelsæter

supported by GitPrime

💡 Tip of the Week

Selecting a 'sample' of a table's rows

You have a table full of data and you want to select a random sample of that data. Using SELECT with an ordering of RANDOM() is one way to do it, but can have performance issues over large datasets. Thankfully, there's a better way.

First, let's create a table that contains 1000 rows — the numbers 1 to 1000:

CREATE TABLE numbers (number int);
INSERT INTO numbers (number)
  SELECT id FROM generate_series(1,1000) AS id;

Postgres 9.5 introduced a new TABLESAMPLE clause that lets you sample tables in different ways (2 ways by default, but more can be added via extensions). Let's give it a go at retrieving a random 0.5% of the rows from our table:

SELECT * FROM numbers TABLESAMPLE BERNOULLI (0.5);
 number
--------
    101
    213
    278
    433

The BERNOULLI here selects rows from the table with a 0.5% chance (so statistically we'd get 5 rows on average), but there's also a more efficient SYSTEM method that uses a cruder block-based technique (with the possible downside that it tends to return ranges/groups of rows within the table).

You can also use TABLESAMPLE with other SQL statements like UPDATE or DELETE if, say, you wanted to delete 30% of a table's rows for whatever reason.

And if you didn't know about generate_series(), consider that an extra tip for free :-)

This week's tip is sponsored by GitPrime. Get your copy of their new field guide '20 Patterns to Watch for in Engineering Teams', filled with actionable insights to help debug your development process with data.