Postgres Weekly Issue 381: November 11, 2020

#381 — November 11, 2020

Postgres Observability: A Diagram of Views and Functions for Observing Postgres — An interactive diagram that provides a simplified view of system views and functions relevant to monitoring Postgres and tracking statistics. This is a Postgres 13 flavored update of a previous effort and Alexey explains more here.

Alexey Lesovsky

Analyzing GPS Trajectories at Scale with Postgres, MobilityDB, and Citus — MobilityDB adds temporal and spatio-temporal objects to Postgres and PostGIS and this post digs into what that means in a practical sense and how Postgres can be made to work with not just locations but movement.

Mahmoud Sakr

Free eBook: How to Get a 3x Performance Improvement on Your Postgres Database — Learn our best practices for optimizing Postgres query performance for customers like Atlassian and how to reduce data loaded from disk by 500x.

pganalyze sponsor

Replication Conflicts and How to Deal with Them — How can replication conflicts occur if the standby server is read-only? Enter vacuums, locks, buffer pins, and more to make life difficult! Luckily, there are various ways to mitigate the problems and Laurenz shares some tips to do so.

Laurenz Albe

Developers Increasingly Pair MongoDB with Postgres, Survey Finds — An interesting finding that comes out of a new Studio3T survey of 18,000 developers.

Matt Asay

How to Analyze a Postgres Crash Dump File — We really hope you’ll never have to do this, but if you do end up with a core dump from running Postgres, these tips might help you navigate it.

Cary Huang

Is It Time to Modernize the PostgreSQL Core Team? — Álvaro wonders if anyone shares his misgivings about the processes, structure and governance of Postgres core.

Álvaro Hernández opinion

5 Ways to Host PostgreSQL Databases — This guide covers various ways to run PostgreSQL: self-managed, managed by cloud providers, managed by third parties.

Prisma sponsor

phpPgAdmin 7.13.0 Released — Popular Web-based Postgres administration tool. Drops PHP 7.1 support, adds support for Postgres 13 (and 14, provisionally).

Robert Treat

NoisePage: A 'Self Driving' Database Management System — NoisePage is an open source, relational database (that uses a Postgres-compatible wire protocol) out of CMU entirely focused on testing out autonomous deployment and machine learning optimization techniques.

Carnegie Mellon University Database Group

▶ Transaction Isolation Levels With Postgres As An Example
Boris Strelnikov

The Very Basics of Postgres Constraints
Mohit Khare

💡 Tip of the Week

Even Faster \COPY
by Ryan Booz

Postgres and psql have built in support for copying (ie. bulk loading) data TO and FROM files in various formats. It's billed as an easy way to shuffle data around between tables and databases, and rightly so!

With psql, the command is called \COPY and makes it easy to load data from both text and binary files that reside on your local workstation.

Given a table named readings with the following columns:

     id |          time          | reading

----|------------------------|---------

  1 | 2020-10-01 00:00:00-00 | 12.3

And a CSV file name readings.csv with the following data:

    1,2020-10-01 01:00:00-00,15.4

1,2020-10-01 01:30:00-00,18.1

2,2020-10-01 01:00:00-00,20.2

...

We could bulk load thousands (or millions) of records with this \COPY command at the psql prompt:

    \COPY readings FROM readings.csv CSV;
    

One disadvantage of \COPY is that it is single threaded and doesn't (yet) support batched or parallel copying. So, for a large file, this could take a significant amount of time.

Enter Parallel Copy, a free and open-source tool written in Go that wraps \COPY and provides parallel inserts of batched data. This means that data can be inserted at higher rates with lower transactional overhead.

Copying the same data with Parallel Copy would look like this, including the helpful feature to get incremental progress updates:

    timescaledb-parallel-copy --connection {CONNECTION STRING} --db-name {DATABASE NAME} --table readings --file {PATH TO `readings.csv`} --workers 4 --reporting-period 30s
    

By default this will copy rows in batches of 5,000 at a time.

Note that Parallel Copy was developed primarily for use with TimescaleDB (a time-series database built on top of Postgres) but works with standard Postgres also.

This week’s tip is not sponsored, but we thank Ryan Booz of Timescale for supplying it to us. If you have a tip we can use, let us know.