Postgres Weekly Issue 553: May 1, 2024

#553 — May 1, 2024

Together with

Crunchy Bridge for Analytics: Your Data Lake in Postgres? — Marco ponders: “What if you could use PostgreSQL to easily query your data lake with state-of-the-art analytics performance?” Crunchy Bridge for Analytics is a new part of Crunchy Data’s Postgres platform offering S3 integration, CSV/Parquet querying support, and bidirectional data lake ↔ Postgres import/export.

Marco Slot (Crunchy Data)

The 150x pgvector Speedup: A Year in Review — pgvector provides Postgres with extra data types, indexing, and querying facilities to work with high dimensional vectors (such as are common with LLMs and embeddings). The past year has seen a huge leap in its capabilities and performance.

Jonathan Katz

💡 pgvector 0.7.0 has also just been released with new halfvec and sparsevec types, useful extra functions, support for scalar and binary quantization, and more.

Improve Postgres Performance and Reduce Costs with OtterTune — Spending more on AWS does not equal better performance. OtterTune gets 146% more throughput than standard RDS settings by optimizing knobs and provides index and query recommendations, plus table and autovacuum insights. Get a free, 30-day trial.

OtterTune sponsor

Postgres Bloat Minimization — Tables occupying more space than expected is a common cause of complaint, but there are a variety of strategies you can employ so that space is used more efficiently.

Pavel Borisov

IN BRIEF:

Peter Eisentraut looks back at Postgres' supported platforms over time.
Timescale's CEO reflects on Postgres as being 'the bedrock for the future of data', using community sentimenent and survey data to make his case.
📅 If you want to keep up to date with the state of development around David Wheeler's proposed improvements to the Postgres extension packaging and distribution ecosystem, there's a virtual 'mini-summit' taking place in just a few hours (and another on May 15).
It looks like PG Day Chicago 2024 went well!

Papa's Got a Brand New RAG — RAG (Retrieval-Augmented Generation) is a technique used alongside large language models where you fetch data to include in prompts to improve the responses LLMs give. Shaun demonstrates bringing all the parts together to build a simple RAG-based query system in Python with pg_vectorize doing the heavy lifting on the Postgres side.

Shaun M. Thomas

📄 Nightly Postgres Backups via GitHub Actions Josh Strange

📄 When and Why Postgres Indexes are Ignored Clusterity s.r.o.

📄 Benchmarking Connection Poolers: PgBouncer, PgCat and Supavisor Binidxaba (Tembo)

📄 Test and Release pgrx Extensions with pgxn-tools David Wheeler

📄 LDAP Authentication in PgBouncer Through PAM Jobin Augustine

🛠 Code and Tools

PeerDB: A Simple, Fast Way to Replicate Postgres Data Elsewhere — A data replication system built by former Microsoft and Citus Data employees that targets use cases where you need to stream data from Postgres to data warehouses, queues, and other storage engines. This week’s v0.13.0 release introduces an Elasticsearch connector. GitHub repo.

PeerDB

🎉 The company behind PeerDB also raised $3.6m in seed funding this month.

pg_tier: Tier Data to External Amazon S3 Storage — Once data becomes older or much less frequently accessed, it might be more cost effective to store it elsewhere. pg_tier makes it easier to push a table over to S3 while still maintaining the ability to query it (at the price of some added latency, naturally).

Tembo

Squawk: A Linter for Postgres Migrations — Prevent unexpected downtime caused by database migrations and encourage best practices around schemas and SQL.

Steve Dignam

HypoPG 1.4.1 – Hypothetical indexes for Postgres. Now with Postgres 17 support.
pgmoneta 0.11 – Backup / restore solution. Now supports macOS and adds disk and network rate limiting features.
Orafce 4.10.0 – Emulate a subset of functions and packages from Oracle Database to aid migrations.
Lantern 0.2.5 – HNSW/usearch vector extension for Postgres.
pgwire 0.22 – The Postgres wire protocol in a Rust library.