#553 — May 1, 2024 |
|
Postgres Weekly |
Crunchy Bridge for Analytics: Your Data Lake in Postgres? — Marco ponders: “What if you could use PostgreSQL to easily query your data lake with state-of-the-art analytics performance?” Crunchy Bridge for Analytics is a new part of Crunchy Data’s Postgres platform offering S3 integration, CSV/Parquet querying support, and bidirectional data lake ↔ Postgres import/export. Marco Slot (Crunchy Data) |
The 150x pgvector Speedup: A Year in Review — pgvector provides Postgres with extra data types, indexing, and querying facilities to work with high dimensional vectors (such as are common with LLMs and embeddings). The past year has seen a huge leap in its capabilities and performance. Jonathan Katz |
💡 pgvector 0.7.0 has also just been released with new |
Improve Postgres Performance and Reduce Costs with OtterTune — Spending more on AWS does not equal better performance. OtterTune gets 146% more throughput than standard RDS settings by optimizing knobs and provides index and query recommendations, plus table and autovacuum insights. Get a free, 30-day trial. OtterTune sponsor |
Postgres Bloat Minimization — Tables occupying more space than expected is a common cause of complaint, but there are a variety of strategies you can employ so that space is used more efficiently. Pavel Borisov |
IN BRIEF:
|
Papa's Got a Brand New RAG — RAG (Retrieval-Augmented Generation) is a technique used alongside large language models where you fetch data to include in prompts to improve the responses LLMs give. Shaun demonstrates bringing all the parts together to build a simple RAG-based query system in Python with pg_vectorize doing the heavy lifting on the Postgres side. Shaun M. Thomas |
📄 Nightly Postgres Backups via GitHub Actions Josh Strange 📄 When and Why Postgres Indexes are Ignored Clusterity s.r.o. 📄 Benchmarking Connection Poolers: PgBouncer, PgCat and Supavisor Binidxaba (Tembo) 📄 Test and Release pgrx Extensions with pgxn-tools David Wheeler 📄 LDAP Authentication in PgBouncer Through PAM Jobin Augustine |
🛠 Code and Tools |
PeerDB: A Simple, Fast Way to Replicate Postgres Data Elsewhere — A data replication system built by former Microsoft and Citus Data employees that targets use cases where you need to stream data from Postgres to data warehouses, queues, and other storage engines. This week’s v0.13.0 release introduces an Elasticsearch connector. GitHub repo. PeerDB |
🎉 The company behind PeerDB also raised $3.6m in seed funding this month. |
pg_tier: Tier Data to External Amazon S3 Storage — Once data becomes older or much less frequently accessed, it might be more cost effective to store it elsewhere. pg_tier makes it easier to push a table over to S3 while still maintaining the ability to query it (at the price of some added latency, naturally). Tembo |
Squawk: A Linter for Postgres Migrations — Prevent unexpected downtime caused by database migrations and encourage best practices around schemas and SQL. Steve Dignam |
|