Psycopg3: Powerful Tech Preview with SQLAlchemy 2.0

May 10, 2022

Postgresql development in Python just got a lot better with psycopg3. SQLAlchemy’s development branch supports this and Django will do so in a future upgrade. Data gurus can get a jump start with this article and included sample project. This article will cover the improved file copy and show how to leverage SQLAlchemy to get the most out of the new driver. To explore this stack, we will:

Install psycopg3 stable and a supported version of SQLAlchemy.
Load a csv of accounts and associated addresses with a file copy.
Perform sync and async queries with the ORM.
Perform low-level SQL queries while taking advantage of Connection Pooling.
Learn how to access new DBAPI from SQLAlchemy and examine Prepared Statements.
Try it ourselves with a GitHub Demo

Installation

SQLAlchemy 2.0 is installed from from github because it’s in pre-beta. We will run PostgreSQL 14.1 from the docker alpine builds in docker-compose.

Engine

Psycopg3 has great pooling but it requires custom recycling of dead connections in a separate thread. SQLAlchemy is absolutely a great choice for pure SQL or ORM development because it provides its own connection pool through the engine.

The connection string in alchemy has to start with “postgresql+psycopg://” for version 3, but If you need to use version 2 then stick with “postgresql://”. In addition let’s set up an async engine using a sqlalchemy extension.

Data Model

Notice how the engine is decoupled from the tables in the new version. This means we can swap engines and run queries in async or sync mode. The data model here is not perfectly 3NF for purposes of demonstration.

Streamlined Raw Connection Recipe

The engine’s connection will easily support raw SQL and ORM queries together but will not expose many of the driver’s features. If we want to use psycopg more we have to access the raw connection but will not have context management. Python’s contextlib to the rescue:

This should call the close method of the raw connection, cleaning it up nicely. The connection will not actually be closed but released back to the pool. Great!

The _ConnectionFairy class serves as a proxy to the underlying DBAPI, giving us access to those properties and methods. It could also be useful to examine the dbapi_connection as a quick API reference or for hacking purposes.

Efficient data loading with COPY

Version 2.0 of psycopg’s copy only supported a fraction on the features in PostgreSQL’s copy. This version allows you to write SQL to perform these copy operations:

CSV to postgres.
Binary to postgres.
Python data structures to postgres.
COPY from postgres into Python or to file
COPY from postgres to another server, table, or database.

We will use sample csv data from github with 100 fictitious accounts and associated addresses and copy data directly in 4KB block sizes. Feel free to increase this for larger data loads.

SQLAlchemy’s engine cannot perform a low-level copy operation so we have to pull a cursor from the raw connection object. The copy specifies columns in the order they are loaded from the disk. Our data is the default form for postgresql csv copy, tab delimited:

This is the one of the most efficient methods of loading data but using python data structures can be more straightforward. Using pickles could also be of value here because they simplify data loads for automated testing. COPY will greatly improve performance for your pipelines, even if the data is from Python.

ORM Usage

Thanks to the hard working contributors at SQLAlchemy our driver upgrade is well supported with the ORM and the sync engine. In this snippet we get some quick counts of the data we just loaded:

We are saving column space here by creating a session without a context manager.

Async Engine

Raw SQL

In this query we will select a random account and their associated city 5 times in 5 different threads. We’ll use the engine’s connection and execute a query inside the text wrapper and return the time it was updated with the result of the query in a tuple. Then we can run that function n_tasks times and gather the results into a list with asyncio.

Sorting by date shows us the order the tasks finish.

ORM

As of this writing there is no query api available from AsyncSession. Passing a select into the session’s execute method should do the trick. Now you can easily use an async framework of your choice (aiohttp, FastAPI, etc) and reap the performance benefits.

Plugging this function into run_tasks should give us a similar result.

Prepared Statements

Psycopg3 takes an automated approach to prepared statements: once a query has been executed 5 times it will store the parsed and planned query in the server session. We cannot change the prepare threshold in SQLAlchemy because it’s API does not expose the psycopg3 constructor. There’s also a hard limit of 100 prepared statements. SQLAlchemy will use these defaults:

Overriding the prepare flag is supported for the raw connection’s execute method, unlike the engine. We could disable prepared statements for custom SQL to avoid caching a bad query plan. However, the default behavior is to prepare and suggests the developers believe it’s a net gain for most uses. Query plan caching is no replacement for proper data modeling and indexing. Always adhere to best practices which produce the best plans!

Prepare	Behavior
None	Default, determined by threshold
True	Query is immediately prepared
False	Query preparation is deferred

Behavior of the prepare flag

Recipe: Autocommit

While autocommit is supported through engine.begin(), the raw connection requires an explicit call to commit. The engine initiates the connection pool at startup and does not support the autocommit parameter in version 2.0. raw_connection’s constructor will not help us here because we’re just being given a proxy to psycopg with a free connection from the pool. Using a custom context manager could remedy this:

What’s next?

Version 3.1 of psycopg will support pipelining of SQL statements by enabling batch execution which save roundtrips to the database. SQLAlchemy’s Engine and ORM is unlikely to support these features by default but using the raw connection recipe will help us here. These projects are iterating rapidly and we can leverage upgrades to get the best possible performance out of our stacks.

If we’re just planning to write SQL it might be worthwhile to check out the official psycopg pool library and build our own stack. On the other hand, we have some relatively easy workarounds for autocommit and auto releasing connections so it’s hard to think that this is the easiest path. We might also consider the performance gains from prepared statements to be too small to justify tuning those parameters.

Try it yourself: https://github.com/realcodespace/psycopg3-sqlalchemy2-preview

Real Code Space

Get Tech Tips for Python, SQL, Data, and all things code