Binary data performance in PostgreSQL

binary data performance nightmare — © Laurenz Albe 2020

Table of Contents

A frequently asked question in this big data world is whether it is better to store binary data inside or outside of a PostgreSQL database. Also, since PostgreSQL has two ways of storing binary data, which one is better?

I decided to benchmark the available options to have some data points next time somebody asks me, and I thought this might be interesting for others as well.

Alternatives for storing binary data

Storing the data outside the database

For that, you store the binary data in files outside the database and store the path of the file in the database.

The obvious downside is that consistency is not guaranteed that way. With all data inside the database, PostgreSQL can guarantee the atomicity of transactions. So it will make the architecture and the implementation somewhat more complicated to store the data outside the database.

One approach to consistency is to always add a file before storing its metadata in the database. When a file's metadata are deleted, you simply leave the file on the file system. This way, there can be no path in the database that does not exist in the file system. You can clean up the filesystem with offline reorganization runs that get rid of orphaned files.

There are two big advantages with this approach:

It keeps the database small, which makes maintenance easier. For example, it is easier to perform an incremental backup of a file system than of a database.
The performance of reading a file directly from the file system must be better. After all, the database is also stored on files, and there must be a certain overhead.

Storing the data in Large Objects

PostgreSQL Large Objects are the “old way” of storing binary data in PostgreSQL. The system assigns an oid (a 4-byte unsigned integer) to the Large Object, splits it up in chunks of 2kB and stores it in the pg_largeobject catalog table.

You refer to the Large Object by its oid, but there is no dependency between an oid stored in a table and the associated Large Object. If you delete the table row, you have to delete the Large Object explicitly (or use a trigger).

Large Objects are cumbersome, because the code using them has to use a special Large Object API. The SQL standard does not cover that, and not all client APIs have support for it.

There are two advantages of Large Objects:

you can store arbitrary large data with Large Objects
the Large Object API has support for streaming, that is, reading and writing Large Objects in chunks

Storing the data as `bytea`

bytea (short for “byte array”) is the “new way” is storing binary data in PostgreSQL. It uses TOAST (The Oversized-Attribute Storage Technique, proudly called “the best thing since sliced bread” by the PostgreSQL community) to transparently store data out of line.

A bytea is stored directly in the database table and vanishes when you delete the table row. No special maintenance is necessary.

The main disadvantages of bytea are:

like all TOASTed data types, there is an absolute length limit of 1GB
when you read or write a bytea, all data have to be stored in memory (no streaming support)

Important TOAST considerations

If you choose bytea, you should be aware of how TOAST works:

for a new table row that would exceed 2000 bytes, variable length data types are compressed, if possible
if the compressed data would still exceed 2000 bytes, PostgreSQL splits variable length data types in chunks and stores them out of line in a special “TOAST table”

Now for already compressed data, the first step is unnecessary and even harmful. After compressing the data, PostgreSQL will realize that the compressed data have actually grown (because PostgreSQL uses a fast compression algorithm) and discard them. That is an unnecessary waste of CPU time.

Moreover, if you retrieve only of a substring of a TOASTed value, PostgreSQL still has to retrieve all chunks that are required to decompress the value.

Fortunately, PostgreSQL allows you to specify how TOAST should handle a column. The default EXTENDED storage type works as described above. If we choose EXTERNAL instead, values will be stored out of line, but not compressed. This saves CPU time. It also allows operations that need only a substring of the data to access only those chunks that contain the actual data.

So you should always change the storage type for compressed binary data to EXTERNAL. This also allows us to implement streaming, at least for read operations, using the substr function (see below).

The bytea table that I use in this benchmark is defined like

CREATE TABLE bins (
   id bigint PRIMARY KEY,
   data bytea NOT NULL
);

ALTER TABLE bins ALTER COLUMN data SET STORAGE EXTERNAL;

CREATE TABLE bins (

id bigint PRIMARY KEY,

data bytea NOT NULL

);

ALTER TABLE bins ALTER COLUMN data SET STORAGE EXTERNAL;

Benchmarking the different approaches

I chose to write my little benchmark in Java, which is frequently used for application code. I wrote an interface for the code that reads the binary data, so that it is easy to test the different implementations with the same code. This also makes it easier to compare the implementations:

import java.io.EOFException;
import java.io.IOException;
import java.sql.SQLException;

public interface LOBStreamer {
	public final static int CHUNK_SIZE = 1048576;
	public int getNextBytes(byte[] buf)
			throws EOFException, IOException, SQLException;
	public void close() throws IOException, SQLException;
}

import java.io.EOFException;

import java.io.IOException;

import java.sql.SQLException;

public interface LOBStreamer {

public final static int CHUNK_SIZE = 1048576;

public int getNextBytes(byte[] buf)

throws EOFException, IOException, SQLException;

public void close() throws IOException, SQLException;

}

CHUNK_SIZE is the unit in which the data will be read.

Code to read binary data from the file system

In the constructor, the database is queried to get the path of the file. That file is opened for reading; the chunks are read in getNextBytes.

import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.EOFException;
import java.io.File;
import java.io.FileInputStream;

public class FileStreamer implements LOBStreamer {
	private FileInputStream file;
	
	public FileStreamer(java.sql.Connection conn, long objectID)
			throws IOException, SQLException {
		PreparedStatement stmt = conn.prepareStatement(
				'SELECT path FROM lobs WHERE id = ?');
		stmt.setLong(1, objectID);
		ResultSet rs = stmt.executeQuery();
		rs.next();
		String path = rs.getString(1);

		this.file = new FileInputStream(new File(path));

		rs.close();
		stmt.close();
	}

	@Override
	public int getNextBytes(byte[] buf)
			throws EOFException, IOException {
		int result = file.read(buf);

		if (result == -1)
			throw new EOFException();

		return result;
	}

	@Override
	public void close() throws IOException {
		file.close();
	}
}

import java.io.IOException;

import java.sql.PreparedStatement;

import java.sql.ResultSet;

import java.sql.SQLException;

import java.io.EOFException;

import java.io.File;

import java.io.FileInputStream;

public class FileStreamer implements LOBStreamer {

private FileInputStream file;

public FileStreamer(java.sql.Connection conn, long objectID)

throws IOException, SQLException {

PreparedStatement stmt = conn.prepareStatement(

'SELECT path FROM lobs WHERE id = ?');

stmt.setLong(1, objectID);

ResultSet rs = stmt.executeQuery();

rs.next();

String path = rs.getString(1);

this.file = new FileInputStream(new File(path));

rs.close();

stmt.close();

}

@Override

public int getNextBytes(byte[] buf)

throws EOFException, IOException {

int result = file.read(buf);

if (result == -1)

throw new EOFException();

return result;

}

@Override

public void close() throws IOException {

file.close();

}

Code to read binary data from a Large Object

The Large Object is opened in the constructor. Note that all read operations must take place in the same database transaction that opened the large object.

Since Large Objects are not covered by the SQL or JDBC standard, we have to use the PostgreSQL-specific extensions of the JDBC driver. That makes the code not portable to other database systems.

However, since Large Objects specifically support streaming, the code is simpler than for the other options.

import java.io.EOFException;
import java.io.IOException;
import java.sql.Connection;
import java.sql.SQLException;

import org.postgresql.PGConnection;
import org.postgresql.largeobject.LargeObject;
import org.postgresql.largeobject.LargeObjectManager;

public class LargeObjectStreamer implements LOBStreamer {
	private LargeObject lob;

	public LargeObjectStreamer(Connection conn, long objectID)
			throws SQLException {
		PGConnection pgconn = conn.unwrap(PGConnection.class);
		this.lob = pgconn.getLargeObjectAPI().open(
						objectID, LargeObjectManager.READ);
	}

	@Override
	public int getNextBytes(byte[] buf)
			throws EOFException, SQLException {
		int result = lob.read(buf, 0, buf.length);

		if (result == 0)
			throw new EOFException();

		return result;
	}

	@Override
	public void close() throws IOException, SQLException {
		lob.close();
	}
}

import java.io.EOFException;

import java.io.IOException;

import java.sql.Connection;

import java.sql.SQLException;

import org.postgresql.PGConnection;

import org.postgresql.largeobject.LargeObject;

import org.postgresql.largeobject.LargeObjectManager;

public class LargeObjectStreamer implements LOBStreamer {

private LargeObject lob;

public LargeObjectStreamer(Connection conn, long objectID)

throws SQLException {

PGConnection pgconn = conn.unwrap(PGConnection.class);

this.lob = pgconn.getLargeObjectAPI().open(

objectID, LargeObjectManager.READ);

}

@Override

public int getNextBytes(byte[] buf)

throws EOFException, SQLException {

int result = lob.read(buf, 0, buf.length);

if (result == 0)

throw new EOFException();

return result;

}

@Override

public void close() throws IOException, SQLException {

lob.close();

}

Code to read binary data from a `bytea`

The constructor retrieves the length of the value and prepares a statement that fetches chunks of the binary data.

Note that the code is more complicated that in the other examples, because I had to implement streaming myself.

With this approach, I don't need to read all chunks in a single transaction, but I do so to keep the examples as similar as possible.

import java.io.EOFException;
import java.io.IOException;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class ByteaStreamer implements LOBStreamer {
	private PreparedStatement stmt;
	private Connection conn;
	private int position = 1, size;

	public ByteaStreamer(Connection conn, long objectID)
			throws SQLException {
		PreparedStatement len_stmt = conn.prepareStatement(
				'SELECT length(data) FROM bins WHERE id = ?');
		len_stmt.setLong(1, objectID);
		ResultSet rs = len_stmt.executeQuery();

		if (!rs.next())
			throw new SQLException('no data found', 'P0002');

		size = rs.getInt(1);

		rs.close();
		len_stmt.close();

		this.conn = conn;
		this.stmt = conn.prepareStatement(
				'SELECT substr(data, ?, ?) FROM bins WHERE id = ?');
		this.stmt.setLong(3, objectID);
	}

	@Override
	public int getNextBytes(byte[] buf)
			throws EOFException, IOException, SQLException {
		int result = (position > size + 1 - buf.length) ?
						(size - position + 1) : buf.length;

		if (result == 0)
			throw new EOFException();

		this.stmt.setInt(1, position);
		this.stmt.setInt(2, result);

		ResultSet rs = this.stmt.executeQuery();

		rs.next();

		InputStream is = rs.getBinaryStream(1);
		is.read(buf);

		is.close();
		rs.close();

		position += result;

		return result;
	}

	@Override
	public void close() throws SQLException {
		this.stmt.close();
	}
}

import java.io.EOFException;

import java.io.IOException;

import java.io.InputStream;

import java.sql.Connection;

import java.sql.PreparedStatement;

import java.sql.ResultSet;

import java.sql.SQLException;

public class ByteaStreamer implements LOBStreamer {

private PreparedStatement stmt;

private Connection conn;

private int position = 1, size;

public ByteaStreamer(Connection conn, long objectID)

throws SQLException {

PreparedStatement len_stmt = conn.prepareStatement(

'SELECT length(data) FROM bins WHERE id = ?');

len_stmt.setLong(1, objectID);

ResultSet rs = len_stmt.executeQuery();

if (!rs.next())

throw new SQLException('no data found', 'P0002');

size = rs.getInt(1);

rs.close();

len_stmt.close();

this.conn = conn;

this.stmt = conn.prepareStatement(

'SELECT substr(data, ?, ?) FROM bins WHERE id = ?');

this.stmt.setLong(3, objectID);

}

@Override

public int getNextBytes(byte[] buf)

throws EOFException, IOException, SQLException {

int result = (position > size + 1 - buf.length) ?

(size - position + 1) : buf.length;

if (result == 0)

throw new EOFException();

this.stmt.setInt(1, position);

this.stmt.setInt(2, result);

ResultSet rs = this.stmt.executeQuery();

rs.next();

InputStream is = rs.getBinaryStream(1);

is.read(buf);

is.close();

rs.close();

position += result;

return result;

}

@Override

public void close() throws SQLException {

this.stmt.close();

}

Benchmark results for reading binary data

I performed the benchmark the code on a laptop with and Intel^® Core™ i7-8565U CPU and SSD storage. The PostgreSQL version used was 12.2. Data were cached in RAM, so the results don't reflect disk I/O overhead. The database connection used the loopback interface to reduce the network impact to a minimum.

This code was used to run the test:

Class.forName('org.postgresql.Driver');

Connection conn = DriverManager.getConnection(
		'jdbc:postgresql:test?user=laurenz&password=...');

// important for large objects
conn.setAutoCommit(false);

byte[] buf = new byte[LOBStreamer.CHUNK_SIZE];

long start = System.currentTimeMillis();

for (int i = 0; i < LOBStreamTester.ITERATIONS; ++i) {
	// set LOBStreamer implementation and object ID as appropriate
	LOBStreamer s = new LargeObjectStreamer(conn, 62409);

	try {
		while (true)
			s.getNextBytes(buf);
	} catch (EOFException e) {
		s.close();
	}

	conn.commit();
}

System.out.println(
		'Average duration: ' +
		(double)(System.currentTimeMillis() - start)
			/ LOBStreamTester.ITERATIONS);
		
conn.close();

Class.forName('org.postgresql.Driver');

Connection conn = DriverManager.getConnection(

'jdbc:postgresql:test?user=laurenz&password=...');

// important for large objects

conn.setAutoCommit(false);

byte[] buf = new byte[LOBStreamer.CHUNK_SIZE];

long start = System.currentTimeMillis();

for (int i = 0; i < LOBStreamTester.ITERATIONS; ++i) {

// set LOBStreamer implementation and object ID as appropriate

LOBStreamer s = new LargeObjectStreamer(conn, 62409);

try {

while (true)

s.getNextBytes(buf);

} catch (EOFException e) {

s.close();

}

conn.commit();

}

System.out.println(

'Average duration: ' +

(double)(System.currentTimeMillis() - start)

/ LOBStreamTester.ITERATIONS);

conn.close();

I ran each test multiple times in a tight loop, both with a large file (350 MB) and a small file (4.5 MB). The files were compressed binary data.

	350 MB data	4.5 MB data
file system	46 ms	1 ms
Large Object	950 ms	8 ms
`bytea`	590 ms	6 ms

Summary

In my benchmark, retrieving binary objects from the database is roughly ten times slower that reading them from files in a file system. Surprisingly, streaming from a bytea with EXTERNAL storage is measurably faster than streaming from a Large Object. Since Large Objects specifically support streaming, I would have expected the opposite.

To sum it up, here are the advantages and disadvantages of each method:

Binary data in the file system:

Advantages:

by far the fastest method
small database for ease of maintenance, e.g. backups

Disadvantages:

more difficult to guarantee consistency
more complicated architecture

Binary data as Large Objects:

Advantages:

constistence automatically guaranteed
API supports streaming

Disadvantages:

very bad performance
non-standard API
requires special maintenance like DELETE triggers in the database
database becomes large and unwieldy

Binary data as `bytea`:

Advantages:

consistence automatically guaranteed
works with standard SQL

Disadvantages:

bad performance
writes cannot be streamed (need lots of RAM)
database becomes large and unwieldy

Storing large binary data in the database is only a good idea on a small scale, when a simple architecture and ease of coding are the prime goals and high performance is not important. Then the increased database size also won't be a big problem.

Large Objects should only be used if the data exceed 1GB or streaming writes to the database is important.

If you need help with the performance and architecture of your PostgreSQL database, don't hesitate to ask us.

0 0 votes

Article Rating

15 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Damir Reic

1 year ago

Hi! Maybe a thing od two to mention about bytea types. I have a 10GB big database, a table 7.5GB big in it with bytea column. It takes 12+ hours to do backup and when I was troubleshooting slowness I found that pg_dump did 240GB of network traffic while backup speed was ~45Mbit on average. Furthur troubleshooting discovered that bytea type in that big table was the reason for backup to take so long. Unfortunately, I still didn't find a solution to fix but based on my research I should migrate data to hex field type. Any inputs on that?

laurenz

1 year ago

Reply to Damir Reic

You should definitely use bytea and store the data as they are.
I do not have a ready explanation for the slowness or the amount of traffic; this would need further investigation.
Perhaps pg_dumpbinary can improve the situation for you.

Shiyao Jin

2 years ago

Really good article!
One question: if I change the TOAST strategy from extended to external on existing column in existing table by using "ALTER TABLE bins ALTER COLUMN data SET STORAGE EXTERNAL", what will happen to the existing data in that column? The existing data will be decompressed and then stored out-of-line?

laurenz

2 years ago

Reply to Shiyao Jin

No, existing data won't be affected.
But if they are compressed binary data, PostgreSQL won't have compressed them anyway (it will have attempted compression and then realized that the data grew, as described in my article).

Shiyao Jin

2 years ago

Reply to laurenz

Thank you for your prompt response.
The tricky part in my case is that my data is binary data (from binary file) but not compressed. I'm not sure if PostgreSQL has compressed them (or all of them in different rows) because I guess there is the probability that PostgreSQL's compression algorithm might make binary data in some rows smaller but not for binary data in other rows. This might result in that in some rows binary data is compressed while in other rows binary data is not compressed. I'm concerned that if I change to "EXTERNAL", there will be some compatible issues, e.g. PostgreSQL cannot read the existing compressed data

laurenz

2 years ago

Reply to Shiyao Jin

I see.
You don't have to worry about this, because it is stored in the column header if the data are compressed or not. So regardless of the column setting, PostgreSQL will decompress data that were stored compressed.

Shiyao Jin

2 years ago

Reply to laurenz

Thank you for your answer. Really appreciate it

Pól Ua Laoínecháin

2 years ago

Hi,

You say above:

he performance of reading a file directly from the file system must be better. After all, the database is also stored on files, and there must be a certain overhead.

However, there is this from SQLite:

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Now, I know that they are both different systems and that tipping points (if any) may be (very) different, but your statement may not hold globally?

Has this been benchmarked at all? Would be an interesting one?

Pól...

xedsdsss

3 years ago

dag, i luv this article on all things BLOB in PG

Pavel Stěhule

3 years ago

maybe LO is slow, because native protocol support is not used. Probably using lo_export should be much faster.

laurenz

3 years ago

Reply to Pavel Stěhule

I am not sure what you mean by the native protocol.
The JDBC driver uses the fast-path API to call the large object server functions.

Pavel Stěhule

3 years ago

Reply to laurenz

libpq allows to use lo_import function. https://www.postgresql.org/docs/9.0/lo-interfaces.html. But probably JDBC driver doesn't use libpq. Another question is impact of Java implementation. Can you check a performance of lo_import from psql? When I look on your examples, there is not too much reasons why LO is too much slower than byte.

There are another argument for LO. Import, export LO typically needs significantly less more RAM (client side, server side) than bytea. I don't know how are PHP limits today, but 15 years ago it was important factor

Jürgen Strobel

3 years ago

Streaming writes to bytea (from a client perspective) is theoretically
possible using UPDATE and concatenation. I guess with a lot of write
amplification though.

laurenz

3 years ago

Reply to Jürgen Strobel

From a client perspective it looks that way, but really it isn't. If you
UPDATE tab SET bincol = bincol || 'xdeadbeef';
the complete value will by read from and written to storage, and you may run out of memory on the server.

Pavel Luzanov

3 years ago

Thank you for an article. But one question remains.
What if we change storage strategy for pg_largeobject.data column to external?

Stay tuned with our

Binary data performance in PostgreSQL

Alternatives for storing binary data

Storing the data outside the database

Storing the data in Large Objects

Storing the data as `bytea`

Important TOAST considerations

Benchmarking the different approaches

Code to read binary data from the file system

Code to read binary data from a Large Object

Code to read binary data from a `bytea`

Benchmark results for reading binary data

Summary

Binary data in the file system:

Binary data as Large Objects:

Binary data as `bytea`:

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Binary data performance in PostgreSQL

Alternatives for storing binary data

Storing the data outside the database

Storing the data in Large Objects

Storing the data as bytea

Important TOAST considerations

Benchmarking the different approaches

Code to read binary data from the file system

Code to read binary data from a Large Object

Code to read binary data from a bytea

Benchmark results for reading binary data

Summary

Binary data in the file system:

Binary data as Large Objects:

Binary data as bytea:

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Storing the data as `bytea`

Code to read binary data from a `bytea`

Binary data as `bytea`: