The 150x pgvector speedup: a year-in-review

Tue, Apr 30, 2024 17-minute read

I wanted to write a “year-in-review” covering all the performance pgvector has made (with significant credit to Andrew Kane), highlighting specific areas where pgvector has improved (including one 150x improvement!) and areas where we can continue to do better.

A few weeks ago, I started outlining this post and began my data collection. While I was working on this over a two week period, no fewer than three competitive benchmarks against pgvector published. To me, this is a testament both how well pgvector is at handling vector workloads (and by extension, PostgreSQL too) that people are using it as the baseline to compare it to their vector search systems.

Some of these benchmarks did contain info that identified areas we can continue to improve both PostgreSQL and pgvector, but I was generally disappointed in the methodology used to make these comparisons. Of course I’d like to see pgvector perform well in benchmarks, but it’s important to position technologies fairly and be vocally self-critical on where your system can improve to build trust in what you’re building.

I have a separate blog post planned for how to best present benchmark studies between different systems for vector similarity search (it’s a topic I’m interested in). Today though, I want to compare pgvector against itself, and highlight areas it’s improved over the past year, and where the project can continue to go and grow.

How I ran these tests

An important aspect of any benchmark is transparency. First, I’ll discuss the test methodology I used, describe the test environment setup (instances, storage, database configuration), and then discuss the results. If you’re not interested in this part, you can skip ahead to “The 150x pgvector speedup”, but this information can help you with your own testing!

First, what are testing for? We’ll be looking at these specific attributes in these tests:

  • Recall: A measurement of the relevancy of our results - what percentage of the expected results are returned during a vector search? Arguably, this is the most important measurement - it doesn’t matter if you have the highest query throughput if your recall is poor.
  • Storage size: This could be related to storing your original vector/associated data, and any data you store in a vector index. Because PostgreSQL is a database, at a minimum you’ll have to store the vector in the table, and pay additional storage costs for building a vector index.
  • Load time / index build time: How long does it take to load your vector data into an existing index? If your data is preloaded, how long does it take to build an index? Spending more time building your index can help improve both recall and query performance, but this is often the most expensive part of a vector database and can impact overall system performance.
  • Latency (p99): Specifically, how long it takes to return a single result, but representing the 99th percentile (“very slow”) queries. This serves as an “upper bound” on latency times.
  • Single-connection Throughput / queries per second (QPS): How many queries can be executed each second? This impacts how much load you can put on a single system.

(More on the “single-connection” distinction in a future blog post).

This is a “year-in-review” post, so I ran tests against the following releases and configurations of pgvector. I’m including the shorthand that I’ll show in the tests results.

pgvector versionIndex typeTest name (r7gd)Test name (r7i)Notes
0.4.1IVFFlatr7gd.041r7i.041
0.4.4IVFFlatr7gd.044r7i.044
0.5.0HNSWr7gd.050r7i.050
0.5.1HNSWr7gd.051r7i.051
0.6.0HNSWr7gd.060r7i.060
0.6.2HNSWr7gd.062r7i.062
0.7.0HNSWr7gd.070r7i.070
0.7.0HNSW - SQ16r7gd.070.fp16r7i.070.fp16Stores 2-byte float representation of vectors in the index
0.7.0HNSW - BQ + Jaccard rerankr7gd.070.bq-jaccard-rerankr7i.070.bq-jaccard-rerankStores binary representation of vectors in index using Jaccard distance; results are re-ranked using original vector after the index search
0.7.0HNSW - BQ + Hamming rerankr7gd.070.bq-jaccard-rerankr7i.070.bq-jaccard-rerankStores binary representation of vectors in index using Hamming distance; results are re-ranked using original vector after the index search

Test setup

To simplify the comparison, I kept the index build parameters the same for all of the tests. Adjusting build parameters can impact all five of the key metrics (please see previous posts and talks), but the purpose of this blog post is to show how pgvector has evolved over the past year and choosing a fixed set of parameters does serve to show how it’s improved and where it can grow. Below are the build parameters used for each index type:

Index typeBuild parameters
IVFFlatlists: 1000
HNSWm: 16; ef_construction: 256

For the testing, I used a r7gd.16xlarge and a r7i.16xlarge, both of which have 64 vCPU and 512GiB of RAM. I stored the data on the local NVMe on the r7gd, and on gp3 storage for the r7i. If this test was looking at behaviors around storage, that fact would matter heavily, but these tests focused specifically on CPU and memory characteristics.

For these tests, I used PostgreSQL 16.2 (aside: the upcoming PostgreSQL 17 release is expected to have the ability to utilize AVX-512 SIMD instructions for the pg_popcount function, used by the Jaccard distance; this doesn’t account for those optimizations) with the following configurations, using parallelism where available:

checkpoint_timeout2h
effective_cache_size256GB
jitoff
maintenance_work_mem64GB
max_parallel_maintenance_workers63
max_parallel_workers64
max_parallel_workers_per_gather64
max_wal_size20GB
max_worker_processes128
shared_buffers128GB
wal_compressionzstd
work_mem64MB

I used the ANN Benchmark framework to run the tests. I made the following modifications to the pgvector module:

For each index type I used the following search parameters, which are the defaults for what’s in the pgvector module for ANN Benchmarks:

Index typeSearch parameters
IVFFlativfflat.probes: [1, 2, 4, 10, 20, 40, 100]
HNSWhnsw.ef_search: [10, 20, 40, 80, 120, 200, 400, 800]

Finally, the test results below will show the recall target (e.g. 0.90 or 90%). The results are shown at the threshold that each test passed that recall level (if it passed that recall level). I could probably have fine tuned this further to find the exact hnsw.ef_search value where the test crossed the threshold, which would give a more accurate representation of the performance characteristics at a recall target, but again, the main goal is to show the growth and growth areas of pgvector over the past year.

And now it’s time for…

The 150x pgvector speedup

For the first test, we’ll review the results from the dbpedia-openai-1000k-angular benchmark at 99% recall. The results are below:

dbpedia-openai-1000k-angular @ 99% recall on a r7gd.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7gd.0410.99481150.161474167.561
r7gd.0440.99481155.251476167.561
r7gd.0500.99324330.45.7427747917.551
r7gd.0510.99224730.95.6727.4508827.551
r7gd.0600.99225231.55.5228.1253307.551
r7gd.0620.99225331.65.5428252307.551
r7gd.0700.99225331.65.5128.2250307.551
r7gd.070.fp160.99326332.95.329.3146513.782
r7gd.070.bq-hamming-rerank0.9923629.55.428.8491530.4616.4
r7gd.070.bq-jaccard-rerank0.9923429.35.3828.9501500.4616.4

And there it is: between pgvector 0.5.0 (where HNSW was introduced) and pgvector 0.7.0, we see that we can get a 150x speedup in the index build time when we use the “binary quantization” methods. Note that we can’t always use binary quantization with our data, but we can see we can that scalar quantization to 2-byte floats show over a 50x speedup from the initial HNSW implementation in pgvector 0.5.0. A lot of this speedup is attributed to the use of parallel workers (in this case, 64) during the index build process. For fun, here’s how this looks in a bar chart:

pgvector-150x-r7gd-dbpedia.png

(Note: I do chuckle a bit, as it reminds of a time I fixed a query I wrote to get a 100x speedup. It was a recursive query, but I used UNION ALL when instead I wanted UNION. Unlike my goofy mistake, this I do take this work in pgvector to be a bona fide speedup due to all of the improvements in the pgvector implementation).

Additionally, we see that the addition of HNSW allows us to get a 30x QPS boost and an almost 30x p99 latency boost over IVFFlat at 99% recall. Queries were executed serially; we’d need to run additional tests to see how pgvector scales with client concurrently querying the data.

dbpedia-openai-1000k-angular @ 99% recall on a r7i.16xlarge

Different CPU families can impact the results from a test based upon the availability of acceleration instructions (e.g. SIMD). pgvector 0.7.0 added support for SIMD disaptching functions on x86-64 architecture, so it’s important to test what impact this has on our test runs. For these tests, I used an Ubuntu 22.04, with the pgvector code compiled with gcc 12.3 and clang-15, and am showing the results from the dbpedia-openai-1000k-angular benchmark at 99% recall:

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7i.0410.99481153.01149615.07.561
r7i.0440.99481156.58149415.17.561
r7i.0500.99225531.95.4228.974431.07.551
r7i.0510.99224530.65.6627.752011.47.551
r7i.0600.99226132.65.2829.77739.67.551
r7i.0620.99226533.15.2230.038219.57.551
r7i.0700.99325531.95.4029.038819.27.551
r7i.070.fp160.99328235.34.8732.222732.83.782
r7i.070.bq-hamming-rerank0.9926933.64.7832.864116.30.4616.4
r7i.070.bq-jaccard-rerank0.9926733.44.7732.866112.80.4616.4

Again, we see a 100x+ speedup in index build time when using the “binary quantization” methods, and comparable performance results overall to what we had with the r7gd family. We can also see a more than 30x improvement in both throughput and latency as well. Here is a chart that shows how the index build times have decreased on the r7i:

pgvector-150x-r7i-dbpedia.png

(I’ll note here I really need to level up my matplotlib skills; likely Excel too, as it was taking me awhile to get the data charted there. Anyway, this is all the charting I’m doing in this blog post).

As explored in the previous blog post on scalar and binary quantization, we can’t always use binary quantization and achieve our recall target due to lack of bit diversity in the indexed vectors. We saw this with both the sift-128-euclidean and gist-960-euclidean datasets. However, both still have nice speedups over the course of the year.

Below are the results from the sift-128-euclidean benchmark @ 99% recall on both architectures:

sift-128-euclidean @ 99% recall on a r7gd.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7gd.0410.999331.044.051.05841.60.511.5
r7gd.0440.999331.042.391.05940.90.511.5
r7gd.0500.99443213.12.9814.824111.00.761.0
r7gd.0510.99443213.12.9814.819331.20.761.0
r7gd.0600.99445313.72.8415.56736.00.761.0
r7gd.0620.99445813.92.8115.75742.30.761.0
r7gd.0700.99448714.82.6516.65643.10.761.0
r7gd.070.fp160.99448214.62.6816.44850.20.521.5

sift-128-euclidean @ 99% recall on a r7i.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7i.0410.999311.148.571.14351.30.511.5
r7i.0440.999291.051.341.04450.20.511.5
r7i.0500.99443615.02.9617.322081.00.761.0
r7i.0510.99442614.73.0616.817221.30.761.0
r7i.0600.99450317.32.5720.05813.80.761.0
r7i.0620.99449717.12.5720.07429.80.761.0
r7i.0700.99449217.02.6019.77429.80.761.0
r7i.070.fp160.99454418.82.3621.86235.60.521.5

Across the board, there are some nice speedups, including the 50x index build time improvement for the quantized halfvec test (r7gd.070.fp16), similar to the dbpedia-openai-1000k-angular test.

Let’s take a quick look at the gist-960-euclidean data. With the previous tests, we looked at the results targeting 99% recall, as the QPS/p99 speedups were more pronounced with those. However, gist-960-euclidean tends to be particularly challenging to get good throughput/performance results at high recall (though with binary quantization, I can get over 6,000 QPS at 0% recall!), and interestingly I observed the best speedups at 90% recall.

gist-960-euclidean @ 90% recall on a r7gd.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7gd.0410.965131.0128.911.030022.63.822.0
r7gd.0440.968141.1123.661.029722.93.822.0
r7gd.0500.92321516.55.5323.367871.07.501.0
r7gd.0510.92421516.55.5923.146871.47.501.0
r7gd.0600.92422917.65.1625.020433.37.501.0
r7gd.0620.92322417.25.3124.319834.37.501.0
r7gd.0700.92222917.65.1824.919734.57.501.0
r7gd.070.fp160.92124819.14.8326.713749.52.503.0

gist-960-euclidean @ 90% recall on a r7i.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7i.0410.966161.1111.471.128222.23.822.0
r7i.0440.965151.0120.901.028921.73.822.0
r7i.0500.92322615.15.2023.362731.07.501.0
r7i.0510.92522815.25.2623.042121.57.501.0
r7i.0600.92424616.44.8425.011095.77.501.0
r7i.0620.92324516.34.8824.830120.87.501.0
r7i.0700.92423815.94.9724.329521.37.501.0
r7i.070.fp160.92127118.14.3327.918034.92.503.0

Again, we can see the effects of parallelism on speeding up the HNSW builds, as well as the effects on shrinking the index size by using 2-byte floats. Also, similar to the sift-128-euclidean test, we’re unable to use binary quantization to achieve 90% recall.

For completeness, here are a few more sets of results. I chose the “recall” values to optimize for where I saw the biggest performance gains:

glove-25-angular @ 99% recall on a r7gd.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7gd.0410.997261.053.501.03181.90.143.2
r7gd.0440.997261.053.981.03376.90.143.2
r7gd.0500.99549319.02.6420.425381.00.451.0
r7gd.0510.99549519.02.6420.419221.30.451.0
r7gd.0600.99551419.82.5521.25347.90.451.0
r7gd.0620.99547018.12.7919.34951.80.451.0
r7gd.0700.99552220.12.5021.64852.90.451.0
r7gd.070.fp160.99552120.02.5121.54852.90.401.1

glove-25-angular @ 99% recall on a r7i.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7i.0410.997231.059.081.03863.50.143.2
r7i.0440.997241.059.591.03080.50.143.2
r7i.0500.99553923.42.4124.724141.00.451.0
r7i.0510.99554523.72.3924.918271.30.451.0
r7i.0600.99555724.22.3425.54715.10.451.0
r7i.0620.99557425.02.2726.36437.70.451.0
r7i.0700.99556924.72.2826.16338.30.451.0
r7i.070.fp160.99556924.72.2826.16040.20.401.1

The interesting thing about both of these tests is that the IVFFlat index builds are both faster and smaller than the HNSW index builds - and that is without using any parallelism during the IVFFlat build. However, the HNSW numbers show a sigificant boost in throughput and p99 latency.

Finally, here are the results from the glove-100-angular test. In my test, I wasn’t able to get much above 95% recall. I would likely need to increase the m build parameter to get towards 99% recall, but as mentioned earlier, the goal of this testing was primarily to see how pgvector has improved over the course of the year and not optimize parameters for a particular dataset:

glove-100-angular @ 95% recall on a r7gd.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7gd.0410.963291.047.101.06858.00.481.7
r7gd.0440.963291.046.201.06957.10.481.7
r7gd.0500.965652.221.052.239411.00.821.0
r7gd.0510.965652.220.902.329651.30.821.0
r7gd.0600.965632.221.222.28347.50.821.0
r7gd.0620.965622.121.682.27850.50.821.0
r7gd.0700.965662.320.072.37751.20.821.0
r7gd.070.fp160.965672.319.972.46858.00.571.4

glove-100-angular @ 95% recall on a r7i.16xlarge

TestRecallSingle Connection Throughput (QPS)QPS Speedupp99 Latency (ms)p99 SpeedupIndex Build (s)Index Build SpeedupIndex Size (GiB)Size Improvement
r7i.0410.963271.050.431.05366.80.481.7
r7i.0440.962261.052.131.05663.30.481.7
r7i.0500.965813.116.703.135431.00.821.0
r7i.0510.965823.216.493.225171.40.821.0
r7i.0600.965793.016.643.16925.10.821.0
r7i.0620.965833.215.903.39836.20.821.0
r7i.0700.965813.116.273.29537.30.821.0
r7i.070.fp160.965863.315.273.48442.20.571.4

Overall with glove-100-angular on the selected build parameters, there are definite speedups on build times for HNSW indexes, and we do see improvements in throughput/latency. For this specific dataset, I’d recommend rerunning it with different HNSW build parameters to see if we can improve query performance numbers at higher levels of recall, but that’s an experiment for another day.

Where do we go from here?

It’s been quite a year for pgvector on many fronts, not to say the least the many people who are already building amazing apps with it today! A “billion-scale” vector storage problem is attainable with pgvector today, much of this attributed to the work of the last year. And while I can’t say enough about the work Andrew Kane has done on pgvector, I do want to give mentions to Heikki Linnakangas, Nathan Bossart, Pavel Borisov, and Arda Aytekin who all made contributions to improve pgvector performance (and apologies if I missed someone).

However, much like the almost 40-year-old database PostgreSQL, there are still ways pgvector can continue to grow. I’m going to talk more in depth about some longer term goals to better support vector workloads with pgvector and PostgreSQL at PGConf.dev 2024, but I’ll give a brief preview here.

Over the past year, pgvector has made significant gains across the board in index build times, index sizes, throughput, and latency, particularly on vector queries over an entire vector data set. Simplifying filtering (aka the WHERE clause) - pgvector and PostgreSQL already support this, but there are some areas we can make it easier and more efficient. Additionally, there are other search patterns that are gaining popularity, such as “hybrid search” like using simultaneously vector similarity search and fulltext search to return results. Again, this is something already supported by PostgreSQL natively, but there are areas we can simplify this process with pgvector. We’re seeing more work in pgvector to support hardware acceleration; this combined with further optimizations on And finally, there are some areas of PostgresQL we can prove to better support distributed pgvector workloads, but I’ll still emphasize that most workloads that involve PostgreSQL and pgvector will scale vertically (which means showing more concurrency testing!).

We’ll also have to see how vector search workloads evolve, as that will also dictate what new features we’ll see in pgvector. Please keep giving feedback on what you’re building with pgvector and how your experience is - as that is how we can continue to make the project better!