Will Postgres use my index or not?

Denis Magda
4 min readApr 11, 2024

Since joining Postgres subreddit, I spend some time daily checking for interesting questions and conversations. One of the last questions I bumped into was about Postgres indexes and sounded as follows:

Hi folks, how does this scenario works under the hood. Imagine table Person along with Id(key), Name(indexed), DateOfBirth(not indexed).

Select Name, DateOfBirth
from Person
where Name=’Jon’ and DateOfBirth = ‘1.1.2000’

I got confused here, how indexed and non-indexed columns work together, does the db scan the actual table bcs of the non-indexed column or does it take it from the indexed one?

I had a lunch break and decided to provide a detailed answer. As a result, the answer was good enough for a post. So, let’s take a look at my findings.

Postgres will try its best to select the most efficient execution plan. It might or might not use the index. The decision is based on index selectivity, the number of records in the table, the columns that you want to retrieve, and many other factors.

Let’s do this simple experiment. First, I’m creating a sample table and populating it with 100 dummy records:

create table account(id int, name text, birthday date);

INSERT INTO account (id, name, birthday)
SELECT
id,
(ARRAY[
'John', 'Emma', 'Olivia', 'Ava', 'Isabella',
'Sophia', 'Charlotte', 'Mia', 'Amelia', 'Harper',
'Evelyn', 'Abigail', 'Emily', 'Elizabeth', 'Mila',
'Ella', 'Avery', 'Sofia', 'Camila', 'Aria',
'Scarlett', 'Victoria', 'Madison', 'Luna', 'Grace',
'Chloe', 'Penelope', 'Layla', 'Riley', 'Zoey'
])[floor(random() * 30 + 1)::int],
('1970-01-01'::date + (random() * (CURRENT_DATE - '1970-01-01'::date))::int) AS birthday
FROM generate_series(1, 100) AS id;

Next, let’s index the name column and force Postgres to update its statistics (which Postgres reiles on to select the most appropriate execution plans) using the vacuum analyze command:

create index on account(name);

vacuum analyze;

After, let’s see how many Johns we have:

select name, birthday from account where name = 'John';

name | birthday
John | 1972-05-14
John | 1989-02-04
John | 1970-02-08
(3 rows)

Has Postgres used the index to get this data? Let’s check using the explain analyze statement:

explain analyze select name, birthday from account where name = 'John';
QUERY PLAN
Seq Scan on account (cost=0.00..2.25 rows=5 width=10)(actual time=0.060..0.069 rows=5 loops=1)
Filter: (name = 'John'::text)
Rows Removed by Filter: 95
Planning Time: 0.339 ms
Execution Time: 0.146 ms
(5 rows)

Nope, Postgres decided to ignore the index and scanned the table from start to finish (see the seq scan access method). The database will continue doing the full table scan if you add the birthday = '1989-02-04' to the condition:

explain analyze select name, birthday from account where name = 'John' and birthday = '1989-02-04';

QUERY PLAN

Seq Scan on account (cost=0.00..2.50 rows=1 width=10) (actual time=0.059..0.060 rows=0 loops=1)
Filter: ((name = 'John'::text) AND (birthday = '1989-02-04'::date))
Rows Removed by Filter: 100
Planning Time: 0.492 ms
Execution Time: 0.110 ms
(5 rows)

There are 100 records in the table and Postgres doesn’t bother doing the index scan.

Next, let’s add 10000 records to the table and see how the execution plan changes.

INSERT INTO account (id, name, birthday)
SELECT
id,
(ARRAY[
'John', 'Emma', 'Olivia', 'Ava', 'Isabella',
'Sophia', 'Charlotte', 'Mia', 'Amelia', 'Harper',
'Evelyn', 'Abigail', 'Emily', 'Elizabeth', 'Mila',
'Ella', 'Avery', 'Sofia', 'Camila', 'Aria',
'Scarlett', 'Victoria', 'Madison', 'Luna', 'Grace',
'Chloe', 'Penelope', 'Layla', 'Riley', 'Zoey'
])[floor(random() * 30 + 1)::int],
('1970-01-01'::date + (random() * (CURRENT_DATE - '1970-01-01'::date))::int) AS birthday
FROM generate_series(101, 10_000) AS id;

vacuum analyze;

Now, when I search for Johns, Postgres prefers doing the index scan because I have more records, and the index has good selectivity:

explain analyze select name, birthday from account where name = 'John';

QUERY PLAN
Bitmap Heap Scan on account (cost=7.14..67.76 rows=369 width=10) (actual time=0.100..0.602 rows=369 loops=1)
Recheck Cond: (name = 'John'::text)
Heap Blocks: exact=56
Bitmap Index Scan on account_name_idx (cost=0.00..7.05 rows=369 width=0) (actual time=0.064..0.064 rows=369 loops=1)
Index Cond: (name = 'John'::text)
Planning Time: 0.509 ms
Execution Time: 0.672 ms
(7 rows)

Reading the plan from bottom to top you can see that Postgres uses the index to build a bitmap of pages that hold records with John (Bitmap Index Scan) and then uses the bitmap to access specific pages with data in the account table (Bitmap Heap Scan). This last phase is necessary because you want Postgres to return the birthday column which is not part of the index. But if you decide that you don't need the birthday in the output and you just want to get the namecolumn, then Postgres will perform the most efficient index-only scan:

explain analyze select name from account where name = 'John';

QUERY PLAN

Index Only Scan using account_name_idx on account (cost=0.29..10.74 rows=369 width=6) (actual time=0.051..0.085 rows=369 loops=1)
Index Cond: (name = 'John'::text)
Heap Fetches: 0
Planning Time: 0.179 ms
Execution Time: 0.136 ms
(5 rows)

Finally, if I execute the query from the original question, then the execution plan should be as follows (for my data set with 10000 records):

explain analyze select name, birthday from account where name = 'John' and birthday = '1989-02-04';

QUERY PLAN

Bitmap Heap Scan on account (cost=7.05..68.59 rows=1 width=10) (actual time=0.725..0.727 rows=0 loops=1)
Recheck Cond: (name = 'John'::text)
Filter: (birthday = '1989-02-04'::date)
Rows Removed by Filter: 369
Heap Blocks: exact=56
Bitmap Index Scan on account_name_idx (cost=0.00..7.05 rows=369 width=0) (actual time=0.117..0.117 rows=369 loops=1)
Index Cond: (name = 'John'::text)
Planning Time: 0.542 ms
Execution Time: 0.813 ms
(9 rows)

Postgres will use the index on the name column to build a bitmap of pages with John and then will perform the bitmap heap scan over the table filtering out records with Johns that don't satisfy the birthday condition.

Hope it makes sense. If you’d like to explore more:

--

--