A preliminary exploration of dynamic hashing in PostgreSQL

Enterprise PostgreSQL Solutions

In computing, a hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. — from wiki

In the src/backend/utils/hash/dynahash.h file of PG source code, the related functions of the Dynamic Hash table are implemented. It is used in many places of PG, and it provides higher efficiency of hash table search. At the same time, when the hash table needs to be expanded, it can be completed at a lower cost. So how is it implemented internally?

Creation of Dynamic Hash table

First of all, when creating a dynamic hash table, we must estimate the number of elements(nelem) we want to store. To be more intuitive, let’s assume that we expect to store 1000 elements. We can customize many familiar parameters through other parameters, such as hash function, hash key, and element size. We don’t care about them here.

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.

It’s very simple to think that if we are in the “ideal” situation, each bucket holds one element, we need the number of buckets(nbuckets) equal to the number of elements(nelem). But in fact, we use the next larger power 2 relative to this number as the number of buckets [1] (why do we do this? We will talk about it later). In the case of default, when nelem is 1000, nbuckets is 1024.

In the computer, frequent application of memory is less efficient than one-time application of large memory. Therefore, we introduce a new term, segment. A segment represents a collection of buckets (256 by default), the operation of applying for space for a large number of buckets will be changed into several segments. Then, we calculate that the number of segments is equal to nbuckets(1024) / 256 = 4 .

Finally, we allocate space for the hash table, and the structure of the created hash table looks like this:

As shown in the figure, we call the set of segments directory, each segment contains 256 buckets, and a total of 1024 buckets are used to store data. It is worth noting that although there are 256 segments by default, only a few segments are used to allocate space, and others will be used in capacity expansion. Also note that like bucket 769, when multiple elements are put into the same bucket, linked lists are used to connect them.

Search and insert of the hash table

The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. Given a key, the algorithm computes an index that suggests where the entry can be found:

index = f(key, array_size)

Often this is done in two steps:

hash = hashfunc(key)
index = hash % array_size

In this method, the hash is independent of the array size, and it is then reduced to an index (a number between 0 and array_size − 1) using the modulo operator (%).

Accordingly, in the implementation of PG hash table, array_size is nbuckets, it also provides the corresponding hashfunc to convert the key to the hash_value. But what may be puzzling is that it doesn’t use hash_value % nbuckets next, the pseudo-code looks like this:

bucket = hash_value & high_mask;
if (bucket_id > max_bucket)
bucket_id = bucket_id & low_mask;

Let’s not worry about the meaning of high/low_mask, let’s see their functions. Where, max_bucket and low_mask are set to nbuckets – 1 when creating the hash table, which is 1023 in the example, and high_mask is set to nbuckets * 2 – 1, which is 2047. Now let’s look at the algorithm, hash_value and high_mask perform binary and operation, while 2047’s binary is 0111 1111 1111. That is to say, we use the lower 11 bits of hash_value as the serial number of the bucket. If this value exceeds the maximum number of buckets, it will be the same as low_mask (the binary value is 0011 1111 1111) is used for binary and, that is, the lower 10 bits are used as the serial number of the bucket so that the serial number must be within the maximum range. We can also see the explanation in the wiki:

In the case that the array size is a power of two, the remainder operation is reduced to masking, which improves speed.

Now we can go back to the doubtful point [1] just now. The reason why we want to determine that the number of barrels is the nth power of 2, so our low_mask and high_mask can be used as a mask to correctly calculate the position of the bucket. However, there is still a doubt here that it seems that we only use low_mask for the above calculation is enough, when does the high_mask work?[2] This problem will be discussed after we study the expansion of the hash table.

Expansion of hash table

Now we start to talk about the core of the Dynamic Hash table, how to realize low-cost expansion.

Loosely speaking, when the number of inserted elements / the total number of buckets > fill factor, the hash table will be expanded, and the function is expand_table. Continue with the above example. At present, the maximum serial number of our bucket is 1023, so the next new one is 1024. In this example, each segment stores 256 buckets, so the new bucket should be placed in the segment with serial number 4. At present, we only have space for segments 0-3. So we applied for space in Section 4, and then max_bucket++.

But this is not enough. Although we have applied for a new segment and added a new bucket, we still need to update the low/high_mask enables elements to find buckets correctly. Via new_bucket & low_mask we can calculate which old bucket the elements that should be put in the new bucket exist in. By traversing the elements of the old bucket, move the element whose key & mask result is a new bucket to the new bucket. At this time, we can answer the question in [2]. If an element needs to be placed in a new bucket, then the binary and of it and high_mask is equal to 1024. At this time, max_bucket has been updated and does not exceed the range, so we put it in the new bucket correctly.

It is easy to know that when we continue to add a new bucket with serial number 1025 because the segment corresponding to serial number 4 has been applied for, we only need max_bucket++, and then update some elements of the old bucket to the new bucket. Next, let’s continue to consider what happens when the serial number of the new bucket reaches 2048. At this time, the serial number 2048 of the new bucket corresponds to a binary value of 1000 0000 0000, while the high_mask 2047 corresponds to a binary value of 0111 1111 1111. Obviously, we need to increase the high/low_mask to meet the new bucket calculation.

First of all, let’s get low_mask = high_mask because so far, the original low_mask to high_mask range of new buckets have been added, and all elements have been reallocated, so the original low_mask is no longer used. And high_make is equal to the binary and of new_bucket and high_mask. Through example, we can see that this operation is to add the highest bit for high_mask, the new high_mask value is 4095, and the binary value is 1111 1111 1111. So far, the logic of hash table expansion and lookup is the same as before.

summary

From the above introduction, we learned the following facts about the dynamic hash table in PG:

  1. When the hash table needs to be expanded, it is added one bucket at a time, but the memory space is applied with many buckets as a segment at one time. This avoids the performance degradation caused by frequent application space.
  2. After extending a bucket, we only need to traverse the elements in the corresponding one old bucket, and then move the elements to be moved to the new bucket. This means that the cost of adjusting elements after each expansion can also be controlled.

Through these ingenious designs, PG implements a high-performance dynamic hash table. In fact, in this blog, for the sake of brevity, most of the details have been omitted. I hope this will become an opportunity to arouse people’s desire to explore the powerful and interesting functions of PostgreSQL.