If you’re thinking about using a key value store for your technology stack, you shouldn’t just consider all the new kids on the block. Postgres has a some very cool tricks up its sleeve. It has its own integrated key value data type. And today we’ll go through how to write your own aggregations for postgres hstore.
The use case
At adeven our technology stack is heavily based on postgres. Our latest product adjust.io tracks downloads, clicks, impressions and many more. To see the distribution of the parameters we track, we need to implement our own aggregations.
Let’s say you have a simple test table which stores creation time along with country:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Count countries
If you want to count the number of times different countries are represented in the dataset, you can do it using the postgres hstore function.
1 2 3 4 5 6 7 8 |
|
You’re returned an hstore document with country as key and the count as value. Note that both key and value in hstores must be of type text.
This basically counts the occurrence of values in an array since array_agg(country)
returns an array of countries. And in order to not confuse your coworker, you can put that in a handy function:
1 2 3 4 5 6 7 8 9 10 11 |
|
So your query becomes simply:
1 2 3 4 5 |
|
Sum hstore
Let’s say you want to store pre-aggregated results in some other table.
1 2 3 4 5 6 7 8 9 |
|
To get the distribution of the whole table you now need to aggregate (sum) hstore documents. Luckily you can define your own aggregations in postgres; to do so, you need a function that knows how to add two hstores.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
With this you can tell postgres how to sum hstores.
1 2 3 4 5 6 |
|
Note that you can overload functions in postgres - so this sum doesn’t destroy the inbuilt postgres sum for numbers.
If you now query:
1 2 3 4 5 |
|
You get what you expect.
Conclusion
Before you think about adding a new fancy key value store with complicated map reduce queries to your toolchain try postgres hstore.