Bump Charts in Pure PostgreSQL
In a previous post I wrote about creating charts with Postgres, but let’s go beyond simple time series charts. Bump charts are 🔥 🔥 🔥 and here’s how to create them in pure PostgreSQL without any post-query transformations!
What is a bump chart, exactly?
A bump chart plots a datapoint’s rank among values instead of the value itself. For example, if you have three sets of values:
A: [123, 200, 90]
B: [100, 221, 30]
C: [12, 279, 1560]
A bump chart would plot them with these values instead.
A: [1, 3, 2]
B: [2, 2, 3]
C: [3, 1, 1]
Not every chart tool can plot in reverse order (ie. 1 at the top), and it’s easier to make 3 the highest value instead of 1 so we’ll do that here.
A: [3, 1, 2]
B: [2, 2, 1]
C: [1, 3, 3]
Bump charts are useful when you have a wide range of values, especially if one or two values dominate the entire chart.
To make them we need to define three temporary tables and use a few sub queries. In an earlier draft of this post I described each one, but it was impossible to follow. Instead, I’ll point out the information you need and then build it for you!
I assume you’re tracking an event and want to compare it across different groups.
EVENT_TABLE = The name of your event’s table
EVENT_DATE = The date column for when the event occurred
TIME_UNIT = The postgres date format for the time interval you’re using. For example, to use a daily time interval, use ‘YYYY-MM-DD’.
EVENT_GROUP = The column we’ll group by. This defines who or what we’re comparing in the bump chart (eg, user id, event type, etc,)
INNER_FILTERS = Any additional filters you want to tack on to the JOIN between the temporary date_cat table and the event table. This can help with performance by cutting down on the number of comparisons the DB needs to make.
OUTER_FILTERS = A set of WHERE clauses if you need to filter out other rows from this table. An example is j.type = ‘SQL Report Builder Usage’
SERIES_LIMIT = The limit on the number of series you want in your chart. If you want the top ten values, use 10.
ROW_LIMIT = The maximum rows you want this query to return.
Replace the bold text below with the names of tables, columns, etc. from your database.
with all_dates as
(select distinct to_char(EVENT_DATE, TIME_UNIT) as time_unit from EVENT_TABLE),
categories as
(select distinct EVENT_GROUP from EVENT_TABLE), date_cat as (select time_unit, EVENT_GROUP from all_dates, categories) select "count", EVENT_GROUP, time_unit, rank() over (partition by time_unit order by "count" asc) as "rank"
from (
select sum("count") over (
partition by EVENT_GROUP
order by time_unit rows unbounded preceding) as "count", EVENT_GROUP, time_unit from (select count(EVENT_TABLE.*) as "count", date_cat.EVENT_GROUP, time_unit
from date_cat
left join EVENT_TABLE on date_cat.time_unit=(to_char(EVENT_DATE, TIME_UNIT)) and date_cat.EVENT_GROUP = EVENT_TABLE.EVENT_GROUP INNER_FILTERS
where date_cat.EVENT_GROUP in
(select EVENT_GROUP from
(select count(*) as "count", j.EVENT_GROUP
from EVENT_TABLE j OUTER_FILTERS
GROUP BY j.EVENT_GROUP
ORDER BY "count" desc SERIES_LIMIT) as "cat")
group by date_cat.EVENT_GROUP, time_unit) as "action"
) as "catcount"
order by time_unit ROW_LIMIT;
And if you don’t want to do that yourself, I created a handy observablehq notebook to do this for you. Now get bumping!
Acknowledgements:
Huge thanks to my friend and colleague, Akash Agrawal of Nupanch for co-writing that SQL with me!