Tutorial Instructions

CTEs and Window Functions

We're going to dig into SQL with the public dataset of US births from 2000-2014. But first we're going to work on creating the dataset in a more Postgres centric way with a new table and the date format:

CREATE TABLE births ( id serial, day date, births int ); INSERT INTO births (day, births) SELECT make_date(year, month, date_of_month), births FROM US_births_20002014_SSA;

Now, lets try out first query:

SELECT * FROM births LIMIT 5;

Now let’s start to roll up births by week.

SELECT date_trunc('week', day) week, sum(births) births FROM births GROUP BY 1 ORDER BY week DESC;

The output of your query should look something like this:

week                    | births
------------------------+--------
 2003-08-04 00:00:00+00 |  83402
 2002-03-04 00:00:00+00 |  77719
 2006-03-27 00:00:00+00 |  79467
 2011-09-05 00:00:00+00 |  78593
 2007-10-22 00:00:00+00 |  83645

CTEs and window functions

We propose a solution using both CTEs and window functions. A CTE is a common table expression that allows you to split a complex query into different named parts and reference those parts later in the query. The WITH statement is how you define CTEs, in our case weekly_births. A Window Function lets you divide your result set into frames (i.e. distinct groups) to and run functions to compare between frames.  In this case, we are using the lag(births, 1) OVER (ORDER BY WEEK DESC) prev_births as a window function to return the previous week’s births values. We can see this in action via this result set, which aggregates all births in a week, then collects the previous week as well:

WITH weekly_births AS ( SELECT date_trunc('week', day) week, sum(births) births FROM births GROUP BY 1 ) SELECT week, births, lag(births, 1) OVER ( ORDER BY week DESC ) prev_births FROM weekly_births;

We could add another level to the CTE to find the two weeks in this set with the greatest difference:

WITH weekly_births AS ( SELECT date_trunc('week', day) week, sum(births) births FROM births GROUP BY 1 ), weekly_with_prev AS ( SELECT week, births, lag(births, 1) OVER ( ORDER BY week DESC ) prev_births FROM weekly_births ) SELECT * FROM weekly_with_prev WHERE prev_births IS NOT NULL AND week >= '2000-01-01' ORDER BY abs(births - prev_births) DESC LIMIT 1;

Loading terminal...

Loading terminal...