Commenting your Postgres database

Written by Craig Kerstiens
October 17, 2018

At Citus whether it's looking at our own data or helping a customer debug a query I end up writing a lot of SQL. When I do write SQL I do my best to make sure it's readable in case others need to come along and understand or modify, but admittedly I do have some bad habits from time to time such as using implicit joins. Regardless of my bad habits I still try to make my SQL and database as easy to understand for someone not already familiar with it. One of the biggest tools for that is comments.

Even early on in learning to program we take advantage of comments to explain and describe what our code is doing, even in times when it seems obvious. I see this less commonly in SQL and databases, which is a shame because data is just as valuable so making it easier to reason and work with seems logical. Postgres has a few great mechanisms you can start leveraging when it comes to commenting so you can better document things.

Inline commenting your queries

The place I most use comments is in larger queries I write. Yes, [common table expressions] can be an optimization fence, but they also allow you to create building blocks within your SQL making a query easier to understand. Though you shouldn't stop there. Within SQL you can have a line comment when you preface it with --. You can do this at the start of a line, or at the end with what follows becoming a comment. We can see this in action on our query example from our earlier CTE blog post:

-- Getting a list of all opportunities opened longer than 30 days ago, but earlier than 60 days ago
WITH opp_list AS (
  SELECT opportunities.id as opportunity_id,
         account_id,
         accounts.name as account_name,
         opportunities.amount as opportunity_amount,
         opportunities.created_at as opportunity_created
  FROM opportunities,
       accounts
  WHERE opportunities.created_at <= now() - '30 days'::interval
    AND opportunities.created_at >= now() - '60 days'::interval
    AND opportunities.account_id = accounts.id
),

-- Get a list of all contacts we've contacted in the last 30 days
recently_contacted AS (
  SELECT email,
         account_id,
         last_contacted
  FROM contacts
  WHERE contacts.last_contacted >= now() - '30 days'::interval
)

-- Find list of recent lists and each person who contacted them recently to show active work
SELECT account_name,
       opportunity_created,
       email,
       last_contacted
FROM opp_list
JOIN recently_contacted
       ON opp_list.account_id = recently_contacted.account_id

Our example above may seem straight-forward, but when you begin writing longer SQL and have a query that is 100+ lines anyone that comes to read your queries, including future you, and finds comments will greatly thank you.

Commenting your schema

Commenting 100+ line queries is great, but to be able to write a query against your database you need to be able to understand it. You could do this by reading through the application code to see where joins happen and the relationship between models, or perhaps you have great referential integrity in place. Another option is to document the schema directly within the database itself. You can do this by using the COMMENT command within Postgres.

Within Postgres you can COMMENT either a table or a column within a table:

COMMENT ON TABLE opportunities IS 'Sales qualified opportunities';
COMMENT ON COLUMN opportunities.deleted_at IS 'Logically but not physically deleted opportunities, should be often filtered in querying'

Now when you describe a table with \d+ you'll see your comments included within the output:


\d+ opportunities

                                Table "public.opportunities"
   Column   |            Type             | ... | Description
------------+-----------------------------+-...-+-----------------------------------------
 id         | integer                     | ... | 
 amount     | integer                     | ... | 
 account    | integer                     | ... | 
 created_at | timestamp without time zone | ... | 
 deleted_at | timestamp without time zone | ... | Logically but not physically deleted...
Indexes:
    "idx_user_created" btree (date_trunc('day'::text, created_at))
Has OIDs: no

Comments aren't just for code

Data is extremely important for powering your business and delivering insights. But your data has to be intrepreted and understood correctly for it to be valuable. Comments help us understand what our application code is doing, and database comments provide the equivalent understanding of how things are structured about our data. The next time you add a column to your database, or write a complex report considering using comments to leave behind some helping guidance for future you.

Craig Kerstiens

Written by Craig Kerstiens

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.