CROSS JOINS - promiscuity extreme!
© Laurenz Albe 2021

 

For many people, “cross join” is something to be afraid of. They remember the time when they forgot the join condition and the DBA was angry, because the query hogged the CPU and filled the disk. However, there are valid use cases for cross joins which I want to explore in this article.

What is a cross join?

The term comes from relational algebra, which also calls the Cartesian product between two relations cross product and denotes it by A × B. This is the most basic kind of join: it combines every row of one table with every row of another table.

A simple example:

Table ATable B
namebirthdaystreetcity
John1989-08-24Roman RoadKexborough
Paul2001-10-03Earl StreetFitzwilliam
Claude1996-04-18

The cross product of the tables would be:

A × B
namebirthdaystreetcity
John1989-08-24Roman RoadKexborough
Paul2001-10-03Roman RoadKexborough
Claude1996-04-18Roman RoadKexborough
John1989-08-24Earl StreetFitzwilliam
Paul2001-10-03Earl StreetFitzwilliam
Claude1996-04-18Earl StreetFitzwilliam

Cross join in SQL

There are two ways to write the cross join of A and B in SQL.

A comma separated list in the FROM clause:

SELECT name, birthday, street, city
FROM a, b;

With the explicit CROSS JOIN syntax:

SELECT name, birthday, street, city
FROM a CROSS JOIN b;

What is problematic about cross joins?

Cross joins are the most basic joins, and you can think of an inner join as a cross join with an additional filter condition. Of course, PostgreSQL doesn’t calculate inner joins that way. It uses more efficient join strategies.

If you write your joins using a comma separated table list (like in the first example above), an inner join and a cross join look very similar. The only difference is a WHERE condition. Now it is a frequent mistake to forget a join condition when you develop an SQL query. The result of such an omission is that you get way more result rows than you reckoned with: a cross join between two tables with a million rows each would result in a trillion rows!

Such a query will run forever. If the query contains an ORDER BY clause, the database server has to cache the whole result set in order to sort it. Since this result set doesn’t fit into memory, PostgreSQL will start writing temporary files to hold the data. These temporary files can fill up the disk. As soon as the query runs out of disk space, PostgreSQL rolls it back and deletes the temporary files. However, if the timing is bad, even a short out-of-disk condition can cause the database server to crash.

How can I protect myself from unintended cross joins?

Never use the “comma separated list” syntax to write joins!

If you write your inner joins as a JOIN b, it is a syntax error to omit the join condition (ON or USING). A cross join is then explicitly written as CROSS JOIN and cannot happen by mistake.

Valid use cases for cross joins

The above sounds pretty discouraging, but there are situations when a cross join is just what you need. In the following, I present two typical cases:

Lateral cross join

In a lateral join, a join relation (an expression or subquery) can refer to earlier entries in the FROM clause. This is very often used in combination with table functions: if you want to join a row with all the table function results that belong to it, you use a lateral join. In that case,  LATERAL already implies that each row is only joined to the function results that belong to it, so there is no need for an extra join condition.

Here is an example (not recommended for your production database!):

CREATE TABLE mytab (
   id bigint PRIMARY KEY,
   data jsonb
);

INSERT INTO mytab VALUES
   (1, '{ "key": ["five", "six"] }'),
   (2, '{ "key": ["pick", "up", "sticks"] }');

Then the arrays could be unpacked with a lateral expression like this:

SELECT mytab.id, j.elem
FROM mytab
   CROSS JOIN LATERAL
      jsonb_array_elements_text(
         mytab.data -> 'key'
      ) AS j(elem);

 id |  elem  
----+--------
  1 | five
  1 | six
  2 | pick
  2 | up
  2 | sticks
(5 rows)

Joining with a “variable”

Sometimes you have a more complicated or expensive expression that you want to use in several places with one query. In that case, it can be a good idea to write a common table expression. You have to use that CTE in the FROM clause, typically with a cross join:

WITH config AS (
   SELECT language, preview_lines
   FROM config_data
   WHERE username = current_user
)
SELECT d.line
FROM description AS d
   CROSS JOIN config AS c
WHERE d.topic = 'SQL joins'
  AND d.lang = c.language
  AND d.line_nr <= c.preview_lines
ORDER BY d.line_nr;

Conclusion

You don’t need cross joins very often, but sometimes they do come in handy.

Avoid the “comma separated list” join syntax, so that you don’t get cross joins by mistake. Such a statement can run forever and use up your database machine’s resources.

 In case you need help to run your database in the most efficient way possible, CYBERTEC offers 24/7 support services to customers around the world.