We all know and value SQL functions as a handy shortcut. PostgreSQL v14 has introduced a new, better way to write SQL functions. This article will show the advantages of the new syntax.

An example of an SQL function

Let’s create a simple example of an SQL function with the “classical” syntax so that we have some material for demonstrations:

CREATE EXTENSION unaccent;

CREATE FUNCTION mangle(t text) RETURNS text
   LANGUAGE sql
   AS 'SELECT lower(unaccent(t))';

You can use the new function like other database functions:

SELECT mangle('Schön dumm');

   mangle   
════════════
 schon dumm
(1 row)

Why SQL functions?

You may ask what good an SQL function is. After all, the main purpose of a database function is to be able to run procedural code inside the database, something you cannot do with SQL. But SQL functions have their use:

  • code reuse for expressions frequently used in different SQL statements
  • to make SQL statements more readable by factoring out part of the code into a function with a meaningful name
  • whenever you need a function for syntactical reasons, like in CREATE AGGREGATE or CREATE OPERATOR

Moreover, simple SQL functions can be inlined, that is, the optimizer can replace the function call with the function definition at query planning time. This can make SQL functions singularly efficient:

  • it removes the overhead of an actual function call
  • since functions are (mostly) black boxes to the optimizer, replacing the function with its definition usually gives you better estimates

We can see function inlining if we use EXPLAIN (VERBOSE) on our example function:

EXPLAIN (VERBOSE, COSTS OFF) SELECT mangle('Schön dumm');

                  QUERY PLAN                   
═══════════════════════════════════════════════
 Result
   Output: lower(unaccent('Schön dumm'::text))
(2 rows)

Shortcomings of PostgreSQL functions

PostgreSQL functions are great. One of the nice aspects is that you are not restricted to a single programming language. Out of the box, PostgreSQL supports functions written in SQL, C, PL/pgSQL (a clone of Oracle’s PL/SQL), Perl, Python and Tcl. But that is not all: in PostgreSQL, you can write a plugin that allows you to use any language of your choice inside the database. To allow that flexibility, the function body of a PostgreSQL function is simply a string constant that the call handler of the procedural language interprets when PostgreSQL executes the function. This has some undesirable side effects:

Lack of dependency tracking

Usually, PostgreSQL tracks dependencies between database objects in the pg_depend and pg_shdepend catalog tables. That way, the database knows the relationships between objects: it will either prevent you from dropping objects on which other objects depend (like a table with a foreign key reference) or drop dependent objects automatically (like dropping a table drops all indexes on the table).

Since the body of a function is just a string constant that PostgreSQL cannot interpret, it won’t track dependencies between a function and objects used in the function. A procedural language can provide a validator that checks the function body for syntactic correctness (if check_function_bodies = on). The validator can also test if the objects referenced in the function exist, but it cannot keep you from later dropping an object used by the function.

Let’s demonstrate that with our example:

DROP EXTENSION unaccent;

SELECT mangle('boom');
ERROR:  function unaccent(text) does not exist
LINE 1: SELECT lower(unaccent(t))
                     ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
QUERY:  SELECT lower(unaccent(t))
CONTEXT:  SQL function "mangle" during inlining

We will fix the problem by creating the extension again. However, it would be better to get an error message when we run DROP EXTENSION without using the CASCADE option.

search_path as a security problem

Since PostgreSQL parses the function body at query execution time, it uses the current setting of search_path to resolve all references to database objects that are not qualified with the schema name. That is not limited to tables and views, but also extends to functions and operators. We can use our example function to demonstrate the problem:

SET search_path = pg_catalog;

SELECT public.mangle('boom');
ERROR:  function unaccent(text) does not exist
LINE 1: SELECT lower(unaccent(t))
                     ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
QUERY:  SELECT lower(unaccent(t))
CONTEXT:  SQL function "mangle" during inlining

In our example, it is a mere annoyance that we can avoid by using public.unaccent() in the function call. But it can be worse than that, particularly with SECURITY DEFINER functions. Since it is cumbersome to schema-qualify each function and operator, the recommended solution is to force a search_path on the function:

ALTER FUNCTION mangle(text) SET search_path = public;

Note that the schemas on the search_path should allow CREATE only to privileged users, so the above is not a good idea on versions older than v15!

An unpleasant downside of setting a search_path is that it prevents the inlining of the SQL function.

The new SQL function syntax in PostgreSQL v14

From PostgreSQL v14 on, the body of SQL functions and procedures need no longer be a string constant. You can now use one of the following forms for the function body:

CREATE FUNCTION function_name(...) RETURNS ...
RETURN expression;

CREATE FUNCTION function_name(...) RETURNS ...
BEGIN ATOMIC
   statement;
   ...
END;

The first form requires the function body to be an expression. So if you want to perform a query, you have to wrap it in parentheses (turning it into a subquery, which is a valid expression). For example:

CREATE FUNCTION get_data(v_id bigint) RETURNS text
RETURN (SELECT value FROM data WHERE is = v_id);

The second form allows you to write a function with more than one SQL statement. As it used to be with multi-statement SQL functions, the result of the function will be the result of the final SQL statement. You can also use the second form of the new syntax to create SQL procedures. The first form is obviously not suitable for a procedure, since procedures don’t have a return value.

We can easily rewrite our example function to use the new syntax:

CREATE OR REPLACE FUNCTION mangle(t text) RETURNS text
RETURN lower(unaccent(t));

Note that these new SQL functions can be inlined into SQL statements just like the old ones!

Advantages of the new SQL function syntax

The main difference is that the new-style SQL functions and procedures are parsed at function definition time and stored in parsed form in the prosqlbody column of the pg_proc system catalog. As a consequence, the two shortcomings noted above are gone:

Dependency tracking with new-style SQL functions

Because the function body is available in parsed form, PostgreSQL can track dependencies. Let’s try that with our redefined example function:

DROP EXTENSION unaccent;
ERROR:  cannot drop extension unaccent because other objects depend on it
DETAIL:  function mangle(text) depends on function unaccent(text)
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Fixed search_path with new-style SQL functions

search_path is only relevant when SQL is parsed. Since this now happens when CREATE FUNCTION runs, we don’t have to worry about the current setting of that parameter at function execution time:

SET search_path = pg_catalog;

SELECT public.mangle('Schön besser');

    mangle    
══════════════
 schon besser
(1 row)

Problems with interactive clients

You may notice that the multi-statement form for defining SQL functions contains semicolons to terminate the SQL statements. That will not only confuse the usual suspects like HeidiSQL (which never learned dollar quoting), but it will be a problem for any client that recognizes semicolons as separator between SQL statements. Even older versions of psql have a problem with that syntax:

psql (13.7, server 15beta2)
WARNING: psql major version 13, server major version 15.
         Some psql features might not work.
Type "help" for help.

test=> CREATE FUNCTION tryme() RETURNS integer
BEGIN ATOMIC
   SELECT 42;
END;
ERROR:  syntax error at end of input
LINE 3:    SELECT 42;
                     ^
WARNING:  there is no transaction in progress
COMMIT

psql thinks that the semicolon after “SELECT 42” terminates the CREATE FUNCTION statement. The truncated statement causes an error. The final END is treated as its own statement, which is a synonym for COMMIT and causes a warning.

In v14 and above, psql handles such statements correctly. pgAdmin 4 has learned the new syntax with version 6.3. But I am sure that there are many clients out there that have not got the message yet.

Conclusion

The new syntax for SQL function introduced by PostgreSQL v14 has great advantages for usability and security. Get a client that supports the new syntax and start using it for your SQL functions. You should consider rewriting your existing functions to make use of these benefits.

Read another great post to increase your PostgreSQL syntax savvy: my post on Cross Join in PostgreSQL.

Find out more about how to get the most performance out of your PostgreSQL database with Hans’ post on how to find and fix a missing index.