AWS Database Blog

Handle empty strings when migrating from Oracle to PostgreSQL

An Oracle-to-PostgreSQL migration in the AWS Cloud can be a multistage process with different technologies and skills involved, starting from the assessment stage to the cutover stage. AWS offers services that make PostgreSQL database deployments easy to set up, manage, and scale for the cloud in a cost-efficient manner. These services are Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition.

During the schema or code conversion phase, we can use AWS Schema Conversion Tool (AWS SCT) to convert the majority of the code. For code conversion exceptions raised by AWS SCT, we can refer to migration patterns and the Oracle to PostgreSQL migration playbook.

Converting code from Oracle to a PostgreSQL-compatible engine may involve multiple corner cases that are important for your overall migration success. Handling empty strings (”) with all NULL-compatible operators or expressions while migrating from Oracle to PostgreSQL is vital to achieve overall functional acceptance on the converted code and have a holistic impact.

Oracle databases treat empty strings and NULL as the same. However, in PostgreSQL, empty strings and NULL are different. If you’re running workloads on Oracle, you might implement procedural code with business logic, assuming that empty strings are treated as NULL when performing any comparison or NULL check. In this post, we cover different variants of handling empty strings in PostgreSQL and viable solutions during migration. We show examples to understand the overall components, impacts, and considerations of migration.

NULL vs. empty strings

Oracle databases treat NULL and empty strings as the same; you can use empty strings and NULL interchangeably in any operation. The following conditional statements in Oracle are all evaluated to NULL:

SQL> set NULL (NULL);
 
SQL> SELECT length(NULL) res FROM DUAL;
 
       RES
	----------
	  (NULL)
 
 
SQL> SELECT length('') res FROM DUAL;
 
       RES
	----------
	  (NULL)

However, in PostgreSQL, NULL means the value is unknown or not given, and an empty string is a string of 0 length. Therefore, NULL and empty strings are interpreted, represented, and operated differently. A PostgreSQL database treats empty strings as an identifiable value that is processed similar to other available characters:

postgres=> \pset null (NULL)

postgres=> SELECT length(NULL) res;

  res
--------
 (NULL)
(1 row)


postgres=> SELECT length('') res;

 res
-----
   0
(1 row)

The following table summarizes these differences.

Expression Oracle Result PostgreSQL Result
NULL = NULL NULL NULL
” = NULL NULL NULL
” = ” NULL: Empty strings are treated as NULL and therefore not comparable to anything, not even to itself TRUE: Empty strings are comparable because they’re strings of 0 length in PostgreSQL

Note the following differences between NULL and empty strings in PostgreSQL:

  • NULL can be assigned to any type, as opposed to empty strings, which aren’t compatible with date or numerical fields. See the following example code:
postgres=> select null::int, null::date;
 int4 | date
------+------
 |
(1 row)

postgres=> select ''::int;
ERROR: invalid input syntax for type integer: ""
LINE 1: select ''::int;
^

postgres=> select ''::date;
ERROR: invalid input syntax for type date: ""
LINE 1: select ''::date;
^
  • NULL is an unknown value. It doesn’t have a value, as opposed to an empty string, which is a value but an empty one.
  • Finding the size or length of NULL results in NULL. An empty string is a string of 0 length.
  • NULL = NULL results in NULL, as opposed to ”=”, which evaluates to TRUE.
  • NULL is not comparable to anything, not even to itself, whereas empty strings can be compared.

One of the biggest differences between Oracle and PostgreSQL is how NULL and empty strings operate with regards to composite unique indexes and constraints. PostgreSQL allows multiple rows with NULL in unique indexes. In contrast, Oracle restricts multiple NULL instances and empty strings in a composite unique constraint or unique index. When encountering an empty string, both Oracle and PostgreSQL throw an error for multiple entries.

The following is an example using Oracle :

SQL> CREATE TABLE test_unq(c1 integer, c2 varchar2(10));

SQL> ALTER TABLE test_unq ADD CONSTRAINT test_unq_c UNIQUE(c1, c2);

SQL> insert into test_unq values(1, null);

1 row created.

SQL> insert into test_unq values(1, null);
insert into test_unq values(1, null)
*
ERROR at line 1:
ORA-00001: unique constraint (TESTORA.IDX_TEST_UNQ) violated

Because Oracle treats empty strings as NULL, multiple entries of empty strings to unique constraints or unique indexes are violated:

SQL> insert into test_unq values(2, '');

1 row created.

SQL> insert into test_unq values(2, '');
insert into test_unq values(2, '')
*
ERROR at line 1:
ORA-00001: unique constraint (TESTORA.TEST_UNQ_C) violated

The following is the same example using PostgreSQL:

postgres=> CREATE TABLE test_unq(c1 integer, c2 character varying(10));

postgres=> ALTER TABLE test_unq ADD CONSTRAINT test_unq_c UNIQUE(c1, c2);

postgres=> insert into test_unq values(1, null);
INSERT 0 1

postgres=> insert into test_unq values(1, null);
INSERT 0 1

PostgreSQL databases treat empty strings and NULL as different. All empty strings are equal, and therefore multiple entries of empty strings to a unique constraint or unique index are violated:

postgres=> insert into test_unq values(2, '');
INSERT 0 1

postgres=> insert into test_unq values(2, '');
ERROR:  duplicate key value violates unique constraint "test_unq_c"
DETAIL:  Key (c1, c2)=(2, ) already exists.

Handling empty strings in PostgreSQL

From the preceding discussion, it’s evident that when migrating from Oracle to PostgreSQL, empty string handling is important and crucial.

Let’s create a sample table and insert data to it. You can use the following SQL statements to create a table and insert data in both Oracle and PostgreSQL:

CREATE TABLE Test_Empty_or_Null(
  tid    INTEGER,
  tname  VARCHAR(10),
  txdate DATE
);

-- insert statement #1
INSERT INTO Test_Empty_or_Null VALUES(1, 'Test1', current_date);

-- insert statement #2
INSERT INTO Test_Empty_or_Null VALUES(2, NULL, current_date);

-- insert statement #3
INSERT INTO Test_Empty_or_Null VALUES(3, '', current_date);

COMMIT;

In Oracle, because empty strings are treated as NULL, the preceding insert statements #2 and #3 will store NULL for column tname in the table. However, in PostgreSQL, the table will store NULL for the #2 and an empty string for the #3 insert statements.

You can see the difference in Oracle and PostgreSQL if rows are selected with the IS NULL or IS NOT NULL operator.

The following is the Oracle code:

SQL> SELECT * FROM Test_Empty_or_Null WHERE tname IS NULL;

     TID    TNAME      TXDATE
---------- ---------- ---------
      2 		  	  20-AUG-21
      3 		 	  20-AUG-21

The following is the PostgreSQL code:

postgres=> SELECT * FROM Test_Empty_or_Null WHERE tname IS NULL;
   tid 	| tname | txdate
   -----+-------+------------
    2 	| 	    | 2021-08-20
(1 row)

If you want to select all rows that have NULL or empty strings in PostgreSQL, you must explicitly add the empty string condition in the query’s WHERE clause:

 postgres=> SELECT * FROM Test_Empty_or_Null WHERE tname IS NULL OR tname = '';
   tid 	| tname | txdate
   -----+-------+------------
    2 	| 	    | 2021-08-20
    3 	| 	    | 2021-08-20
(2 rows)   

Optionally, you can use NULL handling functions such as COALESCE or NULLIF in PostgreSQL to deal with scenarios containing empty strings.

COALESCE function

The COALESCE function returns the first of its arguments that is not NULL. NULL is returned only if all arguments are NULL. You can adjust the WHERE tname IS NULL filter condition to WHERE coalesce(tname, '') = '' to get all rows whose tname is either empty or not known.

However, changing WHERE clauses in SQL queries to support the functionality might affect the run plan of query. The database might not optimize the plan because the particular filter is changed, and any index created on such column wouldn’t be used. There are chances that your query could be slower. In those cases, you must analyze the query run plan, revisit the indexes created, and create partial indexes if required on the table to tune the query. Query tuning is beyond the scope of this post.

The following code is an example of the COALESCE function:

PostgreSQL:

postgres=> SELECT * FROM Test_Empty_or_Null WHERE coalesce(tname, '') = '';

	    tid	| tname |   txdate
	    ----+-------+------------
         2 	| 	    |  2021-08-20
         3 	| 	    |  2021-08-20
        (2 rows)

Oracle:

SQL> SELECT * FROM Test_Empty_or_Null WHERE tname IS NULL;

             TID      TNAME      TXDATE
          ---------- ---------- ---------
             2 		             20-AUG-21
             3 		             20-AUG-21

Using COALESCE to handle IS NULL or IS NOT NULL for empty strings requires a lot of code changes. A simpler way would be to use the NULLIF function.

NULLIF function

NULLIF returns NULL if both the arguments are same. It returns the 1st argument when both the arguments are different. We use it to compare the input expression with the empty string and return NULL:

postgres=> \pset null (NULL)

postgres=> SELECT NULLIF('', '') res;
  res
--------
 (NULL)
(1 row)

We transform native PostgreSQL function with NULLIF for cases where we might encounter empty strings (”) as input. For example, if you have a condition like “tname IS NULL” where “tname” can contain NULL, then you can change to “nullif(tname, '') IS NULL” in PostgreSQL.

postgres=> SELECT * FROM Test_Empty_or_Null WHERE tname IS NULL;
	   tid 	| tname | txdate
	   -----+-------+------------
        2 	| 	    | 2021-08-20
 (1 row)

/* NULLIF is required as part of expression as PostgreSQL “IS NULL” operator do not handle empty string */

postgres=> SELECT * FROM Test_Empty_or_Null WHERE nullif(tname, '') IS NULL;
	   tid 	| tname | txdate
	   -----+-------+------------
		2 	| 	    | 2021-08-20
		3 	| 	    | 2021-08-20
(2 rows) 

With all such cases of NULL and empty strings, let’s dive deep into other database expressions or functions that require proper attention for porting Oracle procedural code to PostgreSQL.

NULL handling functions and challenges with empty strings

NULL handling functions in both Oracle and PostgreSQL work the same in any comparison or expression. However, because an empty string is another way of representing NULL in Oracle, no difference is observed when NULL handling functions are operated with NULL or empty strings. But in a PostgreSQL database, NULL handling functions in conjunction with empty strings evaluate to FALSE. This creates a challenge when migrating from Oracle to PostgreSQL to simulate the same behavior for NULL handling functions when dealing with empty strings.

The following table summarizes the overall behavior between Oracle and PostgreSQL regarding empty strings.

Input Expression Oracle Condition Evaluation PostgreSQL Condition Evaluation
” IS NULL True False
” IS NOT NULL False True
NVL(”, ‘A’) A Not available in PostgreSQL. You can use the Orafce extension NVL function. You must use NULLIF with NVL function to parse empty string as NULL.
COALESCE(”, ‘A’) A ” (empty string)
DECODE(”, NULL, ‘A’, ‘B’) A Not available in PostgreSQL. You can use the Orafce extension DECODE function. You must use NULLIF with the DECODE function to parse empty strings as NULL. Another approach is to convert DECODE to CASE.
CASE WHEN ” IS NULL True False
CASE ” WHEN NULL False False

Now that we have a better understanding of the differences between Oracle and PostgreSQL, we can walk through some of the workarounds and best practices to use during the conversion phase.

Handling IS NULL and IS NOT NULL

If you’re using conditional expressions like IS NULL and IS NOT NULL multiple times as part of procedural code in Oracle, then you must validate that they will work in PostgreSQL with respect to empty strings. Use the NULLIF function in PostgreSQL for situations where a column or variable value might become an empty string during an expression evaluation.

During code migration from your Oracle to PostgreSQL database, adding NULLIF is required only for character varying data types. This is not compatible for other data types like NUMERIC or DATE.

Handling NVL and COALESCE

PostgreSQL databases don’t have a built-in NVL function, and instead support the COALESCE function, which is ANSII compliant for NVL. The COALESCE function works perfectly fine for data types other than strings. This is because strings can accept empty strings. For situations that might accept empty strings as input, you must handle them carefully with the PostgreSQL COALESCE function.

The following is the Oracle code:

SQL> SELECT coalesce(NULL, 'A') res FROM DUAL;

  RES
----------
   A

The following is the PostgreSQL code:

postgres=> SELECT coalesce(NULL, 'A') res;
   res
  -----
   A
(1 row)    

The following example shows how the functionality of COALESCE differs with empty strings as input.

The following is the Oracle code:

SQL> SELECT coalesce('', 'A') res FROM DUAL;

   RES
----------
    A

The following is the PostgreSQL code:

postgres=> SELECT coalesce('', 'A') res;
  res
 -----

 (1 row)

Although the behavior of COALESCE in PostgreSQL conforms to the standard, it’s not always compatible based on input or variable values while migrating from Oracle. Let’s understand the workaround for COALESCE for empty string handling. The following is the PostgreSQL code:

postgres=> SELECT coalesce(nullif('', ''), 'a') res;
  res
 -----
   a
(1 row)

Orafce NVL function

Orafce is an extension supported in Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition. When you create this extension in PostgreSQL, it brings modules containing some useful functions that can help with porting your Oracle application to PostgreSQL. Orafce provides the NVL function, which you can use while migrating code from Oracle without code changes.

The following example shows how Orafce NVL works for NULL input but doesn’t treat empty strings as NULL as the Oracle database does:

postgres=> SELECT nvl(null, 'A'::text) res;
 res
-----
 A
(1 row)

postgres=> SELECT nvl('', 'A'::text) res;
 res
-----

(1 row)

Therefore, if you’re using the Orafce NVL function as part of your code migration or development, you must make sure you handle empty strings for character varying or text data types, as illustrated in the following example:

postgres=> SELECT nvl(nullif('', ''), 'A'::text) res;
 res
-----
 A
(1 row)

Mock NVL function as wrapper for character varying or text arguments

Adding the NULLIF function to each character varying or text data input in COALESCE or Orafce NVL could make the code migration process cumbersome and time-consuming. To mitigate this risk, a workaround is to create a wrapper function in PostgreSQL that parses empty strings to NULL and at the same time is performant.

The following shows a wrapper code for the NVL function with two arguments for the text type:

CREATE OR REPLACE FUNCTION nvl(text, text)
RETURNS text
LANGUAGE sql
AS $$
    select coalesce(NULLIF($1,''), NULLIF($2,''))
$$;

Let’s test the NVL wrapper function we created and see if it can handle NULL as well empty strings:

postgres=> SELECT coalesce('', 'A') res;
  res
 -----

(1 row)


postgres=> SELECT nvl(NULL, 'A') res;
  res
 -----
   A
(1 row)


postgres=> SELECT nvl('', 'A') res;
  res
 -----
   A
(1 row)

Because the empty string is possibly an input argument for the character or text data type in Oracle, you can create the preceding wrapper function to mimic similar behavior of the Oracle NVL function and expedite the code migration process.

The NVL wrapper function we created is designed for character varying or text data types only, because of empty string behavior in PostgreSQL. For other data types, you can choose between COALESCE or the Orafce NVL function.

Handling DECODE and CASE expressions

Expressions like CASE and DECODE can be part of many dynamic queries or conditional expressions, and you need to make sure it adheres to similar behavior for NULL and ” as input. See the following Oracle code:

SET serveroutput on;

DECLARE
    var  varchar2(100);
    res  varchar2(100);

BEGIN
--CASE Expression - Input as NULL
    var := NULL;
    SELECT CASE WHEN var IS NULL THEN 'NULL' 
                ELSE 'NOT NULL'
           END 
      INTO res
      FROM dual;
dbms_output.put_line('CASE Expression - Input as NULL, Result is '|| res);

--CASE Expression - Input as Empty String
    var := '';
    SELECT CASE WHEN var IS NULL THEN 'NULL' 
                ELSE 'NOT NULL'
           END 
      INTO res
      FROM dual;
dbms_output.put_line('CASE Expression - Input as empty string, Result is '|| res);

--DECODE Expression - Input as NULL
    var := NULL;
    SELECT DECODE(var, null, 'NULL', 'NOT NULL') 
      INTO res 
      FROM dual;
dbms_output.put_line('DECODE Expression - Input as NULL, Result is '|| res);

--DECODE Expression - Input as Empty String
    var := '';
    SELECT DECODE(var, null, 'NULL', 'NOT NULL') 
      INTO res 
      FROM dual;
dbms_output.put_line('DECODE Expression - Input as empty string, Result is '|| res);

END;
/

You get the following output:

CASE Expression - Input as NULL, Result is NULL
CASE Expression - Input as empty string, Result is NULL
DECODE Expression - Input as NULL, Result is NULL
DECODE Expression - Input as empty string, Result is NULL

The PostgreSQL engine offers additional functions and extensions to mitigate empty string conditional evaluation and provides the same behavior as the Oracle engine:

SET client_min_messages = debug;

DO $$
DECLARE
    var  varchar(100);
    res  varchar(100);

BEGIN
--CASE Expression - Input as NULL
    var := NULL;
    SELECT CASE WHEN var IS NULL THEN 'NULL' 
                ELSE 'NOT NULL'
           END 
      INTO res;
Raise debug using message := concat_ws('', 'CASE Expression - Input as NULL, Result is ', res);

--CASE Expression - Input as Empty String
    var := '';
    SELECT CASE WHEN var IS NULL THEN 'NULL' 
                ELSE 'NOT NULL'
           END 
      INTO res;
Raise debug using message := concat_ws('', 'CASE Expression - Input as empty string, Result is ', res);

--DECODE Expression – No DECODE function in PostgreSQL

END $$;

You get the following output:

DEBUG:  CASE Expression - Input as NULL, Result is NULL
DEBUG:  CASE Expression - Input as empty string, Result is NOT NULL

No function like DECODE exists in PostgreSQL. You need to convert to CASE expressions manually. You can also use the Orafce DECODE function.

Both DECODE and CASE expressions treat NULL and empty strings as the same in Oracle and need proper handling when converting them to PostgreSQL.

With that understanding, let’s explore various options and best practices to migrate CASE or DECODE into PostgreSQL without breaking the functionality.

Option 1: Convert DECODE as a CASE expression

You can convert all DECODE functions to traditional CASE expressions in PostgreSQL while migrating code from Oracle. But you must always remember the correct implementation of CASE when empty strings might appear as input to the CASE expression.

The following is the Oracle code:

SELECT DECODE(var, null, 'NULL', 'NOT NULL') 
    FROM DUAL;

The following is the right approach in PostgreSQL to write a searched CASE expression:

SELECT CASE WHEN nullif(var, '') IS null THEN 'NULL' 
            ELSE 'NOT NULL'
       END; 

Option 2: Use the Orafce DECODE function in PostgreSQL

Let’s test the DECODE function provided by the Orafce extension with different types of input parameters and learn how to correctly handle empty strings.

When the input is a non-NULL string, Oracle DECODE and Orafce DECODE provide the same output.

The following is the Oracle code:

SQL> SELECT DECODE('one', null, 'NULL', 'NOT NULL') res FROM DUAL;

  RES
--------
NOT NULL

The following is the PostgreSQL code:

postgres=> SELECT DECODE('one'::text, null, 'NULL', 'NOT NULL') res;
   res
----------
 NOT NULL
(1 row)

When the input is NULL, Oracle DECODE and Orafce DECODE provide the same output.

The following is the Oracle code:

SQL> SELECT DECODE(null, null, 'NULL', 'NOT NULL') res FROM DUAL;
         RES
         ----
         NULL

The following is the PostgreSQL code:

postgres=> SELECT DECODE(null::text, null, 'NULL', 'NOT NULL') res;
   res
----------
   NULL
 (1 row)

When the input is an empty string, Oracle DECODE and Orafce DECODE provide different output.

The following is the Oracle code:

SQL> SELECT DECODE('', null, 'NULL', 'NOT NULL') res FROM DUAL;
          RES
          ----
          NULL

The following is the PostgreSQL code:

postgres=> SELECT DECODE(''::text, null, 'NULL', 'NOT NULL') res;
   res
----------
NOT NULL
(1 row)

In this case, you must either convert to CASE, as described earlier, or use NULLIF with DECODE:

postgres=> SELECT DECODE(nullif('', ''), null, 'NULL', 'NOT NULL') res;
   res
----------
  NULL
 (1 row)

Conclusion

Handling empty strings and their evaluation with different operators, expressions, or functions like IS NULL, NVL, CASE, and DECODE should be an important consideration when migrating your database from Oracle to PostgreSQL. In this post, we have discussed the importance of handling empty strings and viable solutions during migration from Oracle to PostgreSQL. Our examples cover different variants of handling empty strings to understand the overall components, impacts, and considerations of migration to PostgreSQL.

If you have any questions or suggestions about this post, leave a comment.


About the Authors

Sashikanta Pattanayak works as a Lead Consultant with the Professional services team at AWS. He works with customers to build scalable, highly available and secure solutions in the AWS cloud. He specializes in homogeneous and heterogeneous database migrations.

Deepak Mahto was a Consultant with the AWS Proserve Team in India. He worked as Database Migration Lead, helping and enabling customers to migrate from commercial engines to Amazon RDS. His passion is automation and has designed and implemented multiple database or migration related tools.

Vinay Paladi is a Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist, helping and enabling customers to build high-available, cost-effective database solutions and migrate their commercial engines to AWS cloud. He is passionate about building innovative solutions to accelerate database journey to cloud.