Using JSON in Postgres and Node.js

[EDIT] As of 2013-04-22, Brian Carlson has implemented JSON support! So a lot of what’s in here is wrong now :)

The easiest way to get Postgres installed on OSX is via Heroku’s Postgress.app. If you’re on linux, just install from aptitude. Working with node.js, use Brian Carlson’s wonderful Node Postgres Package.

npm install pg

You don’t need any of the fancy new postgres features to store json. You can just store it as text. As long as you don’t need to query anything against that data, you’re good to go. This is usually the case for simple activity streams:

// language: javascript;
var query = 'insert into activity (uid, type, data) values ($1, $2, $3)';
db.query(query, [ 1, 'like', JSON.stringify({ relevantId: 3 }) ], function(error, result) {
  /* .. */
});

/* ... */

// Meanwhile, in selection land

var query = 'select id, uid, type, data from activity';

db.query(query, function(error, result){
  if (error) return res.send(500);

  if (result.rows.length == 0) return res.json(result.rows);

  // Node Postgres parses results as JSON, but the JSON
  // we returned in `data` is just text.
  // So we need to parse the data object for all rows(n)
  result.rows.map(function(row){
    try {
      row.data = JSON.parse(row.data);
    } catch (e) {
      row.data = null;
    }

    return row;
  });

  res.json(result.rows);
});

Node postgres does a really good job parsing JSON for you. However, since there’s currently no real JSON type in Postgres, there’s no way to tell the driver to parse arbitrary text as JSON.

Querying Against JSON

But what if you do need to be able to query against the json? If you know your json is only one level deep (there are no sub-objects), then the easiest thing to do is store it as an hstore. The hstore type in Postgres is a simple key-val store, supporting string and number values.

-- language: sql;
create table "activity" (
  id          serial primary key
, uid         references users(id)
, type        text
, stamp       timestamp
, data        text                  -- Stores our hstore/json
);

You’ll need to create a helper function to convert objects to hstore values. Here’s a simple one that takes the first level of an object and creates an hstore value:

// language: javascript;
function hstoreValue(obj){
  var out = [], isNum = false;
  for (var key in obj){
    isNum = typeof obj[key] == 'number';
    out.push('"' + key + '"=>' + (isNum ? '"' : '') + obj[key] + (isNum ? '"' : ''));
  }
  return "'" + out.join(', ') + "'";
}

// relevantId=>5
var query = "insert into activity (type, data) values ('foobar', " + hstoreValue({ relevantId: 1 }) + )";

When using hstore, you’re just storing the data as text. When you pull it out, you cast it to hstore to query against it and you tell Postgres that you want the output format to be json.

-- language: sql;
select
  id
, uid
, type
, stamp
, to_json(data::text) as data
from activity
where data::hstore->'relevantId' = 1::text;

After running the query, the resulting data field will be valid JSON ready to be parsed. Since as of this writing, the latest stable version of Postgres doesn’t have great JSON support, you do have to do some work manually parsing each row, but as you can see from my first example, it’s really easy.

But you see the unfortunate consequence of hstores: they’re really just text. As such, all queries run against them need to be casted to text. Not terrible, but definitely not ideal.

Row to JSON and Embedding Arbitrary Joins as Sub-Objects

Suppose you have a users table and an extension table called consumers

-- users

 id | email | password
----+-------+----------
  1 | a@b.c | fjaslkfj
  2 | b@c.d | asdajdkf

-- consumers
 id | first_name | last_name
----+------------+-----------
  1 |     Bob    |   LaBlah
  2 |     Law    |    Bomb

You want a query users such that each result returns a json object with the corresponding user record embedded.

// language: javascript;
{
  id: 1
, email: 'a@b.c'
, password: 'fjaslkfj'
, consumer: {
    id: 1
  , first_name: 'Bob'
  , last_name: 'LaBlah'
  }
}

You could use the awesome row_to_json function alongside aliased sub-queries and joins:

-- language: sql;
select row_to_json(row) from (
  select users.*, c as consumer from users
  inner join (
    select * from consumers
  ) c on users.id = c.id
  where users.id = 1
) row;

That’s pretty nifty if you’re not using node-pg, which already converts the response to JSON. This query would make a little more sense if you’re using node-pg:

-- language: sql;
select users.*, row_to_json(c) as consumer from users
inner join (
  select * from consumers
) c on users.id = c.id
where users.id = 1

But node-pg doesn’t support the JSON data type yet. So, it won’t actually recognize the consumer field as being an object. It will just be a string which you can manually and safely parse. I say safely because postgres has already recognized it to be valid JSON at this point. Once node-pg supports json, we won’t have to do this silly manual parsing after the fact business.

Brand Spankin’ New JSON Support

So, in Postgres 9.3, or the devel version, you can do some really rad querying on json plus a lot of other neato functions for dealing with JSON. If you want to play with these features you need to build Postgres from the source. I’ll be doing this on my Ubuntu 12.10 server. Also, I’ve already got an installation of postgres running on port 5432, so I’m going to use port 5433.

Building Postgres Devel

Download the latest snapshot:

curl -O http://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.gz

Unpack:

gunzip postgresql-snapshot.tar.gz
tar xf postgresql-snapshot.tar

Configure and install:

cd postgresql-9.3devel/
./configure
make
sudo make install

Setup data directory:

sudo mkdir /usr/local/pgsql/data
sudo chown postgres /usr/local/pgsql/data

Initialize Postgres:

sudo su postgres
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

Ensure you got the right version!

sudo -u postgres /usr/local/pgsql/bin/postgres --version
postgres (PostgreSQL) 9.3devel

Create test database:

/usr/local/pgsql/bin/createdb -p 5433 test

Start Postgres:

/usr/local/pgsql/bin/postgres -p 5433 -D /usr/local/pgsql/data

You should see something like this:

LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

Open a new session and let’s see if you can get connected using psql.

sudo -u postgres /usr/local/pgsql/bin/psql -p 5433 -d test

KABLAMO! That’s it. Obviously this is just for dev purposes. Wouldn’t want to setup your server like this.

Now the fun begins

I created a table called activity that looks like this:

-- language: sql;
create table "activity" (
  id          serial primary key
, type        text
, stamp       timestamp
, data        json                -- Straight up JSON!
);

I put some data in it:

-- language: sql;
select * from activity;

Results:

 id |  type  |           stamp            |         data
----+--------+----------------------------+-----------------------
  1 | foobar | 2013-04-21 23:10:33.73449  | {"foo":"baz","baz":1}
  2 | foobar | 2013-04-21 23:10:47.18456  | {"foo":"bar","baz":1}
  3 | foobar | 2013-04-21 23:10:57.878797 | {"foo":"bah","baz":2}
  4 | foobar | 2013-04-21 23:12:05.994723 | {"foo":"bah","baz":2}

Now we can query against the json:

-- language: sql;
select data from activity where data->>'foo' = 'bah';


 id |  type  |           stamp            |         data
----+--------+----------------------------+-----------------------
  3 | foobar | 2013-04-21 23:10:57.878797 | {"foo":"bah","baz":2}
  4 | foobar | 2013-04-21 23:12:05.994723 | {"foo":"bah","baz":2}

Notice the double arrow syntax used. That will automatically cast the selected value to text. From there it’s easy to cast to int:

-- language: sql;
select * from activity where (data->>'baz')::int = 2;


 id |  type  |           stamp            |         data
----+--------+----------------------------+-----------------------
  3 | foobar | 2013-04-21 23:10:57.878797 | {"foo":"bah","baz":2}
  4 | foobar | 2013-04-21 23:12:05.994723 | {"foo":"bah","baz":2}

You can even order/group by json values:

-- language: sql;
select * from activity order by (data->>'baz')::int desc;

Nested object queries:

-- language: sql;
select * from activity where (data->'a'->'b'->>'c')::int = 5;

Pretty farkin’ awesome. What about existential queries?

-- language: sql;
select * from activity where data->>'foo' is not null;

Result:

ERROR:  operator does not exist: json ->> boolean
LINE 1: select * from activity where data->>'foo' is not null;
                                         ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

No bueno. Clearly the JSON support isn’t quite there for some stuff. For that sort of query, you’ll have to get creative:

-- language: sql;
select * from activity where 'foo' in (select json_object_keys(data) from activity);

The json_object_keys function looks like it could come in handy. Not sure what for, but yeah, I guess the example above is useful.

Just playing around, I found a way to put your entire query into a single column and row which is ready to JSON.parsed:

-- language: sql;
select json_agg( row_to_json(row) ) from ( select * from activity ) row;

This is pretty crazy actually. This would allow you to dump your data directly from the database query result to the response body. You wouldn’t really want to do that, but it’s kind of cool none-the-less. The json_agg function could also allow you to write a REALLY node postgres driver. Just wrap all incoming queries in:

-- language: sql;
select json_agg( row_to_json(row) ) from ( /* Your Query */ ) row;

You won’t have to deal with parsing the result. You can just use JSON.parse.

Conclusion

  • Node-PG needs JSON data-type support
  • Postgres 9.3 is awesome
  • Querying against JSON is fantastic