Searching Serialized Fields in Rails Using Postgres Arrays

Ryan Stenberg, Former Developer

Article Category: #Code

Posted on

In Rails, ActiveRecord allows you store and retrieve data structures like arrays and hashes in a single string or text-type database column through application-level serialization.

It is a common pattern with models having fields like tags or nicknames, where you want to be able to store an arbitrary list of strings:

class Post < ActiveRecord::Base
  serialize :tags
end

This is all fine n' dandy until you're like, "I want to find all posts tagged with either 'hobbits' or 'gandalf'!" Well..

simply

The database only knows it has a string, not a comma-separated list of strings. It is possible to search application-level serialized fields, but it's pretty hacky and brittle:

class Post
  scope :including_all_tags, -> (tags) { where(matching_tag_query(tags, 'AND')) }
  scope :including_any_tags, -> (tags) { where(matching_tag_query(tags, 'OR')) }

  private

  def matching_tag_query(tags, condition_separator = 'OR')
    tags.map { |tag| "(tags LIKE '%#{tag}%')" }.join(" #{condition_separator} ")
  end
end

# In use

Post.including_all_tags(['hobbits', 'gandalf'])
Post.including_any_tags(['hobbits', 'gandalf'])

This approach builds a WHERE condition and matches against the given tags, which grows and becomes increasingly inefficient as the number of searched tags increases. Also, if you have tags like gandalf and gandalf-the-grey, you'd get false positives with the above example. While passable for simple cases, there's got to be a better way! Fortunately for us, along comes Postgres.

types

PostgreSQL Arrays

Postgres has built-in array-type columns, which we can take advantage of in our Rails 4.X apps with the postgres_ext gem. What does this mean for our Rails apps? If we can store arrays in a single database column, then our database and application have the same understanding of the structure of the data. This isn't possible with the out-of-the-box serialization model. Most importantly, we can use SQL to effectively search inside our array-type columns.

With PostgresExt installed, we can add an array: true flag to string/text columns:

class PostgresHaveMyBabiesMigration < Migration
  def change
    create_table :posts do |t|
      t.text :tags, array: true
    end
  end
end

Now, the model is no longer concerned with serializing the :tags field:

class Post < ActiveRecord::Base
  # crickets
end

Searching

As mentioned, one of the primary benefits of Postgres's array-type column is the enhanced searchability through array-specific SQL. The Postgres Extensions gem unlocks a number of array operators and functions for usage in SQL strings. Let's take a look at operators in both their SQL and Arel forms. If you're unfamiliar with Arel, check out this post where I expand on its usefulness.

Overlap Operator

The overlap operator (&&) is useful when you're comparing two arrays to see if they share one or more elements in common.

In SQL:

SELECT *
FROM posts
WHERE posts.tags && '{hobbits,gandalf}'

Utilizing Arel:

Post.where(Post.arel_table[:tags].overlap(['hobbits', 'gandalf']))
# or
Post.where.overlap(tags: ['hobbits', 'gandalf'])

These queries would return any posts having either 'hobbit' or 'gandalf' as a tag.

Contains Operator

The contains operator (@>) is useful when you're looking for arrays that contain all the elements of another.

In SQL:

SELECT *
FROM posts
WHERE posts.tags @> '{hobbits,gandalf}'

Utilizing Arel:

Post.where(Post.arel_table[:tags].contains(['hobbits', 'gandalf']))
# or
Post.where.contains(tags: ['hobbits', 'gandalf'])

These queries would return any posts having both the 'hobbits' and 'gandalf' tags.

mmm

Wrapping Up

Postgres is awesome. The array extensions are magical. Instead of application-level serialization, let Postgres handle array persistance for you. It unifies the understanding of array-type data between the database and application and simplifies searching.

Related Articles