Options for scaling from 1 to 100,000 tenants

Written by Craig Kerstiens
June 28, 2018

When you first start out in building a SaaS application you talk about that day in the future when you will have scaling problems, how that'll be the day, how that would be a good problem to have. You focus on getting the first few customers, making sure they have a great experience, and suddenly you're at 10s of customers, then 100s. You've upgraded your app server to a larger one, then you've gone from one ec2 app server to multiple ones with ELB in front of things. You've upgraded your Postgres database from an r3.large on AWS, to r3.xlarge, now you're eyeing that r3.2xlarge next month. In the back of your mind though, you're starting to look at your plans for future growth of your SaaS app, and you're wondering how much larger you can keep going. Your database is performing well at 100 tenants (tenants = customers), your back of the napkin math says you'll be able to scale your app up to 1,000 tenants, but after that you know you're going to have to explore some options.

What are those options and what are the trade-offs and benefits?

Move off the cloud & back to on-premise proprietary database appliance

As you're eyeing that 488 GB of memory instance on AWS you start to compare cloud providers. On AWS the r line tops out at 488 GB. But if you migrate everything to Google you can go up to 624 GB of memory. Then you start to think about moving your entire application and well migrating clouds doesn't seem to provide the same benefit.

On premise proprietary licenses and dedicated database appliances can help you to scale up. However, dedicated appliances and proprietary solutions not only come with a super expensive price tag— it also isn't transparent in how they work, or if issues exist with within the software. Contrast this with open source database solutions that often communicate a clear roadmap and have public issue trackers: with open source databases,the chance of surprise in what you're getting are greatly reduced.

Even if you're comfortable with the initial cost of a large, proprietary scale up database appliance, the hidden cost over time is the biggest pain for companies. As you start to build to these systems you plan to build in a generic way, but before you know it you're coding to proprietary APIs and you've locked yourself into a system. That lock-in you'll have to maintain for not just a year or two, but for decades. Couple this with the cost of maintenance to maintain your database when DBAs are a hard to find and hire skill and you're paying a much higher cost than you anticipated without an end in sight.

Manually shard at the application layer

Proprietary databases and appliances are out, you're going open source... that'll keep cost down right? Now you've just got to figure out how to make your open source database (hopefully it's Postgres) actually work to keep scaling out.

Sharding PostgreSQL clearly scales, not many have disputed that, after all it worked for Google, Facebook, Instagram, Salesforce. You just have to dig in and do it, you might start with a talk from someone that's done it before like Instagram. Sharding seems straightforward enough on the surface:

  1. Re-architect your application to be sharding aware
  2. Handle edge cases for multi-node differences
  3. Expand your DevOps team to manage tens, hundreds, thousands of separate databases

As you dig in, you realize that each of these single items contains several subsections. The edge cases that are different between a single node and sharded setup alone start to become overwhelming.

  • For thinking through how your consistency model changes, you may have to rebuild logic around transactions directly into your app for things that span shards.
  • For failures, you have to account for what happens when a single node goes offline, do you allow other transactions to succeed or fail?
  • For connections, do you allow the application to connect directly to the database or do you introduce a new proxy/pooling layer.
  • How do you account for hotspots in particular tenants?

Suddenly that YouTube video of rolling your own sharding has become a yearlong engineering project that takes some of your best developers off your application and has them work on database infra rather than customer-facing features that make your SaaS app more competitive.While the ongoing cost of sharding at the application layer looks better than moving to an expensive proprietary database, the price of rolling your own sharded database (combined with the opportunity cost) is a lot more than appears at first glance.

Migrate from relational database to NoSQL

We've heard it before, relational databases don't scale, but NoSQL databases do. Many NoSQL systems were built explicitly with scale in mind so you'd expect them to scale. Though in order to compete with the existing robust ecosystem of relational databases in their lifespan they had to make major concessions early on. These concessions were around things that you take for granted when building your app in something like Rails, Django, or Play. By default with relational databases, you have proper transactions, you can add constraints to ensure referential integrity. If you don't need the following relational features, then NoSQL may work really well:

  • Constraints and referential integrity
  • Transactional guarantees
  • Ability to easily analyze your data (SQL)

If you started with NoSQL and don't need any of the above you should be in great shape. If you started with a relational database and now need scale, well then NoSQL can help if you 1. don't have the above requirements and 2. can invest in all the application changes needed to detangle your relational model.

Adopt a solution that takes care of sharding for you, at the database layer

Among all these options, sharding is the one that scales out and gives you the most flexibility. With sharding, you don't have to be locked in, and can grow from one to 100,000 tenants, leveraging the multi-tenant data model that is inherent in many B2B SaaS apps, that gives you a natural dimension on which to distribute data (the tenant id!)

The big question becomes whether you re-architect your app to shard at the application layer—or whether you leverage the Citus extension to Postgres to scale out, to take care of sharding at the database layer.

With Citus, we've seen customers that had plans to re-architect their SaaS application and roll their own sharding, as it seemed like the only viable option. As they explored the way Citus makes it simple to shard, they found they'd be able to:

  1. Stick with an open source foundation preventing lock-in
  2. Remove the need to create (and maintain) heavy in-application sharding logic
  3. Reduce their overall implementation by 70% (shipping 6 months earlier)
  4. Deliver an overall savings of $400k a year on maintenance alone

Being able to scale out your database so you can grow your SaaS app and your business shouldn't have to compete for the scarce developer resources that you need to improve your product. And that's where the Citus approach to sharding Postgres—especially for multi-tenant apps—can help.

Craig Kerstiens

Written by Craig Kerstiens

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.