Working With Large PostgreSQL Databases

It’s a funny thing when the topic of database sizes comes up. Calling one small, medium, large, or even huge isn’t as straightforward as you’d think. Distinguishing the size of a database is based upon a number of factors whose characteristics can be classified as either “tangible”, things that you can measure in an objective manner, or “intangible”, those attributes best expressed using the catch-all phrase “it depends”. For example, a 2TB database is, to many people, a large database. On the other hand, a veteran DBA could describe a PostgreSQL database cluster as large when it enters the realm of Peta-Bytes.

Here’s a recap of some of PostgreSQL’s basic capabilities:

database size	unlimited
number of databases	4,294,950,911
relations per database	1,431,650,303
table size	32TB
rows per table, defined by the number of tuples that can fit onto the page	4,294,967,295 pages
field per table	1,600
field size	1GB
identifier length	63 bytes
indexes per table	unlimited
columns per index	32
partition keys	32

NB: Despite possible physical constraints one faces when creating large numbers of schema, there is no theoretical limitation to the number created in postgres.

I’ve come to differentiate a small database from a large one using the following caveats. And while it is true that some of the caveats for a large database can be applied to a small one, and vice-versa, the fact of the matter is that most of the setups out there in the wild follow these observations:

Small databases are often administered by a single person
Small databases can be managed manually.
Small databases are minimally tuned.
Small databases can typically tolerate production inefficiencies more than large ones.
Large databases are managed using automated tools.
Large databases must be constantly monitored and go through an active tuning life cycle.
Large databases require rapid response to imminent and ongoing production issues to maintain optimal performance.
Large databases are particularly sensitive to technical debt.

Large databases often bring up the following questions and issues:

Is the system performance especially sensitive to changes in production load?
Is the system performance especially sensitive to minor tuning effects?
Are there large amounts of data churn?
Does the database load system saturate your hardware’s capabilities?
Do the maintenance activities, such as logical backups and repacking tables, take several days or even more than a week?
Does your Disaster Recovery Protocol require having a very small Recovery Point Objective (RPO) or Recovery Time Objective (RTO)?

The key difference between a small vs large database is how they are administered:

Whereas it is common that small databases are manually administered, albeit it’s not best practice, using automation is the industry default mode of operation in many of these situations for large databases.
Because circumstances can change quickly, large databases are particularly sensitive to production issues.
Tuning is constantly evolving; while it is true that newly installed architectures are often well-tuned, circumstances change as they age and large databases are especially vulnerable.

Good planning is your friend: addressing potential issues for a large database by anticipating future conditions is the goal i.e. testing the entire infrastructure before it goes into production.

Scripting your build environment using tools such as Ansible, Puppet, Terraform, etc. mitigates human error when provisioning the underlying infrastructure. It’s important to be able to build in a consistent and repeatable manner.

Once a database is in production it must be monitored and wired with alerts for the various critical thresholds. Aside from the standard errors, consider configuring your monitoring solution to follow the “Rule Of Three”. Select and watch only three metrics that track and alert for a specific “change of state”. This is not to be confused with following a particular issue, rather it is meant to inform you that you should pay attention to your system in order to understand that something has changed from what is considered normal behavior. Depending on your preferences you may want to watch for known production issues or when the system is stable you might be more interested in trending alerts such as query performance which have slowed below a predefined threshold.

In regards to system tuning: while small databases can, after a fashion, perform in a satisfactory manner using the default values large databases cannot. Configuring initial tuning parameters such as the shared_buffers etc is de rigueur but you should also monitor the environment in order to trend issues such as for example bloat and long-term query performance. Remember, the most common problem experienced by an otherwise stable and well-thought-out architecture is table and index bloat. Addressing bloat by tuning the autovacuum characteristics is essential.

Monitoring, especially before and after maintenance windows, is required because they can catch potential problems to the update before becoming production issues.

Pay close attention to following the regular maintenance activities during the life-cycle of your system:

Logical backups and misc database redundancies
Architectural evolution:
- application stack updates, upgrades, and rollbacks
- application/database scaling
PostgreSQL server upgrades:
- minor
- major
Database migrations
Hardware upgrades
Scaling the cluster by adding and removing server nodes

Maintenance activities such as logical backups and PostgreSQL minor upgrades are performed at regular intervals.

Plan for space utilization requirements of logical dumps and WAL archives.

In regards to logical backups: it can be difficult to justify backing up an entire database when it can take a week. Alternatively, differential backups are a potential solution. Backing up tables that are updated and deleted regularly can be archived at a faster frequency than the slower changing tables which can be stored without changes for a longer period of time. This approach however requires the appropriate architectural design considerations such as using table partitioning.

An alternative to logical backups is to consider Copy On Write (COW), or stacked file systems, such as ZFS and BTRFS. Environments within containers for example can leverage snapshots and clones allowing for near-instant recoveries in a disaster recovery scenario.

Complex operations, such as hardware and database scaling, encompass many sub-activities and can often involve working with several teams at the same time. In this case, maintaining reference documentation is critical. Activities such as these are best tracked and planned in a Kanban, or Scrum, environment.

In regards to Disaster Recovery (DR) consider automating the following operations:

Recovery via Logical backups
Failover of a PRIMARY to a REPLICA
Dropping/Building a new REPLICA
Point In Time Recovery (PITR): rebuilding a cluster to a point in time

As an aside to PITR: instead of rebuilding an entire data cluster from scratch to a particular point in time, one can instead create a STANDBY host that is replicated on a delay and can be recovered to a particular point in time or promoted in its current state. Refer to run-time parameter recovery_min_apply_delay for more information.

In conclusion, while small databases can be managed by administrating in an ad hoc manner, the administration of a large database must always be performed using a more rigorous and conscientious approach. And what you learn from administering a large database can be carried over to administering a small one.

REFERENCES:

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Working With Large PostgreSQL Databases

Related

Related Blog Articles

RECOMMENDED ARTICLES

A Guide to Better Understanding MySQL Charset Levels

Why SELECT COUNT(*) FROM TABLE Is Sometimes Very Slow in MySQL or MariaDB

Partially Rolling Back a Transaction in MySQL or PostgreSQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Working With Large PostgreSQL Databases

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

A Guide to Better Understanding MySQL Charset Levels

Why SELECT COUNT(*) FROM TABLE Is Sometimes Very Slow in MySQL or MariaDB

Partially Rolling Back a Transaction in MySQL or PostgreSQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation