Achieve high uptime with Flexible Server PostgreSQL HA!

sridrang · ‎Jun 20 2022

Azure Database for PostgreSQL Flexible server addresses several fundamental requirements including security, availability, reliability, scalability, performance, business continuity & disaster recovery suitable to run your mission-critical workloads.

This blog focuses on the high availability (HA) aspect of the Flexible Server PostgreSQL, including two new capabilities to the high availability feature.

Ability to deploy standby in the same zone as the primary for same-zone HA.
Ability to choose the standby AZ for your zone-redundant HA.

highway two roads 1920.jpg

What is Flexible server PostgreSQL high availability architecture?

Flexible Server PostgreSQL deploys a standby server with identical compute and storage as the primary in another physical node within a region. The standby server is deployed in the same availability zone (AZ) or a different AZ depending on your HA deployment choice. Using heath monitoring and automatic failover in place, Flexible server HA configuration helps with high uptime during both planned and unplanned outages.

The Flexible server HA architecture uses PostgreSQL streaming replication technology that streams logs to the standby server in synchronous mode. The application writes and commits are first written to the primary server's write-ahead-logs (WAL) which is streamed to the standby server. Once the WAL data is persisted on the standby site, the application writes are acknowledged. This provides zero data loss capability in the event of a failover. See this documentation for more details. Currently, the standby server is not supported to run your read workload.

Flexible server PostgreSQL uses Premium managed disks (Locally redundant storage within the AZ with 3 copies of data) for storing data and logs for each server. With HA configuration, you now have six copies of data between the primary and the standby servers. That helps with providing high data resiliency and isolation. Periodic data backups (snapshots) are performed from the primary server. WAL files are continuously archived to the backup storage. Both snapshot data and WAL files backups are stored on zone-redundant storage (ZRS) in regions where AZs are supported. Otherwise, they are stored using local-redundant storage (LRS).

For detailed architecture, stead-state operations, planned and unplanned downtime experience, and HA workflow mechanisms, see the HA documentation.

What HA deployment models are available with Flexible Server PostgreSQL?

Flexible Server PostgreSQL supports two HA deployment models.

Zone-redundant HA
Same-zone HA

1. Zone-redundant HA

You can configure your server in zone-redundant HA mode in which your primary and standby servers are deployed across AZs within a region. You now have the capability to choose the AZ for your standby server. This provides more control for you to co-locate your clients and applications along with databases in both the primary and the standby AZs. Zone-redundant HA offers 99.99% of uptime SLA. See here for details.

Figure 1: Diagram of zone redundant HA architecture

2. Same-zone HA

The other HA deployment model that we recently introduced is the Same-zone HA. By choosing this option, your standby server is automatically provisioned in the same AZ as the primary. This deployment model helps with reduced writes/commits logs roundtrip latency - as the traffic is within the AZ and not across AZs (which could be up to 2ms) while still providing compute and storage isolation. This deployment model is also useful to provide redundancy in regions that don’t support AZ yet or regions that have restrictions to deploy zone-redundant HA. Same-zone HA deployment offers 99.95% of uptime SLA. See here for details.

Figure 2: Diagram of same-zone HA architecture

How do I deploy, manage, and test HA?

Deploy HA

Flexible Server Postgres provides click-button experience to deploy HA configuration. You can also choose Azure CLI or ARM/SDK/Terraform to deploy your servers. By default, HA is enabled for Memory optimized SKUs (large production workload). Once you check the HA box and choose the deployment model, the service takes care of deploying the standby server within the same AZ or across AZs depending on your choice.

Figure 3: Screenshot of HA enablement and deployment models

Figure 3: Create screen experience to choose HA deployment

In regions where AZs are not supported, the only HA deployment model available will be the same-zone HA. You will not be able to choose the AZs in those regions.

Figure 4: Screenshot of same-zone selection

Manage HA

You can also do the following post server creation which you may or may not have enabled HA:

Enable HA
Disable HA
Change the HA deployment model (requires you to first disable HA and then choose a different model)

See the how-to-guide on managing your HA server.

Figure 5: Screenshot of HA blade to disable HA

Test HA

You can also test your application connectivity to the DB server, observe the application downtime during failovers, and improve the retry mechanism using on-demand Forced failover option. This will trigger a fault in your primary server and initiates the failover workflow. You can also use the Planned failover option to bring the primary server back to the preferred AZ.

Figure 6: Screenshot to perform on-demand Forced failover

Comparing Zone-redundant HA vs Same-zone HA

*Availability Feature*	Zone-redundant HA	Same-zone HA
Standby server with synchronous replication	Yes	Yes
Server-level protection	Yes	Yes
Storage – 3x redundant copy	Yes	Yes
Compute auto-restart after a failure	Yes	Yes
Reduced downtime during scheduled maintenance with HA	Yes	Yes
AZ-level protection for compute & storage	Yes	No*

*Using the zone-redundant backup (if available in the region), you can do a point-in-time restore to a different AZ within the region.

What are the HA limitations?

See the documentation for the list of limitations when deploying HA with Flexible Server.

What about availability for non-HA servers?

Flexible server offers robust resiliency and availability capabilities for your databases even without configuring HA. You will still achieve the following benefits without incurring 2x the cost.

3x copies of data on premium managed disk with auto-repair capabilities.
Backup data on zone-redundant Azure BLOB storage. This provides zone-level resiliency where you can restore your data to another AZ in the event of your server’s AZ is down.
DB server is automatically restarted if that is down for any reason.
The compute VM is relocated automatically within the AZ due to issues such as node crash.
An uptime SLA of 99.9% is offered for non-HA deployments!! See here for details.

However, as you may have noticed, depending on the outage, you may encounter some downtime (longer RTO). For example, in the event of a node crash, until a new VM is provisioned your application will experience a downtime. This downtime may be acceptable for your test/dev environment. But for your mission-critical workloads that demand high uptime during planned and unplanned outages, it is highly recommended to deploy with HA configuration.

How about HA across regions?

Many of you asked about providing HA across regions capability. In the event of a regional fault, we term that as a Disaster recovery (DR) scenario. We currently have geo-redundant backup capability in preview and also planning to address geo-DR using asynchronous replication in future.

What are your next steps?

Learn about Flexible server
You can explore Flexible Server docs—which provide a great place to roll up your sleeves. Also visit our website to learn more about our Azure Database for PostgreSQL managed service.
If you want to know more about the business continuity aspect, you can check the options listed here.

For all your feedback, questions, and feature requests, you can always reach out to us via email at Ask Azure DB for PostgreSQL.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs