A journey through the infinite combinations of processes and architectures that PostgreSQL and its open source ecosystem provide within disaster recovery and high availability to help you achieve peace of mind. Starting from zero.
5. FROM 0 TO ~100:
BUSINESS
CONTINUITY WITH
POSTGRESQL
Gabriele Bartolini
Head of Support @ 2ndQuadrant
PgDay.IT 2017, Milan
6. 2ndquadrant.com
@_GBartolini_ #PGDayIT
ABOUT MYSELF
▸ Open Source passionate and programmer since 1995
▸ First time with Postgres in 1997, regular from ~2000
▸ Lean and DevOps practitioner
▸ Co-Founder of ITPUG and PostgreSQL Europe
▸ Entrepreneur, with 2ndQuadrant since 2008
▸ Co-Author of “PostgreSQL Administration Cookbook”
▸ Came up with the name “Barman”
10. 2ndquadrant.com
@_GBartolini_ #PGDayIT
SOME NOTES FOR THIS PRESENTATION
▸ PostgreSQL on Linux
▸ Servers can be either physical or virtual
▸ Storage must be redundant
▸ RAID is required
▸ VOLUME: redundant disk mounted on a system
20. 2ndquadrant.com
@_GBartolini_ #PGDayIT
DEFINING SOME OBJECTIVES
▸ Measure time for pg_restore
▸ RPO = backup frequency
▸ RTO = maximum time of recovery
▸ Provision another server
▸ Configure another server (automated, right?)
▸ Time to restore the last backup (measure it)
26. 2ndquadrant.com
@_GBartolini_ #PGDayIT
POSTGRESQL’S PITR
▸ Part of core (fully open source)
▸ Rebuild a cluster at a point in time
▸ From crash recovery to sync streamrep (physical/logical)
▸ RPO = 0 (zero data loss)
▸ Hot base backup, continuous WAL archiving, Recovery
▸ API
27. 2ndquadrant.com
@_GBartolini_ #PGDayIT
BASIC CONCEPTS
▸ Continuous copy of WAL data (continuous archiving)
▸ Physical base backups
▸ Recovery:
▸ copy base backup to another location
▸ recovery mode (replay of WALs until target)
28. 2ndquadrant.com
@_GBartolini_ #PGDayIT
BARMAN
▸ In this presentation: Barman 2.3
▸ Open Source (GNU GPL 3)
▸ Written in Python
▸ Developed and maintained by 2ndQuadrant
▸ Available at www.pgbarman.org
32. 2ndquadrant.com
@_GBartolini_ #PGDayIT
COPY METHOD
▸ PostgreSQL streaming
▸ Practical/Windows/Docker
▸ Rsync/SSH
▸ Incremental backup and recovery (via hard links)
▸ Parallel backup and recovery
▸ Network compression and bandwidth limitation
33. 2ndquadrant.com
@_GBartolini_ #PGDayIT
WAL SHIPPING METHOD
▸ “archiving”, through “archive_command”:
▸ RPO ~ 16MB of WAL data, or
▸ “archive_timeout”
▸ “streaming”, through streaming replication:
▸ “pg_receivewal” or “pg_receivexlog”
▸ continuous stream, RPO ~ 0
▸ PostgreSQL 9.2+ required
34. 2ndquadrant.com
@_GBartolini_ #PGDayIT
EXAMPLE FROM POSTGRESQL.CONF
archive_mode = on
wal_level = logical
max_wal_senders = 10
max_replication_slots = 10
archive_command = 'rsync -a %p
barman@HOST:/var/lib/barman/ID/incoming'
35. 2ndquadrant.com
@_GBartolini_ #PGDayIT
EXAMPLE FROM BARMAN.CONF
[angus]
description = “Angus Young database"
ssh_command = ssh postgres@angus
conninfo = user=barman-acdc dbname=postgres host=angus
retention_policy = RECOVERY WINDOW OF 6 MONTHS
copy_method = rsync
reuse_backup = link
parallel_jobs = 4
archiver = true
streaming_archiver = true
slot_name = barman_streaming_acdc
36. 2ndquadrant.com
@_GBartolini_ #PGDayIT
RECAP
▸ How do you feel now?
▸ Still: RPO = ∞ and RTO = n/a. Why?
▸ A backup is valid only if you have tested it
▸ Barman reduces backup risks, does not exclude them
▸ Systematic tests (especially custom scripts)
▸ Business risk is very high
42. 2ndquadrant.com
@_GBartolini_ #PGDayIT
EXAMPLE OF RECOVERY SCRIPT
▸ Write a bash script that:
▸ connects to a remote server via SSH
▸ stops the PostgreSQL server
▸ issues a “barman recover” with target “immediate”
▸ starts the PostgreSQL
▸ Set it as post-backup script
43. 2ndquadrant.com
@_GBartolini_ #PGDayIT
SOME FOOD FOR THOUGHT
▸ Outcomes:
▸ Systematically test your backup
▸ Measure your recovery time
▸ Identical server? This is a backup server ready to start
▸ You can use a different data centre
▸ Be creative, PostgreSQL gives you infinite freedom!
44. 2ndquadrant.com
@_GBartolini_ #PGDayIT
RECAP
▸ RPO ~ 0 (your backups work, every time)
▸ RTO = Time of reaction + Recovery time
▸ Example: RPO ~0 and RTO < 1 day
▸ Acceptable or not acceptable?
▸ Entry level architecture for business continuity
▸ Priority now: improve RTO
46. 2ndquadrant.com
@_GBartolini_ #PGDayIT
POSTGRESQL’S REPLICATION
▸ Part of core (fully open source)
▸ One master, multiple standby servers
▸ Evolution of PITR
▸ Standby server is in continuous recovery mode
▸ Hot standby (read-only)
▸ Both streaming (9.0+) and file based pulling of WAL
▸ Cascading from a standby
47. 2ndquadrant.com
@_GBartolini_ #PGDayIT
SYNCHRONOUS REPLICATION
▸ Fine control (from global down to transaction level)
▸ 2-safe replication
▸ COMMIT of a write transactions waits until written on
both the master and a standby (or more from 9.6)
▸ More than a synchronous client is required
▸ Read consistency of a cluster
▸ RPO = 0 (zero data loss)
51. 2ndquadrant.com
@_GBartolini_ #PGDayIT
SWITCHOVER (PLANNED)
▸ Applications are paused (start of downtime)
▸ Shut down the master
▸ Allow the standby to catch up with the master
▸ Promote the standby
▸ Switch virtual IPs
▸ Resume applications (end of downtime)
▸ Reconfigure the former master as standby
54. 2ndquadrant.com
@_GBartolini_ #PGDayIT
RECAP
▸ RPO ~ 0 (your backups work, every time)
▸ RTO = Time of reaction + Time of promotion
▸ Criticality: manual intervention
▸ Reliable monitoring
▸ Trained people (practice & docs!)
55. 2ndquadrant.com
@_GBartolini_ #PGDayIT
MANUAL FAILOVER VS AUTOMATED FAILOVER
▸ Risk management
▸ Split brain nightmare
▸ Automated is built on manual (test!)
▸ Your choice
▸ Very good solution for business continuity
▸ Uptime > 99.99% in a year
58. 2ndquadrant.com
@_GBartolini_ #PGDayIT
SYNCHRONOUS REPLICATION
▸ Primary: Barman
▸ Zero data loss backup
▸ Primary: Standby
▸ Zero data loss cluster (reduce RTO)
▸ Just one configuration line in PostgreSQL
▸ synchronous_standby_names = '1 (ha, barman_receive_wal)'
59. ~100.
TWO POSTGRES SYNC SERVERS
+ ONE BARMAN SERVER
+ ONE RECOVERY SERVER
+ REPMGR (AUTO-FAILOVER)
63. 2ndquadrant.com
@_GBartolini_ #PGDayIT
CONCLUSIONS
▸ Babysteps and KISS
▸ New? Explore and learn
▸ Practice is the only way to mastery (drills)
▸ Plan regular healthy downtimes
▸ Use switchovers to perform PostgreSQL updates
▸ Smart downtimes increase long-term uptime
65. 2ndquadrant.com
@_GBartolini_ #PGDayIT
LICENCE
Attribution 4.0 International (CC BY 4.0)
You are free to:
▸ Share — copy and redistribute the material in any medium or
format
▸ Adapt — remix, transform, and build upon the material for any
purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow
the license terms.