In modern database infrastructures, high availability (HA) is critical for ensuring that systems remain operational during node failures, maintenance, or other disruptions. PostgreSQL, while powerful, lacks built-in tools for easily achieving HA. This is where Patroni comes in as a PostgreSQL HA solution that manages replication and failover. Combined with ETCD as the distributed key-value store and HA Proxy as the load balancer, you can build a resilient, highly available PostgreSQL cluster.
This blog will walk through setting up a highly available PostgreSQL cluster using:
Patroni is an open-source tool designed to manage PostgreSQL high-availability (HA) clusters. It provides an automated and reliable solution for database failover and replication, ensuring that your PostgreSQL setup remains resilient and available in case of node failures. Patroni helps coordinate leader election, monitor cluster health, and promote standby nodes to the primary role when necessary. Its ease of configuration and integration with existing infrastructure makes it a popular choice for managing high-availability PostgreSQL environments.
ETCD is a distributed key-value store that plays a crucial role in achieving high availability (HA) for PostgreSQL clusters. As a reliable coordination service, ETCD is often used to store configuration data and coordinate cluster nodes in systems like Patroni. In a high-availability PostgreSQL setup, ETCD helps manage leader elections, monitor cluster health, and ensure data consistency across multiple database instances.
By providing a consensus-based mechanism, ETCD ensures that only one PostgreSQL node acts as the primary (leader) at any given time, while others function as replicas. In the event of a primary node failure, ETCD facilitates the automatic promotion of a replica to the primary role, minimizing downtime and ensuring continuous availability. Its simplicity, speed, and strong consistency make it a popular choice for managing distributed PostgreSQL clusters.
HA Proxy is a high-performance, open-source load balancer and proxy server that plays a key role in managing high availability (HA) in PostgreSQL clusters. In a high-availability setup, HAProxy is often used to route client connections to the correct PostgreSQL node, ensuring that read and write requests are directed appropriately.
HA Proxy helps manage traffic between the application and the database cluster by automatically directing connections to the primary node for write operations and to replica nodes for read operations. If the primary PostgreSQL instance fails, HA Proxy, in combination with tools like Patroni, can seamlessly reroute traffic to a newly promoted primary, ensuring minimal downtime.
With its fast and reliable performance, HA Proxy is widely used in PostgreSQL HA setups to balance load, handle failover, and provide a single-entry point for clients, thus improving both availability and scalability.
Keepalived is an open-source routing software that enhances the availability of services by implementing high-availability features for systems like HAProxy, Keepalived enables multiple HAProxy instances to share a virtual IP address, ensuring that clients can always reach the load balancer, even if one instance fails.
In a typical setup, Keepalived monitors the health of the HAProxy nodes. If the primary HAProxy instance becomes unresponsive, Keepalived automatically transfers the virtual IP address to a standby instance, allowing continuous access to the backend services without manual intervention. This seamless failover capability enhances the resilience and uptime of applications, making Keepalived an essential component in high-availability architectures.
The architecture includes:
The ETCD cluster will store the state of the Patroni cluster and help with leader election.
On each node, modify the /etc/default/ETCD or /etc/ETCD/ETCD.conf file (depending on your OS).
On ETCD1, the configuration looks like this:
ETCD_NAME=node1
ETCD_DATA_DIR="/var/lib/ETCD"
ETCD_LISTEN_PEER_URLS="http://<node1-ip>:2380"
ETCD_LISTEN_CLIENT_URLS="http://<node1-ip>:2379,http://127.0.0.1:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://<node1-ip>:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://<node1-ip>:2379"
ETCD_INITIAL_CLUSTER="node1=http://<node1-ip>:2380,node2=http://<node2-ip>:2380,node3=http://<node3-ip>:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="ETCD-cluster-1"
Repeat the configuration on ETCD2 and ETCD3 by changing the ETCD_NAME and IPs accordingly.
Each PostgreSQL node running Patroni needs a configuration file. Below is an example configuration for the first Patroni node. On both Patroni nodes, create a Patroni configuration file /etc/patroni.yml:
scope: pg_cluster
namespace: /service/
name: patroni1
restapi:
listen: 0.0.0.0:8008
connect_address: <patroni1_name>:8008
ETCD:
host: <ETCD_cluster_address>:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind:true
parameters:
wal_level: replica
hot_standby: "on"
wal_keep_segments: 8
max_wal_senders: 5
max_replication_slots: 5
synchronous_commit: "off"
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin_pass
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432
connect_address: <patroni1_name>:5432
data_dir: /var/lib/postgresql/12/main
bin_dir: /usr/lib/postgresql/12/bin
authentication:
replication:
username: replicator
password: repl_pass
superuser:
username: postgres
password: postgres_pass
parameters:
unix_socket_directories: '/var/run/postgresql'
On the second node, adjust name, connect_address, and any other node-specific details.
sudo systemctl start patroni
Step 3 Setup HA Proxy
Configure HAProxy (2 Nodes)
On both HAProxy nodes, edit /etc/haproxy/haproxy.cfg:
global
log /dev/log local0
log /dev/log local1 notice
maxconn 2000
user haproxy
group haproxy
defaults
log global
option redispatch
option tcplog
retries 3
timeout connect 10s
timeout client 1m
timeout server 1m
frontend postgresql
bind *:5432
default_backend postgresql-backend
backend postgresql-backend
balance roundrobin
option tcp-check
server postgresql1<patroni-node1-ip>:5432 check port 8008
server postgresql2 <patroni-node2-ip>:5432 check port 8008
Start HA PROXY
Step 4: Setup Keepalived (2 Nodes)
Install Keepalived
On both HAProxy nodes, edit /etc/keepalived/keepalived.conf to define a virtual IP (VIP) for failover.
Node1:
vrrp_script chk_haproxy {
script "pidof haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass password123
}
virtual_ipaddress {
192.168.1.100
}
track_script {
chk_haproxy
}
}
Node 2:
vrrp_instance VI_1 {
interface eth0
state BACKUP
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass password123
}
virtual_ipaddress {
192.168.1.100
}
track_script {
chk_haproxy
}
}
Start and enable Keepalived:
Check Cluster with patronictl
patronictl list
+ Cluster: postgres-cluster (6958757123490168240) --------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+-------------+---------+---------+----+-----------+
| patroni1 | <patroni1_ip> | Leader | running | 3 | |
| patroni2 | <patroni2_ip> | Replica | running | 3 | 0 |
+-----------+-------------+---------+---------+----+-----------+
This confirms that the PostgreSQL cluster is running with one leader and one replica.
Setting up a highly available PostgreSQL cluster with ETCD, Patroni, HAProxy, and Keepalived ensures that your database remains online, even in the event of multiple node failures. This HA setup distributes responsibilities across various layers:
By following this guide, you can create a robust and resilient PostgreSQL environment capable of handling node failures with minimal downtime.