Skip to main content

BAJAJ TECHNOLOGY SERVICES

Building a High Availability PostgreSQL Cluster

Building a High Availability PostgreSQL Cluster
Achieving resilient PostgreSQL with etcd, Patroni, HAProxy, and Keepalived. Setting up a seamless, high-availability cluster with automated failover and load balancing.
Oct 21, 2024 | 7 min read
Building a High Availability

In modern database infrastructures, high availability (HA) is critical for ensuring that systems remain operational during node failures, maintenance, or other disruptions. PostgreSQL, while powerful, lacks built-in tools for easily achieving HA. This is where Patroni comes in as a PostgreSQL HA solution that manages replication and failover. Combined with ETCD as the distributed key-value store and HA Proxy as the load balancer, you can build a resilient, highly available PostgreSQL cluster.

This blog will walk through setting up a highly available PostgreSQL cluster using:

  • 3 ETCD nodes (for distributed consensus)
  • 2 Patroni nodes (for PostgreSQL replication and failover)
  • HAProxy nodes (for load balancing)
  • Keepalived (for VIP failover)

What is Patroni?

Patroni is an open-source tool designed to manage PostgreSQL high-availability (HA) clusters. It provides an automated and reliable solution for database failover and replication, ensuring that your PostgreSQL setup remains resilient and available in case of node failures. Patroni helps coordinate leader election, monitor cluster health, and promote standby nodes to the primary role when necessary. Its ease of configuration and integration with existing infrastructure makes it a popular choice for managing high-availability PostgreSQL environments.

What is ETCD?

ETCD is a distributed key-value store that plays a crucial role in achieving high availability (HA) for PostgreSQL clusters. As a reliable coordination service, ETCD is often used to store configuration data and coordinate cluster nodes in systems like Patroni. In a high-availability PostgreSQL setup, ETCD helps manage leader elections, monitor cluster health, and ensure data consistency across multiple database instances.

By providing a consensus-based mechanism, ETCD ensures that only one PostgreSQL node acts as the primary (leader) at any given time, while others function as replicas. In the event of a primary node failure, ETCD facilitates the automatic promotion of a replica to the primary role, minimizing downtime and ensuring continuous availability. Its simplicity, speed, and strong consistency make it a popular choice for managing distributed PostgreSQL clusters.

What is HA PROXY?

HA Proxy is a high-performance, open-source load balancer and proxy server that plays a key role in managing high availability (HA) in PostgreSQL clusters. In a high-availability setup, HAProxy is often used to route client connections to the correct PostgreSQL node, ensuring that read and write requests are directed appropriately.

HA Proxy helps manage traffic between the application and the database cluster by automatically directing connections to the primary node for write operations and to replica nodes for read operations. If the primary PostgreSQL instance fails, HA Proxy, in combination with tools like Patroni, can seamlessly reroute traffic to a newly promoted primary, ensuring minimal downtime.

With its fast and reliable performance, HA Proxy is widely used in PostgreSQL HA setups to balance load, handle failover, and provide a single-entry point for clients, thus improving both availability and scalability.

What is Keepalived?

Keepalived  is an open-source routing software that enhances the availability of services by implementing high-availability features for systems like HAProxy, Keepalived enables multiple HAProxy instances to share a virtual IP address, ensuring that clients can always reach the load balancer, even if one instance fails.

In a typical setup, Keepalived monitors the health of the HAProxy nodes. If the primary HAProxy instance becomes unresponsive, Keepalived automatically transfers the virtual IP address to a standby instance, allowing continuous access to the backend services without manual intervention. This seamless failover capability enhances the resilience and uptime of applications, making Keepalived an essential component in high-availability architectures.

Architecture Overview

PostgreSQL_1
 

The architecture includes:

  • 3 ETCD nodes for distributed configuration management and failover handling.
  • Patroni nodes running PostgreSQL. Patroni ensures that one node acts as the leader while the other is a follower (replica).
  • HAProxy nodes that will forward database traffic to the active Patroni leader, providing load balancing and failover.
  • Keepalived runs on the HAProxy nodes to manage the VIP. If one HAProxy node fails, Keepalived automatically shifts the VIP to the surviving node.

Step by Step Demonstration for HA Setup:

Step 1: Setting Up a 3-node ETCD Cluster

The ETCD cluster will store the state of the Patroni cluster and help with leader election.

Install ETCD on all 3 ETCD nodes:

PostgreSQL_2

Configure ETCD on all three nodes:

On each node, modify the /etc/default/ETCD or /etc/ETCD/ETCD.conf file (depending on your OS).

On ETCD1, the configuration looks like this:

ETCD_NAME=node1

ETCD_DATA_DIR="/var/lib/ETCD"

ETCD_LISTEN_PEER_URLS="http://<node1-ip>:2380"

ETCD_LISTEN_CLIENT_URLS="http://<node1-ip>:2379,http://127.0.0.1:2379"

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://<node1-ip>:2380"

ETCD_ADVERTISE_CLIENT_URLS="http://<node1-ip>:2379"

ETCD_INITIAL_CLUSTER="node1=http://<node1-ip>:2380,node2=http://<node2-ip>:2380,node3=http://<node3-ip>:2380"

ETCD_INITIAL_CLUSTER_STATE="new"

ETCD_INITIAL_CLUSTER_TOKEN="ETCD-cluster-1"

Repeat the configuration on ETCD2 and ETCD3 by changing the ETCD_NAME and IPs accordingly.

Start and enable ETCD:

PostgreSQL_3

Step 2: Installing and Configuring Patroni

Install patroni on both servers

PostgreSQL_4

Configure Patroni (2 Nodes)

Each PostgreSQL node running Patroni needs a configuration file. Below is an example configuration for the first Patroni node. On both Patroni nodes, create a Patroni configuration file /etc/patroni.yml:

scope: pg_cluster

namespace: /service/

name: patroni1

 

restapi:

listen: 0.0.0.0:8008

connect_address: <patroni1_name>:8008

 

ETCD:

host: <ETCD_cluster_address>:2379

 

bootstrap:

dcs:

ttl: 30

loop_wait: 10

retry_timeout: 10

maximum_lag_on_failover: 1048576

postgresql:

use_pg_rewind:true

parameters:

wal_level: replica

hot_standby: "on"

wal_keep_segments: 8

max_wal_senders: 5

max_replication_slots: 5

synchronous_commit: "off"

initdb:

- encoding: UTF8

- data-checksums

pg_hba:

- host replication replicator 0.0.0.0/0 md5

- host all all 0.0.0.0/0 md5

users:

admin:

password: admin_pass

options:

- createrole

- createdb

postgresql:

listen: 0.0.0.0:5432

connect_address: <patroni1_name>:5432

data_dir: /var/lib/postgresql/12/main

bin_dir: /usr/lib/postgresql/12/bin

authentication:

replication:

username: replicator

password: repl_pass

superuser:

username: postgres

password: postgres_pass

parameters:

unix_socket_directories: '/var/run/postgresql'

On the second node, adjust name, connect_address, and any other node-specific details.

Start Patroni on both nodes:

sudo systemctl start patroni

Step 3 Setup HA Proxy

Install HA Proxy

PostgreSQL_5

Configure HAProxy (2 Nodes)
On both HAProxy nodes, edit /etc/haproxy/haproxy.cfg:

global

log /dev/log local0

log /dev/log local1 notice

maxconn 2000

user haproxy

group haproxy

defaults

log global

option redispatch

option tcplog

retries 3

timeout connect 10s

timeout client 1m

timeout server 1m

frontend postgresql

bind *:5432

default_backend postgresql-backend

backend postgresql-backend

balance roundrobin

option tcp-check

server postgresql1<patroni-node1-ip>:5432 check port 8008

server postgresql2 <patroni-node2-ip>:5432 check port 8008

Start HA PROXY

PostgreSQL_6

Step 4: Setup Keepalived (2 Nodes)

Install Keepalived

PostgreSQL_7

Step 4: Configure Keepalived

On both HAProxy nodes, edit /etc/keepalived/keepalived.conf to define a virtual IP (VIP) for failover.
Node1:

vrrp_script chk_haproxy {

script "pidof haproxy"

interval 2

weight 2

}

vrrp_instance VI_1 {

interface eth0

state MASTER

virtual_router_id 51

priority 101

advert_int 1

authentication {

auth_type PASS

auth_pass password123

}

virtual_ipaddress {

192.168.1.100

}

track_script {

chk_haproxy

}

}

Node 2:

vrrp_instance VI_1 {

interface eth0

state BACKUP

virtual_router_id 51

priority 100

advert_int 1

authentication {

auth_type PASS

auth_pass password123

}

virtual_ipaddress {

192.168.1.100

}

track_script {

chk_haproxy

}

}

Start and enable Keepalived:

PostgreSQL_8
Check Cluster with patronictl

patronictl list

+ Cluster: postgres-cluster (6958757123490168240) --------+----+-----------+

| Member | Host | Role | State | TL | Lag in MB |

+-----------+-------------+---------+---------+----+-----------+

| patroni1 | <patroni1_ip> | Leader | running | 3 | |

| patroni2 | <patroni2_ip> | Replica | running | 3 | 0 |

+-----------+-------------+---------+---------+----+-----------+

This confirms that the PostgreSQL cluster is running with one leader and one replica.

Conclusion

Setting up a highly available PostgreSQL cluster with ETCD, Patroni, HAProxy, and Keepalived ensures that your database remains online, even in the event of multiple node failures. This HA setup distributes responsibilities across various layers:

  • ETCD ensures distributed consensus for Patroni.
  • Patroni manages PostgreSQL replication and failover.
  • HAProxy distributes traffic between the database nodes.
  • Keepalived handles failover at the load-balancing layer with a floating VIP.

By following this guide, you can create a robust and resilient PostgreSQL environment capable of handling node failures with minimal downtime.

Written by

Biswajit Mukhopadhyay
Head - Data and AI