
TL;DR
Docker based project to run a highly available RabbitMQ cluster: https://github.com/ypereirareis/docker-rabbitmq-ha-cluster
RabbitMQ cluster
A cluster is composed of at least two RabbitMQ nodes. These nodes need to be able to communicate with each other. I strongly advise you to always have an odd number of nodes in your cluster.
Load Balancing with HAProxy

When working with a cluster the goal is to have a highly available service. So we need to dispatch requests on every running node of the cluster, and avoid sending request to failing nodes.
One way to achieve this is to use HAProxy. It’s a very light and very good tool when dealing with reverse proxy or load balancing. HAProxy allows TCP connections and redirections out of the box and works well with the AMQP protocol. With NGINX you will need to install plugins to manage AMQP connections.
The HAProxy service SHOULD NOT be run on a node of the RAbbitMQ cluster. Because if the node fails, the load balancer will fail too. And you’ll loose the ability to load balance requests on other nodes.
Configuration example
global
  log 127.0.0.1 local0
  log 127.0.0.1 local1 notice
  log-send-hostname
  maxconn 4096
  pidfile /var/run/haproxy.pid
  user haproxy
  group haproxy
  daemon
  stats socket /var/run/haproxy.stats level admin
  ssl-default-bind-options no-sslv3
  ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:AES256-GCM-SHA384:AES256-SHA256:AES256-SHA:DHE-DSS-AES128-SHA:DES-CBC3-SHA
defaults
  balance roundrobin
  log global
  mode tcp
  option redispatch
  option httplog
  option dontlognull
  option forwardfor
  timeout connect 5000
  timeout client 50000
  timeout server 50000
listen stats
  bind :1936
  mode http
  stats enable
  timeout connect 10s
  timeout client 1m
  timeout server 1m
  stats hide-version
  stats realm Haproxy\ Statistics
  stats uri /
  stats auth stats:stats
listen port_5672
  bind :5672
  mode tcp
  server rmq_rmq3_1 rmq_rmq3_1:5672 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:5672 check inter 2000 rise 2 fall 3
  server rmq_rmq1_1 rmq_rmq1_1:5672 check inter 2000 rise 2 fall 3
listen port_15672
  bind :15672
  mode tcp
  server rmq_rmq1_1 rmq_rmq1_1:15672 check inter 2000 rise 2 fall 3
frontend default_port_80
  bind :80
  reqadd X-Forwarded-Proto:\ http
  maxconn 4096
  default_backend default_service
backend default_service
  server rmq_rmq1_1 rmq_rmq1_1:25672 check inter 2000 rise 2 fall 3
  server rmq_rmq1_1 rmq_rmq1_1:4369 check inter 2000 rise 2 fall 3
  server rmq_rmq1_1 rmq_rmq1_1:9100 check inter 2000 rise 2 fall 3
  server rmq_rmq1_1 rmq_rmq1_1:9101 check inter 2000 rise 2 fall 3
  server rmq_rmq1_1 rmq_rmq1_1:9102 check inter 2000 rise 2 fall 3
  server rmq_rmq1_1 rmq_rmq1_1:9103 check inter 2000 rise 2 fall 3
  server rmq_rmq1_1 rmq_rmq1_1:9104 check inter 2000 rise 2 fall 3
  server rmq_rmq1_1 rmq_rmq1_1:9105 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:15672 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:25672 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:4369 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:9100 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:9101 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:9102 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:9103 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:9104 check inter 2000 rise 2 fall 3
  server rmq_rmq2_1 rmq_rmq2_1:9105 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:15672 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:25672 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:4369 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:9100 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:9101 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:9102 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:9103 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:9104 check inter 2000 rise 2 fall 3
  server rmq_rmq3_1 rmq_rmq3_1:9105 check inter 2000 rise 2 fall 3High availability
Nodes
To have high availability you need more than one node, and of course you need load balancing. Let’s say that 3 nodes is a good start for a simple RAbbitMQ cluster. And we need one more node for HAProxy.

Exchanges, Queues and messages mirroring and persistence
If we want a high level of resiliency, something important is to mirror everything we can on other nodes of the cluster.
We need to configure a few things:
- durable, mirrored and persistent Exchanges
- durable, mirrored and persistent Queues
- persistent and mirrored messages (read/write on disk)
Now we can have consumers and producers connected with one (or more) nodes of the cluster.
Network partition
Sometimes a node can be excluded and unreachable by the others (network failure for instance). But it’s still running and receiving / consuming messages. After a small period of time the node becomes desynchronized, and it appears a network partition. Messages in the excluded node are not in the others and vice versa.
Test and benchmark
If you want to experiment scenarios with RabbitMQ cluster, I created a docker based project for that.
https://github.com/ypereirareis/docker-rabbitmq-ha-cluster
You can experiment:
- Load Balancing
- Node failure
- Network partition
- Messages persistency
- Message NO ACK and retries
- Exchanges and queues durability and mirroring
- Polling VS pulling
- Swarrot / SwarrotBundle
- RabbitMqBundle
