You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@activemq.apache.org by cl...@apache.org on 2020/10/30 12:58:05 UTC
[activemq-artemis] branch master updated: ARTEMIS-1730 Adding Restart Sequence of brokers on doc

This is an automated email from the ASF dual-hosted git repository.

clebertsuconic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/activemq-artemis.git


The following commit(s) were added to refs/heads/master by this push:
     new 6772314  ARTEMIS-1730 Adding Restart Sequence of brokers on doc
     new 1014db4  This closes #2972
6772314 is described below

commit 6772314488c7ff8baa9b5fe160790dc8de588e66
Author: Shrikant Chavan <sh...@redhat.com>
AuthorDate: Wed Feb 5 12:02:32 2020 +0800

    ARTEMIS-1730 Adding Restart Sequence of brokers on doc
---
 docs/user-manual/en/.persistence.md.swp | Bin 0 -> 16384 bytes
 docs/user-manual/en/SUMMARY.md          |   1 +
 docs/user-manual/en/restart-sequence.md |  78 ++++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)

diff --git a/docs/user-manual/en/.persistence.md.swp b/docs/user-manual/en/.persistence.md.swp
new file mode 100644
index 0000000..7439333
Binary files /dev/null and b/docs/user-manual/en/.persistence.md.swp differ
diff --git a/docs/user-manual/en/SUMMARY.md b/docs/user-manual/en/SUMMARY.md
index 537c784..b5f2dd3 100644
--- a/docs/user-manual/en/SUMMARY.md
+++ b/docs/user-manual/en/SUMMARY.md
@@ -81,3 +81,4 @@
 * [Unit Testing](unit-testing.md)
 * [Troubleshooting and Performance Tuning](perf-tuning.md)
 * [Configuration Reference](configuration-index.md)
+* [Restart Sequence](restart-sequence.md)
diff --git a/docs/user-manual/en/restart-sequence.md b/docs/user-manual/en/restart-sequence.md
new file mode 100644
index 0000000..708aa8b
--- /dev/null
+++ b/docs/user-manual/en/restart-sequence.md
@@ -0,0 +1,78 @@
+# Restart Sequence
+
+Apache ActiveMQ Artemis ships with 2 architectures for providing HA features.
+The master and slave brokers can be configured either using network replication
+or using shared storage. This document will share restart sequences for the
+brokers under various circumstances when the client applications are 
+connected to it.
+
+## Restarting 1 broker at a time
+When restarting the brokers one at a time at regular intervals, it is not
+important to follow any sequence. We just need to make sure that atleast
+1 broker in the master/slave pair is live to take up the connections from 
+the client applications.
+
+#### Note on restarting
+> While restarting the brokers while the client applications are connected 
+kindly make sure that atleast one broker is always live to serve the connected 
+clients.
+
+## Completely shutting down the brokers and starting
+If there is situation that we need to completely shutdown the brokers and 
+start them again, please follow the following procedure:
+
+1. Shut down all the slave brokers.
+2. Shut down all the master brokers.
+3. Start all the master brokers.
+4. Start all the slave brokers.
+
+This sequence is particularly important in case of network replication for 
+the following reasons:
+If the master broker is shutdown first, the slave broker will come live and accept 
+all the client connections. Then when the slave broker is stopped, the clients will 
+remain connected to the last live connection i.e. slave. Now, when we start the slave 
+and master brokers, the clients will keep trying to connecting to the last connection 
+i.e. with slave and will never be able to connect until we restart the client applications. 
+To avoid the hassle of restarting of client applications, we must follow the sequence 
+as suggested above.
+
+## Split-brain situation
+The following procedure helps the cluster to recover from the split-brain situation 
+and getting the client connections auto-reconnected to the cluster.
+With this sequence, client applications do not need to be restarted in order to make 
+connection with the brokers.
+
+During the split brain situation both the master and slave brokers are live and there is 
+no replication that is happening from the master broker to the slave.
+
+In such situation, there can be some client applications that are connected to the master 
+broker and other connected to the slave broker. Now after we restart the brokers and the 
+the cluster is properly formed.
+
+Here, the clients that were connected to the master broker during the split brain situation 
+are auto-connected to the cluster and start processing the messages. But the clients that got 
+connected to the slave broker are still trying to make connection with the broker. This happens 
+because the slave broker has restarted in 'back up' mode.
+
+Thus, not all the clients get connected to the brokers and function properly.
+
+To avoid such mishap, kindly follow the below sequence:
+1. Stop the slave broker
+2. Start the slave broker. Observe the logs for the message "Waiting for the master"
+3. Stop the master broker.
+4. Start the master broker.
+   Observe the master broker logs for "Server is live"
+   Observe the slave broker logs for "backup announced"
+5. Stop the master broker again. Wait until the slave broker becomes live. Observe that all the 
+   clients are connected to the slave broker.
+6. Start the master broker. This time, all the connections will be switched to master broker again,
+
+#### Note on delta message loss on the slave broker
+
+> During the split brain situation, messages are produced on the slave broker since it is live. 
+While resolving the split brain situation, if there are some delta messages that are not produced 
+on the slave broker. Those messages cannot be auto-recovered. There will be manual intervention 
+required to retrieve the messages, sometime it is almost impossible to recover the messages.
+> The above mentioned sequence helps in forming the cluster that was broken due to split brain 
+and getting all the client applications to auto connected to the cluster without any need for 
+client applications to be restarted.