You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@qpid.apache.org by ac...@apache.org on 2014/07/10 18:23:08 UTC
svn commit: r1609495 - /qpid/trunk/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml

Author: aconway
Date: Thu Jul 10 16:23:08 2014
New Revision: 1609495

URL: http://svn.apache.org/r1609495
Log:
NO-JIRA: [C++ broker book] HA chapter: minor cleanup.

Modified:
    qpid/trunk/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml

Modified: qpid/trunk/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml?rev=1609495&r1=1609494&r2=1609495&view=diff
==============================================================================
--- qpid/trunk/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml (original)
+++ qpid/trunk/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml Thu Jul 10 16:23:08 2014
@@ -112,13 +112,21 @@ under the License.
 	message is consumed and acknowledged by a regular client before it has
 	been replicated to a backup, then it doesn't need to be replicated.
       </para>
-      <variablelist>
+      <variablelist id="ha-broker-states">
 	<title>HA Broker States</title>
 	<varlistentry>
+	  <term>Stand-alone</term>
+	  <listitem>
+	    <para>
+	      Broker is not part of a HA cluster.
+	    </para>
+	  </listitem>
+	</varlistentry>
+	<varlistentry>
 	  <term>Joining</term>
 	  <listitem>
 	    <para>
-	      Initial state of a new broker that has not yet connected to the primary.
+	      Newly started broker, not yet connected to any existing primary.
 	    </para>
 	  </listitem>
 	</varlistentry>
@@ -126,8 +134,8 @@ under the License.
 	  <term>Catch-up</term>
 	  <listitem>
 	    <para>
-	      A backup broker that is connected to the primary and catching up
-	      on queues and messages.
+	      A backup broker that is connected to the primary and downloading
+	      existing state (queues, messages etc.)
 	    </para>
 	  </listitem>
 	</varlistentry>
@@ -144,7 +152,8 @@ under the License.
 	  <term>Recovering</term>
 	  <listitem>
 	    <para>
-	      The newly-promoted primary, waiting for backups to connect and catch up.
+	      Newly-promoted primary, waiting for backups to connect and catch up.
+	      Clients can connect but they are stalled until the primary is active.
 	    </para>
 	  </listitem>
 	</varlistentry>
@@ -222,7 +231,7 @@ under the License.
     <note>
       <para>
 	Incorrect security settings are a common cause of problems when
-	getting started, see <xref linkend="ha-security"/>.	
+	getting started, see <xref linkend="ha-security"/>.
       </para>
     </note>
     <table frame="all" id="ha-broker-options">
@@ -1049,24 +1058,18 @@ link-heartbeat-interval=5
     <section id="ha-troubleshoot-total-cluster-failure">
       <title>Total cluster failure</title>
       <para>
+	Note: for definition of broker states <firstterm>joining</firstterm>,
+	<firstterm>catch-up</firstterm>, <firstterm>ready</firstterm>,
+	<firstterm>recovering</firstterm> and <firstterm>active</firstterm> see
+	<xref linkend="ha-broker-states"/>
+      </para>
+      <para>
 	The cluster can only guarantee availability as long as there is at
 	least one active primary broker or ready backup broker left alive.
 	If all the brokers fail simultaneously, the cluster will fail and
 	non-persistent data will be lost.
       </para>
       <para>
-	To explain this better, note that brokers are in one of 4 states:
-	- standalone: not part of a HA cluster - joining: newly started
-	backup, not yet joined to the cluster. - catch-up: backup has
-	connected to the primary and is downloading queues, messages etc.
-	- ready: backup is connected and actively replicating from
-	primary, it is ready to take over. - recovering: newly-promoted to
-	primary, waiting for backups to catch up before serving clients.
-	Only a single primary broker can be recovering at a time. -
-	active: serving clients, only a single primary broker can be
-	active at a time.
-      </para>
-      <para>
 	While there is an active primary broker, clients can get service.
 	If the active primary fails, one of the &quot;ready&quot; backup
 	brokers will take over, recover and become active. Note a backup
@@ -1097,27 +1100,43 @@ link-heartbeat-interval=5
 	  this:
 	</para>
 	<programlisting>
-Service Name                   Owner (Last)                   State         
-------- ----                   ----- ------                   -----         
-service:mrg33-qpidd-service    20.0.10.33                     started       
-service:mrg34-qpidd-service    20.0.10.34                     started       
-service:mrg35-qpidd-service    20.0.10.35                     started       
-service:qpidd-primary-service  (20.0.10.33)                   stopped       
+Service Name                   Owner (Last)                   State
+------- ----                   ----- ------                   -----
+service:mrg33-qpidd-service    20.0.10.33                     started
+service:mrg34-qpidd-service    20.0.10.34                     started
+service:mrg35-qpidd-service    20.0.10.35                     started
+service:qpidd-primary-service  (20.0.10.33)                   stopped
 	</programlisting>
 	<para>
 	  Eventually all brokers become stuck in &quot;joining&quot; mode,
-	  as shown by qpid-ha status --all.
+	  as shown by: <literal>qpid-ha status --all</literal>
 	</para>
 	<para>
 	  At this point you need to restart the cluster in one of the
-	  following ways: Restart the entire cluster: - In
-	  luci:<replaceable>your-cluster</replaceable>:Nodes click reboot to restart the entire
-	  cluster. - OR stop and restart the cluster with ccs --stopall;
-	  ccs --startall Restart just the Qpid services: - In
-	  luci:<replaceable>your-cluster</replaceable>:Service Groups - select all the qpidd (not
-	  primary) services, click restart - select the qpidd-primary
-	  service, click restart - OR stop the primary and qpidd services
-	  with clusvcadm, then restart (primary last)
+	  following ways:
+	  <orderedlist>
+	    <listitem><para>
+	      Restart the entire cluster:
+	      In <literal>luci:<replaceable>your-cluster</replaceable>:Nodes</literal>
+	      click reboot to restart the entire cluster
+	    </para></listitem>
+	    <listitem><para>
+	      Stop and restart the cluster with
+	      <literal>ccs --stopall; ccs --startall</literal>
+	    </para></listitem>
+	    <listitem><para>
+	      Restart just the Qpid services:In <literal>luci:<replaceable>your-cluster</replaceable>:Service Groups</literal>
+	      <orderedlist>
+		<listitem><para>Select all the qpidd (not qpidd-primary) services, click restart</para></listitem>
+		<listitem><para>Select the qpidd-primary service, click restart</para></listitem>
+	      </orderedlist>
+	    </para></listitem>
+	    <listitem><para>
+	      Stop the <literal>qpidd-primary</literal> and
+	      <literal>qpidd</literal> services with <literal>clusvcadm</literal>,
+	      then restart (qpidd-primary last)
+	    </para></listitem>
+	  </orderedlist>
 	</para>
       </section>
       <section id="ha-troubleshoot-the-cluster-reboots">



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@qpid.apache.org
For additional commands, e-mail: commits-help@qpid.apache.org