You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@qpid.apache.org by ro...@apache.org on 2012/08/12 21:03:53 UTC
svn commit: r1372179 [4/18] - in /qpid/site/docs/books/0.18: ./ AMQP-Messaging-Broker-CPP-Book/ AMQP-Messaging-Broker-CPP-Book/html/ AMQP-Messaging-Broker-CPP-Book/html/css/ AMQP-Messaging-Broker-CPP-Book/html/images/ AMQP-Messaging-Broker-CPP-Book/pdf...

Added: qpid/site/docs/books/0.18/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Active_Cluster.html
URL: http://svn.apache.org/viewvc/qpid/site/docs/books/0.18/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Active_Cluster.html?rev=1372179&view=auto
==============================================================================
--- qpid/site/docs/books/0.18/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Active_Cluster.html (added)
+++ qpid/site/docs/books/0.18/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Active_Cluster.html Sun Aug 12 19:03:49 2012
@@ -0,0 +1,248 @@
+<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>1.8.Â Active-active Messaging Clusters</title><link rel="stylesheet" href="css/style.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="AMQP Messaging Broker (Implemented in C++)"><link rel="up" href="ch01.html" title="ChapterÂ 1.Â  Running the AMQP Messaging Broker"><link rel="prev" href="queue-state-replication.html" title="1.7.Â  Queue State Replication"><link rel="next" href="producer-flow-control.html" title="1.9.Â  Producer Flow Control"></head><body><div class="container" bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><DIV class="header"><DIV class="logo"><H1>Apache Qpidâ¢</H1><H2>Open Source AMQP Messaging</H2></DIV></DIV><DIV class="menu_box"><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Apache Qpid</H3><UL><LI><A href="http://qpid.apache.org/ind
 ex.html">Home</A></LI><LI><A href="http://qpid.apache.org/download.html">Download</A></LI><LI><A href="http://qpid.apache.org/getting_started.html">Getting Started</A></LI><LI><A href="http://www.apache.org/licenses/">License</A></LI><LI><A href="https://cwiki.apache.org/qpid/faq.html">FAQ</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Documentation</H3><UL><LI><A href="http://qpid.apache.org/documentation.html#doc-release">0.14 Release</A></LI><LI><A href="http://qpid.apache.org/documentation.html#doc-trunk">Trunk</A></LI><LI><A href="http://qpid.apache.org/documentation.html#doc-archives">Archive</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Community</H3><UL><LI><A href="http://qpid.apache.org/getting_involved.html">Getting Involved</A></LI><LI><A href="http://qpid.apache.org/source_repository.html">Source Repository</A></LI><LI><A h
 ref="http://qpid.apache.org/mailing_lists.html">Mailing Lists</A></LI><LI><A href="https://cwiki.apache.org/qpid/">Wiki</A></LI><LI><A href="https://issues.apache.org/jira/browse/qpid">Issue Reporting</A></LI><LI><A href="http://qpid.apache.org/people.html">People</A></LI><LI><A href="http://qpid.apache.org/acknowledgements.html">Acknowledgements</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Developers</H3><UL><LI><A href="https://cwiki.apache.org/qpid/building.html">Building Qpid</A></LI><LI><A href="https://cwiki.apache.org/qpid/developer-pages.html">Developer Pages</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>About AMQP</H3><UL><LI><A href="http://qpid.apache.org/amqp.html">What is AMQP?</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>About Apache</H3><UL><LI><A h
 ref="http://www.apache.org">Home</A></LI><LI><A href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</A></LI><LI><A href="http://www.apache.org/foundation/thanks.html">Thanks</A></LI><LI><A href="http://www.apache.org/security/">Security</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV></DIV><div class="main_text_area"><div class="main_text_area_top"></div><div class="main_text_area_body"><DIV class="breadcrumbs"><span class="breadcrumb-link"><a href="index.html">AMQP Messaging Broker (Implemented in C++)</a></span> &gt; <span class="breadcrumb-link"><a href="ch01.html">
+      Running the AMQP Messaging Broker
+    </a></span> &gt; <span class="breadcrumb-node">Active-active Messaging Clusters</span></DIV><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="chap-Messaging_User_Guide-Active_Active_Cluster"></a>1.8.Â Active-active Messaging Clusters</h2></div></div></div><p>
+    Active-active Messaging Clusters provide fault tolerance by ensuring that every broker in a <em class="firstterm">cluster</em> has the same queues, exchanges, messages, and bindings, and allowing a client to <em class="firstterm">fail over</em> to a new broker and continue without any loss of messages if the current broker fails or becomes unavailable. <em class="firstterm">Active-active</em> refers to the fact that all brokers in the cluster can actively serve clients.  Because all brokers are automatically kept in a consistent state, clients can connect to and use any broker in a cluster. Any number of messaging brokers can be run as one <em class="firstterm">cluster</em>, and brokers can be added to or removed from a cluster while it is in use.
+  </p><p>
+    High Availability Messaging Clusters are implemented using using the <a class="ulink" href="http://www.openais.org/" target="_top">OpenAIS Cluster Framework</a>.
+  </p><p>
+    An OpenAIS daemon runs on every machine in the cluster, and these daemons communicate using multicast on a particular address. Every qpidd process in a cluster joins a named group that is automatically synchronized using OpenAIS Closed Process Groups (CPG) â the qpidd processes multicast events to the named group, and CPG ensures that each qpidd process receives all the events in the same sequence. All members get an identical sequence of events, so they can all update their state consistently.
+  </p><p>
+    Two messaging brokers are in the same cluster if
+    </p><div class="orderedlist"><ol type="1"><li><p>
+	  They run on hosts in the same OpenAIS cluster; that is, OpenAIS is configured with the same mcastaddr, mcastport and bindnetaddr, and
+	</p></li><li><p>
+	  They use the same cluster name.
+	</p></li></ol></div><p>
+
+  </p><p>
+    High Availability Clustering has a cost: in order to allow each broker in a cluster to continue the work of any other broker, a cluster must replicate state for all brokers in the cluster. Because of this, the brokers in a cluster should normally be on a LAN; there should be fast and reliable connections between brokers. Even on a LAN, using multiple brokers in a cluster is somewhat slower than using a single broker without clustering. This may be counter-intuitive for people who are used to clustering in the context of High Performance Computing or High Throughput Computing, where clustering increases performance or throughput.
+  </p><p>
+    High Availability Messaging Clusters should be used together with Red Hat Clustering Services (RHCS); without RHCS, clusters are vulnerable to the "split-brain" condition, in which a network failure splits the cluster into two sub-clusters that cannot communicate with each other. See the documentation on the <span class="command"><strong>--cluster-cman</strong></span> option for details on running using RHCS with High Availability Messaging Clusters. See the <a class="ulink" href="http://sources.redhat.com/cluster/wiki" target="_top">CMAN Wiki</a> for more detail on CMAN and split-brain conditions. Use the <span class="command"><strong>--cluster-cman</strong></span> option to enable RHCS when starting the broker.
+  </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-Starting_a_Broker_in_a_Cluster"></a>1.8.1.Â Starting a Broker in a Cluster</h3></div></div></div><p>
+      Clustering is implemented using the <code class="filename">cluster.so</code> module, which is loaded by default when you start a broker. To run brokers in a cluster, make sure they all use the same OpenAIS mcastaddr, mcastport, and bindnetaddr. All brokers in a cluster must also have the same cluster name â specify the cluster name in <code class="filename">qpidd.conf</code>:
+    </p><pre class="screen">cluster-name="local_test_cluster"
+    </pre><p>
+      On RHEL6, you must create the file <code class="filename">/etc/corosync/uidgid.d/qpidd</code> to tell Corosync the name of the user running the broker.By default, the user is qpidd:
+    </p><pre class="programlisting">
+      uidgid {
+      uid: qpidd
+      gid: qpidd
+      }
+    </pre><p>
+      On RHEL5, the primary group for the process running qpidd must be the ais group. If you are running qpidd as a service, it is run as the <span class="command"><strong>qpidd</strong></span> user, which is already in the ais group. If you are running the broker from the command line, you must ensure that the primary group for the user running qpidd is ais. You can set the primary group using <span class="command"><strong>newgrp</strong></span>:
+    </p><pre class="screen">$ newgrp ais
+    </pre><p>
+      You can then run the broker from the command line, specifying the cluster name as an option.
+    </p><pre class="screen">[jonathan@localhost]$ qpidd --cluster-name="local_test_cluster"
+    </pre><p>
+      All brokers in a cluster must have identical configuration, with a few exceptions noted below. They must load the same set of plug-ins, and have matching configuration files and command line arguments. The should also have identical ACL files and SASL databases if these are used. If one broker uses persistence, all must use persistence â a mix of transient and persistent brokers is not allowed. Differences in configuration can cause brokers to exit the cluster. For instance, if different ACL settings allow a client to access a queue on broker A but not on broker B, then publishing to the queue will succeed on A and fail on B, so B will exit the cluster to prevent inconsistency.
+    </p><p>
+      The following settings can differ for brokers on a given cluster:
+    </p><div class="itemizedlist"><ul><li><p>
+	  logging options
+	</p></li><li><p>
+	  cluster-url â if set, it will be different for each broker.
+	</p></li><li><p>
+	  port â brokers can listen on different ports.
+	</p></li></ul></div><p>
+      The qpid log contains entries that record significant clustering events, e.g. when a broker becomes a member of a cluster, the membership of a cluster is changed, or an old journal is moved out of the way. For instance, the following message states that a broker has been added to a cluster as the first node:
+    </p><pre class="screen">
+      2009-07-09 18:13:41 info 127.0.0.1:1410(READY) member update: 127.0.0.1:1410(member)
+      2009-07-09 18:13:41 notice 127.0.0.1:1410(READY) first in cluster
+    </pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
+	If you are using SELinux, the qpidd process and OpenAIS must have the same SELinux context, or else SELinux must be set to permissive mode. If both qpidd and OpenAIS are run as services, they have the same SELinux context. If both OpenAIS and qpidd are run as user processes, they have the same SELinux context. If one is run as a service, and the other is run as a user process, they have different SELinux contexts.
+      </p></div><p>
+      The following options are available for clustering:
+    </p><div class="table"><a name="tabl-Messaging_User_Guide-Starting_a_Broker_in_a_Cluster-Options_for_High_Availability_Messaging_Cluster"></a><p class="title"><b>TableÂ 1.9.Â Options for High Availability Messaging Cluster</b></p><div class="table-contents"><table summary="Options for High Availability Messaging Cluster" border="1"><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th colspan="2" align="center">
+	      Options for High Availability Messaging Cluster
+	    </th></tr></thead><tbody><tr><td align="left">
+	      <span class="command"><strong>--cluster-name <em class="replaceable"><code>NAME</code></em></strong></span>
+	    </td><td align="left">
+	      Name of the Messaging Cluster to join. A Messaging Cluster consists of all brokers started with the same cluster-name and openais configuration.
+	    </td></tr><tr><td align="left">
+	      <span class="command"><strong>--cluster-size <em class="replaceable"><code>N</code></em></strong></span>
+	    </td><td align="left">
+	      Wait for at least N initial members before completing cluster initialization and serving clients. Use this option in a persistent cluster so all brokers in a persistent cluster can exchange the status of their persistent store and do consistency checks before serving clients.
+	    </td></tr><tr><td align="left">
+	      <span class="command"><strong>--cluster-url <em class="replaceable"><code>URL</code></em></strong></span>
+	    </td><td align="left">
+	      An AMQP URL containing the local address that the broker advertizes to clients for fail-over connections. This is different for each host. By default, all local addresses for the broker are advertized. You only need to set this if
+	      <div class="orderedlist"><ol type="1"><li><p>
+		    Your host has more than one active network interface, and
+		  </p></li><li><p>
+		    You want to restrict client fail-over to a specific interface or interfaces.
+		  </p></li></ol></div>
+	      <p>Each broker in the cluster is specified using the following form:</p>
+
+	      <pre class="programlisting">url = ["amqp:"][ user ["/" password] "@" ] protocol_addr
+	      ("," protocol_addr)*
+	      protocol_addr = tcp_addr / rmda_addr / ssl_addr / ...
+	      tcp_addr = ["tcp:"] host [":" port]
+	      rdma_addr = "rdma:" host [":" port]
+	      ssl_addr = "ssl:" host [":" port]</pre>
+
+	      <p>In most cases, only one address is advertized, but more than one address can be specified in if the machine running the broker has more than one network interface card, and you want to allow clients to connect using multiple network interfaces. Use a comma delimiter (",") to separate brokers in the URL. Examples:</p>
+	      <div class="itemizedlist"><ul><li><p>
+		    <span class="command"><strong>amqp:tcp:192.168.1.103:5672</strong></span> advertizes a single address to the broker for failover.
+		  </p></li><li><p>
+		    <span class="command"><strong>amqp:tcp:192.168.1.103:5672,tcp:192.168.1.105:5672</strong></span> advertizes two different addresses to the broker for failover, on two different network interfaces.
+		  </p></li></ul></div>
+
+	    </td></tr><tr><td align="left">
+	      <span class="command"><strong>--cluster-cman</strong></span>
+	    </td><td align="left">
+	      <p>
+		CMAN protects against the "split-brain" condition, in which a network failure splits the cluster into two sub-clusters that cannot communicate with each other. When "split-brain" occurs, each of the sub-clusters can access shared resources without knowledge of the other sub-cluster, resulting in corrupted cluster integrity.
+	      </p>
+	      <p>
+		To avoid "split-brain", CMAN uses the notion of a "quorum". If more than half the cluster nodes are active, the cluster has quorum and can act. If half (or fewer) nodes are active, the cluster does not have quorum, and all cluster activity is stopped. There are other ways to define the quorum for particular use cases (e.g. a cluster of only 2 members), see the <a class="ulink" href="http://sources.redhat.com/cluster/wiki" target="_top">CMAN Wiki</a>
+		for more detail.
+	      </p>
+	      <p>
+		When enabled, the broker will wait until it belongs to a quorate cluster before accepting client connections. It continually monitors the quorum status and shuts down immediately if the node it runs on loses touch with the quorum.
+	      </p>
+
+	    </td></tr><tr><td align="left">
+	      --cluster-username
+	    </td><td align="left">
+	      SASL username for connections between brokers.
+	    </td></tr><tr><td align="left">
+	      --cluster-password
+	    </td><td align="left">
+	      SASL password for connections between brokers.
+	    </td></tr><tr><td align="left">
+	      --cluster-mechanism
+	    </td><td align="left">
+	      SASL authentication mechanism for connections between brokers
+	    </td></tr></tbody></table></div></div><br class="table-break"><p>
+      If a broker is unable to establish a connection to another broker in the cluster, the log will contain SASL errors, e.g:
+    </p><pre class="screen">2009-aug-04 10:17:37 info SASL: Authentication failed: SASL(-13): user not found: Password verification failed
+    </pre><p>
+      You can set the SASL user name and password used to connect to other brokers using the <span class="command"><strong>cluster-username</strong></span> and <span class="command"><strong>cluster-password</strong></span> properties when you start the broker. In most environment, it is easiest to create an account with the same user name and password on each broker in the cluster, and use these as the <span class="command"><strong>cluster-username</strong></span> and <span class="command"><strong>cluster-password</strong></span>. You can also set the SASL mode using <span class="command"><strong>cluster-mechanism</strong></span>. Remember that any mechanism you enable for broker-to-broker communication can also be used by a client, so do not enable <span class="command"><strong>cluster-mechanism=ANONYMOUS</strong></span> in a secure environment.
+    </p><p>
+      Once the cluster is running, run <span class="command"><strong>qpid-cluster</strong></span> to make sure that the brokers are running as one cluster. See the following section for details.
+    </p><p>
+      If the cluster is correctly configured, queues and messages are replicated to all brokers in the cluster, so an easy way to test the cluster is to run a program that routes messages to a queue on one broker, then to a different broker in the same cluster and read the messages to make sure they have been replicated. The <span class="command"><strong>drain</strong></span> and <span class="command"><strong>spout</strong></span> programs can be used for this test.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-qpid_cluster"></a>1.8.2.Â qpid-cluster</h3></div></div></div><p>
+      <span class="command"><strong>qpid-cluster</strong></span> is a command-line utility that allows you to view information on a cluster and its brokers, disconnect a client connection, shut down a broker in a cluster, or shut down the entire cluster. You can see the options using the <span class="command"><strong>--help</strong></span> option:
+    </p><pre class="screen">$ ./qpid-cluster --help
+    </pre><pre class="screen">Usage:  qpid-cluster [OPTIONS] [broker-addr]
+
+    broker-addr is in the form:   [username/password@] hostname | ip-address [:&lt;port&gt;]
+    ex:  localhost, 10.1.1.7:10000, broker-host:10000, guest/guest@localhost
+
+    Options:
+    -C [--all-connections]  View client connections to all cluster members
+    -c [--connections] ID   View client connections to specified member
+    -d [--del-connection] HOST:PORT
+    Disconnect a client connection
+    -s [--stop] ID          Stop one member of the cluster by its ID
+    -k [--all-stop]         Shut down the whole cluster
+    -f [--force]            Suppress the 'are-you-sure?' prompt
+    -n [--numeric]          Don't resolve names
+    </pre><p>
+      Let's connect to a cluster and display basic information about the cluser and its brokers. When you connect to the cluster using <span class="command"><strong>qpid-tool</strong></span>, you can use the host and port for any broker in the cluster. For instance, if a broker in the cluster is running on <code class="filename">localhost</code> on port 6664, you can start <span class="command"><strong>qpid-tool</strong></span> like this:
+    </p><pre class="screen">
+      $ qpid-cluster localhost:6664
+    </pre><p>
+      Here is the output:
+    </p><pre class="screen">
+      Cluster Name: local_test_cluster
+      Cluster Status: ACTIVE
+      Cluster Size: 3
+      Members: ID=127.0.0.1:13143 URL=amqp:tcp:192.168.1.101:6664,tcp:192.168.122.1:6664,tcp:10.16.10.62:6664
+      : ID=127.0.0.1:13167 URL=amqp:tcp:192.168.1.101:6665,tcp:192.168.122.1:6665,tcp:10.16.10.62:6665
+      : ID=127.0.0.1:13192 URL=amqp:tcp:192.168.1.101:6666,tcp:192.168.122.1:6666,tcp:10.16.10.62:6666
+    </pre><p>
+      The ID for each broker in cluster is given on the left. For instance, the ID for the first broker in the cluster is <span class="command"><strong>127.0.0.1:13143</strong></span>. The URL in the output is the broker's advertized address. Let's use the ID to shut the broker down using the <span class="command"><strong>--stop</strong></span> command:
+    </p><pre class="screen">$ ./qpid-cluster localhost:6664 --stop 127.0.0.1:13143
+    </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-Failover_in_Clients"></a>1.8.3.Â Failover in Clients</h3></div></div></div><p>
+      If a client is connected to a broker, the connection fails if the broker crashes or is killed. If heartbeat is enabled for the connection, a connection also fails if the broker hangs, the machine the broker is running on fails, or the network connection to the broker is lost â the connection fails no later than twice the heartbeat interval.
+    </p><p>
+      When a client's connection to a broker fails, any sent messages that have been acknowledged to the sender will have been replicated to all brokers in the cluster, any received messages that have not yet been acknowledged by the receiving client requeued to all brokers, and the client API notifies the application of the failure by throwing an exception.
+    </p><p>
+      Clients can be configured to automatically reconnect to another broker when it receives such an exception. Any messages that have been sent by the client, but not yet acknowledged as delivered, are resent. Any messages that have been read by the client, but not acknowledged, are delivered to the client.
+    </p><p>
+      TCP is slow to detect connection failures. A client can configure a connection to use a heartbeat to detect connection failure, and can specify a time interval for the heartbeat. If heartbeats are in use, failures will be detected no later than twice the heartbeat interval. The Java JMS client enables hearbeat by default. See the sections on Failover in Java JMS Clients and Failover in C++ Clients for the code to enable heartbeat.
+    </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sect-Messaging_User_Guide-Failover_in_Clients-Failover_in_Java_JMS_Clients"></a>1.8.3.1.Â Failover in Java JMS Clients</h4></div></div></div><p>
+	In Java JMS clients, client failover is handled automatically if it is enabled in the connection. Any messages that have been sent by the client, but not yet acknowledged as delivered, are resent. Any messages that have been read by the client, but not acknowledged, are sent to the client.
+      </p><p>
+	You can configure a connection to use failover using the <span class="command"><strong>failover</strong></span> property:
+      </p><pre class="screen">
+	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&amp;failover='failover_exchange'
+      </pre><p>
+	This property can take three values:
+      </p><div class="variablelist"><a name="vari-Messaging_User_Guide-Failover_in_Java_JMS_Clients-Failover_Modes"></a><p class="title"><b>Failover Modes</b></p><dl><dt><span class="term">failover_exchange</span></dt><dd><p>
+	      If the connection fails, fail over to any other broker in the cluster.
+	    </p></dd><dt><span class="term">roundrobin</span></dt><dd><p>
+	      If the connection fails, fail over to one of the brokers specified in the <span class="command"><strong>brokerlist</strong></span>.
+	    </p></dd><dt><span class="term">singlebroker</span></dt><dd><p>
+	      Failover is not supported; the connection is to a single broker only.
+	    </p></dd></dl></div><p>
+	In a Connection URL, heartbeat is set using the <span class="command"><strong>idle_timeout</strong></span> property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds:
+      </p><pre class="screen">
+	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672',idle_timeout=3
+      </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sect-Messaging_User_Guide-Failover_in_Clients-Failover_and_the_Qpid_Messaging_API"></a>1.8.3.2.Â Failover and the Qpid Messaging API</h4></div></div></div><p>
+	The Qpid Messaging API also supports automatic reconnection in the event a connection fails. . Senders can also be configured to replay any in-doubt messages (i.e. messages whice were sent but not acknowleged by the broker. See "Connection Options" and "Sender Capacity and Replay" in <em class="citetitle">Programming in Apache Qpid</em> for details.
+      </p><p>
+	In C++ and python clients, heartbeats are disabled by default. You can enable them by specifying a heartbeat interval (in seconds) for the connection via the 'heartbeat' option.
+      </p><p>
+	See "Cluster Failover" in <em class="citetitle">Programming in Apache Qpid</em> for details on how to keep the client aware of cluster membership.
+      </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-Error_handling_in_Clusters"></a>1.8.4.Â Error handling in Clusters</h3></div></div></div><p>
+      If a broker crashes or is killed, or a broker machine failure, broker connection failure, or a broker hang is detected, the other brokers in the cluster are notified that it is no longer a member of the cluster. If a new broker is joined to the cluster, it synchronizes with an active broker to obtain the current cluster state; if this synchronization fails, the new broker exit the cluster and aborts.
+    </p><p>
+      If a broker becomes extremely busy and stops responding, it stops accepting incoming work. All other brokers continue processing, and the non-responsive node caches all AIS traffic. When it resumes, the broker completes processes all cached AIS events, then accepts further incoming work. 
+    </p><p>
+      Broker hangs are only detected if the watchdog plugin is loaded and the <span class="command"><strong>--watchdog-interval</strong></span> option is set. The watchdog plug-in kills the qpidd broker process if it becomes stuck for longer than the watchdog interval. In some cases, e.g. certain phases of error resolution, it is possible for a stuck process to hang other cluster members that are waiting for it to send a message. Using the watchdog, the stuck process is terminated and removed from the cluster, allowing other members to continue and clients of the stuck process to fail over to other members.
+    </p><p>
+      Redundancy can also be achieved directly in the AIS network by specifying more than one network interface in the AIS configuration file. This causes Totem to use a redundant ring protocol, which makes failure of a single network transparent.
+    </p><p>
+      Redundancy can be achieved at the operating system level by using NIC bonding, which combines multiple network ports into a single group, effectively aggregating the bandwidth of multiple interfaces into a single connection. This provides both network load balancing and fault tolerance.
+    </p><p>
+      If any broker encounters an error, the brokers compare notes to see if they all received the same error. If not, the broker removes itself from the cluster and shuts itself down to ensure that all brokers in the cluster have consistent state. For instance, a broker may run out of disk space; if this happens, the broker shuts itself down. Examining the broker's log can help determine the error and suggest ways to prevent it from occuring in the future.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-Persistence_in_High_Availability_Message_Clusters"></a>1.8.5.Â Persistence in High Availability Message Clusters</h3></div></div></div><p>
+      Persistence and clustering are two different ways to provide reliability. Most systems that use a cluster do not enable persistence, but you can do so if you want to ensure that messages are not lost even if the last broker in a cluster fails. A cluster must have all transient or all persistent members, mixed clusters are not allowed. Each broker in a persistent cluster has it's own independent replica of the cluster's state it its store.
+    </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Clean_and_Dirty_Stores"></a>1.8.5.1.Â Clean and Dirty Stores</h4></div></div></div><p>
+	When a broker is an active member of a cluster, its store is marked "dirty" because it may be out of date compared to other brokers in the cluster. If a broker leaves a running cluster because it is stopped, it crashes or the host crashes, its store continues to be marked "dirty".
+      </p><p>
+	If the cluster is reduced to a single broker, its store is marked "clean" since it is the only broker making updates. If the cluster is shut down with the command <code class="literal">qpid-cluster -k</code> then all the stores are marked clean.
+      </p><p>
+	When a cluster is initially formed, brokers with clean stores read from their stores. Brokers with dirty stores, or brokers that join after the cluster is running, discard their old stores and initialize a new store with an update from one of the running brokers. The <span class="command"><strong>--truncate</strong></span> option can be used to force a broker to discard all existing stores even if they are clean. (A dirty store is discarded regardless.)
+      </p><p>
+	Discarded stores are copied to a back up directory. The active store is in &lt;data-dir&gt;/rhm. Back-up stores are in &lt;data-dir&gt;/_cluster.bak.&lt;nnnn&gt;/rhm, where &lt;nnnn&gt; is a 4 digit number. A higher number means a more recent backup.
+      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Starting_a_persistent_cluster"></a>1.8.5.2.Â Starting a persistent cluster</h4></div></div></div><p>
+	When starting a persistent cluster broker, set the cluster-size option to the number of brokers in the cluster. This allows the brokers to wait until the entire cluster is running so that they can synchronize their stored state.
+      </p><p>
+	The cluster can start if:
+      </p><p>
+	</p><div class="itemizedlist"><ul><li><p>
+	      all members have empty stores, or
+	    </p></li><li><p>
+	      at least one member has a clean store
+	    </p></li></ul></div><p>
+
+      </p><p>
+	All members of the new cluster will be initialized with the state from a clean store.
+      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Stopping_a_persistent_cluster"></a>1.8.5.3.Â Stopping a persistent cluster</h4></div></div></div><p>
+	To cleanly shut down a persistent cluster use the command <span class="command"><strong>qpid-cluster -k</strong></span>. This causes all brokers to synchronize their state and mark their stores as "clean" so they can be used when the cluster restarts.
+      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Starting_a_persistent_cluster_with_no_clean_store"></a>1.8.5.4.Â Starting a persistent cluster with no clean store</h4></div></div></div><p>
+	If the cluster has previously had a total failure and there are no clean stores then the brokers will fail to start with the log message <code class="literal">Cannot recover, no clean store.</code> If this happens you can start the cluster by marking one of the stores "clean" as follows:
+      </p><div class="procedure"><ol type="1"><li><p>
+	    Move the latest store backup into place in the brokers data-directory. The backups end in a 4 digit number, the latest backup is the highest number.
+	  </p><pre class="screen">
+	    cd &lt;data-dir&gt;
+	    mv rhm rhm.bak
+	    cp -a _cluster.bak.&lt;nnnn&gt;/rhm .
+	  </pre></li><li><p>
+	    Mark the store as clean:
+	    </p><pre class="screen">qpid-cluster-store -c &lt;data-dir&gt;</pre><p>
+
+	  </p></li></ol></div><p>
+	Now you can start the cluster, all members will be initialized from the store you marked as clean.
+      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Isolated_failures_in_a_persistent_cluster"></a>1.8.5.5.Â Isolated failures in a persistent cluster</h4></div></div></div><p>
+	A broker in a persistent cluster may encounter errors that other brokers in the cluster do not; if this happens, the broker shuts itself down to avoid making the cluster state inconsistent. For example a disk failure on one node will result in that node shutting down. Running out of storage capacity can also cause a node to shut down because because the brokers may not run out of storage at exactly the same point, even if they have similar storage configuration. To avoid unnecessary broker shutdowns, make sure the queue policy size of each durable queue is less than the capacity of the journal for the queue.
+      </p></div></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="queue-state-replication.html">Prev</a>Â </td><td width="20%" align="center"><a accesskey="u" href="ch01.html">Up</a></td><td width="40%" align="right">Â <a accesskey="n" href="producer-flow-control.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.7.Â 
+    Queue State Replication
+  Â </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">Â 1.9.Â 
+    Producer Flow Control
+  </td></tr></table></div><div class="main_text_area_bottom"></div></div></div></body></html>

Added: qpid/site/docs/books/0.18/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Passive_Cluster.html
URL: http://svn.apache.org/viewvc/qpid/site/docs/books/0.18/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Passive_Cluster.html?rev=1372179&view=auto
==============================================================================
--- qpid/site/docs/books/0.18/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Passive_Cluster.html (added)
+++ qpid/site/docs/books/0.18/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Passive_Cluster.html Sun Aug 12 19:03:49 2012
@@ -0,0 +1,520 @@
+<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>1.13.Â Active-passive Messaging Clusters</title><link rel="stylesheet" href="css/style.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="AMQP Messaging Broker (Implemented in C++)"><link rel="up" href="ch01.html" title="ChapterÂ 1.Â  Running the AMQP Messaging Broker"><link rel="prev" href="Using-message-groups.html" title="1.12.Â  Using Message Groups"><link rel="next" href="ch01s14.html" title="1.14.Â Queue Replication with the HA module"></head><body><div class="container" bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><DIV class="header"><DIV class="logo"><H1>Apache Qpidâ¢</H1><H2>Open Source AMQP Messaging</H2></DIV></DIV><DIV class="menu_box"><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Apache Qpid</H3><UL><LI><A href="http://qpid.apache.org/index
 .html">Home</A></LI><LI><A href="http://qpid.apache.org/download.html">Download</A></LI><LI><A href="http://qpid.apache.org/getting_started.html">Getting Started</A></LI><LI><A href="http://www.apache.org/licenses/">License</A></LI><LI><A href="https://cwiki.apache.org/qpid/faq.html">FAQ</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Documentation</H3><UL><LI><A href="http://qpid.apache.org/documentation.html#doc-release">0.14 Release</A></LI><LI><A href="http://qpid.apache.org/documentation.html#doc-trunk">Trunk</A></LI><LI><A href="http://qpid.apache.org/documentation.html#doc-archives">Archive</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Community</H3><UL><LI><A href="http://qpid.apache.org/getting_involved.html">Getting Involved</A></LI><LI><A href="http://qpid.apache.org/source_repository.html">Source Repository</A></LI><LI><A hre
 f="http://qpid.apache.org/mailing_lists.html">Mailing Lists</A></LI><LI><A href="https://cwiki.apache.org/qpid/">Wiki</A></LI><LI><A href="https://issues.apache.org/jira/browse/qpid">Issue Reporting</A></LI><LI><A href="http://qpid.apache.org/people.html">People</A></LI><LI><A href="http://qpid.apache.org/acknowledgements.html">Acknowledgements</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Developers</H3><UL><LI><A href="https://cwiki.apache.org/qpid/building.html">Building Qpid</A></LI><LI><A href="https://cwiki.apache.org/qpid/developer-pages.html">Developer Pages</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>About AMQP</H3><UL><LI><A href="http://qpid.apache.org/amqp.html">What is AMQP?</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>About Apache</H3><UL><LI><A hre
 f="http://www.apache.org">Home</A></LI><LI><A href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</A></LI><LI><A href="http://www.apache.org/foundation/thanks.html">Thanks</A></LI><LI><A href="http://www.apache.org/security/">Security</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV></DIV><div class="main_text_area"><div class="main_text_area_top"></div><div class="main_text_area_body"><DIV class="breadcrumbs"><span class="breadcrumb-link"><a href="index.html">AMQP Messaging Broker (Implemented in C++)</a></span> &gt; <span class="breadcrumb-link"><a href="ch01.html">
+      Running the AMQP Messaging Broker
+    </a></span> &gt; <span class="breadcrumb-node">Active-passive Messaging Clusters</span></DIV><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="chap-Messaging_User_Guide-Active_Passive_Cluster"></a>1.13.Â Active-passive Messaging Clusters</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2536314"></a>1.13.1.Â Overview</h3></div></div></div><p>
+
+      The High Availability (HA) module provides
+      <em class="firstterm">active-passive</em>, <em class="firstterm">hot-standby</em>
+      messaging clusters to provide fault tolerant message delivery.
+    </p><p>
+      In an active-passive cluster only one broker, known as the
+      <em class="firstterm">primary</em>, is active and serving clients at a time. The other
+      brokers are standing by as <em class="firstterm">backups</em>. Changes on the primary
+      are replicated to all the backups so they are always up-to-date or "hot". Backup
+      brokers reject client connection attempts, to enforce the requirement that clients
+      only connect to the primary.
+    </p><p>
+      If the primary fails, one of the backups is promoted to take over as the new
+      primary. Clients fail-over to the new primary automatically. If there are multiple
+      backups, the other backups also fail-over to become backups of the new primary.
+    </p><p>
+      This approach relies on an external <em class="firstterm">cluster resource manager</em>
+      to detect failures, choose the new primary and handle network partitions. <a class="ulink" href="https://fedorahosted.org/cluster/wiki/RGManager" target="_top">Rgmanager</a> is supported
+      initially, but others may be supported in the future.
+    </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2532321"></a>1.13.1.1.Â Avoiding message loss</h4></div></div></div><p>
+	In order to avoid message loss, the primary broker <span class="emphasis"><em>delays
+	acknowledgment</em></span> of messages received from clients until the
+	message has been replicated to and acknowledged by all of the back-up
+	brokers.
+      </p><p>
+	Clients buffer unacknowledged messages and re-send them in the event of
+	a fail-over.
+	<sup>[<a name="id2547364" href="#ftn.id2547364" class="footnote">1</a>]</sup>
+	If the primary crashes before a message is replicated to
+	all the backups, the client will re-send the message when it fails over
+	to the new primary.
+      </p><p>
+	Note that this means it is possible for messages to be
+	<span class="emphasis"><em>duplicated</em></span>. In the event of a failure it is
+	possible for a message to be both received by the backup that becomes
+	the new primary <span class="emphasis"><em>and</em></span> re-sent by the client.
+      </p><p>
+	When a new primary is promoted after a fail-over it is initially in
+	"recovering" mode. In this mode, it delays acknowledgment of messages
+	on behalf of all the backups that were connected to the previous
+	primary. This protects those messages against a failure of the new
+	primary until the backups have a chance to connect and catch up.
+      </p><div class="variablelist"><p class="title"><b>Status of a HA broker</b></p><dl><dt><span class="term">Joining</span></dt><dd><p>
+	      Initial status of a new broker that has not yet connected to the primary.
+	    </p></dd><dt><span class="term">Catch-up</span></dt><dd><p>
+	      A backup broker that is connected to the primary and catching up
+	      on queues and messages.
+	    </p></dd><dt><span class="term">Ready</span></dt><dd><p>
+	      A backup broker that is fully caught-up and ready to take over as
+	      primary.
+	    </p></dd><dt><span class="term">Recovering</span></dt><dd><p>
+	      The newly-promoted primary, waiting for backups to connect and catch up.
+	    </p></dd><dt><span class="term">Active</span></dt><dd><p>
+	      The active primary broker with all backups connected and caught-up.
+	    </p></dd></dl></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2544560"></a>1.13.1.2.Â Replacing the old cluster module</h4></div></div></div><p>
+	The High Availability (HA) module replaces the previous
+	<em class="firstterm">active-active</em> cluster module.  The new active-passive
+	approach has several advantages compared to the existing active-active cluster
+	module.
+	</p><div class="itemizedlist"><ul><li>
+	    It does not depend directly on openais or corosync. It does not use multicast
+	    which simplifies deployment.
+	  </li><li>
+	    It is more portable: in environments that don't support corosync, it can be
+	    integrated with a resource manager available in that environment.
+	  </li><li>
+	    Replication to a <em class="firstterm">disaster recovery</em> site can be handled as
+	    simply another node in the cluster, it does not require a separate replication
+	    mechanism.
+	  </li><li>
+	    It can take advantage of features provided by the resource manager, for example
+	    virtual IP addresses.
+	  </li><li>
+	    Improved performance and scalability due to better use of multiple CPUs
+	  </li></ul></div><p>
+      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2546236"></a>1.13.1.3.Â Limitations</h4></div></div></div><p>
+	There are a number of known limitations in the current preview implementation. These
+	will be fixed in the production versions.
+      </p><div class="itemizedlist"><ul><li>
+	  Transactional changes to queue state are not replicated atomically. If the primary crashes
+	  during a transaction, it is possible that the backup could contain only part of the
+	  changes introduced by a transaction.
+	</li><li>
+	  Not yet integrated with the persistent store.  A persistent broker must have its
+	  store erased before joining an existing cluster.  If the entire cluster fails,
+	  there are no tools to help identify the most recent store. In the future a
+	  persistent broker will be able to use its stored messages to avoid downloading
+	  messages from the primary when joining a cluster.
+	</li><li>
+	  Configuration changes (creating or deleting queues, exchanges and bindings) are
+	  replicated asynchronously. Management tools used to make changes will consider
+	  the change complete when it is complete on the primary, it may not yet be
+	  replicated to all the backups.
+	</li><li>
+	  Deletions made immediately after a failure (before all the backups are ready)
+	  may be lost on a backup. Queues, exchange or bindings that were deleted on the
+	  primary could re-appear if that backup is promoted to primary on a subsequent
+	  failure.
+	</li><li>
+	  Federated links <span class="emphasis"><em>from</em></span> the primary will be lost in fail over,
+	  they will not be re-connected to the new primary. Federation links
+	  <span class="emphasis"><em>to</em></span> the primary can fail over.
+	</li></ul></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2500902"></a>1.13.2.Â Virtual IP Addresses</h3></div></div></div><p>
+      Some resource managers (including <span class="command"><strong>rgmanager</strong></span>) support
+      <em class="firstterm">virtual IP addresses</em>. A virtual IP address is an IP
+      address that can be relocated to any of the nodes in a cluster.  The
+      resource manager associates this address with the primary node in the
+      cluster, and relocates it to the new primary when there is a failure. This
+      simplifies configuration as you can publish a single IP address rather
+      than a list.
+    </p><p>
+      A virtual IP address can be used by clients and backup brokers to connect
+      to the primary. The following sections will explain how to configure
+      virtual IP addresses for clients or brokers.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2520592"></a>1.13.3.Â Configuring the Brokers</h3></div></div></div><p>
+      The broker must load the <code class="filename">ha</code> module, it is loaded by
+      default. The following broker options are available for the HA module.
+    </p><div class="table"><a name="ha-broker-options"></a><p class="title"><b>TableÂ 1.18.Â Broker Options for High Availability Messaging Cluster</b></p><div class="table-contents"><table summary="Broker Options for High Availability Messaging Cluster" border="1"><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th colspan="2" align="center">
+	      Options for High Availability Messaging Cluster
+	    </th></tr></thead><tbody><tr><td align="left">
+	      <code class="literal">ha-cluster <em class="replaceable"><code>yes|no</code></em></code>
+	    </td><td align="left">
+	      Set to "yes" to have the broker join a cluster.
+	    </td></tr><tr><td align="left">
+	      <code class="literal">ha-brokers-url <em class="replaceable"><code>URL</code></em></code>
+	    </td><td align="left">
+	      <p>
+		The URL
+		<sup>[<a name="id2507029" href="#ftn.id2507029" class="footnote">a</a>]</sup>
+		used by cluster brokers to connect to each other. The URL can
+		contain a list of all the broker addresses or it can contain a single
+		virtual IP address.  If a list is used it is comma separated, for example
+		<code class="literal">amqp:node1.exaple.com,node2.exaple.com,node3.exaple.com</code>
+	      </p>
+	    </td></tr><tr><td align="left"><code class="literal">ha-public-url <em class="replaceable"><code>URL</code></em></code> </td><td align="left">
+	      <p>
+		The URL that is advertised to clients. This defaults to the
+		<code class="literal">ha-brokers-url</code> URL above, and has the same format.  A
+		virtual IP address is recommended for the public URL as it simplifies
+		deployment and hides changes to the cluster membership from clients.
+	      </p>
+	      <p>
+		This option allows you to put client traffic on a different network from
+		broker traffic, which is recommended.
+	      </p>
+	    </td></tr><tr><td align="left"><code class="literal">ha-replicate </code><em class="replaceable"><code>VALUE</code></em></td><td align="left">
+	      <p>
+		Specifies whether queues and exchanges are replicated by default.
+		<em class="replaceable"><code>VALUE</code></em> is one of: <code class="literal">none</code>,
+		<code class="literal">configuration</code>, <code class="literal">all</code>.
+		For details see <a class="xref" href="chap-Messaging_User_Guide-Active_Passive_Cluster.html#ha-creating-replicated" title="1.13.7.Â Creating replicated queues and exchanges">SectionÂ 1.13.7, âCreating replicated queues and exchangesâ</a>.
+	      </p>
+	    </td></tr><tr><td align="left">
+	      <p><code class="literal">ha-username <em class="replaceable"><code>USER</code></em></code></p>
+	      <p><code class="literal">ha-password <em class="replaceable"><code>PASS</code></em></code></p>
+	      <p><code class="literal">ha-mechanism <em class="replaceable"><code>MECH</code></em></code></p>
+	    </td><td align="left">
+	      Authentication settings used by HA brokers to connect to each other.
+	      If you are using authorization
+	      (<a class="xref" href="chap-Messaging_User_Guide-Security.html#sect-Messaging_User_Guide-Security-Authorization" title="1.5.2.Â Authorization">SectionÂ 1.5.2, âAuthorizationâ</a>)
+	      then this user must have all permissions.
+	    </td></tr><tr><td align="left"><code class="literal">ha-backup-timeout <em class="replaceable"><code>SECONDS</code></em></code> </td><td align="left">
+	      <p>
+		Maximum time that a recovering primary will wait for an expected
+		backup to connect and become ready.
+	      </p>
+	    </td></tr><tr><td align="left"><code class="literal">link-maintenance-interval <em class="replaceable"><code>SECONDS</code></em></code></td><td align="left">
+	      <p>
+		Interval for the broker to check link health and re-connect links if need
+		be. If you want brokers to fail over quickly you can set this to a
+		fraction of a second, for example: 0.1.
+	      </p>
+	    </td></tr></tbody><tbody class="footnotes"><tr><td colspan="2"><div class="footnote"><p><sup>[<a name="ftn.id2507029" href="#id2507029" class="para">a</a>] </sup>
+		  The full format of the URL is given by this grammar:
+		  <pre class="programlisting">
+url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
+addr = tcp_addr / rmda_addr / ssl_addr / ...
+tcp_addr = ["tcp:"] host [":" port]
+rdma_addr = "rdma:" host [":" port]
+ssl_addr = "ssl:" host [":" port]'
+		  </pre>
+		  </p></div></td></tr></tbody></table></div></div><br class="table-break"><p>
+      To configure a HA cluster you must set at least <code class="literal">ha-cluster</code> and
+      <code class="literal">ha-brokers-url</code>.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2530932"></a>1.13.4.Â The Cluster Resource Manager</h3></div></div></div><p>
+      Broker fail-over is managed by a <em class="firstterm">cluster resource
+      manager</em>.  An integration with <a class="ulink" href="https://fedorahosted.org/cluster/wiki/RGManager" target="_top">rgmanager</a> is
+      provided, but it is possible to integrate with other resource managers.
+    </p><p>
+      The resource manager is responsible for starting the <span class="command"><strong>qpidd</strong></span> broker
+      on each node in the cluster. The resource manager then <em class="firstterm">promotes</em>
+      one of the brokers to be the primary. The other brokers connect to the primary as
+      backups, using the URL provided in the <code class="literal">ha-brokers-url</code> configuration
+      option.
+    </p><p>
+      Once connected, the backup brokers synchronize their state with the
+      primary.  When a backup is synchronized, or "hot", it is ready to take
+      over if the primary fails.  Backup brokers continually receive updates
+      from the primary in order to stay synchronized.
+    </p><p>
+      If the primary fails, backup brokers go into fail-over mode. The resource
+      manager must detect the failure and promote one of the backups to be the
+      new primary.  The other backups connect to the new primary and synchronize
+      their state with it.
+    </p><p>
+      The resource manager is also responsible for protecting the cluster from
+      <em class="firstterm">split-brain</em> conditions resulting from a network partition.  A
+      network partition divide a cluster into two sub-groups which cannot see each other.
+      Usually a <em class="firstterm">quorum</em> voting algorithm is used that disables nodes
+      in the inquorate sub-group.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2505266"></a>1.13.5.Â Configuring <span class="command"><strong>rgmanager</strong></span> as resource manager</h3></div></div></div><p>
+      This section assumes that you are already familiar with setting up and configuring
+      clustered services using <span class="command"><strong>cman</strong></span> and
+      <span class="command"><strong>rgmanager</strong></span>. It will show you how to configure an active-passive,
+      hot-standby <span class="command"><strong>qpidd</strong></span> HA cluster with <span class="command"><strong>rgmanager</strong></span>.
+    </p><p>
+      You must provide a <code class="literal">cluster.conf</code> file to configure
+      <span class="command"><strong>cman</strong></span> and <span class="command"><strong>rgmanager</strong></span>.  Here is
+      an example <code class="literal">cluster.conf</code> file for a cluster of 3 nodes named
+      node1, node2 and node3. We will go through the configuration step-by-step.
+    </p><pre class="programlisting">
+      
+&lt;?xml version="1.0"?&gt;
+&lt;!--
+This is an example of a cluster.conf file to run qpidd HA under rgmanager.
+This example assumes a 3 node cluster, with nodes named node1, node2 and node3.
+
+NOTE: fencing is not shown, you must configure fencing appropriately for your cluster.
+--&gt;
+
+&lt;cluster name="qpid-test" config_version="18"&gt;
+  &lt;!-- The cluster has 3 nodes. Each has a unique nodid and one vote
+       for quorum. --&gt;
+  &lt;clusternodes&gt;
+    &lt;clusternode name="node1.example.com" nodeid="1"/&gt;
+    &lt;clusternode name="node2.example.com" nodeid="2"/&gt;
+    &lt;clusternode name="node3.example.com" nodeid="3"/&gt;
+  &lt;/clusternodes&gt;
+  &lt;!-- Resouce Manager configuration. --&gt;
+  &lt;rm&gt;
+    &lt;!--
+	There is a failoverdomain for each node containing just that node.
+	This lets us stipulate that the qpidd service should always run on each node.
+    --&gt;
+    &lt;failoverdomains&gt;
+      &lt;failoverdomain name="node1-domain" restricted="1"&gt;
+	&lt;failoverdomainnode name="node1.example.com"/&gt;
+      &lt;/failoverdomain&gt;
+      &lt;failoverdomain name="node2-domain" restricted="1"&gt;
+	&lt;failoverdomainnode name="node2.example.com"/&gt;
+      &lt;/failoverdomain&gt;
+      &lt;failoverdomain name="node3-domain" restricted="1"&gt;
+	&lt;failoverdomainnode name="node3.example.com"/&gt;
+      &lt;/failoverdomain&gt;
+    &lt;/failoverdomains&gt;
+
+    &lt;resources&gt;
+      &lt;!-- This script starts a qpidd broker acting as a backup. --&gt;
+      &lt;script file="/etc/init.d/qpidd" name="qpidd"/&gt;
+
+      &lt;!-- This script promotes the qpidd broker on this node to primary. --&gt;
+      &lt;script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/&gt;
+
+      &lt;!-- This is a virtual IP address for broker replication traffic. --&gt;
+      &lt;ip address="20.0.10.200" monitor_link="1"/&gt;
+
+      &lt;!-- This is a virtual IP address on a seprate network for client traffic. --&gt;
+      &lt;ip address="20.0.20.200" monitor_link="1"/&gt;
+    &lt;/resources&gt;
+
+    &lt;!-- There is a qpidd service on each node, it should be restarted if it fails. --&gt;
+    &lt;service name="node1-qpidd-service" domain="node1-domain" recovery="restart"&gt;
+      &lt;script ref="qpidd"/&gt;
+    &lt;/service&gt;
+    &lt;service name="node2-qpidd-service" domain="node2-domain" recovery="restart"&gt;
+      &lt;script ref="qpidd"/&gt;
+    &lt;/service&gt;
+    &lt;service name="node3-qpidd-service" domain="node3-domain"  recovery="restart"&gt;
+      &lt;script ref="qpidd"/&gt;
+    &lt;/service&gt;
+
+    &lt;!-- There should always be a single qpidd-primary service, it can run on any node. --&gt;
+    &lt;service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate"&gt;
+      &lt;script ref="qpidd-primary"/&gt;
+      &lt;!-- The primary has the IP addresses for brokers and clients to connect. --&gt;
+      &lt;ip ref="20.0.10.200"/&gt;
+      &lt;ip ref="20.0.20.200"/&gt;
+    &lt;/service&gt;
+  &lt;/rm&gt;
+&lt;/cluster&gt;
+      
+    </pre><p>
+      There is a <code class="literal">failoverdomain</code> for each node containing just that
+      one node.  This lets us stipulate that the qpidd service should always run on all
+      nodes.
+    </p><p>
+      The <code class="literal">resources</code> section defines the <span class="command"><strong>qpidd</strong></span>
+      script used to start the <span class="command"><strong>qpidd</strong></span> service. It also defines the
+      <span class="command"><strong>qpid-primary</strong></span> script which does not
+      actually start a new service, rather it promotes the existing
+      <span class="command"><strong>qpidd</strong></span> broker to primary status.
+    </p><p>
+      The <code class="literal">resources</code> section also defines a pair of virtual IP
+      addresses on different sub-nets. One will be used for broker-to-broker
+      communication, the other for client-to-broker.
+    </p><p>
+      To take advantage of the virtual IP addresses, <code class="filename">qpidd.conf</code>
+      should contain these  lines:
+    </p><pre class="programlisting">
+      ha-cluster=yes
+      ha-brokers-url=20.0.20.200
+      ha-public-url=20.0.10.200
+    </pre><p>
+      This configuration specifies that backup brokers will use 20.0.20.200
+      to connect to the primary and will advertise 20.0.10.200 to clients.
+      Clients should connect to 20.0.10.200.
+    </p><p>
+      The <code class="literal">service</code> section defines 3 <code class="literal">qpidd</code>
+      services, one for each node. Each service is in a restricted fail-over
+      domain containing just that node, and has the <code class="literal">restart</code>
+      recovery policy. The effect of this is that rgmanager will run
+      <span class="command"><strong>qpidd</strong></span> on each node, restarting if it fails.
+    </p><p>
+      There is a single <code class="literal">qpidd-primary-service</code> using the
+      <span class="command"><strong>qpidd-primary</strong></span> script which is not restricted to a
+      domain and has the <code class="literal">relocate</code> recovery policy. This means
+      rgmanager will start <span class="command"><strong>qpidd-primary</strong></span> on one of the nodes
+      when the cluster starts and will relocate it to another node if the
+      original node fails. Running the <code class="literal">qpidd-primary</code> script
+      does not start a new broker process, it promotes the existing broker to
+      become the primary.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2530382"></a>1.13.6.Â Broker Administration Tools</h3></div></div></div><p>
+      Normally, clients are not allowed to connect to a backup broker. However
+      management tools are allowed to connect to a backup brokers. If you use
+      these tools you <span class="emphasis"><em>must not</em></span> add or remove messages from
+      replicated queues, nor create or delete replicated queues or exchanges as
+      this will disrupt the replication process and may cause message loss.
+    </p><p>
+      <span class="command"><strong>qpid-ha</strong></span> allows you to view and change HA configuration settings.
+    </p><p>
+      The tools <span class="command"><strong>qpid-config</strong></span>, <span class="command"><strong>qpid-route</strong></span> and
+      <span class="command"><strong>qpid-stat</strong></span> will connect to a backup if you pass the flag <span class="command"><strong>ha-admin</strong></span> on the
+      command line.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="ha-creating-replicated"></a>1.13.7.Â Creating replicated queues and exchanges</h3></div></div></div><p>
+      By default, queues and exchanges are not replicated automatically. You can change
+      the default behavior by setting the <code class="literal">ha-replicate</code> configuration
+      option. It has one of the following values:
+      </p><div class="itemizedlist"><ul><li><em class="firstterm">all</em>: Replicate everything automatically: queues,
+	  exchanges, bindings and messages.
+	</li><li><em class="firstterm">configuration</em>: Replicate the existence of queues,
+	  exchange and bindings but don't replicate messages.
+	</li><li><em class="firstterm">none</em>: Don't replicate anything, this is the default.
+	</li></ul></div><p>
+    </p><p>
+      You can over-ride the default for a particular queue or exchange by passing the
+      argument <code class="literal">qpid.replicate</code> when creating the queue or exchange. It
+      takes the same values as <code class="literal">ha-replicate</code>
+    </p><p>
+      Bindings are automatically replicated if the queue and exchange being bound both
+      have replication <code class="literal">all</code> or <code class="literal">configuration</code>, they
+      are not replicated otherwise.
+    </p><p>
+      You can create replicated queues and exchanges with the
+      <span class="command"><strong>qpid-config</strong></span> management tool like this:
+    </p><pre class="programlisting">
+      qpid-config add queue myqueue --replicate all
+    </pre><p>
+      To create replicated queues and exchanges via the client API, add a
+      <code class="literal">node</code> entry to the address like this:
+    </p><pre class="programlisting">
+      "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
+    </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2505676"></a>1.13.8.Â Client Connection and Fail-over</h3></div></div></div><p>
+      Clients can only connect to the primary broker. Backup brokers
+      automatically reject any connection attempt by a client.
+    </p><p>
+      Clients are configured with the URL for the cluster (details below for
+      each type of client). There are two possibilities
+      </p><div class="itemizedlist"><ul><li>
+	  The URL contains multiple addresses, one for each broker in the cluster.
+	</li><li>
+	  The URL contains a single <em class="firstterm">virtual IP address</em>
+	  that is assigned to the primary broker by the resource manager.
+	  <sup>[<a name="id2537542" href="#ftn.id2537542" class="footnote">2</a>]</sup></li></ul></div><p>
+      In the first case, clients will repeatedly re-try each address in the URL
+      until they successfully connect to the primary. In the second case the
+      resource manager will assign the virtual IP address to the primary broker,
+      so clients only need to re-try on a single address.
+    </p><p>
+      When the primary broker fails, clients re-try all known cluster addresses
+      until they connect to the new primary.  The client re-sends any messages
+      that were previously sent but not acknowledged by the broker at the time
+      of the failure.  Similarly messages that have been sent by the broker, but
+      not acknowledged by the client, are re-queued.
+    </p><p>
+      TCP can be slow to detect connection failures. A client can configure a
+      connection to use a <em class="firstterm">heartbeat</em> to detect connection
+      failure, and can specify a time interval for the heartbeat. If heartbeats
+      are in use, failures will be detected no later than twice the heartbeat
+      interval. The following sections explain how to enable heartbeat in each
+      client.
+    </p><p>
+      See "Cluster Failover" in <em class="citetitle">Programming in Apache
+      Qpid</em> for details on how to keep the client aware of cluster
+      membership.
+    </p><p>
+      Suppose your cluster has 3 nodes: <code class="literal">node1</code>,
+      <code class="literal">node2</code> and <code class="literal">node3</code> all using the
+      default AMQP port, and you are not using a virtual IP address. To connect
+      a client you need to specify the address(es) and set the
+      <code class="literal">reconnect</code> property to <code class="literal">true</code>. The
+      following sub-sections show how to connect each type of client.
+    </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2520032"></a>1.13.8.1.Â C++ clients</h4></div></div></div><p>
+	With the C++ client, you specify multiple cluster addresses in a single URL
+	<sup>[<a name="id2519334" href="#ftn.id2519334" class="footnote">3</a>]</sup>
+	You also need to specify the connection option
+	<code class="literal">reconnect</code> to be true.  For example:
+      </p><pre class="programlisting">
+	qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
+      </pre><p>
+	Heartbeats are disabled by default. You can enable them by specifying a
+	heartbeat interval (in seconds) for the connection via the
+	<code class="literal">heartbeat</code> option. For example:
+	</p><pre class="programlisting">
+	  qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}");
+	</pre><p>
+      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2545291"></a>1.13.8.2.Â Python clients</h4></div></div></div><p>
+	With the python client, you specify <code class="literal">reconnect=True</code>
+	and a list of <em class="replaceable"><code>host:port</code></em> addresses as
+	<code class="literal">reconnect_urls</code> when calling
+	<code class="literal">Connection.establish</code> or
+	<code class="literal">Connection.open</code>
+      </p><pre class="programlisting">
+	connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"])
+      </pre><p>
+	Heartbeats are disabled by default. You can
+	enable them by specifying a heartbeat interval (in seconds) for the
+	connection via the 'heartbeat' option. For example:
+      </p><pre class="programlisting">
+	connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10)
+      </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2502155"></a>1.13.8.3.Â Java JMS Clients</h4></div></div></div><p>
+	In Java JMS clients, client fail-over is handled automatically if it is
+	enabled in the connection.  You can configure a connection to use
+	fail-over using the <span class="command"><strong>failover</strong></span> property:
+      </p><pre class="screen">
+	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&amp;failover='failover_exchange'
+      </pre><p>
+	This property can take three values:
+      </p><div class="variablelist"><p class="title"><b>Fail-over Modes</b></p><dl><dt><span class="term">failover_exchange</span></dt><dd><p>
+	      If the connection fails, fail over to any other broker in the cluster.
+	    </p></dd><dt><span class="term">roundrobin</span></dt><dd><p>
+	      If the connection fails, fail over to one of the brokers specified in the <span class="command"><strong>brokerlist</strong></span>.
+	    </p></dd><dt><span class="term">singlebroker</span></dt><dd><p>
+	      Fail-over is not supported; the connection is to a single broker only.
+	    </p></dd></dl></div><p>
+	In a Connection URL, heartbeat is set using the <span class="command"><strong>idle_timeout</strong></span> property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds:
+      </p><pre class="screen">
+	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672',idle_timeout=3
+      </pre></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2553767"></a>1.13.9.Â Security.</h3></div></div></div><p>
+      You can secure your cluster using the authentication and authorization features
+      described in <a class="xref" href="chap-Messaging_User_Guide-Security.html" title="1.5.Â Security">SectionÂ 1.5, âSecurityâ</a>.
+    </p><p>
+      Backup brokers connect to the primary broker and subscribe for management
+      events and queue contents. You can specify the identity used to connect
+      to the primary with the following options:
+    </p><div class="table"><a name="ha-broker-security-options"></a><p class="title"><b>TableÂ 1.19.Â Security options for High Availability Messaging Cluster</b></p><div class="table-contents"><table summary="Security options for High Availability Messaging Cluster" border="1"><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th colspan="2" align="center">
+	      Security options for High Availability Messaging Cluster
+	    </th></tr></thead><tbody><tr><td align="left">
+	      <p><code class="literal">ha-username <em class="replaceable"><code>USER</code></em></code></p>
+	      <p><code class="literal">ha-password <em class="replaceable"><code>PASS</code></em></code></p>
+	      <p><code class="literal">ha-mechanism <em class="replaceable"><code>MECH</code></em></code></p>
+	    </td><td align="left">
+	      Authentication settings used by HA brokers to connect to each other.
+	      If you are using authorization
+	      (<a class="xref" href="chap-Messaging_User_Guide-Security.html#sect-Messaging_User_Guide-Security-Authorization" title="1.5.2.Â Authorization">SectionÂ 1.5.2, âAuthorizationâ</a>)
+	      then this user must have all permissions.
+	    </td></tr></tbody></table></div></div><br class="table-break"><p>
+      This identity is also used to authorize actions taken on the backup broker to replicate
+      from the primary, for example to create queues or exchanges.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2532345"></a>1.13.10.Â Integrating with other Cluster Resource Managers</h3></div></div></div><p>
+      To integrate with a different resource manager you must configure it to:
+      </p><div class="itemizedlist"><ul><li>Start a qpidd process on each node of the cluster.</li><li>Restart qpidd if it crashes.</li><li>Promote exactly one of the brokers to primary.</li><li>Detect a failure and promote a new primary.</li></ul></div><p>
+    </p><p>
+      The <span class="command"><strong>qpid-ha</strong></span> command allows you to check if a broker is primary,
+      and to promote a backup to primary.
+    </p><p>
+      To test if a broker is the primary:
+      </p><pre class="programlisting">
+	qpid-ha -b <em class="replaceable"><code>broker-address</code></em> status --expect=primary
+      </pre><p>
+      This command will return 0 if the broker at <em class="replaceable"><code>broker-address</code></em>
+      is the primary, non-0 otherwise.
+    </p><p>
+      To promote a broker to primary:
+      </p><pre class="programlisting">
+	qpid-ha -b <em class="replaceable"><code>broker-address</code></em> promote
+      </pre><p>
+    </p><p>
+      <span class="command"><strong>qpid-ha --help</strong></span> gives information on other commands and options available.
+      You can also use <span class="command"><strong>qpid-ha</strong></span> to manually examine and promote brokers. This
+      can be useful for testing failover scenarios without having to set up a full resource manager,
+      or to simulate a cluster on a single node. For deployment, a resource manager is required.
+    </p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a name="ftn.id2547364" href="#id2547364" class="para">1</a>] </sup>
+	  Clients must use "at-least-once" reliability to enable re-send of unacknowledged
+	  messages. This is the default behavior, no options need be set to enable it. For
+	  details of client addressing options see "Using the Qpid Messaging API"
+	  in <em class="citetitle">Programming in Apache Qpid</em>
+	  </p></div><div class="footnote"><p><sup>[<a name="ftn.id2537542" href="#id2537542" class="para">2</a>] </sup>Only if the resource manager supports virtual IP addresses</p></div><div class="footnote"><p><sup>[<a name="ftn.id2519334" href="#id2519334" class="para">3</a>] </sup>
+	    The full grammar for the URL is:
+	  </p><pre class="programlisting">
+	    url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
+	    addr = tcp_addr / rmda_addr / ssl_addr / ...
+	    tcp_addr = ["tcp:"] host [":" port]
+	    rdma_addr = "rdma:" host [":" port]
+	    ssl_addr = "ssl:" host [":" port]'
+	  </pre></div></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="Using-message-groups.html">Prev</a>Â </td><td width="20%" align="center"><a accesskey="u" href="ch01.html">Up</a></td><td width="40%" align="right">Â <a accesskey="n" href="ch01s14.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.12.Â 
+    Using Message Groups
+  Â </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">Â 1.14.Â Queue Replication with the HA module</td></tr></table></div><div class="main_text_area_bottom"></div></div></div></body></html>



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@qpid.apache.org
For additional commands, e-mail: commits-help@qpid.apache.org