You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@qpid.apache.org by ro...@apache.org on 2012/08/12 21:19:53 UTC

svn commit: r1372183 [8/19] - in /qpid/site/docs/books/trunk_new: ./ AMQP-Messaging-Broker-CPP-Book/ AMQP-Messaging-Broker-CPP-Book/html/ AMQP-Messaging-Broker-CPP-Book/html/css/ AMQP-Messaging-Broker-CPP-Book/html/images/ AMQP-Messaging-Broker-CPP-Boo...

Added: qpid/site/docs/books/trunk_new/AMQP-Messaging-Broker-Java-Book/html/High-Availability.html
URL: http://svn.apache.org/viewvc/qpid/site/docs/books/trunk_new/AMQP-Messaging-Broker-Java-Book/html/High-Availability.html?rev=1372183&view=auto
==============================================================================
--- qpid/site/docs/books/trunk_new/AMQP-Messaging-Broker-Java-Book/html/High-Availability.html (added)
+++ qpid/site/docs/books/trunk_new/AMQP-Messaging-Broker-Java-Book/html/High-Availability.html Sun Aug 12 19:19:49 2012
@@ -0,0 +1,438 @@
+<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>1.6. High Availability</title><link rel="stylesheet" href="css/style.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="AMQP Messaging Broker (Implemented in Java)"><link rel="up" href="Java-General-User-Guides.html" title="Chapter 1. General User Guides"><link rel="prev" href="Java-Broker-Configuration-Guide.html" title="1.5. Broker Configuration Guide"><link rel="next" href="Qpid-Java-Broker-HowTos.html" title="Chapter 2. How Tos"></head><body><div class="container" bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><DIV class="header"><DIV class="logo"><H1>Apache Qpid™</H1><H2>Open Source AMQP Messaging</H2></DIV></DIV><DIV class="menu_box"><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Apache Qpid</H3><UL><LI><A href="http://qpid.apache.org/index.
 html">Home</A></LI><LI><A href="http://qpid.apache.org/download.html">Download</A></LI><LI><A href="http://qpid.apache.org/getting_started.html">Getting Started</A></LI><LI><A href="http://www.apache.org/licenses/">License</A></LI><LI><A href="https://cwiki.apache.org/qpid/faq.html">FAQ</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Documentation</H3><UL><LI><A href="http://qpid.apache.org/documentation.html#doc-release">0.14 Release</A></LI><LI><A href="http://qpid.apache.org/documentation.html#doc-trunk">Trunk</A></LI><LI><A href="http://qpid.apache.org/documentation.html#doc-archives">Archive</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Community</H3><UL><LI><A href="http://qpid.apache.org/getting_involved.html">Getting Involved</A></LI><LI><A href="http://qpid.apache.org/source_repository.html">Source Repository</A></LI><LI><A href
 ="http://qpid.apache.org/mailing_lists.html">Mailing Lists</A></LI><LI><A href="https://cwiki.apache.org/qpid/">Wiki</A></LI><LI><A href="https://issues.apache.org/jira/browse/qpid">Issue Reporting</A></LI><LI><A href="http://qpid.apache.org/people.html">People</A></LI><LI><A href="http://qpid.apache.org/acknowledgements.html">Acknowledgements</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Developers</H3><UL><LI><A href="https://cwiki.apache.org/qpid/building.html">Building Qpid</A></LI><LI><A href="https://cwiki.apache.org/qpid/developer-pages.html">Developer Pages</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>About AMQP</H3><UL><LI><A href="http://qpid.apache.org/amqp.html">What is AMQP?</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>About Apache</H3><UL><LI><A href
 ="http://www.apache.org">Home</A></LI><LI><A href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</A></LI><LI><A href="http://www.apache.org/foundation/thanks.html">Thanks</A></LI><LI><A href="http://www.apache.org/security/">Security</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV></DIV><div class="main_text_area"><div class="main_text_area_top"></div><div class="main_text_area_body"><DIV class="breadcrumbs"><span class="breadcrumb-link"><a href="index.html">AMQP Messaging Broker (Implemented in Java)</a></span> &gt; <span class="breadcrumb-link"><a href="Java-General-User-Guides.html">General User Guides</a></span> &gt; <span class="breadcrumb-node">High Availability</span></DIV><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="High-Availability"></a>1.6. High Availability</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HAGeneralIntroduction"><
 /a>1.6.1. General Introduction</h3></div></div></div><p>The term High Availability (HA) usually refers to having a number of instances of a service such as a Message Broker
+      available so that should a service unexpectedly fail, or requires to be shutdown for maintenance, users may quickly connect
+      to another instance and continue their work with minimal interuption. HA is one way to make a overall system more resilient
+      by eliminating a single point of failure from a system.</p><p>HA offerings are usually categorised as <span class="bold"><strong>Active/Active</strong></span> or <span class="bold"><strong>Active/Passive</strong></span>.
+      An Active/Active system is one where all nodes within the cluster are usuaully available for use by clients all of the time.  In an
+      Active/Passive system, one only node within the cluster is available for use by clients at any one time, whilst the others are in
+      some kind of standby state, awaiting to quickly step-in in the event the active node becomes unavailable.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HAOfferingsOfJavaBroker"></a>1.6.2. HA offerings of the Java Broker</h3></div></div></div><p>The Java Broker's HA offering became available at release <span class="bold"><strong>0.18</strong></span>.  HA is provided by way of the HA
+      features built into the <a class="ulink" href="http://www.oracle.com/technetwork/products/berkeleydb/overview/index-093405.html" target="_top">Java Edition of the Berkley Database (BDB JE)</a> and as such
+      is currently only available to Java Broker users who use the optional BDB JE based persistence store. This
+      <span class="bold"><strong>optional</strong></span> store requires the use of BDB JE which is licensed under the Sleepycat Licence, which is
+      not compatible with the Apache Licence and thus BDB JE is not distributed with Qpid. Users who elect to use this optional store for
+      the broker have to provide this dependency.</p><p>HA in the Java Broker provides an <span class="bold"><strong>Active/Passive</strong></span> mode of operation with Virtual hosts being
+      the unit of replication.  The Active node (referred to as the <span class="bold"><strong>Master</strong></span>) accepts all work from all the clients.
+       The Passive nodes (referred to as <span class="bold"><strong>Replicas</strong></span>) are unavailable for work: the only task they must perform is
+       to remain in synch with the Master node by consuming a replication stream containing all data and state.</p><p>If the Master node fails, a Replica node is elected to become the new Master node.  All clients automatically failover
+      <sup>[<a name="id2494999" href="#ftn.id2494999" class="footnote">1</a>]</sup> to the new Master and continue their work.</p><p>The Java Broker HA solution is incompatible with the HA solution offered by the CPP Broker.  It is not possible to co-locate Java and CPP
+       Brokers within the same cluster.</p><p>HA is not currently available for those using the the <span class="bold"><strong>Derby Store</strong></span> or <span class="bold"><strong>Memory
+      Message Store</strong></span>.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HATwoNodeCluster"></a>1.6.3. Two Node Cluster</h3></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2495041"></a>1.6.3.1. Overview</h4></div></div></div><p>In this HA solution, a cluster is formed with two nodes. one node serves as
+        <span class="bold"><strong>master</strong></span> and the other is a <span class="bold"><strong>replica</strong></span>.
+      </p><p>All data and state required for the operation of the virtual host is automatically sent from the
+        master to the replica. This is called the replication stream. The master virtual host confirms each
+        message is on the replica before the client transaction completes. The exact way the client awaits
+        for the master and replica is gorverned by the <a class="link" href="High-Availability.html#HADurabilityGuarantee" title="1.6.6. Durability Guarantees">durability</a>
+        configuration, which is discussed later. In this way, the replica remains ready to take over the
+        role of the master if the master becomes unavailable.
+      </p><p>It is important to note that there is an inherent limitation of two node clusters is that
+        the replica node cannot make itself master automatically in the event of master failure.  This
+        is because the replica has no way to distinguish between a network partition (with potentially
+        the master still alive on the other side of the partition) and the case of genuine master failure.
+        (If the replica were to elect itself as master, the cluster would run the risk of a
+        <a class="ulink" href="http://en.wikipedia.org/wiki/Split-brain_(computing)" target="_top">split-brain</a> scenario).
+        In the event of a master failure, a third party must designate the replica as primary.  This process
+        is described in more detail later.
+      </p><p>Clients connect to the cluster using a <a class="link" href="High-Availability.html#HAClientFailover" title="1.6.7. Client failover configuration">failover url</a>.
+        This allows the client to maintain a connection to the master in a way that is transparent
+        to the client application.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2496149"></a>1.6.3.2. Depictions of cluster operation</h4></div></div></div><p>In this section, the operation of the cluster is depicted through a series of figures
+        supported by explanatory text.</p><div class="figure"><a name="id2496163"></a><p class="title"><b>Figure 1.1. Key for figures</b></p><div class="figure-contents"><div class="mediaobject"><img src="images/HA-2N-Key.png" alt="Key to figures"></div></div></div><br class="figure-break"><div class="section" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="HATwoNodeNormalOperation"></a>Normal Operation</h5></div></div></div><p>The figure below illustrates normal operation.  Clients connecting to the cluster by way
+	  of the failover URL achieve a connection to the master. As clients perform work (message
+	  production, consumption, queue creation etc), the master additionally sends this data to the
+	  replica over the network.</p><div class="figure"><a name="id2496211"></a><p class="title"><b>Figure 1.2. Normal operation of a two-node cluster</b></p><div class="figure-contents"><div class="mediaobject"><img src="images/HA-2N-Normal.png" alt="Normal operation"></div></div></div><br class="figure-break"></div><div class="section" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="HATwoNodeMasterFailure"></a>Master Failure and Recovery</h5></div></div></div><p>The figure below illustrates a sequence of events whereby the master suffers a failure
+	  and the replica is made the master to allow the clients to continue to work. Later the
+	  old master is repaired and comes back on-line in replica role.</p><p>The item numbers in this list apply to the numbered boxes in the figure below.</p><div class="orderedlist"><ol type="1"><li><p>System operating normally</p></li><li><p>Master suffers a failure and disconnects all clients. Replica realises that it is no
+	      longer in contact with master. Clients begin to try to reconnect to the cluster, although these
+	      connection attempts will fail at this point.</p></li><li><p>A third-party (an operator, a script or a combination of the two) verifies that the master has truely
+           failed <span class="bold"><strong>and is no longer running</strong></span>. If it has truely failed, the decision is made
+           to designate the replica as primary, allowing it to assume the role of master despite the other node being down.
+           This primary designation is performed using <a class="link" href="High-Availability.html#HAJMXAPI" title="1.6.8. Qpid JMX API for HA">JMX</a>.</p></li><li><p>Client connections to the new master succeed and the <span class="bold"><strong>service is restored
+	      </strong></span>, albeit without a replica.</p></li><li><p>The old master is repaired and brought back on-line.  It automatically rejoins the cluster
+	       in the <span class="bold"><strong>replica</strong></span> role.</p></li></ol></div><div class="figure"><a name="id2496339"></a><p class="title"><b>Figure 1.3. Failure of master and recovery sequence</b></p><div class="figure-contents"><div class="mediaobject"><img src="images/HA-2N-MasterFail.png" alt="Failure of master and subsequent recovery sequence"></div></div></div><br class="figure-break"></div><div class="section" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="HATwoNodeReplicaFailure"></a>Replica Failure and Recovery</h5></div></div></div><p>The figure that follows illustrates a sequence of events whereby the replica suffers a failure
+	   leaving the master to continue processing alone.  Later the replica is repaired and is restarted.
+	   It rejoins the cluster so that it is once again ready to take over in the event of master failure.</p><p>The behavior of the replica failure case is governed by the <code class="varname">designatedPrimary</code>
+        configuration item. If set true on the master, the master will continue to operate solo without outside
+        intervention when the replica fails. If false, a third-party must designate the master as primary in order
+        for it to continue solo.</p><p>The item numbers in this list apply to the numbered boxes in the figure below. This example assumes
+	   that <code class="varname">designatedPrimary</code> is true on the original master node.</p><div class="orderedlist"><ol type="1"><li><p>System operating normally</p></li><li><p>Replica suffers a failure. Master realises that replica longer in contact but as
+	      <code class="varname">designatedPrimary</code> is true, master continues processing solo and thus client
+	      connections are uninterrupted by the loss of the replica. System continues operating normally, albeit
+          with a single node.</p></li><li><p>Replica is repaired.</p></li><li><p>After catching up with missed work, replica is once again ready to take over in the event of master failure.</p></li></ol></div><div class="figure"><a name="id2498978"></a><p class="title"><b>Figure 1.4. Failure of replica and subsequent recovery sequence</b></p><div class="figure-contents"><div class="mediaobject"><img src="images/HA-2N-ReplicaFail.png" alt="Failure of replica and subsequent recovery sequence"></div></div></div><br class="figure-break"></div><div class="section" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="HATwoNodeNetworkPartition"></a>Network Partition and Recovery</h5></div></div></div><p>The figure below illustrates the sequence of events that would occur if the network between
+	  master and replica were to suffer a partition, and the nodes were out of contact with one and other.</p><p>As with <a class="link" href="High-Availability.html#HATwoNodeReplicaFailure" title="Replica Failure and Recovery">Replica Failure and Recovery</a>, the
+	  behaviour is governed by the <code class="varname">designatedPrimary</code>.
+	  Only if <code class="varname">designatedPrimary</code> is true on the master, will the master continue solo.</p><p>The item numbers in this list apply to the numbered boxes in the figure below. This example assumes
+	   that <code class="varname">designatedPrimary</code> is true on the original master node.</p><div class="orderedlist"><ol type="1"><li><p>System operating normally</p></li><li><p>Network suffers a failure. Master realises that replica longer in contact but as
+	      <code class="varname">designatedPrimary</code> is true, master continues processing solo and thus client
+	      connections are uninterrupted by the network partition between master and replica.</p></li><li><p>Network is repaired.</p></li><li><p>After catching up with missed work, replica is once again ready to take over in the event of master failure.
+	    System operating normally again.</p></li></ol></div><div class="figure"><a name="id2499099"></a><p class="title"><b>Figure 1.5. Partition of the network separating master and replica</b></p><div class="figure-contents"><div class="mediaobject"><img src="images/HA-2N-NetworkPartition.png" alt="Network Partition and Recovery"></div></div></div><br class="figure-break"></div><div class="section" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="HATwoNodeSplitBrain"></a>Split Brain</h5></div></div></div><p>A <a class="ulink" href="http://en.wikipedia.org/wiki/Split-brain_(computing)" target="_top">split-brain</a>
+          is a situation where the two node cluster has two masters. BDB normally strives to prevent
+	  this situation arising by preventing two nodes in a cluster being master at the same time.
+	  However, if the network suffers a partition, and the third-party intervenes incorrectly
+	  and makes the replica a second master a split-brain will be formed and both masters will
+	  proceed to perform work  <span class="bold"><strong>independently</strong></span> of one and other.</p><p>There is no automatic recovery from a split-brain.</p><p>Manual intervention will be required to choose which store will be retained as master
+	  and which will be discarded.  Manual intervention will be required to identify and repeat the
+          lost business transactions.</p><p>The item numbers in this list apply to the numbered boxes in the figure below.</p><div class="orderedlist"><ol type="1"><li><p>System operating normally</p></li><li><p>Network suffers a failure. Master realises that replica longer in contact but as
+	      <code class="varname">designatedPrimary</code> is true, master continues processing solo.  Client
+	      connections are uninterrupted by the network partition.</p><p>A third-party <span class="bold"><strong>erroneously</strong></span> designates the replica as primary while the
+            original master continues running (now solo).</p></li><li><p>As the nodes cannot see one and other, both behave as masters. Clients may perform work against
+	      both master nodes.</p></li></ol></div><div class="figure"><a name="id2499232"></a><p class="title"><b>Figure 1.6. Split Brain</b></p><div class="figure-contents"><div class="mediaobject"><img src="images/HA-2N-SplitBrain.png" alt="Split Brain"></div></div></div><br class="figure-break"></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HAMultiNodeCluster"></a>1.6.4. Multi Node Cluster</h3></div></div></div><p>Multi node clusters, that is clusters where the number of nodes is three or more, are not yet
+         ready for use.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HAConfiguration"></a>1.6.5. Configuring a Virtual Host to be a node</h3></div></div></div><p>To configure a virtualhost as a cluster node, configure the virtualhost.xml in the following manner:</p><p>
+      </p><pre class="programlisting">
+&lt;virtualhost&gt;
+  &lt;name&gt;myhost&lt;/name&gt;
+  &lt;myvhost&gt;
+    &lt;store&gt;
+      &lt;class&gt;org.apache.qpid.server.store.berkeleydb.BDBHAMessageStore&lt;/class&gt;
+      &lt;environment-path&gt;${work}/bdbhastore&lt;/environment-path&gt;
+      &lt;highAvailability&gt;
+        &lt;groupName&gt;myclustername&lt;/groupName&gt;
+        &lt;nodeName&gt;mynode1&lt;/nodeName&gt;
+        &lt;nodeHostPort&gt;node1host:port&lt;/nodeHostPort&gt;
+        &lt;helperHostPort&gt;node1host:port&lt;/helperHostPort&gt;
+        &lt;durability&gt;NO_SYNC\,NO_SYNC\,SIMPLE_MAJORITY&lt;/durability&gt;
+        &lt;coalescingSync&gt;true|false&lt;/coalescingSync&gt;
+        &lt;designatedPrimary&gt;true|false&lt;/designatedPrimary&gt;
+      &lt;/highAvailability&gt;
+    &lt;/store&gt;
+    ...
+ &lt;/myvhost&gt;
+&lt;/virtualhost&gt;</pre><p>
+    </p><p>The <code class="varname">groupName</code> is the name of logical name of the cluster.  All nodes within the
+      cluster must use the same <code class="varname">groupName</code> in order to be considered part of the cluster.</p><p>The <code class="varname">nodeName</code> is the logical name of the node.  All nodes within the cluster must have a
+      unique name.  It is recommended that the node name should be chosen from a different nomenclature from that of
+      the servers on which they are hosted, in case the need arises to move node to a new server in the future.</p><p>The <code class="varname">nodeHostPort</code> is the hostname and port number used by this node to communicate with the
+      the other nodes in the cluster. For the hostname, an IP address, hostname or fully qualified hostname may be used.
+      For the port number, any free port can be used.  It is important that this address is stable over time, as BDB
+      records and uses this address internally.</p><p>The <code class="varname">helperHostPort</code> is the hostname and port number that new nodes use to discover other
+      nodes within the cluster when they are newly introduced to the cluster.  When configuring the first node, set the
+      <code class="varname">helperHostPort</code> to its own <code class="varname">nodeHostPort</code>.  For the second and subsequent nodes,
+      set their <code class="varname">helperHostPort</code> to that of the first node.</p><p><code class="varname">durability</code> controls the <a class="link" href="High-Availability.html#HADurabilityGuarantee" title="1.6.6. Durability Guarantees">durability</a>
+      guarantees made by the cluster. It is important that all nodes use the same value for this property. The default value is
+      NO_SYNC\,NO_SYNC\,SIMPLE_MAJORITY. Owing to the internal use of Apache Commons Config, it is currently necessary
+      to escape the commas within the durability string.</p><p><code class="varname">coalescingSync</code> controls the <a class="link" href="High-Availability.html#HADurabilityGuarantee_CoalescingSync" title="1.6.6.2. Coalescing-sync">coalescing-sync</a>
+      mode of Qpid. It is important that all nodes use the same value. If omitted, it defaults to true.</p><p>The <code class="varname">designatedPrimary</code> is applicable only to the <a class="link" href="High-Availability.html#HATwoNodeCluster" title="1.6.3. Two Node Cluster">two-node
+     case.</a>  It governs the behaviour of a node when the other node fails or becomes uncontactable.  If true,
+     the node will be designated as primary at startup and will be able to continue operating as a single node master.
+     If false, the node will transition to an unavailable state until a third-party manually designates the node as
+     primary or the other node is restored. It is suggested that the node that normally fulfils the role of master is
+     set true in config file and the node that is normally replica is set false.  Be aware that setting both nodes to
+     true will lead to a failure to start up, as both cannot be designated at the point of contact. Designating both
+     nodes as primary at runtime (using the JMX interface) will lead to a <a class="link" href="High-Availability.html#HATwoNodeSplitBrain" title="Split Brain">split-brain</a>
+     in the case of network partition and must be avoided.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Usage of domain names in  <code class="varname">helperHostPort</code> and <code class="varname">nodeHostPort</code> is more preferebale
+     over IP addresses due to the tendency of more frequent changes of the last over the former.
+     If server IP address changes but domain name remains the same the HA cluster can continue working as normal
+     in case when domain names are used in cluster configuration. In case when IP addresses are used and they are changed with the time
+     than Qpid <a class="link" href="High-Availability.html#HAJMXAPI" title="1.6.8. Qpid JMX API for HA">JMX API for HA</a> can be used to change the addresses or remove the nodes from the cluster.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="HAConfiguration_BDBEnvVars"></a>1.6.5.1. Passing BDB environment and replication configuration options</h4></div></div></div><p>It is possible to pass BDB <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html" target="_top">
+         environment</a> and <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/rep/ReplicationConfig.html" target="_top">
+         replication</a> configuration options from the virtualhost.xml. Environment configuration options are passed using
+         the <code class="varname">envConfig</code> element, and replication config using <code class="varname">repConfig</code>.</p><p>For example, to override the BDB environment configuration options <code class="varname">je.cleaner.threads</code> and
+        <code class="varname">je.txn.timeout</code></p><pre class="programlisting">
+         ...
+      &lt;/highAvailability&gt;
+      &lt;envConfig&gt;
+        &lt;name&gt;je.cleaner.threads&lt;/name&gt;
+        &lt;value&gt;2&lt;/value&gt;
+      &lt;/envConfig&gt;
+      &lt;envConfig&gt;
+        &lt;name&gt;je.txn.timeout&lt;/name&gt;
+        &lt;value&gt;15 min&lt;/value&gt;
+      &lt;/envConfig&gt;
+      ...
+    &lt;/store&gt;</pre><p>And to override the BDB replication configuration options <code class="varname">je.rep.insufficientReplicasTimeout</code>.</p><pre class="programlisting">
+         ...
+      &lt;/highAvailability&gt;
+      ...
+      &lt;repConfig&gt;
+        &lt;name&gt;je.rep.insufficientReplicasTimeout&lt;/name&gt;
+        &lt;value&gt;2&lt;/value&gt;
+      &lt;/envConfig&gt;
+      &lt;envConfig&gt;
+        &lt;name&gt;je.txn.timeout&lt;/name&gt;
+        &lt;value&gt;10 s&lt;/value&gt;
+      &lt;/envConfig&gt;
+      ...
+    &lt;/store&gt;</pre></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HADurabilityGuarantee"></a>1.6.6. Durability Guarantees</h3></div></div></div><p>The term <a class="ulink" href="http://en.wikipedia.org/wiki/ACID#Durability" target="_top">durability</a> is used to mean that once a
+      transaction is committed, it remains committed regardless of subsequent failures. A highly durable system is one where
+      loss of a committed transaction is extermely unlikely, whereas with a less durable system loss of a transaction is likely
+      in a greater number of scenarios.  Typically, the more highly durable a system the slower and more costly it will be.</p><p>Qpid exposes the all the
+      <a class="ulink" href="http://oracle.com/cd/E17277_02/html/ReplicationGuide/txn-management.html#durabilitycontrols" target="_top">durability controls</a>
+      offered by by BDB JE JA and a Qpid specific optimisation called <span class="bold"><strong>coalescing-sync</strong></span> which defaults
+      to enabled.</p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="HADurabilityGuarantee_BDBControls"></a>1.6.6.1. BDB Durability Controls</h4></div></div></div><p>BDB expresses durability as a triplet with the following form:</p><pre class="programlisting">&lt;master sync policy&gt;,&lt;replica sync policy&gt;,&lt;replica acknowledgement policy&gt;</pre><p>The sync polices controls whether the thread performing the committing thread awaits the successful completion of the
+        write, or the write and sync before continuing. The master sync policy and replica sync policy need not be the same.</p><p>For master and replic sync policies, the available values are:
+        <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.SyncPolicy.html#SYNC" target="_top">SYNC</a>,
+        <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.SyncPolicy.html#WRITE_NO_SYNC" target="_top">WRITE_NO_SYNC</a>,
+        <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.SyncPolicy.html#NO_SYNC" target="_top">NO_SYNC</a>. SYNC
+        is offers the highest durability whereas NO_SYNC the lowest.</p><p>Note: the combination of a master sync policy of SYNC and <a class="link" href="High-Availability.html#HADurabilityGuarantee_CoalescingSync" title="1.6.6.2. Coalescing-sync">coalescing-sync</a>
+        true would result in poor performance with no corresponding increase in durability guarantee.  It cannot not be used.</p><p>The acknowledgement policy defines whether when a master commits a transaction, it also awaits for the replica(s) to
+         commit the same transaction before continuing.  For the two-node case, ALL and SIMPLE_MAJORITY are equal.</p><p>For acknowledgement policy, the available value are:
+         <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.ReplicaAckPolicy.html#ALL" target="_top">ALL</a>,
+         <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.ReplicaAckPolicy.html#SIMPLE_MAJORITY" target="_top">SIMPLE_MAJORITY</a>
+         <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.ReplicaAckPolicy.html#NONE" target="_top">NONE</a>.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="HADurabilityGuarantee_CoalescingSync"></a>1.6.6.2. Coalescing-sync</h4></div></div></div><p>If enabled (the default) Qpid works to reduce the number of separate
+        <a class="ulink" href="http://oracle.com/javase/6/docs/api/java/io/FileDescriptor.html#sync()" target="_top">file-system sync</a> operations
+        performed by the <span class="bold"><strong>master</strong></span> on the underlying storage device thus improving performance.  It does
+        this coalescing separate sync operations arising from the different client commits operations occuring at approximately the same time.
+        It does this in such a manner not to reduce the ACID guarantees of the system.</p><p>Coalescing-sync has no effect on the behaviour of the replicas.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="HADurabilityGuarantee_Default"></a>1.6.6.3. Default</h4></div></div></div><p>The default durability guarantee is <code class="constant">NO_SYNC, NO_SYNC, SIMPLE_MAJORITY</code> with coalescing-sync enabled. The effect
+         of this combination is described in the table below. It offers a good compromise between durability guarantee and performance
+         with writes being guaranteed on the master and the additional guarantee that a majority of replicas have received the
+         transaction.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="HADurabilityGuarantee_Examples"></a>1.6.6.4. Examples</h4></div></div></div><p>Here are some examples illustrating the effects of the durability and coalescing-sync settings.</p><p>
+        </p><div class="table"><a name="id2499802"></a><p class="title"><b>Table 1.2. Effect of different durability guarantees</b></p><div class="table-contents"><table summary="Effect of different durability guarantees" border="1"><colgroup><col><col><col><col></colgroup><thead><tr><th> </th><th>Durability</th><th>Coalescing-sync</th><th>Description</th></tr></thead><tbody><tr><td>1</td><td>NO_SYNC, NO_SYNC, SIMPLE_MAJORITY</td><td>true</td><td>Before the commit returns to the client, the transaction will be written/sync'd to the Master's disk (effect of
+                   coalescing-sync) and a majority of the replica(s) will have acknowledged the <span class="bold"><strong>receipt</strong></span>
+                   of the transaction.  The replicas will write and sync the transaction to their disk at a point in the future governed by
+                   <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/rep/ReplicationMutableConfig.html#LOG_FLUSH_TASK_INTERVAL" target="_top">ReplicationMutableConfig#LOG_FLUSH_INTERVAL</a>.
+                </td></tr><tr><td>2</td><td>NO_SYNC, WRITE_NO_SYNC, SIMPLE_MAJORITY</td><td>true</td><td>Before the commit returns to the client, the transaction will be written/sync'd to the Master's disk (effect of
+                  coalescing-sync and a majority of the replica(s) will have acknowledged the <span class="bold"><strong>write</strong></span> of
+                  the transaction to their disk.  The replicas will sync the transaction to disk at a point in the future with an upper bound governed by
+                  ReplicationMutableConfig#LOG_FLUSH_INTERVAL.</td></tr><tr><td>3</td><td>NO_SYNC, NO_SYNC, NONE</td><td>false</td><td>After the commit returns to the client, the transaction is neither guaranteed to be written to the disk of the master
+                   nor received by any of the replicas. The master and replicas will write and sync the transaction to their disk at a point
+                   in the future with an upper bound governed by ReplicationMutableConfig#LOG_FLUSH_INTERVAL. This offers the weakest durability guarantee.</td></tr></tbody></table></div></div><p><br class="table-break">
+      </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HAClientFailover"></a>1.6.7. Client failover configuration</h3></div></div></div><p>The details about format of Qpid connection URLs can be found at section
+        <a class="ulink" href="../../Programming-In-Apache-Qpid/html/QpidJNDI.html" target="_top">Connection URLs</a>
+        of book <a class="ulink" href="../../Programming-In-Apache-Qpid/html/" target="_top">Programming In Apache Qpid</a>.</p><p>The failover policy option in the connection URL for the HA Cluster should be set to <span class="emphasis"><em>roundrobin</em></span>.
+      The Master broker should be put into a first place in <span class="emphasis"><em>brokerlist</em></span> URL option.
+      The recommended value for <span class="emphasis"><em>connectdelay</em></span> option in broker URL should be set to
+      the value greater than 1000 milliseconds. If it is desired that clients re-connect automatically after a
+      master to replica failure, <code class="varname">cyclecount</code> should be tuned so that the retry period is longer than
+      the expected length of time to perform the failover.</p><div class="example"><a name="id2500000"></a><p class="title"><b>Example 1.1. Example of connection URL for the HA Cluster</b></p><div class="example-contents">
+amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672?connectdelay='2000'&amp;retries='3';tcp://localhost:5671?connectdelay='2000'&amp;retries='3';tcp://localhost:5673?connectdelay='2000'&amp;retries='3''&amp;failover='roundrobin?cyclecount='30''
+        </div></div><br class="example-break"></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HAJMXAPI"></a>1.6.8. Qpid JMX API for HA</h3></div></div></div><p>Qpid exposes the BDB HA store information via its JMX interface and provides APIs to remove a Node from
+     the group, update a Node IP address, and assign a Node as the designated primary.</p><p>An instance of the <code class="classname">BDBHAMessageStore</code> MBean is instantiated by the broker for the each virtualhost using the HA store.</p><p>The reference to this MBean can be obtained via JMX API using an ObjectName like <span class="emphasis"><em>org.apache.qpid:type=BDBHAMessageStore,name=&lt;virtualhost name&gt;</em></span>
+                 where &lt;virtualhost name&gt; is the name of a specific virtualhost on the broker.</p><table border="1" id="id2500050">Mbean BDBHAMessageStore attributes<thead><tr>
+          <td>Name</td>
+          <td>Type</td>
+          <td>Accessibility</td>
+          <td>Description</td>
+        </tr></thead><tbody><tr>
+          <td>GroupName</td>
+          <td>String</td>
+          <td>Read only</td>
+          <td>Name identifying the group</td>
+        </tr><tr>
+          <td>NodeName</td>
+          <td>String</td>
+          <td>Read only</td>
+          <td>Unique name identifying the node within the group</td>
+        </tr><tr>
+          <td>NodeHostPort</td>
+          <td>String</td>
+          <td>Read only</td>
+          <td>Host/port used to replicate data between this node and others in the group</td>
+        </tr><tr>
+          <td>HelperHostPort</td>
+          <td>String</td>
+          <td>Read only</td>
+          <td>Host/port used to allow a new node to discover other group members</td>
+        </tr><tr>
+          <td>NodeState</td>
+          <td>String</td>
+          <td>Read only</td>
+          <td>Current state of the node</td>
+        </tr><tr>
+          <td>ReplicationPolicy</td>
+          <td>String</td>
+          <td>Read only</td>
+          <td>Node replication durability</td>
+        </tr><tr id="JMXDesignatedPrimary">
+          <td>DesignatedPrimary</td>
+          <td>boolean</td>
+          <td>Read/Write</td>
+          <td>Designated primary flag. Applicable to the two node case.</td>
+        </tr><tr>
+          <td>CoalescingSync</td>
+          <td>boolean</td>
+          <td>Read only</td>
+          <td>Coalescing sync flag. Applicable to the master sync policies NO_SYNC and WRITE_NO_SYNC only.</td>
+        </tr><tr>
+          <td>getAllNodesInGroup</td>
+          <td>TabularData</td>
+          <td>Read only</td>
+          <td>Get all nodes within the group, regardless of whether currently attached or not</td>
+        </tr></tbody></table><table border="1" id="id2500268">Mbean BDBHAMessageStore operations<thead><tr>
+          <td>Operation</td>
+          <td>Parameters</td>
+          <td>Returns</td>
+          <td>Description</td>
+        </tr></thead><tbody><tr>
+          <td>removeNodeFromGroup</td>
+          <td>
+            <p><span class="emphasis"><em>nodeName</em></span>, name of node, string</p>
+          </td>
+          <td>void</td>
+          <td>Remove an existing node from the group</td>
+        </tr><tr>
+          <td>updateAddress</td>
+          <td>
+            <div class="itemizedlist"><ul><li><p><span class="emphasis"><em>nodeName</em></span>, name of node, string</p></li><li><p><span class="emphasis"><em>newHostName</em></span>, new host name, string</p></li><li><p><span class="emphasis"><em>newPort</em></span>, new port number, int</p></li></ul></div>
+          </td>
+          <td>void</td>
+          <td>Update the address of another node. The node must be in a STOPPED state.</td>
+        </tr></tbody></table><div class="figure"><a name="id2500389"></a><p class="title"><b>Figure 1.7. BDBHAMessageStore view from jconsole.</b></p><div class="figure-contents"><div><img src="images/HA-BDBHAMessageStore-MBean-jconsole.png" alt="BDBHAMessageStore view from jconsole."></div></div></div><br class="figure-break"><div class="example"><a name="id2500403"></a><p class="title"><b>Example 1.2. Example of java code to get the node state value</b></p><div class="example-contents"><pre class="programlisting">
+Map&lt;String, Object&gt; environment = new HashMap&lt;String, Object&gt;();
+
+// credentials: user name and password
+environment.put(JMXConnector.CREDENTIALS, new String[] {"admin","admin"});
+JMXServiceURL url =  new JMXServiceURL("service:jmx:rmi:///jndi/rmi://localhost:9001/jmxrmi");
+JMXConnector jmxConnector = JMXConnectorFactory.connect(url, environment);
+MBeanServerConnection mbsc =  jmxConnector.getMBeanServerConnection();
+
+ObjectName queueObjectName = new ObjectName("org.apache.qpid:type=BDBHAMessageStore,name=test");
+String state = (String)mbsc.getAttribute(queueObjectName, "NodeState");
+
+System.out.println("Node state:" + state);
+        </pre><p>Example system output:</p><pre class="screen">Node state:MASTER</pre></div></div><br class="example-break"></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="BDB-HA-Monitoring-cluster"></a>1.6.9. Monitoring cluster</h3></div></div></div><p>In order to discover potential issues with HA Cluster early, all nodes in the Cluster should be monitored on regular basis
+    using the following techniques:</p><div class="itemizedlist"><ul><li><p>Broker log files scrapping for WARN or ERROR entries and operational log entries like:</p><div class="itemizedlist"><ul><li><p><span class="emphasis"><em>MST-1007 :</em></span> Store Passivated. It can indicate that Master virtual host has gone down.</p></li><li><p><span class="emphasis"><em>MST-1006 :</em></span> Recovery Complete. It can indicate that a former Replica virtual host is up and became the Master.</p></li></ul></div></li><li><p>Disk space usage and system load using system tools.</p></li><li><p>Berkeley HA node status using <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/rep/util/DbPing.html" target="_top"><code class="classname">DbPing</code></a> utility.</p><div class="example"><a name="id2500512"></a><p class="title"><b>Example 1.3. Using <code class="classname">DbPing</code> utility for monitoring HA nodes.</b></p><div class="example-conte
 nts"><span class="command"><strong>
+java -jar je-5.0.48.jar DbPing -groupName TestClusterGroup -nodeName Node-5001 -nodeHost localhost:5001 -socketTimeout 10000
+</strong></span><pre class="screen">
+Current state of node: Node-5001 from group: TestClusterGroup
+  Current state: MASTER
+  Current master: Node-5001
+  Current JE version: 5.0.48
+  Current log version: 8
+  Current transaction end (abort or commit) VLSN: 165
+  Current master transaction end (abort or commit) VLSN: 0
+  Current active feeders on node: 0
+  Current system load average: 0.35
+</pre></div></div><br class="example-break"><p>In the example above <code class="classname">DbPing</code> utility requested status of Cluster node with name
+            <span class="emphasis"><em>Node-5001</em></span> from replication group <span class="emphasis"><em>TestClusterGroup</em></span> running on host <span class="emphasis"><em>localhost:5001</em></span>.
+            The state of the node was reported into a system output.
+            </p></li><li><p>Using Qpid broker JMX interfaces.</p><p>Mbean <code class="classname">BDBHAMessageStore</code> can be used to request the following node information:</p><div class="itemizedlist"><ul><li><p><span class="emphasis"><em>NodeState</em></span> indicates whether node is a Master or Replica.</p></li><li><p><span class="emphasis"><em>Durability</em></span> replication durability.</p></li><li><p><span class="emphasis"><em>DesignatedPrimary</em></span> indicates whether Master node is designated primary.</p></li><li><p><span class="emphasis"><em>GroupName</em></span> replication group name.</p></li><li><p><span class="emphasis"><em>NodeName</em></span> node name.</p></li><li><p><span class="emphasis"><em>NodeHostPort</em></span> node host and port.</p></li><li><p><span class="emphasis"><em>HelperHostPort</em></span> helper host and port.</p></li><li><p><span class="emphasis"><em>AllNodesInGroup</em></span> lists of all nodes in the replication group includi
 ng their names, hosts and ports.</p></li></ul></div><p>For more details about <code class="classname">BDBHAMessageStore</code> MBean please refer section <a class="link" href="High-Availability.html#HAJMXAPI" title="1.6.8. Qpid JMX API for HA">Qpid JMX API for HA</a></p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HADiskSpace"></a>1.6.10. Disk space requirements</h3></div></div></div><p>Disk space is a critical resource for the HA Qpid broker.</p><p>In case when a Replica goes down (or falls behind the Master in 2 node cluster where the Master is designated primary)
+    and the Master continues running, the non-replicated store files are kept on the Masters disk for the period of time
+    as specified in <span class="emphasis"><em>je.rep.repStreamTimeout</em></span> JE setting in order to replicate this data later
+    when the Replica is back. This setting is set to 1 hour by default by the broker. The setting can be overridden as described in
+    <a class="xref" href="High-Availability.html#HAConfiguration_BDBEnvVars" title="1.6.5.1. Passing BDB environment and replication configuration options">Section 1.6.5.1, “Passing BDB environment and replication configuration options”</a>.</p><p>Depending from the application publishing/consuming rates and message sizes,
+    the disk space might become overfull during this period of time due to preserved logs.
+    Please, make sure to allocate enough space on your disk to avoid this from happening.
+    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="BDB-HA-Network-Requirements"></a>1.6.11. Network Requirements</h3></div></div></div><p>The HA Cluster performance depends on the network bandwidth, its use by existing traffic, and quality of service.</p><p>In order to achieve the best performance it is recommended to use a separate network infrastructure for the Qpid HA Nodes
+     which might include installation of dedicated network hardware on Broker hosts, assigning a higher priority to replication ports,
+     installing a cluster in a separate network not impacted by any other traffic.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="BDB-HA-Security"></a>1.6.12. Security</h3></div></div></div><p>At the moment Berkeley replication API supports only TCP/IP protocol to transfer replication data between Master and Replicas.</p><p>As result, the replicated data is unprotected and can be intercepted by anyone having access to the replication network.</p><p>Also, anyone who can access to this network can introduce a new node and therefore receive a copy of the data.</p><p>In order to reduce the security risks the entire HA cluster is recommended to run in a separate network protected from general access.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="BDB-HA-Backup"></a>1.6.13. Backups</h3></div></div></div><p>In order to protect the entire cluster from some cataclysms which
  might destroy all cluster nodes,
+    backups of the Master store should be taken on a regular basis.</p><p>Qpid Broker distribution includes the "hot" backup utility <span class="emphasis"><em>backup.sh</em></span> which can be found at broker bin folder.
+         This utility can perform the backup when broker is running.</p><p><span class="emphasis"><em>backup.sh</em></span> script invokes <code class="classname">org.apache.qpid.server.store.berkeleydb.BDBBackup</code> to do the job.</p><p>You can also run this class from command line like in an example below:</p><div class="example"><a name="id2500817"></a><p class="title"><b>Example 1.4. Performing store backup by using <code class="classname">BDBBackup</code> class directly</b></p><div class="example-contents"><span class="command"><strong>
+        java -cp qpid-bdbstore-0.18.jar org.apache.qpid.server.store.berkeleydb.BDBBackup -fromdir path/to/store/folder -todir path/to/backup/foldeAr</strong></span></div></div><br class="example-break"><p>In the example above BDBBackup utility is called from qpid-bdbstore-0.18.jar to backup the store at <span class="emphasis"><em>path/to/store/folder</em></span> and copy store logs into <span class="emphasis"><em>path/to/backup/folder</em></span>.</p><p>Linux and Unix users can take advantage of <span class="emphasis"><em>backup.sh</em></span> bash script by running this script in a similar way.</p><div class="example"><a name="id2500855"></a><p class="title"><b>Example 1.5. Performing store backup by using <code class="classname">backup.sh</code> bash script</b></p><div class="example-contents"><span class="command"><strong>backup.sh -fromdir path/to/store/folder -todir path/to/backup/folder</strong></span></div></div><br class="example-break"><div class="note" style
 ="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Do not forget to ensure that the Master store is being backed up, in the event the Node elected Master changes during
+      the lifecycle of the cluster.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HAMigrationFromNonHA"></a>1.6.14. Migration of a non-HA store to HA</h3></div></div></div><p>Non HA stores starting from schema version 4 (0.14 Qpid release) can be automatically converted into HA store on broker startup if replication is first enabled with the <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/rep/util/DbEnableReplication.html" target="_top"><code class="classname">DbEnableReplication</code></a> utility from the BDB JE jar.</p><p>DbEnableReplication converts a non HA store into an HA store and can be used as follows:</p><div class="example"><a name="id2500911"></a><p class="title"><b>Example 1.6. Enabling replication</b></p><div class="example-contents"><span class="command"><strong>
+java -jar je-5.0.48.jar DbEnableReplication -h /path/to/store -groupName MyReplicationGroup -nodeName MyNode1 -nodeHostPort  localhost:5001
+        </strong></span></div></div><br class="example-break"><p>In the examples above, je jar of version 5.0.48 is used to convert store at <span class="emphasis"><em>/path/to/store</em></span> into HA store having replication group name <span class="emphasis"><em>MyReplicationGroup</em></span>, node name <span class="emphasis"><em>MyNode1</em></span> and running on host <span class="emphasis"><em>localhost</em></span> and port <span class="emphasis"><em>5001</em></span>.</p><p>After running DbEnableReplication and updating the virtual host store to configuration to be an HA message store, like in example below,
+    on broker start up the store schema will be upgraded to the most recent version and the broker can be used as normal.</p><div class="example"><a name="id2500954"></a><p class="title"><b>Example 1.7. Example of XML configuration for HA message store</b></p><div class="example-contents"><pre class="programlisting">
+&lt;store&gt;
+    &lt;class&gt;org.apache.qpid.server.store.berkeleydb.BDBHAMessageStore&lt;/class&gt;
+    &lt;environment-path&gt;/path/to/store&lt;/environment-path&gt;
+    &lt;highAvailability&gt;
+        &lt;groupName&gt;MyReplicationGroup&lt;/groupName&gt;
+        &lt;nodeName&gt;MyNode1&lt;/nodeName&gt;
+        &lt;nodeHostPort&gt;localhost:5001&lt;/nodeHostPort&gt;
+        &lt;helperHostPort&gt;localhost:5001&lt;/helperHostPort&gt;
+    &lt;/highAvailability&gt;
+&lt;/store&gt;</pre></div></div><br class="example-break"><p>The Replica nodes can be started with empty stores. The data will be automatically copied from Master to Replica on Replica start-up.
+      This will take a period of time determined by the size of the Masters store and the network bandwidth between the nodes.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Due to existing caveats in Berkeley JE with copying of data from Master into Replica it is recommended to restart the Master node after store schema upgrade is finished before starting the Replica nodes.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HADisasterRecovery"></a>1.6.15. Disaster Recovery</h3></div></div></div><p>This section describes the steps required to restore HA broker cluster from backup.</p><p>The detailed instructions how to perform backup on replicated environment can be found <a class="link" href="High-Availability.html#BDB-HA-Backup" title="1.6.13. Backups">here</a>.</p><p>At this point we assume that backups are collected on regular basis from Master node.</p><p>Re
 plication configuration of a cluster is stored internally in HA message store.
+    This information includes IP addresses of the nodes.
+    In case when HA message store needs to be restored on a different host with a different IP address
+    the cluster replication configuration should be reseted in this case</p><p>Oracle provides a command line utility <a class="ulink" href="http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/rep/util/DbResetRepGroup.html" target="_top"><code class="classname">DbResetRepGroup</code></a>
+    to reset the members of a replication group and replace the group with a new group consisting of a single new member
+    as described by the arguments supplied to the utility</p><p>Cluster can be restored with the following steps:</p><div class="itemizedlist"><ul><li><p>Copy log files into the store folder from backup</p></li><li><p>Use <code class="classname">DbResetRepGroup</code> to reset an existing environment. See an example below</p><div class="example"><a name="id2501072"></a><p class="title"><b>Example 1.8. Reseting of replication group with <code class="classname">DbResetRepGroup</code></b></p><div class="example-contents"><span class="command"><strong>
+java -cp je-5.0.48.jar com.sleepycat.je.rep.util.DbResetRepGroup -h ha-work/Node-5001/bdbstore -groupName TestClusterGroup -nodeName Node-5001 -nodeHostPort localhost:5001</strong></span></div></div><br class="example-break"><p>In the example above <code class="classname">DbResetRepGroup</code> utility from Berkeley JE of version 5.0.48 is used to reset the store
+            at location <span class="emphasis"><em>ha-work/Node-5001/bdbstore</em></span> and set a replication group to <span class="emphasis"><em>TestClusterGroup</em></span>
+            having a node <span class="emphasis"><em>Node-5001</em></span> which runs at <span class="emphasis"><em>localhost:5001</em></span>.</p></li><li><p>Start a broker with HA store configured as specified on running of <code class="classname">DbResetRepGroup</code> utility.</p></li><li><p>Start replica nodes having the same replication group and a helper host port pointing to a new master. The store content will be copied into Replicas from Master on their start up.</p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HAPerformance"></a>1.6.16. Performance</h3></div></div></div><p>The aim of this section is not to provide exact performance metrics relating to HA, as this depends heavily on the test
+    environment, but rather showing an impact of HA on Qpid broker performance in comparison with the Non HA case.</p><p>For testing of impact of HA on a broker performance a special test script was written using Qpid performance test framework.
+    The script opened a number of connections to the Qpid broker, created producers and consumers on separate connections,
+    and published test messages with concurrent producers into a test queue and consumed them with concurrent consumers.
+    The table below shows the number of producers/consumers used in the tests.
+    The overall throughput was collected for each configuration.
+    </p><table border="1" id="id2501163">Number of producers/consumers in performance tests<thead><tr>
+          <th>Test</th>
+          <th>Number of producers</th>
+          <th>Number of consumers</th>
+        </tr></thead><tbody><tr>
+          <td>1</td>
+          <td>1</td>
+          <td>1</td>
+        </tr><tr>
+          <td>2</td>
+          <td>2</td>
+          <td>2</td>
+        </tr><tr>
+          <td>3</td>
+          <td>4</td>
+          <td>4</td>
+        </tr><tr>
+          <td>4</td>
+          <td>8</td>
+          <td>8</td>
+        </tr><tr>
+          <td>5</td>
+          <td>16</td>
+          <td>16</td>
+        </tr><tr>
+          <td>6</td>
+          <td>32</td>
+          <td>32</td>
+        </tr><tr>
+          <td>7</td>
+          <td>64</td>
+          <td>64</td>
+        </tr></tbody></table><p>The test was run against the following Qpid Broker configurations</p><div class="itemizedlist"><ul><li><p>Non HA Broker</p></li><li><p>HA 2 Nodes Cluster with durability <span class="emphasis"><em>SYNC,SYNC,ALL</em></span></p></li><li><p>HA 2 Nodes Cluster with durability <span class="emphasis"><em>WRITE_NO_SYNC,WRITE_NO_SYNC,ALL</em></span></p></li><li><p>HA 2 Nodes Cluster with durability <span class="emphasis"><em>WRITE_NO_SYNC,WRITE_NO_SYNC,ALL</em></span> and <span class="emphasis"><em>coalescing-sync</em></span> Qpid mode</p></li><li><p>HA 2 Nodes Cluster with durability <span class="emphasis"><em>WRITE_NO_SYNC,NO_SYNC,ALL</em></span> and <span class="emphasis"><em>coalescing-sync</em></span> Qpid mode</p></li><li><p>HA 2 Nodes Cluster with durability <span class="emphasis"><em>NO_SYNC,NO_SYNC,ALL</em></span> and <span class="emphasis"><em>coalescing-sync</em></span> Qpid option</p></li></ul></div><p>The evironment used in testing consis
 ted of 2 servers with 4 CPU cores (2x Intel(r) Xeon(R) CPU 5150@2.66GHz), 4GB of RAM
+        and running under OS Red Hat Enterprise Linux AS release 4 (Nahant Update 4). Network bandwidth was 1Gbit.
+    </p><p>We ran Master node on the first server and Replica and clients(both consumers and producers) on the second server.</p><p>In non-HA case Qpid Broker was run on a first server and clients were run on a second server.</p><p>The table below contains the test results we measured on this environment for different Broker configurations.</p><p>Each result is represented by throughput value in KB/second and difference in % between HA configuration and non HA case for the same number of clients.</p><table border="1" id="id2501415">Performance Comparison<thead><tr>
+          <td>Test/Broker</td>
+          <td>No HA</td>
+          <td>SYNC, SYNC, ALL</td>
+          <td>WRITE_NO_SYNC, WRITE_NO_SYNC, ALL</td>
+          <td>WRITE_NO_SYNC, WRITE_NO_SYNC, ALL - coalescing-sync</td>
+          <td>WRITE_NO_SYNC, NO_SYNC,ALL - coalescing-sync</td>
+          <td>NO_SYNC, NO_SYNC, ALL - coalescing-sync</td>
+        </tr></thead><tbody><tr>
+          <td>1 (1/1)</td>
+          <td>0.0%</td>
+          <td>-61.4%</td>
+          <td>117.0%</td>
+          <td>-16.02%</td>
+          <td>-9.58%</td>
+          <td>-25.47%</td>
+        </tr><tr>
+          <td>2 (2/2)</td>
+          <td>0.0%</td>
+          <td>-75.43%</td>
+          <td>67.87%</td>
+          <td>-66.6%</td>
+          <td>-69.02%</td>
+          <td>-30.43%</td>
+        </tr><tr>
+          <td>3 (4/4)</td>
+          <td>0.0%</td>
+          <td>-84.89%</td>
+          <td>24.19%</td>
+          <td>-71.02%</td>
+          <td>-69.37%</td>
+          <td>-43.67%</td>
+        </tr><tr>
+          <td>4 (8/8)</td>
+          <td>0.0%</td>
+          <td>-91.17%</td>
+          <td>-22.97%</td>
+          <td>-82.32%</td>
+          <td>-83.42%</td>
+          <td>-55.5%</td>
+        </tr><tr>
+          <td>5 (16/16)</td>
+          <td>0.0%</td>
+          <td>-91.16%</td>
+          <td>-21.42%</td>
+          <td>-86.6%</td>
+          <td>-86.37%</td>
+          <td>-46.99%</td>
+        </tr><tr>
+          <td>6 (32/32)</td>
+          <td>0.0%</td>
+          <td>-94.83%</td>
+          <td>-51.51%</td>
+          <td>-92.15%</td>
+          <td>-92.02%</td>
+          <td>-57.59%</td>
+        </tr><tr>
+          <td>7 (64/64)</td>
+          <td>0.0%</td>
+          <td>-94.2%</td>
+          <td>-41.84%</td>
+          <td>-89.55%</td>
+          <td>-89.55%</td>
+          <td>-50.54%</td>
+        </tr></tbody></table><p>The figure below depicts the graphs for the performance test results</p><div class="figure"><a name="id2501679"></a><p class="title"><b>Figure 1.8. Test results</b></p><div class="figure-contents"><div><img src="images/HA-perftests-results.png" alt="Test results"></div></div></div><br class="figure-break"><p>On using durability <span class="emphasis"><em>SYNC,SYNC,ALL</em></span> (without coalescing-sync) the performance drops significantly (by 62-95%) in comparison with non HA broker.</p><p>Whilst, on using durability <span class="emphasis"><em>WRITE_NO_SYNC,WRITE_NO_SYNC,ALL</em></span> (without coalescing-sync) the performance drops by only half, but with loss of durability guarantee, so is not recommended.</p><p>In order to have better performance with HA, Qpid Broker comes up with the special mode called <a class="link" href="High-Availability.html#HADurabilityGuarantee_CoalescingSync" title="1.6.6.2. Coalescing-sync">coalescing-
 sync</a>,
+    With this mode enabled, Qpid broker batches the concurrent transaction commits and syncs transaction data into Master disk in one go.
+    As result, the HA performance only drops by 25-60% for durability <span class="emphasis"><em>NO_SYNC,NO_SYNC,ALL</em></span> and by 10-90% for <span class="emphasis"><em>WRITE_NO_SYNC,WRITE_NO_SYNC,ALL</em></span>.</p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a name="ftn.id2494999" href="#id2494999" class="para">1</a>] </sup>The automatic failover feature is available only for AMQP connections from the Java client.  Management connections (JMX)
+        do not current offer this feature.</p></div></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="Java-Broker-Configuration-Guide.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="Java-General-User-Guides.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="Qpid-Java-Broker-HowTos.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.5. Broker Configuration Guide  </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 2. How Tos</td></tr></table></div><div class="main_text_area_bottom"></div></div></div></body></html>

Added: qpid/site/docs/books/trunk_new/AMQP-Messaging-Broker-Java-Book/html/How-to-Tune-M3-Java-Broker-Performance.html
URL: http://svn.apache.org/viewvc/qpid/site/docs/books/trunk_new/AMQP-Messaging-Broker-Java-Book/html/How-to-Tune-M3-Java-Broker-Performance.html?rev=1372183&view=auto
==============================================================================
--- qpid/site/docs/books/trunk_new/AMQP-Messaging-Broker-Java-Book/html/How-to-Tune-M3-Java-Broker-Performance.html (added)
+++ qpid/site/docs/books/trunk_new/AMQP-Messaging-Broker-Java-Book/html/How-to-Tune-M3-Java-Broker-Performance.html Sun Aug 12 19:19:49 2012
@@ -0,0 +1,104 @@
+<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>2.8.  How to Tune M3 Java Broker Performance</title><link rel="stylesheet" href="css/style.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="AMQP Messaging Broker (Implemented in Java)"><link rel="up" href="Qpid-Java-Broker-HowTos.html" title="Chapter 2. How Tos"><link rel="prev" href="Java-Broker-Debug-Logging.html" title="2.7.  Debug using log4j"><link rel="next" href="Qpid-Java-Build-HowTo.html" title="2.9.  Qpid Java Build How To"></head><body><div class="container" bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><DIV class="header"><DIV class="logo"><H1>Apache Qpid™</H1><H2>Open Source AMQP Messaging</H2></DIV></DIV><DIV class="menu_box"><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Apache Qpid</H3><UL><LI><A href="http://qpid.apache.org/index.
 html">Home</A></LI><LI><A href="http://qpid.apache.org/download.html">Download</A></LI><LI><A href="http://qpid.apache.org/getting_started.html">Getting Started</A></LI><LI><A href="http://www.apache.org/licenses/">License</A></LI><LI><A href="https://cwiki.apache.org/qpid/faq.html">FAQ</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Documentation</H3><UL><LI><A href="http://qpid.apache.org/documentation.html#doc-release">0.14 Release</A></LI><LI><A href="http://qpid.apache.org/documentation.html#doc-trunk">Trunk</A></LI><LI><A href="http://qpid.apache.org/documentation.html#doc-archives">Archive</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Community</H3><UL><LI><A href="http://qpid.apache.org/getting_involved.html">Getting Involved</A></LI><LI><A href="http://qpid.apache.org/source_repository.html">Source Repository</A></LI><LI><A href
 ="http://qpid.apache.org/mailing_lists.html">Mailing Lists</A></LI><LI><A href="https://cwiki.apache.org/qpid/">Wiki</A></LI><LI><A href="https://issues.apache.org/jira/browse/qpid">Issue Reporting</A></LI><LI><A href="http://qpid.apache.org/people.html">People</A></LI><LI><A href="http://qpid.apache.org/acknowledgements.html">Acknowledgements</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>Developers</H3><UL><LI><A href="https://cwiki.apache.org/qpid/building.html">Building Qpid</A></LI><LI><A href="https://cwiki.apache.org/qpid/developer-pages.html">Developer Pages</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>About AMQP</H3><UL><LI><A href="http://qpid.apache.org/amqp.html">What is AMQP?</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV><DIV class="menu_box_top"></DIV><DIV class="menu_box_body"><H3>About Apache</H3><UL><LI><A href
 ="http://www.apache.org">Home</A></LI><LI><A href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</A></LI><LI><A href="http://www.apache.org/foundation/thanks.html">Thanks</A></LI><LI><A href="http://www.apache.org/security/">Security</A></LI></UL></DIV><DIV class="menu_box_bottom"></DIV></DIV><div class="main_text_area"><div class="main_text_area_top"></div><div class="main_text_area_body"><DIV class="breadcrumbs"><span class="breadcrumb-link"><a href="index.html">AMQP Messaging Broker (Implemented in Java)</a></span> &gt; <span class="breadcrumb-link"><a href="Qpid-Java-Broker-HowTos.html">How Tos</a></span> &gt; <span class="breadcrumb-node">
+      How to Tune M3 Java Broker Performance
+    </span></DIV><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="How-to-Tune-M3-Java-Broker-Performance"></a>2.8. 
+      How to Tune M3 Java Broker Performance
+    </h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HowtoTuneM3JavaBrokerPerformance-ProblemStatement"></a>2.8.1. 
+            Problem
+            Statement
+          </h3></div></div></div><p>
+            During destructive testing of the Qpid M3 Java Broker, we tested
+            some tuning techniques and deployment changes to improve the Qpid
+            M3 Java Broker's capacity to maintain high levels of throughput,
+            particularly in the case of a slower consumer than produceer
+            (i.e. a growing backlog).
+          </p><p>
+            The focus of this page is to detail the results of tuning &amp;
+            deployment changes trialled.
+          </p><p>
+            The successful tuning changes are applicable for any deployment
+            expecting to see bursts of high volume throughput (1000s of
+            persistent messages in large batches). Any user wishing to use
+            these options <span class="emphasis"><em>must test them thoroughly in their own
+            environment with representative volumes</em></span>.
+          </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HowtoTuneM3JavaBrokerPerformance-SuccessfulTuningOptions"></a>2.8.2. 
+            Successful
+            Tuning Options
+          </h3></div></div></div><p>
+            The key scenario being taregetted by these changes is a broker
+            under heavy load (processing a large batch of persistent
+            messages)can be seen to perform slowly when filling up with an
+            influx of high volume transient messages which are queued behind
+            the persistent backlog. However, the changes suggested will be
+            equally applicable to general heavy load scenarios.
+          </p><p>
+            The easiest way to address this is to separate streams of
+            messages. Thus allowing the separate streams of messages to be
+            processed, and preventing a backlog behind a particular slow
+            consumer.
+          </p><p>
+            These strategies have been successfully tested to mitigate this
+            problem:
+          </p><div class="table"><a name="id2496809"></a><p class="title"><b>Table 2.7. </b></p><div class="table-contents"><table summary="" border="1"><colgroup><col><col></colgroup><tbody><tr><td>
+                  Strategy
+                </td><td>
+                  Result
+                </td></tr><tr><td>
+                  Seperate connections to one broker for separate streams of
+                  messages.
+                </td><td>
+                  Messages processed successfully, no problems experienced
+                </td></tr><tr><td>
+                  Seperate brokers for transient and persistent messages.
+                </td><td>
+                  Messages processed successfully, no problems experienced
+                </td></tr></tbody></table></div></div><br class="table-break"><p>
+            <span class="emphasis"><em>Separate Connections</em></span>
+            Using separate connections effectively means that the two streams
+            of data are not being processed via the same buffer, and thus the
+            broker gets &amp; processes the transient messages while
+            processing the persistent messages. Thus any build up of
+            unprocessed data is minimal and transitory.
+          </p><p>
+            <span class="emphasis"><em>Separate Brokers</em></span>
+            Using separate brokers may mean more work in terms of client
+            connection details being changed, and from an operational
+            perspective. However, it is certainly the most clear cut way of
+            isolating the two streams of messages and the heaps impacted.
+          </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="HowtoTuneM3JavaBrokerPerformance-Additionaltuning"></a>2.8.2.1. 
+            Additional
+            tuning
+          </h4></div></div></div><p>
+            It is worth testing if changing the size of the Qpid read/write
+            thread pool improves performance (eg. by setting
+            JAVA_OPTS="-Damqj.read_write_pool_size=32" before running
+            qpid-server). By default this is equal to the number of CPU
+            cores, but a higher number may show better performance with some
+            work loads.
+          </p><p>
+            It is also important to note that you should give the Qpid broker
+            plenty of memory - for any serious application at least a -Xmx of
+            3Gb. If you are deploying on a 64 bit platform, a larger heap is
+            definitely worth testing with. We will be testing tuning options
+            around a larger heap shortly.
+          </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="HowtoTuneM3JavaBrokerPerformance-NextSteps"></a>2.8.3. 
+            Next
+            Steps
+          </h3></div></div></div><p>
+            These two options have been testing using a Qpid test case, and
+            demonstrated that for a test case with a profile of persistent
+            heavy load following by constant transient high load traffic they
+            provide significant improvment.
+          </p><p>
+            However, the deploying project <span class="emphasis"><em>must</em></span> complete their own
+            testing, using the same destructive test cases, representative
+            message paradigms &amp; volumes, in order to verify the proposed
+            mitigation options.
+          </p><p>
+            The using programme should then choose the option most applicable
+            for their deployment and perform BAU testing before any
+            implementation into a production or pilot environment.
+          </p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="Java-Broker-Debug-Logging.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="Qpid-Java-Broker-HowTos.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="Qpid-Java-Build-HowTo.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">2.7. 
+      Debug using log4j
+     </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 2.9. 
+      Qpid Java Build How To
+    </td></tr></table></div><div class="main_text_area_bottom"></div></div></div></body></html>



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@qpid.apache.org
For additional commands, e-mail: commits-help@qpid.apache.org