You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@qpid.apache.org by ac...@apache.org on 2012/03/05 22:31:59 UTC
svn commit: r1297234 [1/3] - in /qpid/trunk/qpid: cpp/design_docs/ cpp/src/qpid/ha/ cpp/src/tests/ doc/book/src/ tools/src/py/

Author: aconway
Date: Mon Mar  5 21:31:58 2012
New Revision: 1297234

URL: http://svn.apache.org/viewvc?rev=1297234&view=rev
Log:
QPID-3603: Initial documentation for the new HA plug-in.

Added:
    qpid/trunk/qpid/doc/book/src/Active-Active-Cluster.xml   (with props)
    qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml   (with props)
    qpid/trunk/qpid/doc/book/src/HA-Queue-Replication.xml   (with props)
Removed:
    qpid/trunk/qpid/doc/book/src/Starting-a-cluster.xml
Modified:
    qpid/trunk/qpid/cpp/design_docs/new-ha-design.txt
    qpid/trunk/qpid/cpp/src/qpid/ha/BrokerReplicator.cpp
    qpid/trunk/qpid/cpp/src/tests/ha_tests.py
    qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP-Book.xml
    qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP.xml
    qpid/trunk/qpid/doc/book/src/Managing-CPP-Broker.xml
    qpid/trunk/qpid/doc/book/src/Programming-In-Apache-Qpid.xml
    qpid/trunk/qpid/doc/book/src/schemas.xml
    qpid/trunk/qpid/tools/src/py/qpid-config
    qpid/trunk/qpid/tools/src/py/qpid-ha

Modified: qpid/trunk/qpid/cpp/design_docs/new-ha-design.txt
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/design_docs/new-ha-design.txt?rev=1297234&r1=1297233&r2=1297234&view=diff
==============================================================================
--- qpid/trunk/qpid/cpp/design_docs/new-ha-design.txt (original)
+++ qpid/trunk/qpid/cpp/design_docs/new-ha-design.txt Mon Mar  5 21:31:58 2012
@@ -257,20 +257,24 @@ Broker startup with store:
   - When connecting as backup, check UUID matches primary, shut down if not.
 - Empty: start ok, no UUID check with primary.
 
-** Current Limitations
+* Current Limitations
 
 (In no particular order at present)
 
 For message replication:
 
-LM1 - The re-synchronisation does not handle the case where a newly elected
-primary is *behind* one of the other backups. To address this I propose
-a new event for restting the sequence that the new primary would send
-out on detecting that a replicating browser is ahead of it, requesting
-that the replica revert back to a particular sequence number. The
-replica on receiving this event would then discard (i.e. dequeue) all
-the messages ahead of that sequence number and reset the counter to
-correctly sequence any subsequently delivered messages.
+LM1a - On failover, backups delete their queues and download the full queue state from the
+primary.  There was code to use messags already on the backup for re-synchronisation, it
+was removed in early development (r1214490) to simplify the logic while getting basic
+replication working. It needs to be re-introduced.
+
+LM1b - This re-synchronisation does not handle the case where a newly elected primary is *behind*
+one of the other backups. To address this I propose a new event for restting the sequence
+that the new primary would send out on detecting that a replicating browser is ahead of
+it, requesting that the replica revert back to a particular sequence number. The replica
+on receiving this event would then discard (i.e. dequeue) all the messages ahead of that
+sequence number and reset the counter to correctly sequence any subsequently delivered
+messages.
 
 LM2 - There is a need to handle wrap-around of the message sequence to avoid
 confusing the resynchronisation where a replica has been disconnected
@@ -349,6 +353,12 @@ LC6 - The events and query responses are
       It is not possible to miss a create event and yet not to have
       the object in question in the query response however.
 
+LC7 Federated links from the primary will be lost in failover, they will not be re-connected on
+the new primary. Federation links to the primary can fail over.
+
+LC8 Only plain FIFO queues can be replicated. LVQs and ring queues are not yet supported.
+
+LC9 The "last man standing" feature of the old cluster is not available.
 
 * Benefits compared to previous cluster implementation.
 
@@ -359,58 +369,3 @@ LC6 - The events and query responses are
 - Can take advantage of resource manager features, e.g. virtual IP addresses.
 - Fewer inconsistent errors (store failures) that can be handled without killing brokers.
 - Improved performance
-* User Documentation Notes
-
-Notes to seed initial user documentation. Loosely tracking the implementation,
-some points mentioned in the doc may not be implemented yet.
-
-** High Availability Overview
-
-HA is implemented using a 'hot standby' approach. Clients are directed
-to a single "primary" broker. The primary executes client requests and
-also replicates them to one or more "backup" brokers. If the primary
-fails, one of the backups takes over the role of primary carrying on
-from where the primary left off. Clients will fail over to the new
-primary automatically and continue their work.
-
-TODO: at least once, deduplication.
-
-** Enabling replication on the client.
-
-To enable replication set the qpid.replicate argument when creating a
-queue or exchange.
-
-This can have one of 3 values
-- none: the object is not replicated
-- configuration: queues, exchanges and bindings are replicated but messages are not.
-- messages: configuration and messages are replicated.
-
-TODO: examples
-TODO: more options for default value of qpid.replicate
-
-A HA client connection has multiple addresses, one for each broker. If
-the it fails to connect to an address, or the connection breaks,
-it will automatically fail-over to another address.
-
-Only the primary broker accepts connections, the backup brokers
-redirect connection attempts to the primary. If the primary fails, one
-of the backups is promoted to primary and clients fail-over to the new
-primary.
-
-TODO: using multiple-address connections, examples c++, python, java.
-
-TODO: dynamic cluster addressing?
-
-TODO: need de-duplication.
-
-** Enabling replication on the broker.
-
-Network topology: backup links, separate client/broker networks.
-Describe failover mechanisms.
-- Client view: URLs, failover, exclusion & discovery.
-- Broker view: similar.
-Role of rmganager
-
-** Configuring rgmanager
-
-** Configuring qpidd

Modified: qpid/trunk/qpid/cpp/src/qpid/ha/BrokerReplicator.cpp
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/src/qpid/ha/BrokerReplicator.cpp?rev=1297234&r1=1297233&r2=1297234&view=diff
==============================================================================
--- qpid/trunk/qpid/cpp/src/qpid/ha/BrokerReplicator.cpp (original)
+++ qpid/trunk/qpid/cpp/src/qpid/ha/BrokerReplicator.cpp Mon Mar  5 21:31:58 2012
@@ -113,15 +113,15 @@ template <class T> bool match(Variant::M
     return T::match(schema[CLASS_NAME], schema[PACKAGE_NAME]);
 }
 
-enum ReplicateLevel { RL_NONE=0, RL_CONFIGURATION, RL_MESSAGES };
+enum ReplicateLevel { RL_NONE=0, RL_CONFIGURATION, RL_ALL };
 const string S_NONE="none";
 const string S_CONFIGURATION="configuration";
-const string S_MESSAGES="messages";
+const string S_ALL="all";
 
 ReplicateLevel replicateLevel(const string& level) {
     if (level == S_NONE) return RL_NONE;
     if (level == S_CONFIGURATION) return RL_CONFIGURATION;
-    if (level == S_MESSAGES) return RL_MESSAGES;
+    if (level == S_ALL) return RL_ALL;
     throw Exception("Invalid value for "+QPID_REPLICATE+": "+level);
 }
 
@@ -491,7 +491,7 @@ void BrokerReplicator::doResponseBind(Va
 }
 
 void BrokerReplicator::startQueueReplicator(const boost::shared_ptr<Queue>& queue) {
-    if (replicateLevel(queue->getSettings()) == RL_MESSAGES) {
+    if (replicateLevel(queue->getSettings()) == RL_ALL) {
         boost::shared_ptr<QueueReplicator> qr(new QueueReplicator(queue, link));
         broker.getExchanges().registerExchange(qr);
         qr->activate();

Modified: qpid/trunk/qpid/cpp/src/tests/ha_tests.py
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/cpp/src/tests/ha_tests.py?rev=1297234&r1=1297233&r2=1297234&view=diff
==============================================================================
--- qpid/trunk/qpid/cpp/src/tests/ha_tests.py (original)
+++ qpid/trunk/qpid/cpp/src/tests/ha_tests.py Mon Mar  5 21:31:58 2012
@@ -58,11 +58,11 @@ class HaBroker(Broker):
 
     def config_replicate(self, from_broker, queue):
         assert os.system(
-            "%s/qpid-config --broker=%s add queue --replicate-from %s %s"%(self.commands, self.host_port(), from_broker, queue)) == 0
+            "%s/qpid-config --broker=%s add queue --start-replica %s %s"%(self.commands, self.host_port(), from_broker, queue)) == 0
 
     def config_declare(self, queue, replication):
         assert os.system(
-            "%s/qpid-config --broker=%s add queue %s --replication %s"%(self.commands, self.host_port(), queue, replication)) == 0
+            "%s/qpid-config --broker=%s add queue %s --replicate %s"%(self.commands, self.host_port(), queue, replication)) == 0
 
 class HaCluster(object):
     _cluster_count = 0
@@ -98,7 +98,7 @@ class HaCluster(object):
     def __iter__(self): return self._brokers.__iter__()
 
 
-def qr_node(value="messages"): return "node:{x-declare:{arguments:{'qpid.replicate':%s}}}" % value
+def qr_node(value="all"): return "node:{x-declare:{arguments:{'qpid.replicate':%s}}}" % value
 
 class ShortTests(BrokerTest):
     """Short HA functionality tests."""
@@ -148,18 +148,18 @@ class ShortTests(BrokerTest):
             return"%s;{create:always,node:{type:topic,x-declare:{arguments:{'qpid.replicate':%s}, type:'fanout'},x-bindings:[{exchange:'%s',queue:'%s'}]}}"%(name, replicate, name, bindq)
         def setup(p, prefix, primary):
             """Create config, send messages on the primary p"""
-            s = p.sender(queue(prefix+"q1", "messages"))
+            s = p.sender(queue(prefix+"q1", "all"))
             for m in ["a", "b", "1"]: s.send(Message(m))
             # Test replication of dequeue
             self.assertEqual(p.receiver(prefix+"q1").fetch(timeout=0).content, "a")
             p.acknowledge()
             p.sender(queue(prefix+"q2", "configuration")).send(Message("2"))
             p.sender(queue(prefix+"q3", "none")).send(Message("3"))
-            p.sender(exchange(prefix+"e1", "messages", prefix+"q1")).send(Message("4"))
-            p.sender(exchange(prefix+"e2", "messages", prefix+"q2")).send(Message("5"))
+            p.sender(exchange(prefix+"e1", "all", prefix+"q1")).send(Message("4"))
+            p.sender(exchange(prefix+"e2", "all", prefix+"q2")).send(Message("5"))
             # Test  unbind
-            p.sender(queue(prefix+"q4", "messages")).send(Message("6"))
-            s3 = p.sender(exchange(prefix+"e4", "messages", prefix+"q4"))
+            p.sender(queue(prefix+"q4", "all")).send(Message("6"))
+            s3 = p.sender(exchange(prefix+"e4", "all", prefix+"q4"))
             s3.send(Message("7"))
             # Use old connection to unbind
             us = primary.connect_old().session(str(uuid4()))
@@ -204,7 +204,7 @@ class ShortTests(BrokerTest):
         verify(b, "1", p)
         verify(b, "2", p)
         # Test a series of messages, enqueue all then dequeue all.
-        s = p.sender(queue("foo","messages"))
+        s = p.sender(queue("foo","all"))
         self.wait(b, "foo")
         msgs = [str(i) for i in range(10)]
         for m in msgs: s.send(Message(m))
@@ -232,7 +232,7 @@ class ShortTests(BrokerTest):
         primary = HaBroker(self, name="primary")
         primary.promote()
         p = primary.connect().session()
-        s = p.sender(queue("q","messages"))
+        s = p.sender(queue("q","all"))
         for m in [str(i) for i in range(0,10)]: s.send(m)
         s.sync()
         backup1 = HaBroker(self, name="backup1", broker_url=primary.host_port())
@@ -260,14 +260,14 @@ class ShortTests(BrokerTest):
         sender = self.popen(
             ["qpid-send",
              "--broker", primary.host_port(),
-             "--address", "q;{create:always,%s}"%(qr_node("messages")),
+             "--address", "q;{create:always,%s}"%(qr_node("all")),
              "--messages=1000",
              "--content-string=x"
              ])
         receiver = self.popen(
             ["qpid-receive",
              "--broker", primary.host_port(),
-             "--address", "q;{create:always,%s}"%(qr_node("messages")),
+             "--address", "q;{create:always,%s}"%(qr_node("all")),
              "--messages=990",
              "--timeout=10"
              ])
@@ -352,7 +352,7 @@ class ShortTests(BrokerTest):
     def test_qpid_config_replication(self):
         """Set up replication via qpid-config"""
         brokers = HaCluster(self,2)
-        brokers[0].config_declare("q","messages")
+        brokers[0].config_declare("q","all")
         brokers[0].connect().session().sender("q").send("foo")
         self.assert_browse_backup(brokers[1], "q", ["foo"])
 
@@ -389,8 +389,8 @@ class ShortTests(BrokerTest):
         cluster = HaCluster(self, 2)
         primary = cluster[0]
         pc = cluster.connect(0)
-        ps = pc.session().sender("q;{create:always,%s}"%qr_node("messages"))
-        pr = pc.session().receiver("q;{create:always,%s}"%qr_node("messages"))
+        ps = pc.session().sender("q;{create:always,%s}"%qr_node("all"))
+        pr = pc.session().receiver("q;{create:always,%s}"%qr_node("all"))
         backup = HaBroker(self, name="backup", ha_cluster=False, args=["--log-enable=debug+"])
         br = backup.connect().session().receiver("q;{create:always}")
         backup.replicate(cluster.url, "q")
@@ -410,7 +410,7 @@ class ShortTests(BrokerTest):
         primary  = HaBroker(self, name="primary")
         primary.promote()
         backup = HaBroker(self, name="backup", broker_url=primary.host_port())
-        s = primary.connect().session().sender("lvq; {create:always, node:{x-declare:{arguments:{'qpid.last_value_queue_key':lvq-key, 'qpid.replicate':messages}}}}")
+        s = primary.connect().session().sender("lvq; {create:always, node:{x-declare:{arguments:{'qpid.last_value_queue_key':lvq-key, 'qpid.replicate':all}}}}")
         def send(key,value): s.send(Message(content=value,properties={"lvq-key":key}))
         for kv in [("a","a-1"),("b","b-1"),("a","a-2"),("a","a-3"),("c","c-1"),("c","c-2")]:
             send(*kv)
@@ -426,7 +426,7 @@ class ShortTests(BrokerTest):
         primary  = HaBroker(self, name="primary")
         primary.promote()
         backup = HaBroker(self, name="backup", broker_url=primary.host_port())
-        s = primary.connect().session().sender("q; {create:always, node:{x-declare:{arguments:{'qpid.policy_type':ring, 'qpid.max_count':5, 'qpid.replicate':messages}}}}")
+        s = primary.connect().session().sender("q; {create:always, node:{x-declare:{arguments:{'qpid.policy_type':ring, 'qpid.max_count':5, 'qpid.replicate':all}}}}")
         for i in range(10): s.send(Message(str(i)))
         self.assert_browse_backup(backup, "q", [str(i) for i in range(5,10)])
 
@@ -435,7 +435,7 @@ class ShortTests(BrokerTest):
         primary  = HaBroker(self, name="primary")
         primary.promote()
         backup = HaBroker(self, name="backup", broker_url=primary.host_port())
-        s = primary.connect().session().sender("q; {create:always, node:{x-declare:{arguments:{'qpid.policy_type':reject, 'qpid.max_count':5, 'qpid.replicate':messages}}}}")
+        s = primary.connect().session().sender("q; {create:always, node:{x-declare:{arguments:{'qpid.policy_type':reject, 'qpid.max_count':5, 'qpid.replicate':all}}}}")
         try:
             for i in range(10): s.send(Message(str(i)), sync=False)
         except qpid.messaging.exceptions.TargetCapacityExceeded: pass
@@ -447,7 +447,7 @@ class ShortTests(BrokerTest):
         primary.promote()
         backup = HaBroker(self, name="backup", broker_url=primary.host_port())
         session = primary.connect().session()
-        s = session.sender("priority-queue; {create:always, node:{x-declare:{arguments:{'qpid.priorities':10, 'qpid.replicate':messages}}}}")
+        s = session.sender("priority-queue; {create:always, node:{x-declare:{arguments:{'qpid.priorities':10, 'qpid.replicate':all}}}}")
         priorities = [8,9,5,1,2,2,3,4,9,7,8,9,9,2]
         for p in priorities: s.send(Message(priority=p))
         # Can't use browse_backup as browser sees messages in delivery order not priority.
@@ -466,7 +466,7 @@ class ShortTests(BrokerTest):
         priorities = [4,5,3,7,8,8,2,8,2,8,8,16,6,6,6,6,6,6,8,3,5,8,3,5,5,3,3,8,8,3,7,3,7,7,7,8,8,8,2,3]
         limits={7:0,6:4,5:3,4:2,3:2,2:2,1:2}
         limit_policy = ",".join(["'qpid.fairshare':5"] + ["'qpid.fairshare-%s':%s"%(i[0],i[1]) for i in limits.iteritems()])
-        s = session.sender("priority-queue; {create:always, node:{x-declare:{arguments:{'qpid.priorities':%s, %s, 'qpid.replicate':messages}}}}"%(levels,limit_policy))
+        s = session.sender("priority-queue; {create:always, node:{x-declare:{arguments:{'qpid.priorities':%s, %s, 'qpid.replicate':all}}}}"%(levels,limit_policy))
         messages = [Message(content=str(uuid4()), priority = p) for p in priorities]
         for m in messages: s.send(m)
         self.wait_backup(backup, s.target)
@@ -480,7 +480,7 @@ class ShortTests(BrokerTest):
         primary  = HaBroker(self, name="primary")
         primary.promote()
         backup = HaBroker(self, name="backup", broker_url=primary.host_port())
-        s = primary.connect().session().sender("q; {create:always, node:{x-declare:{arguments:{'qpid.policy_type':ring, 'qpid.max_count':5, 'qpid.priorities':10, 'qpid.replicate':messages}}}}")
+        s = primary.connect().session().sender("q; {create:always, node:{x-declare:{arguments:{'qpid.policy_type':ring, 'qpid.max_count':5, 'qpid.priorities':10, 'qpid.replicate':all}}}}")
         priorities = [8,9,5,1,2,2,3,4,9,7,8,9,9,2]
         for p in priorities: s.send(Message(priority=p))
         # FIXME aconway 2012-02-22: there is a bug in priority ring queues that allows a low

Modified: qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP-Book.xml
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP-Book.xml?rev=1297234&r1=1297233&r2=1297234&view=diff
==============================================================================
--- qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP-Book.xml (original)
+++ qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP-Book.xml Mon Mar  5 21:31:58 2012
@@ -20,7 +20,7 @@
 
 -->
 
-<book>
+<book xmlns:xi="http://www.w3.org/2001/XInclude">
   <title>AMQP Messaging Broker (Implemented in C++)</title>
   <preface>
     <title>Introduction</title>
@@ -46,21 +46,20 @@
       Running the AMQP Messaging Broker
     </title>
 
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Running-CPP-Broker.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Cheat-Sheet-for-configuring-Queue-Options.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Cheat-Sheet-for-configuring-Exchange-Options.xml"/>
-
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Using-Broker-Federation.xml"/>
-
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Security.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="LVQ.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="queue-state-replication.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Starting-a-cluster.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="producer-flow-control.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="AMQP-Compatibility.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Qpid-Interoperability-Documentation.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Using-message-groups.xml"/>
-
+    <xi:include href="Running-CPP-Broker.xml"/>
+    <xi:include href="Cheat-Sheet-for-configuring-Queue-Options.xml"/>
+    <xi:include href="Cheat-Sheet-for-configuring-Exchange-Options.xml"/>
+    <xi:include href="Using-Broker-Federation.xml"/>
+    <xi:include href="Security.xml"/>
+    <xi:include href="LVQ.xml"/>
+    <xi:include href="queue-state-replication.xml"/>
+    <xi:include href="Active-Active-Cluster.xml"/>
+    <xi:include href="producer-flow-control.xml"/>
+    <xi:include href="AMQP-Compatibility.xml"/>
+    <xi:include href="Qpid-Interoperability-Documentation.xml"/>
+    <xi:include href="Using-message-groups.xml"/>
+    <xi:include href="Active-Passive-Cluster.xml"/>
+    <xi:include href="HA-Queue-Replication.xml"/>
 </chapter>
 
 
@@ -69,8 +68,8 @@
       Managing the AMQP Messaging Broker
     </title>
 
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Managing-CPP-Broker.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Qpid-Management-Framework.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="QMF-Python-Console-Tutorial.xml"/>
+    <xi:include href="Managing-CPP-Broker.xml"/>
+    <xi:include href="Qpid-Management-Framework.xml"/>
+    <xi:include href="QMF-Python-Console-Tutorial.xml"/>
 </chapter>
 </book>

Modified: qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP.xml
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP.xml?rev=1297234&r1=1297233&r2=1297234&view=diff
==============================================================================
--- qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP.xml (original)
+++ qpid/trunk/qpid/doc/book/src/AMQP-Messaging-Broker-CPP.xml Mon Mar  5 21:31:58 2012
@@ -49,7 +49,7 @@
     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="SSL.xml"/>
     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="LVQ.xml"/>
     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="queue-state-replication.xml"/>
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Starting-a-cluster.xml"/>
+    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Active-Active-Cluster.xml"/>
     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ACL.xml"/>
     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="producer-flow-control.xml"/>
     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Using-message-groups.xml"/>

Added: qpid/trunk/qpid/doc/book/src/Active-Active-Cluster.xml
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/doc/book/src/Active-Active-Cluster.xml?rev=1297234&view=auto
==============================================================================
--- qpid/trunk/qpid/doc/book/src/Active-Active-Cluster.xml (added)
+++ qpid/trunk/qpid/doc/book/src/Active-Active-Cluster.xml Mon Mar  5 21:31:58 2012
@@ -0,0 +1,561 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!--
+
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+-->
+
+<section id="chap-Messaging_User_Guide-Active_Active_Cluster">
+  <title>Active-active Messaging Clusters</title>
+  <para>
+    Active-active Messaging Clusters provide fault tolerance by ensuring that every broker in a <firstterm>cluster</firstterm> has the same queues, exchanges, messages, and bindings, and allowing a client to <firstterm>fail over</firstterm> to a new broker and continue without any loss of messages if the current broker fails or becomes unavailable. <firstterm>Active-active</firstterm> refers to the fact that all brokers in the cluster can actively serve clients.  Because all brokers are automatically kept in a consistent state, clients can connect to and use any broker in a cluster. Any number of messaging brokers can be run as one <firstterm>cluster</firstterm>, and brokers can be added to or removed from a cluster while it is in use.
+  </para>
+  <para>
+    High Availability Messaging Clusters are implemented using using the <ulink url="http://www.openais.org/">OpenAIS Cluster Framework</ulink>.
+  </para>
+  <para>
+    An OpenAIS daemon runs on every machine in the cluster, and these daemons communicate using multicast on a particular address. Every qpidd process in a cluster joins a named group that is automatically synchronized using OpenAIS Closed Process Groups (CPG) â the qpidd processes multicast events to the named group, and CPG ensures that each qpidd process receives all the events in the same sequence. All members get an identical sequence of events, so they can all update their state consistently.
+  </para>
+  <para>
+    Two messaging brokers are in the same cluster if
+    <orderedlist>
+      <listitem>
+	<para>
+	  They run on hosts in the same OpenAIS cluster; that is, OpenAIS is configured with the same mcastaddr, mcastport and bindnetaddr, and
+	</para>
+
+      </listitem>
+      <listitem>
+	<para>
+	  They use the same cluster name.
+	</para>
+
+      </listitem>
+
+    </orderedlist>
+
+  </para>
+  <para>
+    High Availability Clustering has a cost: in order to allow each broker in a cluster to continue the work of any other broker, a cluster must replicate state for all brokers in the cluster. Because of this, the brokers in a cluster should normally be on a LAN; there should be fast and reliable connections between brokers. Even on a LAN, using multiple brokers in a cluster is somewhat slower than using a single broker without clustering. This may be counter-intuitive for people who are used to clustering in the context of High Performance Computing or High Throughput Computing, where clustering increases performance or throughput.
+  </para>
+
+  <para>
+    High Availability Messaging Clusters should be used together with Red Hat Clustering Services (RHCS); without RHCS, clusters are vulnerable to the &#34;split-brain&#34; condition, in which a network failure splits the cluster into two sub-clusters that cannot communicate with each other. See the documentation on the <command>--cluster-cman</command> option for details on running using RHCS with High Availability Messaging Clusters. See the <ulink url="http://sources.redhat.com/cluster/wiki">CMAN Wiki</ulink> for more detail on CMAN and split-brain conditions. Use the <command>--cluster-cman</command> option to enable RHCS when starting the broker.
+  </para>
+  <section id="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-Starting_a_Broker_in_a_Cluster">
+    <title>Starting a Broker in a Cluster</title>
+    <para>
+      Clustering is implemented using the <filename>cluster.so</filename> module, which is loaded by default when you start a broker. To run brokers in a cluster, make sure they all use the same OpenAIS mcastaddr, mcastport, and bindnetaddr. All brokers in a cluster must also have the same cluster name â specify the cluster name in <filename>qpidd.conf</filename>:
+    </para>
+
+    <screen>cluster-name=&#34;local_test_cluster&#34;
+    </screen>
+    <para>
+      On RHEL6, you must create the file <filename>/etc/corosync/uidgid.d/qpidd</filename> to tell Corosync the name of the user running the broker.By default, the user is qpidd:
+    </para>
+
+    <programlisting>
+      uidgid {
+      uid: qpidd
+      gid: qpidd
+      }
+    </programlisting>
+    <para>
+      On RHEL5, the primary group for the process running qpidd must be the ais group. If you are running qpidd as a service, it is run as the <command>qpidd</command> user, which is already in the ais group. If you are running the broker from the command line, you must ensure that the primary group for the user running qpidd is ais. You can set the primary group using <command>newgrp</command>:
+    </para>
+
+    <screen>$ newgrp ais
+    </screen>
+    <para>
+      You can then run the broker from the command line, specifying the cluster name as an option.
+    </para>
+
+    <screen>[jonathan@localhost]$ qpidd --cluster-name=&#34;local_test_cluster&#34;
+    </screen>
+    <para>
+      All brokers in a cluster must have identical configuration, with a few exceptions noted below. They must load the same set of plug-ins, and have matching configuration files and command line arguments. The should also have identical ACL files and SASL databases if these are used. If one broker uses persistence, all must use persistence â a mix of transient and persistent brokers is not allowed. Differences in configuration can cause brokers to exit the cluster. For instance, if different ACL settings allow a client to access a queue on broker A but not on broker B, then publishing to the queue will succeed on A and fail on B, so B will exit the cluster to prevent inconsistency.
+    </para>
+    <para>
+      The following settings can differ for brokers on a given cluster:
+    </para>
+    <itemizedlist>
+      <listitem>
+	<para>
+	  logging options
+	</para>
+
+      </listitem>
+      <listitem>
+	<para>
+	  cluster-url â if set, it will be different for each broker.
+	</para>
+
+      </listitem>
+      <listitem>
+	<para>
+	  port â brokers can listen on different ports.
+	</para>
+
+      </listitem>
+
+    </itemizedlist>
+    <para>
+      The qpid log contains entries that record significant clustering events, e.g. when a broker becomes a member of a cluster, the membership of a cluster is changed, or an old journal is moved out of the way. For instance, the following message states that a broker has been added to a cluster as the first node:
+    </para>
+
+    <screen>
+      2009-07-09 18:13:41 info 127.0.0.1:1410(READY) member update: 127.0.0.1:1410(member)
+      2009-07-09 18:13:41 notice 127.0.0.1:1410(READY) first in cluster
+    </screen>
+    <note>
+      <para>
+	If you are using SELinux, the qpidd process and OpenAIS must have the same SELinux context, or else SELinux must be set to permissive mode. If both qpidd and OpenAIS are run as services, they have the same SELinux context. If both OpenAIS and qpidd are run as user processes, they have the same SELinux context. If one is run as a service, and the other is run as a user process, they have different SELinux contexts.
+      </para>
+
+    </note>
+    <para>
+      The following options are available for clustering:
+    </para>
+    <table frame="all" id="tabl-Messaging_User_Guide-Starting_a_Broker_in_a_Cluster-Options_for_High_Availability_Messaging_Cluster">
+      <title>Options for High Availability Messaging Cluster</title>
+      <tgroup align="left" cols="2" colsep="1" rowsep="1">
+	<colspec colname="c1" colwidth="1*"></colspec>
+	<colspec colname="c2" colwidth="4*"></colspec>
+	<thead>
+	  <row>
+	    <entry align="center" nameend="c2" namest="c1">
+	      Options for High Availability Messaging Cluster
+	    </entry>
+
+	  </row>
+
+	</thead>
+	<tbody>
+	  <row>
+	    <entry>
+	      <command>--cluster-name <replaceable>NAME</replaceable></command>
+	    </entry>
+	    <entry>
+	      Name of the Messaging Cluster to join. A Messaging Cluster consists of all brokers started with the same cluster-name and openais configuration.
+	    </entry>
+
+	  </row>
+	  <row>
+	    <entry>
+	      <command>--cluster-size <replaceable>N</replaceable></command>
+	    </entry>
+	    <entry>
+	      Wait for at least N initial members before completing cluster initialization and serving clients. Use this option in a persistent cluster so all brokers in a persistent cluster can exchange the status of their persistent store and do consistency checks before serving clients.
+	    </entry>
+
+	  </row>
+	  <row>
+	    <entry>
+	      <command>--cluster-url <replaceable>URL</replaceable></command>
+	    </entry>
+	    <entry>
+	      An AMQP URL containing the local address that the broker advertizes to clients for fail-over connections. This is different for each host. By default, all local addresses for the broker are advertized. You only need to set this if
+	      <orderedlist>
+		<listitem>
+		  <para>
+		    Your host has more than one active network interface, and
+		  </para>
+
+		</listitem>
+		<listitem>
+		  <para>
+		    You want to restrict client fail-over to a specific interface or interfaces.
+		  </para>
+
+		</listitem>
+
+	      </orderedlist>
+	      <para>Each broker in the cluster is specified using the following form:</para>
+
+	      <programlisting>url = [&#34;amqp:&#34;][ user [&#34;/&#34; password] &#34;@&#34; ] protocol_addr
+	      (&#34;,&#34; protocol_addr)*
+	      protocol_addr = tcp_addr / rmda_addr / ssl_addr / ...
+	      tcp_addr = [&#34;tcp:&#34;] host [&#34;:&#34; port]
+	      rdma_addr = &#34;rdma:&#34; host [&#34;:&#34; port]
+	      ssl_addr = &#34;ssl:&#34; host [&#34;:&#34; port]</programlisting>
+
+	      <para>In most cases, only one address is advertized, but more than one address can be specified in if the machine running the broker has more than one network interface card, and you want to allow clients to connect using multiple network interfaces. Use a comma delimiter (&#34;,&#34;) to separate brokers in the URL. Examples:</para>
+	      <itemizedlist>
+		<listitem>
+		  <para>
+		    <command>amqp:tcp:192.168.1.103:5672</command> advertizes a single address to the broker for failover.
+		  </para>
+
+		</listitem>
+		<listitem>
+		  <para>
+		    <command>amqp:tcp:192.168.1.103:5672,tcp:192.168.1.105:5672</command> advertizes two different addresses to the broker for failover, on two different network interfaces.
+		  </para>
+
+		</listitem>
+
+	      </itemizedlist>
+
+	    </entry>
+
+	  </row>
+	  <row>
+	    <entry>
+	      <command>--cluster-cman</command>
+	    </entry>
+	    <entry>
+	      <para>
+		CMAN protects against the &#34;split-brain&#34; condition, in which a network failure splits the cluster into two sub-clusters that cannot communicate with each other. When &#34;split-brain&#34; occurs, each of the sub-clusters can access shared resources without knowledge of the other sub-cluster, resulting in corrupted cluster integrity.
+	      </para>
+	      <para>
+		To avoid &#34;split-brain&#34;, CMAN uses the notion of a &#34;quorum&#34;. If more than half the cluster nodes are active, the cluster has quorum and can act. If half (or fewer) nodes are active, the cluster does not have quorum, and all cluster activity is stopped. There are other ways to define the quorum for particular use cases (e.g. a cluster of only 2 members), see the <ulink url="http://sources.redhat.com/cluster/wiki">CMAN Wiki</ulink>
+		for more detail.
+	      </para>
+	      <para>
+		When enabled, the broker will wait until it belongs to a quorate cluster before accepting client connections. It continually monitors the quorum status and shuts down immediately if the node it runs on loses touch with the quorum.
+	      </para>
+
+	    </entry>
+
+	  </row>
+	  <row>
+	    <entry>
+	      --cluster-username
+	    </entry>
+	    <entry>
+	      SASL username for connections between brokers.
+	    </entry>
+
+	  </row>
+	  <row>
+	    <entry>
+	      --cluster-password
+	    </entry>
+	    <entry>
+	      SASL password for connections between brokers.
+	    </entry>
+
+	  </row>
+	  <row>
+	    <entry>
+	      --cluster-mechanism
+	    </entry>
+	    <entry>
+	      SASL authentication mechanism for connections between brokers
+	    </entry>
+
+	  </row>
+
+	</tbody>
+
+      </tgroup>
+
+    </table>
+    <para>
+      If a broker is unable to establish a connection to another broker in the cluster, the log will contain SASL errors, e.g:
+    </para>
+
+    <screen>2009-aug-04 10:17:37 info SASL: Authentication failed: SASL(-13): user not found: Password verification failed
+    </screen>
+    <para>
+      You can set the SASL user name and password used to connect to other brokers using the <command>cluster-username</command> and <command>cluster-password</command> properties when you start the broker. In most environment, it is easiest to create an account with the same user name and password on each broker in the cluster, and use these as the <command>cluster-username</command> and <command>cluster-password</command>. You can also set the SASL mode using <command>cluster-mechanism</command>. Remember that any mechanism you enable for broker-to-broker communication can also be used by a client, so do not enable <command>cluster-mechanism=ANONYMOUS</command> in a secure environment.
+    </para>
+    <para>
+      Once the cluster is running, run <command>qpid-cluster</command> to make sure that the brokers are running as one cluster. See the following section for details.
+    </para>
+    <para>
+      If the cluster is correctly configured, queues and messages are replicated to all brokers in the cluster, so an easy way to test the cluster is to run a program that routes messages to a queue on one broker, then to a different broker in the same cluster and read the messages to make sure they have been replicated. The <command>drain</command> and <command>spout</command> programs can be used for this test.
+    </para>
+
+  </section>
+
+  <section id="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-qpid_cluster">
+    <title>qpid-cluster</title>
+    <para>
+      <command>qpid-cluster</command> is a command-line utility that allows you to view information on a cluster and its brokers, disconnect a client connection, shut down a broker in a cluster, or shut down the entire cluster. You can see the options using the <command>--help</command> option:
+    </para>
+
+    <screen>$ ./qpid-cluster --help
+    </screen>
+
+    <screen>Usage:  qpid-cluster [OPTIONS] [broker-addr]
+
+    broker-addr is in the form:   [username/password@] hostname | ip-address [:&#60;port&#62;]
+    ex:  localhost, 10.1.1.7:10000, broker-host:10000, guest/guest@localhost
+
+    Options:
+    -C [--all-connections]  View client connections to all cluster members
+    -c [--connections] ID   View client connections to specified member
+    -d [--del-connection] HOST:PORT
+    Disconnect a client connection
+    -s [--stop] ID          Stop one member of the cluster by its ID
+    -k [--all-stop]         Shut down the whole cluster
+    -f [--force]            Suppress the &#39;are-you-sure?&#39; prompt
+    -n [--numeric]          Don&#39;t resolve names
+    </screen>
+    <para>
+      Let&#39;s connect to a cluster and display basic information about the cluser and its brokers. When you connect to the cluster using <command>qpid-tool</command>, you can use the host and port for any broker in the cluster. For instance, if a broker in the cluster is running on <filename>localhost</filename> on port 6664, you can start <command>qpid-tool</command> like this:
+    </para>
+
+    <screen>
+      $ qpid-cluster localhost:6664
+    </screen>
+    <para>
+      Here is the output:
+    </para>
+
+    <screen>
+      Cluster Name: local_test_cluster
+      Cluster Status: ACTIVE
+      Cluster Size: 3
+      Members: ID=127.0.0.1:13143 URL=amqp:tcp:192.168.1.101:6664,tcp:192.168.122.1:6664,tcp:10.16.10.62:6664
+      : ID=127.0.0.1:13167 URL=amqp:tcp:192.168.1.101:6665,tcp:192.168.122.1:6665,tcp:10.16.10.62:6665
+      : ID=127.0.0.1:13192 URL=amqp:tcp:192.168.1.101:6666,tcp:192.168.122.1:6666,tcp:10.16.10.62:6666
+    </screen>
+    <para>
+      The ID for each broker in cluster is given on the left. For instance, the ID for the first broker in the cluster is <command>127.0.0.1:13143</command>. The URL in the output is the broker&#39;s advertized address. Let&#39;s use the ID to shut the broker down using the <command>--stop</command> command:
+    </para>
+
+    <screen>$ ./qpid-cluster localhost:6664 --stop 127.0.0.1:13143
+    </screen>
+
+  </section>
+
+  <section id="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-Failover_in_Clients">
+    <title>Failover in Clients</title>
+    <para>
+      If a client is connected to a broker, the connection fails if the broker crashes or is killed. If heartbeat is enabled for the connection, a connection also fails if the broker hangs, the machine the broker is running on fails, or the network connection to the broker is lost â the connection fails no later than twice the heartbeat interval.
+    </para>
+    <para>
+      When a client&#39;s connection to a broker fails, any sent messages that have been acknowledged to the sender will have been replicated to all brokers in the cluster, any received messages that have not yet been acknowledged by the receiving client requeued to all brokers, and the client API notifies the application of the failure by throwing an exception.
+    </para>
+    <para>
+      Clients can be configured to automatically reconnect to another broker when it receives such an exception. Any messages that have been sent by the client, but not yet acknowledged as delivered, are resent. Any messages that have been read by the client, but not acknowledged, are delivered to the client.
+    </para>
+    <para>
+      TCP is slow to detect connection failures. A client can configure a connection to use a heartbeat to detect connection failure, and can specify a time interval for the heartbeat. If heartbeats are in use, failures will be detected no later than twice the heartbeat interval. The Java JMS client enables hearbeat by default. See the sections on Failover in Java JMS Clients and Failover in C++ Clients for the code to enable heartbeat.
+    </para>
+    <section id="sect-Messaging_User_Guide-Failover_in_Clients-Failover_in_Java_JMS_Clients">
+      <title>Failover in Java JMS Clients</title>
+      <para>
+	In Java JMS clients, client failover is handled automatically if it is enabled in the connection. Any messages that have been sent by the client, but not yet acknowledged as delivered, are resent. Any messages that have been read by the client, but not acknowledged, are sent to the client.
+      </para>
+      <para>
+	You can configure a connection to use failover using the <command>failover</command> property:
+      </para>
+
+      <screen>
+	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;&amp;failover=&#39;failover_exchange&#39;
+      </screen>
+      <para>
+	This property can take three values:
+      </para>
+      <variablelist id="vari-Messaging_User_Guide-Failover_in_Java_JMS_Clients-Failover_Modes">
+	<title>Failover Modes</title>
+	<varlistentry>
+	  <term>failover_exchange</term>
+	  <listitem>
+	    <para>
+	      If the connection fails, fail over to any other broker in the cluster.
+	    </para>
+
+	  </listitem>
+
+	</varlistentry>
+	<varlistentry>
+	  <term>roundrobin</term>
+	  <listitem>
+	    <para>
+	      If the connection fails, fail over to one of the brokers specified in the <command>brokerlist</command>.
+	    </para>
+
+	  </listitem>
+
+	</varlistentry>
+	<varlistentry>
+	  <term>singlebroker</term>
+	  <listitem>
+	    <para>
+	      Failover is not supported; the connection is to a single broker only.
+	    </para>
+
+	  </listitem>
+
+	</varlistentry>
+
+      </variablelist>
+      <para>
+	In a Connection URL, heartbeat is set using the <command>idle_timeout</command> property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds:
+      </para>
+
+      <screen>
+	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;,idle_timeout=3
+      </screen>
+
+    </section>
+
+    <section id="sect-Messaging_User_Guide-Failover_in_Clients-Failover_and_the_Qpid_Messaging_API">
+      <title>Failover and the Qpid Messaging API</title>
+      <para>
+	The Qpid Messaging API also supports automatic reconnection in the event a connection fails. . Senders can also be configured to replay any in-doubt messages (i.e. messages whice were sent but not acknowleged by the broker. See &#34;Connection Options&#34; and &#34;Sender Capacity and Replay&#34; in <citetitle>Programming in Apache Qpid</citetitle> for details.
+      </para>
+      <para>
+	In C++ and python clients, heartbeats are disabled by default. You can enable them by specifying a heartbeat interval (in seconds) for the connection via the &#39;heartbeat&#39; option.
+      </para>
+      <para>
+	See &#34;Cluster Failover&#34; in <citetitle>Programming in Apache Qpid</citetitle> for details on how to keep the client aware of cluster membership.
+      </para>
+
+    </section>
+
+
+  </section>
+
+  <section id="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-Error_handling_in_Clusters">
+    <title>Error handling in Clusters</title>
+    <para>
+      If a broker crashes or is killed, or a broker machine failure, broker connection failure, or a broker hang is detected, the other brokers in the cluster are notified that it is no longer a member of the cluster. If a new broker is joined to the cluster, it synchronizes with an active broker to obtain the current cluster state; if this synchronization fails, the new broker exit the cluster and aborts.
+    </para>
+    <para>
+      If a broker becomes extremely busy and stops responding, it stops accepting incoming work. All other brokers continue processing, and the non-responsive node caches all AIS traffic. When it resumes, the broker completes processes all cached AIS events, then accepts further incoming work. <!--               If a broker is non-responsive for too long, it is assumed to be hanging, and treated as described in the previous paragraph.               -->
+    </para>
+    <para>
+      Broker hangs are only detected if the watchdog plugin is loaded and the <command>--watchdog-interval</command> option is set. The watchdog plug-in kills the qpidd broker process if it becomes stuck for longer than the watchdog interval. In some cases, e.g. certain phases of error resolution, it is possible for a stuck process to hang other cluster members that are waiting for it to send a message. Using the watchdog, the stuck process is terminated and removed from the cluster, allowing other members to continue and clients of the stuck process to fail over to other members.
+    </para>
+    <para>
+      Redundancy can also be achieved directly in the AIS network by specifying more than one network interface in the AIS configuration file. This causes Totem to use a redundant ring protocol, which makes failure of a single network transparent.
+    </para>
+    <para>
+      Redundancy can be achieved at the operating system level by using NIC bonding, which combines multiple network ports into a single group, effectively aggregating the bandwidth of multiple interfaces into a single connection. This provides both network load balancing and fault tolerance.
+    </para>
+    <para>
+      If any broker encounters an error, the brokers compare notes to see if they all received the same error. If not, the broker removes itself from the cluster and shuts itself down to ensure that all brokers in the cluster have consistent state. For instance, a broker may run out of disk space; if this happens, the broker shuts itself down. Examining the broker&#39;s log can help determine the error and suggest ways to prevent it from occuring in the future.
+    </para>
+    <!--                "Bad case" for cluster matrix - things we will fix, or things users may encounter long term?                -->
+  </section>
+
+  <section id="sect-Messaging_User_Guide-High_Availability_Messaging_Clusters-Persistence_in_High_Availability_Message_Clusters">
+    <title>Persistence in High Availability Message Clusters</title>
+    <para>
+      Persistence and clustering are two different ways to provide reliability. Most systems that use a cluster do not enable persistence, but you can do so if you want to ensure that messages are not lost even if the last broker in a cluster fails. A cluster must have all transient or all persistent members, mixed clusters are not allowed. Each broker in a persistent cluster has it&#39;s own independent replica of the cluster&#39;s state it its store.
+    </para>
+    <section id="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Clean_and_Dirty_Stores">
+      <title>Clean and Dirty Stores</title>
+      <para>
+	When a broker is an active member of a cluster, its store is marked &#34;dirty&#34; because it may be out of date compared to other brokers in the cluster. If a broker leaves a running cluster because it is stopped, it crashes or the host crashes, its store continues to be marked &#34;dirty&#34;.
+      </para>
+      <para>
+	If the cluster is reduced to a single broker, its store is marked &#34;clean&#34; since it is the only broker making updates. If the cluster is shut down with the command <literal>qpid-cluster -k</literal> then all the stores are marked clean.
+      </para>
+      <para>
+	When a cluster is initially formed, brokers with clean stores read from their stores. Brokers with dirty stores, or brokers that join after the cluster is running, discard their old stores and initialize a new store with an update from one of the running brokers. The <command>--truncate</command> option can be used to force a broker to discard all existing stores even if they are clean. (A dirty store is discarded regardless.)
+      </para>
+      <para>
+	Discarded stores are copied to a back up directory. The active store is in &#60;data-dir&#62;/rhm. Back-up stores are in &#60;data-dir&#62;/_cluster.bak.&#60;nnnn&#62;/rhm, where &#60;nnnn&#62; is a 4 digit number. A higher number means a more recent backup.
+      </para>
+
+    </section>
+
+    <section id="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Starting_a_persistent_cluster">
+      <title>Starting a persistent cluster</title>
+      <para>
+	When starting a persistent cluster broker, set the cluster-size option to the number of brokers in the cluster. This allows the brokers to wait until the entire cluster is running so that they can synchronize their stored state.
+      </para>
+      <para>
+	The cluster can start if:
+      </para>
+      <para>
+	<itemizedlist>
+	  <listitem>
+	    <para>
+	      all members have empty stores, or
+	    </para>
+
+	  </listitem>
+	  <listitem>
+	    <para>
+	      at least one member has a clean store
+	    </para>
+
+	  </listitem>
+
+	</itemizedlist>
+
+      </para>
+      <para>
+	All members of the new cluster will be initialized with the state from a clean store.
+      </para>
+
+    </section>
+
+    <section id="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Stopping_a_persistent_cluster">
+      <title>Stopping a persistent cluster</title>
+      <para>
+	To cleanly shut down a persistent cluster use the command <command>qpid-cluster -k</command>. This causes all brokers to synchronize their state and mark their stores as &#34;clean&#34; so they can be used when the cluster restarts.
+      </para>
+
+    </section>
+
+    <section id="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Starting_a_persistent_cluster_with_no_clean_store">
+      <title>Starting a persistent cluster with no clean store</title>
+      <para>
+	If the cluster has previously had a total failure and there are no clean stores then the brokers will fail to start with the log message <literal>Cannot recover, no clean store.</literal> If this happens you can start the cluster by marking one of the stores &#34;clean&#34; as follows:
+      </para>
+      <procedure>
+	<step>
+	  <para>
+	    Move the latest store backup into place in the brokers data-directory. The backups end in a 4 digit number, the latest backup is the highest number.
+	  </para>
+
+	  <screen>
+	    cd &#60;data-dir&#62;
+	    mv rhm rhm.bak
+	    cp -a _cluster.bak.&#60;nnnn&#62;/rhm .
+	  </screen>
+
+	</step>
+	<step>
+	  <para>
+	    Mark the store as clean:
+	    <screen>qpid-cluster-store -c &#60;data-dir&#62;</screen>
+
+	  </para>
+
+	</step>
+
+      </procedure>
+
+      <para>
+	Now you can start the cluster, all members will be initialized from the store you marked as clean.
+      </para>
+
+    </section>
+
+    <section id="sect-Messaging_User_Guide-Persistence_in_High_Availability_Message_Clusters-Isolated_failures_in_a_persistent_cluster">
+      <title>Isolated failures in a persistent cluster</title>
+      <para>
+	A broker in a persistent cluster may encounter errors that other brokers in the cluster do not; if this happens, the broker shuts itself down to avoid making the cluster state inconsistent. For example a disk failure on one node will result in that node shutting down. Running out of storage capacity can also cause a node to shut down because because the brokers may not run out of storage at exactly the same point, even if they have similar storage configuration. To avoid unnecessary broker shutdowns, make sure the queue policy size of each durable queue is less than the capacity of the journal for the queue.
+      </para>
+
+    </section>
+
+
+  </section>
+
+
+</section>

Propchange: qpid/trunk/qpid/doc/book/src/Active-Active-Cluster.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: qpid/trunk/qpid/doc/book/src/Active-Active-Cluster.xml
------------------------------------------------------------------------------
    svn:keywords = Rev Date

Propchange: qpid/trunk/qpid/doc/book/src/Active-Active-Cluster.xml
------------------------------------------------------------------------------
    svn:mime-type = text/xml

Added: qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml?rev=1297234&view=auto
==============================================================================
--- qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml (added)
+++ qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml Mon Mar  5 21:31:58 2012
@@ -0,0 +1,361 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!--
+
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+-->
+
+<section id="chap-Messaging_User_Guide-Active_Passive_Cluster">
+
+  <title>Active-passive Messaging Clusters (Preview)</title>
+
+  <section>
+    <title>Overview</title>
+    <para>
+      This release provides a preview of a new module for High Availability (HA). The new
+      module is not yet complete or ready for production use, it being made available so
+      that users can experiment with the new approach and provide feedback early in the
+      development process.  Feedback should go to <ulink
+      url="mailto:user@qpid.apache.org">user@qpid.apache.org</ulink>.
+    </para>
+    <para>
+      The old cluster module takes an <firstterm>active-active</firstterm> approach,
+      i.e. all the brokers in a cluster are able to handle client requests
+      simultaneously. The new HA module takes an <firstterm>active-passive</firstterm>,
+      <firstterm>hot-standby</firstterm> approach.
+    </para>
+    <para>
+      In an active-passive cluster, only one broker, known as the
+      <firstterm>primary</firstterm>, is active and serving clients at a time. The other
+      brokers are standing by as <firstterm>backups</firstterm>. Changes on the primary
+      are immediately replicated to all the backups so they are always up-to-date or
+      "hot".  If the primary fails, one of the backups is promoted to be the new
+      primary. Clients fail-over to the new primary automatically. If there are multiple
+      backups, the backups also fail-over to become backups of the new primary.
+    </para>
+    <para>
+      The new approach depends on an external <firstterm>cluster resource
+      manager</firstterm> to detect failure of the primary and choose the new primary. The
+      first supported resource manager will be <ulink
+      url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink>, but it will
+      be possible to add integration with other resource managers in the future. The
+      preview version is not integrated with any resource manager, you can use the
+      <command>qpid-ha</command> tool to simulate the actions of a resource manager or do
+      your own integration.
+    </para>
+    <section>
+      <title>Why the new approach?</title>
+      The new active-passive approach has several advantages compared to the
+      existing active-active cluster module.
+      <itemizedlist>
+	<listitem>
+	  It does not depend directly on openais or corosync. It does not use multicast
+	  which simplifies deployment.
+	</listitem>
+	<listitem>
+	  It is more portable: in environments that don't support corosync, it can be
+	  integrated with a resource manager available in that environment.
+	</listitem>
+	<listitem>
+	  Replication to a <firstterm>disaster recovery</firstterm> site can be handled as
+	  simply another node in the cluster, it does not require a separate replication
+	  mechanism.
+	</listitem>
+	<listitem>
+	  It can take advantage of features provided by the resource manager, for example
+	  virtual IP addresses.
+	</listitem>
+	<listitem>
+	  Improved performance and scalability due to better use of multiple CPU s
+	</listitem>
+      </itemizedlist>
+    </section>
+    <section>
+
+      <title>Limitations</title>
+
+      <para>
+	There are a number of known limitations in the current preview implementation. These
+	will be fixed in the production versions.
+      </para>
+
+      <itemizedlist>
+	<listitem>
+	  Transactional changes to queue state are not replicated atomically. If the
+	  primary crashes during a transaction, it is possible that the backup could
+	  contain only part of the changes introduced by a transaction.
+	</listitem>
+	<listitem>
+	  During a fail-over one backup is promoted to primary and any other backups switch to
+	  the new primary. Messages sent to the new primary before all the backups have
+	  switched could be lost if the new primary itself fails before all the backups have
+	  switched.
+	</listitem>
+	<listitem>
+	  When used with a persistent store: if the entire cluster fails, there are no tools
+	  to help identify the most recent store.
+	</listitem>
+	<listitem>
+	  Acknowledgments are confirmed to clients before the message has been dequeued
+	  from replicas or indeed from the local store if that is asynchronous.
+	</listitem>
+	<listitem>
+	  A persistent broker must have its store erased before joining an existing cluster.
+	  In the production version a persistent broker will be able to load its store and
+	  avoid downloading messages that are in the store from the primary.
+	</listitem>
+	<listitem>
+	  Configuration changes (creating or deleting queues, exchanges and bindings) are
+	  replicated asynchronously. Management tools used to make changes will consider the
+	  change complete when it is complete on the primary, it may not yet be replicated
+	  to all the backups.
+	</listitem>
+	<listitem>
+	  Deletions made immediately after a failure (before all the backups are ready) may
+	  be lost on a backup. Queues, exchange or bindings that were deleted on the primary could
+	  re-appear if that backup is promoted to primary on a subsequent failure.
+	</listitem>
+	<listitem>
+	  Better control is needed over which queues/exchanges are replicated and which are not.
+	</listitem>
+	<listitem>
+	  There are some known issues affecting performance, both the throughput of
+	  replication and the time taken for backups to fail-over. Performance will improve
+	  in the production version.
+	</listitem>
+	<listitem>
+	  Federated links from the primary will be lost in fail over, they will not be
+	  re-connected on the new primary. Federation links to the primary can fail over.
+	</listitem>
+	<listitem>
+	  Only plain FIFO queues can be replicated. LVQ and ring queues are not yet supported.
+	</listitem>
+      </itemizedlist>
+    </section>
+  </section>
+
+
+  <section>
+    <title>Configuring the Brokers</title>
+    <para>
+      The broker must load the <filename>ha</filename> module, it is loaded by default
+      when you start a broker. The following broker options are available for the HA module.
+    </para>
+    <table frame="all" id="ha-broker-options">
+      <title>Options for High Availability Messaging Cluster</title>
+      <tgroup align="left" cols="2" colsep="1" rowsep="1">
+	<colspec colname="c1" colwidth="1*"/>
+	<colspec colname="c2" colwidth="4*"/>
+	<thead>
+	  <row>
+	    <entry align="center" nameend="c2" namest="c1">
+	      Options for High Availability Messaging Cluster
+	    </entry>
+	  </row>
+	</thead>
+	<tbody>
+	  <row>
+	    <entry>
+	      <command>--ha-cluster <replaceable>yes|no</replaceable></command>
+	    </entry>
+	    <entry>
+	      Set to "yes" to have the broker join a cluster.
+	    </entry>
+	  </row>
+	  <row>
+	    <entry>
+	      <command>--ha-brokers <replaceable>URL</replaceable></command>
+	    </entry>
+	    <entry>
+	      URL use by brokers to connect to each other. The URL lists the addresses of
+	      all the brokers in the cluster
+	      <footnote>
+		<para>
+		  If the resource manager supports virtual IP addresses then the URL can
+		  contain just the single virtual IP.
+		</para>
+	      </footnote>
+	      in the following form:
+	      <programlisting>
+		url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
+		addr = tcp_addr / rmda_addr / ssl_addr / ...
+		tcp_addr = ["tcp:"] host [":" port]
+		rdma_addr = "rdma:" host [":" port]
+		ssl_addr = "ssl:" host [":" port]'
+	      </programlisting>
+	    </entry>
+	  </row>
+	  <row>
+	    <entry> <command>--ha-public-brokers <replaceable>URL</replaceable></command> </entry>
+	    <entry>
+	      URL used by clients to connect to the brokers in the same format as
+	      <command>--ha-brokers</command> above. Use this option if you want client
+	      traffic on a different network from broker replication traffic. If this
+	      option is not set, clients will use the same URL as brokers.
+	    </entry>
+	  </row>
+	  <row>
+	    <entry>
+	      <para><command>--ha-username <replaceable>USER</replaceable></command></para>
+	      <para><command>--ha-password <replaceable>PASS</replaceable></command></para>
+	      <para><command>--ha-mechanism <replaceable>MECH</replaceable></command></para>
+	    </entry>
+	    <entry>
+	      Brokers use <replaceable>USER</replaceable>,
+	      <replaceable>PASS</replaceable>, <replaceable>MECH</replaceable> to
+	      authenticate when connecting to each other.
+	    </entry>
+	  </row>
+	</tbody>
+      </tgroup>
+    </table>
+    <para>
+      To configure a cluster you must set at least <command>ha-cluster</command> and <command>ha-brokers</command>
+    </para>
+  </section>
+
+
+  <section>
+    <title>Creating replicated queues and exchanges</title>
+    <para>
+      To create a replicated queue or exchange, pass the argument
+      <command>qpid.replicate</command> when creating the queue or exchange. It should
+      have one of the following three values:
+      <itemizedlist>
+	<listitem>
+	  <firstterm>all</firstterm>: Replicate the queue or exchange, messages and bindings.
+	</listitem>
+	<listitem>
+	  <firstterm>configuration</firstterm>: Replicate the existence of the queue or
+	  exchange and bindings but don't replicate messages.
+	</listitem>
+	<listitem>
+	  <firstterm>none</firstterm>: Don't replicate, this is the default.
+	</listitem>
+      </itemizedlist>
+    </para>
+    Bindings are automatically replicated if the queue and exchange being bound both have
+    replication argument of <command>all</command> or <command>confguration</command>, they are
+    not replicated otherwise.
+
+    You can create replicated queues and exchanges with the <command>qpid-config</command>
+    management tool like this:
+    <programlisting>
+      qpid-config add queue myqueue --replicate all
+    </programlisting>
+
+    To create replicated queues and exchangs via the client API, add a <command>node</command> entry to the address like this:
+    <programlisting>
+      "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
+    </programlisting>
+  </section>
+
+
+
+  <section>
+    <title>Client Fail-over</title>
+    <para>
+      Clients can only connect to the single primary broker. All other brokers in the
+      cluster are backups, and they automatically reject any attempt by a client to
+      connect.
+    </para>
+    <para>
+      Clients are configured with the addreses of all of the brokers in the cluster.
+      <footnote>
+	<para>
+	  If the resource manager supports virtual IP addresses then the clients
+	  can be configured with a single virtual IP address.
+	</para>
+      </footnote>
+      When the client tries to connect initially, it will try all of its addresses until it
+      successfully connects to the primary. If the primary fails, clients will try to
+      try to re-connect to all the known brokers until they find the new primary.
+    </para>
+    <para>
+      Suppose your cluster has 3 nodes: <command>node1</command>, <command>node2</command> and <command>node3</command> all using the default AMQP port.
+    </para>
+    <para>
+      With the C++ client, you specify all the cluster addresses in a single URL, for example:
+      <programlisting>
+	qpid::messaging::Connection c("node1:node2:node3");
+      </programlisting>
+    </para>
+    <para>
+      With the python client, you specify <command>reconnect=True</command> and a list of <replaceable>host:port</replaceable> addresses as <command>reconnect_urls</command> when calling <command>establish</command> or <command>open</command>
+      <programlisting>
+	connection = qpid.messaging.Connection.establish("node1", reconnect=True, "reconnect_urls=["node1", "node2", "node3"])
+      </programlisting>
+    </para>
+  </section>
+
+  <section>
+    <title>Broker fail-over</title>
+    <para>
+      Broker fail-over is managed by a <firstterm>cluster resource
+      manager</firstterm>. The initial preview version of HA is not integrated with a
+      resource manager, the production version will be integrated with <ulink
+      url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> and it may
+      be integrated with other resource managers in the future.
+    </para>
+    <para>
+      The resource manager is responsible for ensuring that there is exactly one broker
+      is acting as primary at all times. It selects the initial primary broker when the
+      cluster is started, detects failure of the primary, and chooses the backup to
+      promote as the new primary.
+    </para>
+    <para>
+      You can simulate the actions of a resource manager, or indeed do your own
+      integration with a resource manager using the <command>qpid-ha</command> tool.  The
+      command
+      <programlisting>
+	qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable>
+      </programlisting>
+      will promote the broker listening on
+      <replaceable>host</replaceable>:<replaceable>port</replaceable> to be the primary.
+      You should only promote a broker to primary when there is no other primary in the
+      cluster. The brokers will not detect multiple primaries, they rely on the resource
+      manager to do that.
+    </para>
+    <para>
+      A clustered broker always starts initially in <firstterm>discovery</firstterm>
+      mode. It uses the addresses configured in the <command>ha-brokers</command>
+      configuration option and tries to connect to each in turn until it finds to the
+      primary. The resource manager is responsible for choosing on of the backups to
+      promote as the initial primary.
+    </para>
+    <para>
+      If the primary fails, all the backups are disconnected and return to discovery mode.
+      The resource manager chooses one to promote as the new primary. The other backups
+      will eventually discover the new primary and reconnect.
+    </para>
+  </section>
+  <section>
+    <title>Broker Administration</title>
+    <para>
+      You can connect to a backup broker with the administrative tool
+      <command>qpid-ha</command>. You can also connect with the tools
+      <command>qpid-config</command>, <command>qpid-route</command> and
+      <command>qpid-stat</command> if you pass the flag <command>--ha-admin</command> on the
+      command line.  If you do connect to a backup you should not modify any of the
+      replicated queues, as this will disrupt the replication and may result in
+      message loss.
+    </para>
+  </section>
+</section>
+<!-- LocalWords:  scalability rgmanager multicast RGManager mailto LVQ
+-->

Propchange: qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml
------------------------------------------------------------------------------
    svn:keywords = Rev Date

Propchange: qpid/trunk/qpid/doc/book/src/Active-Passive-Cluster.xml
------------------------------------------------------------------------------
    svn:mime-type = text/xml

Added: qpid/trunk/qpid/doc/book/src/HA-Queue-Replication.xml
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/doc/book/src/HA-Queue-Replication.xml?rev=1297234&view=auto
==============================================================================
--- qpid/trunk/qpid/doc/book/src/HA-Queue-Replication.xml (added)
+++ qpid/trunk/qpid/doc/book/src/HA-Queue-Replication.xml Mon Mar  5 21:31:58 2012
@@ -0,0 +1,54 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!--
+
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+-->
+
+<section>
+  <title>Queue Replication with the HA module</title>
+  <para>
+    As well as support for an active-passive cluster, the <filename>ha</filename> module
+    also allows you to replicate individual queues. The <firstterm>original</firstterm>
+    queue is used as normal.  The <firstterm>replica</firstterm> queue is updated
+    automatically as messages are added to or removed from the original queue.
+  </para>
+  <para>
+    To create a replica you need the HA module to be loaded on both the orignal and replica
+    brokers. Note that it is not safe to modify the replica queue other than via the
+    automatic updates from the original. Adding or removing messages on the replica queue
+    will make replication inconsistent and may cause message loss. The HA module does
+    <emphasis>not</emphasis> enforce restricted access to the replica queue (as it does in
+    the case of a cluster) so it is up to the application to ensure the replca is not used
+    until it has been disconnected from the original.
+  </para>
+  <para>
+    Suppose that <command>myqueue</command> is a queue on <command>node1</command> and
+    we want to create a replica of <command>myqueue</command> on <command>node2</command>
+    (where both brokers are using the default AMQP port.) This is accomplished by the command:
+    <programlisting>
+      qpid-config --broker=node2 add queue --start-replica node1 myqueue
+    </programlisting>
+  </para>
+  <para>
+    If <command>myqueue</command> already exists on the replica broker you  can start replication from the original queue like this:
+    <programlisting>
+      qpid-ha replicate -b node2 node1 myqueue
+    </programlisting>
+  </para>
+</section>

Propchange: qpid/trunk/qpid/doc/book/src/HA-Queue-Replication.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: qpid/trunk/qpid/doc/book/src/HA-Queue-Replication.xml
------------------------------------------------------------------------------
    svn:keywords = Rev Date

Propchange: qpid/trunk/qpid/doc/book/src/HA-Queue-Replication.xml
------------------------------------------------------------------------------
    svn:mime-type = text/xml

Modified: qpid/trunk/qpid/doc/book/src/Managing-CPP-Broker.xml
URL: http://svn.apache.org/viewvc/qpid/trunk/qpid/doc/book/src/Managing-CPP-Broker.xml?rev=1297234&r1=1297233&r2=1297234&view=diff
==============================================================================
--- qpid/trunk/qpid/doc/book/src/Managing-CPP-Broker.xml (original)
+++ qpid/trunk/qpid/doc/book/src/Managing-CPP-Broker.xml Mon Mar  5 21:31:58 2012
@@ -1,6 +1,6 @@
 <?xml version="1.0" encoding="utf-8"?>
 <!--
- 
+
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
@@ -8,16 +8,16 @@
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at
- 
+
    http://www.apache.org/licenses/LICENSE-2.0
- 
+
  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either express or implied.  See the License for the
  specific language governing permissions and limitations
  under the License.
- 
+
 -->
 
 <section id="section-Managing-CPP-Broker">
@@ -38,6 +38,8 @@
             </para></listitem>
             <listitem><para>qpid-printevents - used to receive and print QMF events
             </para></listitem>
+            <listitem><para>qpid-ha - used to interact with the High Availability module
+            </para></listitem>
           </itemizedlist>
 
 	  <section role="h3" id="MgmtC-2B-2B-Usingqpidconfig"><title>
@@ -119,10 +121,10 @@ Total Exchanges: 6
 $ qpid-config queues
 Queue Name                                  Attributes
 =================================================================
-pub_start                                  
-pub_done                                   
-sub_ready                                  
-sub_done                                   
+pub_start
+pub_done
+sub_ready
+sub_done
 perftest0                                   --durable
 reply-dhcp-100-18-254.bos.redhat.com.20713  auto-del excl
 topic-dhcp-100-18-254.bos.redhat.com.20713  auto-del excl
@@ -459,4 +461,25 @@ Options:
             You get the idea... have fun!
           </para>
 <!--h3--></section>
+<section>
+  <title>Using qpid-ha</title>
+  <para>This utility lets you monitor and control the activity of the clustering behavior provided by the HA module.
+  </para>
+  <programlisting>
+    <![CDATA[
+qpid-ha --help
+usage: qpid-ha <command> [<arguments>]
+
+Commands are:
+
+  ready        Test if a backup broker is ready.
+  query        Print HA configuration settings.
+  set          Set HA configuration settings.
+  promote      Promote broker from backup to primary.
+  replicate    Set up replication from <queue> on <remote-broker> to <queue> on the current broker.
+
+For help with a command type: qpid-ha <command> --help
+]]>
+  </programlisting>
+</section>
 </section>



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@qpid.apache.org
For additional commands, e-mail: commits-help@qpid.apache.org