You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by kn...@apache.org on 2015/11/23 22:08:08 UTC
[29/37] storm git commit: Adding a bit to docs.

Adding a bit to docs.


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/387232c6
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/387232c6
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/387232c6

Branch: refs/heads/master
Commit: 387232c68ae88c317a1607af20a0ad2a21ee62cf
Parents: ee5265d
Author: Kyle Nusbaum <Ky...@gmail.com>
Authored: Thu Nov 19 11:00:47 2015 -0600
Committer: Kyle Nusbaum <Ky...@gmail.com>
Committed: Thu Nov 19 11:00:47 2015 -0600

----------------------------------------------------------------------
 docs/documentation/Pacemaker.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/387232c6/docs/documentation/Pacemaker.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Pacemaker.md b/docs/documentation/Pacemaker.md
index e877541..f82f23d 100644
--- a/docs/documentation/Pacemaker.md
+++ b/docs/documentation/Pacemaker.md
@@ -87,3 +87,22 @@ PacemakerServer {
 
  - The client's user principal in the `PacemakerClient` section on the Nimbus host must match the `nimbus.daemon.user` storm cluster config value.
  - The client's `serviceName` value must match the server's user principal in the `PacemakerServer` section on the Pacemaker host.
+
+
+### Fault Tolerance
+
+Pacemaker runs as a single daemon instance currently. This makes it a potential Single Point of Failure.
+
+If Pacemaker becomes unreachable by Nimbus, through crash or partition, the workers will continue to run and Nimbus will repeatedly attempt to reconnect. Nimbus functionality will be disrupted, but the topologies themselves will continue to run.
+In case of partition of the cluster where Nimbus and Pacemaker are on the same side of the partition, the workers that are on the other side of the partition will not be able to heartbeat, and Nimbus will reschedule the tasks elsewhere. This is probably what we want to happen anyway.
+
+
+### ZooKeeper Comparison
+Compared to ZooKeeper, Pacemaker uses less CPU, less memory, and of course no disk for the same load, thanks to lack of overhead from maintaining consistency between nodes.
+On Gigabit networking, there is a theoretical limit of about 6000 nodes. However, the real limit is likely around 2000-3000 nodes. These limits have not yet been tested.
+On a 270 supervisor cluster, fully scheduled with topologies, Pacemaker resource utilization was 70% of one core and nearly 1GiB of RAM on a machine with 4 `Intel(R) Xeon(R) CPU E5530 @ 2.40GHz` and 24GiB of RAM.
+
+
+There is an easy route to HA for Pacemaker. Unlike ZooKeeper, Pacemaker should be able to scale horizontally without overhead. By contrast, with ZooKeeper, there are diminishing returns when adding ZK nodes.
+
+In short, a single Pacemaker node should be able to handle many times the load that a ZooKeeper cluster can, and future HA work allowing horizontal scaling will increase that even farther.