You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Sheldon White <sh...@comcast.net> on 2014/11/19 17:49:17 UTC

Topology periodically stalls until manually restarted

We've recently implemented a Storm topology that pulls data from one DB, performs some simple transformations and populates another DB. The spout and bolt code is all pretty simple and doesn't retain any data (other than a DB connection pool). This topology is running through about 400 million DB records.
We have a cluster of 8 VMs running storm (0.9.3) and a single VM running zookeeper (3.4.5). The zookeeper node never shows much load.
We are periodically seeing behavior where the zookeeper and the storm topology seem to "lose track" of each other. The supervisor.log files shows a loop of process restarts:

2014-11-17 00:54:58 b.s.util [INFO] Error when trying to kill 5109. Process is probably already dead.
2014-11-17 00:54:59 b.s.util [INFO] Error when trying to kill 5109. Process is probably already dead.
2014-11-17 00:54:59 b.s.d.supervisor [INFO] Shut down de55f9bf-d630-4e11-bc8c-9bbaa9899212:d6d231cc-2f2c-4bad-924e-570c6357d97d
2014-11-17 00:54:59 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "swhite-test-2014-11-14-13-58-2
1-3-1416002313", :executors ([5 5] [37 37] [13 13] [21 21] [29 29])} for this supervisor de55f9bf-d630-4e11-bc8c-9bbaa9899212 on port 6702 with id 59b3b5d9-bd9c-4da7-82
e4-38172f7dc232
2014-11-17 00:54:59 b.s.d.supervisor [INFO] Launching worker with command: 'java' '-server' '-Xmx2048m' '-XX:+UseConcMarkSweepGC' '-XX:+UseParNewGC' '-XX:+UseConcMarkSw
eepGC' '-XX:NewSize=128m' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:-CMSConcurrentMTEnabled' '-Djava.net.preferIPv4Stack=true' '-Djava.library.path=/opt/storm-local/
supervisor/stormdist/swhite-test-2014-11-14-13-58-21-3-1416002313/resources/Linux-amd64:/opt/storm-local/supervisor/stormdist/swhite-test-2014-11-14-13-58-21-3-14160023
13/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6702.log' '-Dstorm.home=/opt/apache-storm-0.9.3-rc1' '-Dstorm.log.dir=/opt/apache-storm-0.9.
3-rc1/logs' '-Dlogback.configurationFile=/opt/apache-storm-0.9.3-rc1/logback/cluster.xml' '-Dstorm.id=swhite-test-2014-11-14-13-58-21-3-1416002313' '-Dworker.id=59b3b5d
9-bd9c-4da7-82e4-38172f7dc232' '-Dworker.port=6702' '-cp' '/opt/apache-storm-0.9.3-rc1/lib/math.numeric-tower-0.0.1.jar:/opt/apache-storm-0.9.3-rc1/lib/slf4j-api-1.7.5.
jar:/opt/apache-storm-0.9.3-rc1/lib/ring-devel-0.3.11.jar:/opt/apache-storm-0.9.3-rc1/lib/disruptor-3.2.1.jar:/opt/apache-storm-0.9.3-rc1/lib/jetty-6.1.26.jar:/opt/apac
he-storm-0.9.3-rc1/lib/logback-core-1.0.13.jar:/opt/apache-storm-0.9.3-rc1/lib/ring-jetty-adapter-0.3.11.jar:/opt/apache-storm-0.9.3-rc1/lib/carbonite-1.4.0.jar:/opt/ap
ache-storm-0.9.3-rc1/lib/compojure-1.1.3.jar:/opt/apache-storm-0.9.3-rc1/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-0.9.3-rc1/lib/chill-java-0.3.5.jar:/opt/apache
-storm-0.9.3-rc1/lib/logback-classic-1.0.13.jar:/opt/apache-storm-0.9.3-rc1/lib/minlog-1.2.jar:/opt/apache-storm-0.9.3-rc1/lib/kryo-2.21.jar:/opt/apache-storm-0.9.3-rc1
/lib/tools.macro-0.1.0.jar:/opt/apache-storm-0.9.3-rc1/lib/commons-exec-1.1.jar:/opt/apache-storm-0.9.3-rc1/lib/commons-io-2.4.jar:/opt/apache-storm-0.9.3-rc1/lib/cloju
re-1.5.1.jar:/opt/apache-storm-0.9.3-rc1/lib/ring-servlet-0.3.11.jar:/opt/apache-storm-0.9.3-rc1/lib/storm-core-0.9.3-rc1.jar:/opt/apache-storm-0.9.3-rc1/lib/core.incub
ator-0.1.0.jar:/opt/apache-storm-0.9.3-rc1/lib/asm-4.0.jar:/opt/apache-storm-0.9.3-rc1/lib/jetty-util-6.1.26.jar:/opt/apache-storm-0.9.3-rc1/lib/commons-lang-2.5.jar:/o
pt/apache-storm-0.9.3-rc1/lib/commons-logging-1.1.3.jar:/opt/apache-storm-0.9.3-rc1/lib/servlet-api-2.5.jar:/opt/apache-storm-0.9.3-rc1/lib/clj-time-0.4.1.jar:/opt/apac
he-storm-0.9.3-rc1/lib/objenesis-1.2.jar:/opt/apache-storm-0.9.3-rc1/lib/json-simple-1.1.jar:/opt/apache-storm-0.9.3-rc1/lib/jgrapht-core-0.9.0.jar:/opt/apache-storm-0.
9.3-rc1/lib/commons-fileupload-1.2.1.jar:/opt/apache-storm-0.9.3-rc1/lib/ring-core-1.1.5.jar:/opt/apache-storm-0.9.3-rc1/lib/snakeyaml-1.11.jar:/opt/apache-storm-0.9.3-
rc1/lib/commons-codec-1.6.jar:/opt/apache-storm-0.9.3-rc1/lib/joda-time-2.0.jar:/opt/apache-storm-0.9.3-rc1/lib/clout-1.0.1.jar:/opt/apache-storm-0.9.3-rc1/lib/jline-2.
11.jar:/opt/apache-storm-0.9.3-rc1/lib/hiccup-0.3.6.jar:/opt/apache-storm-0.9.3-rc1/lib/tools.cli-0.2.4.jar:/opt/apache-storm-0.9.3-rc1/lib/reflectasm-1.07-shaded.jar:/
opt/apache-storm-0.9.3-rc1/lib/tools.logging-0.2.3.jar:/opt/apache-storm-0.9.3-rc1/lib/clj-stacktrace-0.2.2.jar:/opt/apache-storm-0.9.3-rc1/conf:/opt/storm-local/superv
isor/stormdist/swhite-test-2014-11-14-13-58-21-3-1416002313/stormjar.jar' 'backtype.storm.daemon.worker' 'swhite-test-2014-11-14-13-58-21-3-1416002313' 'de55f9bf-d630-4
e11-bc8c-9bbaa9899212' '6702' '59b3b5d9-bd9c-4da7-82e4-38172f7dc232'
2014-11-17 00:54:59 b.s.d.supervisor [INFO] 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
2014-11-17 00:54:59 b.s.d.supervisor [INFO] 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
2014-11-17 00:55:00 b.s.d.supervisor [INFO] 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
2014-11-17 00:55:00 b.s.d.supervisor [INFO] 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
2014-11-17 00:55:01 b.s.d.supervisor [INFO] 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
2014-11-17 00:55:01 b.s.d.supervisor [INFO] 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
2014-11-17 00:55:02 b.s.d.supervisor [INFO] Shutting down and clearing state for id 59b3b5d9-bd9c-4da7-82e4-38172f7dc232. Current supervisor time: 1416185702. State: :d
isallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1416185701, :storm-id "swhite-test-2014-11-14-13-58-21-3-1416002313", :executors #{[-1 -1
]}, :port 6702}
2014-11-17 00:55:02 b.s.d.supervisor [INFO] Shutting down de55f9bf-d630-4e11-bc8c-9bbaa9899212:59b3b5d9-bd9c-4da7-82e4-38172f7dc232
2014-11-17 00:55:03 b.s.util [INFO] Error when trying to kill 5174. Process is probably already dead.
...

Also seeing this in the worker logs when things are stalled out:
...
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:20 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [220]
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:21 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [221]
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:22 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [222]
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:23 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [223]
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:24 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [224]
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:24 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [225]
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:25 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [226]
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:26 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [227]
/opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:27 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [228]
...

We've tried increasing the storm.zookeeper timeouts to 5 minutes:

storm.zookeeper.session.timeout = 300000
storm.zookeeper.connection.timeout = 300000

and this allows the cluster to run much more stably, with good topology performance. But it still occasionally seems to stall out, and a manual rebalancing request is required to get it rolling. All VMs have fixed DNS entries and there doesn't seem to be any network issues at all.

This behavior is pretty much identical to something reported in August: http://mail-archives.apache.org/mod_mbox/storm-user/201408.mbox/%3CFF975F8738DEE84AABF1BF95A4248E6B6A8F5393%40mvexe1.integral.com%3E
but this thread didn't really get resolved.

Question 1: Is there any reason to think we'd need additional zookeeper servers? The one server we're using doen't show any significant load.
Question 2: Is it normal to manually rebalance a running topology? It would probably work but seems like something's not right.
Question 3: What other clues should I be looking for?

Thanks in advance

-sheldon white-
sheldonwhite@comcast.net

Re: Topology periodically stalls until manually restarted

Posted by William Oberman <ob...@civicscience.com>.
I don't know about what caused your initial failure (for me it's an bolt
that wraps PHP, and the PHP OOMs periodically), but I also had Netty get
jammed up trying to recover from the failure in 0.9.3-rc1 with similar
logging.  For me, patching with
https://github.com/apache/storm/pull/268 (STORM-329)
fixed it.

will

On Wed, Nov 19, 2014 at 11:49 AM, Sheldon White <sh...@comcast.net>
wrote:

> We've recently implemented a Storm topology that pulls data from one DB,
> performs some simple transformations and populates another DB. The spout
> and bolt code is all pretty simple and doesn't retain any data (other than
> a DB connection pool). This topology is running through about 400 million
> DB records.
> We have a cluster of 8 VMs running storm (0.9.3) and a single VM running
> zookeeper (3.4.5). The zookeeper node never shows much load.
> We are periodically seeing behavior where the zookeeper and the storm
> topology seem to "lose track" of each other. The supervisor.log files shows
> a loop of process restarts:
>
> 2014-11-17 00:54:58 b.s.util [INFO] Error when trying to kill 5109.
> Process is probably already dead.
> 2014-11-17 00:54:59 b.s.util [INFO] Error when trying to kill 5109.
> Process is probably already dead.
> 2014-11-17 00:54:59 b.s.d.supervisor [INFO] Shut down
> de55f9bf-d630-4e11-bc8c-9bbaa9899212:d6d231cc-2f2c-4bad-924e-570c6357d97d
> 2014-11-17 00:54:59 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "swhite-test-2014-11-14-13-58-2
> 1-3-1416002313", :executors ([5 5] [37 37] [13 13] [21 21] [29 29])} for
> this supervisor de55f9bf-d630-4e11-bc8c-9bbaa9899212 on port 6702 with id
> 59b3b5d9-bd9c-4da7-82
> e4-38172f7dc232
> 2014-11-17 00:54:59 b.s.d.supervisor [INFO] Launching worker with command:
> 'java' '-server' '-Xmx2048m' '-XX:+UseConcMarkSweepGC' '-XX:+UseParNewGC'
> '-XX:+UseConcMarkSw
> eepGC' '-XX:NewSize=128m' '-XX:CMSInitiatingOccupancyFraction=70'
> '-XX:-CMSConcurrentMTEnabled' '-Djava.net.preferIPv4Stack=true'
> '-Djava.library.path=/opt/storm-local/
>
> supervisor/stormdist/swhite-test-2014-11-14-13-58-21-3-1416002313/resources/Linux-amd64:/opt/storm-local/supervisor/stormdist/swhite-test-2014-11-14-13-58-21-3-14160023
> 13/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
> '-Dlogfile.name=worker-6702.log' '-Dstorm.home=/opt/apache-storm-0.9.3-rc1'
> '-Dstorm.log.dir=/opt/apache-storm-0.9.
> 3-rc1/logs'
> '-Dlogback.configurationFile=/opt/apache-storm-0.9.3-rc1/logback/cluster.xml'
> '-Dstorm.id=swhite-test-2014-11-14-13-58-21-3-1416002313'
> '-Dworker.id=59b3b5d
> 9-bd9c-4da7-82e4-38172f7dc232' '-Dworker.port=6702' '-cp'
> '/opt/apache-storm-0.9.3-rc1/lib/math.numeric-tower-0.0.1.jar:/opt/apache-storm-0.9.3-rc1/lib/slf4j-api-1.7.5.
>
> jar:/opt/apache-storm-0.9.3-rc1/lib/ring-devel-0.3.11.jar:/opt/apache-storm-0.9.3-rc1/lib/disruptor-3.2.1.jar:/opt/apache-storm-0.9.3-rc1/lib/jetty-6.1.26.jar:/opt/apac
>
> he-storm-0.9.3-rc1/lib/logback-core-1.0.13.jar:/opt/apache-storm-0.9.3-rc1/lib/ring-jetty-adapter-0.3.11.jar:/opt/apache-storm-0.9.3-rc1/lib/carbonite-1.4.0.jar:/opt/ap
>
> ache-storm-0.9.3-rc1/lib/compojure-1.1.3.jar:/opt/apache-storm-0.9.3-rc1/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-0.9.3-rc1/lib/chill-java-0.3.5.jar:/opt/apache
>
> -storm-0.9.3-rc1/lib/logback-classic-1.0.13.jar:/opt/apache-storm-0.9.3-rc1/lib/minlog-1.2.jar:/opt/apache-storm-0.9.3-rc1/lib/kryo-2.21.jar:/opt/apache-storm-0.9.3-rc1
>
> /lib/tools.macro-0.1.0.jar:/opt/apache-storm-0.9.3-rc1/lib/commons-exec-1.1.jar:/opt/apache-storm-0.9.3-rc1/lib/commons-io-2.4.jar:/opt/apache-storm-0.9.3-rc1/lib/cloju
>
> re-1.5.1.jar:/opt/apache-storm-0.9.3-rc1/lib/ring-servlet-0.3.11.jar:/opt/apache-storm-0.9.3-rc1/lib/storm-core-0.9.3-rc1.jar:/opt/apache-storm-0.9.3-rc1/lib/core.incub
>
> ator-0.1.0.jar:/opt/apache-storm-0.9.3-rc1/lib/asm-4.0.jar:/opt/apache-storm-0.9.3-rc1/lib/jetty-util-6.1.26.jar:/opt/apache-storm-0.9.3-rc1/lib/commons-lang-2.5.jar:/o
>
> pt/apache-storm-0.9.3-rc1/lib/commons-logging-1.1.3.jar:/opt/apache-storm-0.9.3-rc1/lib/servlet-api-2.5.jar:/opt/apache-storm-0.9.3-rc1/lib/clj-time-0.4.1.jar:/opt/apac
>
> he-storm-0.9.3-rc1/lib/objenesis-1.2.jar:/opt/apache-storm-0.9.3-rc1/lib/json-simple-1.1.jar:/opt/apache-storm-0.9.3-rc1/lib/jgrapht-core-0.9.0.jar:/opt/apache-storm-0.
>
> 9.3-rc1/lib/commons-fileupload-1.2.1.jar:/opt/apache-storm-0.9.3-rc1/lib/ring-core-1.1.5.jar:/opt/apache-storm-0.9.3-rc1/lib/snakeyaml-1.11.jar:/opt/apache-storm-0.9.3-
>
> rc1/lib/commons-codec-1.6.jar:/opt/apache-storm-0.9.3-rc1/lib/joda-time-2.0.jar:/opt/apache-storm-0.9.3-rc1/lib/clout-1.0.1.jar:/opt/apache-storm-0.9.3-rc1/lib/jline-2.
>
> 11.jar:/opt/apache-storm-0.9.3-rc1/lib/hiccup-0.3.6.jar:/opt/apache-storm-0.9.3-rc1/lib/tools.cli-0.2.4.jar:/opt/apache-storm-0.9.3-rc1/lib/reflectasm-1.07-shaded.jar:/
>
> opt/apache-storm-0.9.3-rc1/lib/tools.logging-0.2.3.jar:/opt/apache-storm-0.9.3-rc1/lib/clj-stacktrace-0.2.2.jar:/opt/apache-storm-0.9.3-rc1/conf:/opt/storm-local/superv
> isor/stormdist/swhite-test-2014-11-14-13-58-21-3-1416002313/stormjar.jar'
> 'backtype.storm.daemon.worker'
> 'swhite-test-2014-11-14-13-58-21-3-1416002313' 'de55f9bf-d630-4
> e11-bc8c-9bbaa9899212' '6702' '59b3b5d9-bd9c-4da7-82e4-38172f7dc232'
> 2014-11-17 00:54:59 b.s.d.supervisor [INFO]
> 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
> 2014-11-17 00:54:59 b.s.d.supervisor [INFO]
> 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
> 2014-11-17 00:55:00 b.s.d.supervisor [INFO]
> 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
> 2014-11-17 00:55:00 b.s.d.supervisor [INFO]
> 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
> 2014-11-17 00:55:01 b.s.d.supervisor [INFO]
> 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
> 2014-11-17 00:55:01 b.s.d.supervisor [INFO]
> 59b3b5d9-bd9c-4da7-82e4-38172f7dc232 still hasn't started
> 2014-11-17 00:55:02 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 59b3b5d9-bd9c-4da7-82e4-38172f7dc232. Current supervisor time:
> 1416185702. State: :d
> isallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1416185701,
> :storm-id "swhite-test-2014-11-14-13-58-21-3-1416002313", :executors #{[-1
> -1
> ]}, :port 6702}
> 2014-11-17 00:55:02 b.s.d.supervisor [INFO] Shutting down
> de55f9bf-d630-4e11-bc8c-9bbaa9899212:59b3b5d9-bd9c-4da7-82e4-38172f7dc232
> 2014-11-17 00:55:03 b.s.util [INFO] Error when trying to kill 5174.
> Process is probably already dead.
> ...
>
> Also seeing this in the worker logs when things are stalled out:
> ...
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:20
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [220]
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:21
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [221]
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:22
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [222]
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:23
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [223]
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:24
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [224]
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:24
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [225]
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:25
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [226]
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:26
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [227]
> /opt/apache-storm-0.9.3-rc1/logs/worker-6703.log.9:2014-11-17 00:49:27
> b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-storm-ap08.dev.g2/10.10.122.17:6703... [228]
> ...
>
> We've tried increasing the storm.zookeeper timeouts to 5 minutes:
>
> storm.zookeeper.session.timeout = 300000
> storm.zookeeper.connection.timeout = 300000
>
> and this allows the cluster to run much more stably, with good topology
> performance. But it still occasionally seems to stall out, and a manual
> rebalancing request is required to get it rolling. All VMs have fixed DNS
> entries and there doesn't seem to be any network issues at all.
>
> This behavior is pretty much identical to something reported in August:
> http://mail-archives.apache.org/mod_mbox/storm-user/201408.mbox/%3CFF975F8738DEE84AABF1BF95A4248E6B6A8F5393%40mvexe1.integral.com%3E
> but this thread didn't really get resolved.
>
> Question 1: Is there any reason to think we'd need additional zookeeper
> servers? The one server we're using doen't show any significant load.
> Question 2: Is it normal to manually rebalance a running topology? It
> would probably work but seems like something's not right.
> Question 3: What other clues should I be looking for?
>
> Thanks in advance
>
> -sheldon white-
> sheldonwhite@comcast.net
>



--