You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Hart, James W." <jw...@seic.com> on 2016/07/15 13:36:46 UTC

General question about how many topologies can run in a smallish cluster.

I have a 5 VM cluster with 16 gig 8 core machines, and 3 of the machines are worker nodes.  Can anyone give input on how many topologies should/can be run on the cluster?  We are currently running 40 topologies in this dev cluster and having tons of stability and topology startup issues.  These topologies are running bursty workloads in a few of these topologies, but mostly they are doing nothing.

I'm looking for a sanity check as we are having severe stability issues with (nimbus crashing, supervisors crashing, topologies failing to startup.).  Our topologies are failing to startup because the 3 workers launch with too big a time lag between them (~3 minutes),and by the time the 2nd and 3rd startup the first have given up making netty connections to the others.  Once our topologies fail to connect they give up.

We have tuned the retry params used for the worker instances to connect so that they retry slower, and now they are connecting, but still hanging after connecting.  By hanging I mean the PIDs are still alive, but the logs stop logging and zero kafka message flow into even the 1 good worker topology that is still logging.    We are wondering if the retries are filling up stdout/stderr and if that is causing the thread to block.

Any input and help wold be appreciated.