You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2015/06/24 20:53:04 UTC

[jira] [Created] (STORM-909) Automatic Black Listing of bad nodes

Robert Joseph Evans created STORM-909:
-----------------------------------------

             Summary: Automatic Black Listing of bad nodes
                 Key: STORM-909
                 URL: https://issues.apache.org/jira/browse/STORM-909
             Project: Apache Storm
          Issue Type: Improvement
            Reporter: Robert Joseph Evans


We should be able to detect and monitor the failure rate of workers on nodes, and come up with a few different probabilities.  How likely is it that this worker will fail on this particular node in the next n mins.  How likely is it that all workers will fail on this particular node in the next n mins.  How likely is it that this worker will fail on any node in the next n mins.

With these we should be able to detect bad nodes and blacklist them, and ideally trigger external systems that can take actions to try and fix the nodes.  We should also be able to detect topologies that have bugs in the common case warn them, and in the worst case stop trying to run them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)