You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/09/30 03:14:04 UTC

[jira] [Updated] (STORM-909) Automatic Black Listing of bad nodes

     [ https://issues.apache.org/jira/browse/STORM-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-909:
-------------------------------
    Component/s: storm-core

> Automatic Black Listing of bad nodes
> ------------------------------------
>
>                 Key: STORM-909
>                 URL: https://issues.apache.org/jira/browse/STORM-909
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: Robert Joseph Evans
>            Assignee: Chuanlei Ni
>
> We should be able to detect and monitor the failure rate of workers on nodes, and come up with a few different probabilities.  How likely is it that this worker will fail on this particular node in the next n mins.  How likely is it that all workers will fail on this particular node in the next n mins.  How likely is it that this worker will fail on any node in the next n mins.
> With these we should be able to detect bad nodes and blacklist them, and ideally trigger external systems that can take actions to try and fix the nodes.  We should also be able to detect topologies that have bugs in the common case warn them, and in the worst case stop trying to run them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)