You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/10/09 02:33:27 UTC

[jira] [Updated] (STORM-37) Auto-deactivate topologies that are continuously erroring

     [ https://issues.apache.org/jira/browse/STORM-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-37:
------------------------------
    Component/s: storm-core

> Auto-deactivate topologies that are continuously erroring
> ---------------------------------------------------------
>
>                 Key: STORM-37
>                 URL: https://issues.apache.org/jira/browse/STORM-37
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: James Xu
>            Priority: Minor
>
> There exists a bad interaction between the isolation scheduler, how Mesos does resource offers (in storm-mesos), and continuously erroring topologies. The effect is that no non-isolated topologies can run because the isolation scheduler needs to kill non-isolated topologies to free up resources for isolated topologies in the next scheduling iteration, and continuously does so because the isolated topology always errors.
> A nice fix for this would be for Nimbus to automatically deactivate topologies that are continuously erroring. It should measure the number of X worker failures in the last Y minutes and put the topology into "DEACTIVATED_ERRORED" state if there's too many errors.
> This would also be good for non-Mesos clusters in order to avoid the cost of continuous JVM startups from erroring topologies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)