You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/10/09 02:33:27 UTC
[jira] [Updated] (STORM-37) Auto-deactivate topologies that are
continuously erroring
[ https://issues.apache.org/jira/browse/STORM-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rick Kellogg updated STORM-37:
------------------------------
Component/s: storm-core
> Auto-deactivate topologies that are continuously erroring
> ---------------------------------------------------------
>
> Key: STORM-37
> URL: https://issues.apache.org/jira/browse/STORM-37
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Reporter: James Xu
> Priority: Minor
>
> There exists a bad interaction between the isolation scheduler, how Mesos does resource offers (in storm-mesos), and continuously erroring topologies. The effect is that no non-isolated topologies can run because the isolation scheduler needs to kill non-isolated topologies to free up resources for isolated topologies in the next scheduling iteration, and continuously does so because the isolated topology always errors.
> A nice fix for this would be for Nimbus to automatically deactivate topologies that are continuously erroring. It should measure the number of X worker failures in the last Y minutes and put the topology into "DEACTIVATED_ERRORED" state if there's too many errors.
> This would also be good for non-Mesos clusters in order to avoid the cost of continuous JVM startups from erroring topologies.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)