You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/09/29 05:05:04 UTC

[jira] [Updated] (STORM-1072) Nimbus gives incomplete cluster data to scheduler (hides dead worker slots)

     [ https://issues.apache.org/jira/browse/STORM-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-1072:
--------------------------------
    Component/s: storm-core

> Nimbus gives incomplete cluster data to scheduler (hides dead worker slots)
> ---------------------------------------------------------------------------
>
>                 Key: STORM-1072
>                 URL: https://issues.apache.org/jira/browse/STORM-1072
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.11.0
>            Reporter: Derek Dagit
>
> 1. Describe observed behavior.
> Certain slots that have been assigned but have workers that have not yet sent a heartbeat are treated as "dead" slots, and these are not included in the cluster summary data that is passed to the scheduler.
> [link|https://github.com/apache/storm/blob/8dd9e6e213210009968f39483cb69f271b2e8415/storm-core/src/clj/backtype/storm/daemon/nimbus.clj#L527] to nimbus code
> For topologies whose payload is very large, this can result in scheduler results that never quite converge due to some of the slots not appearing on each call to schedule()
> 2. What is the expected behavior?
> Nimbus may be too smart here: it seems better to give the full cluster information to the scheduler and let the scheduler make the appropriate decision about how to handle workers that are not yet up.
> 3. Outline the steps to reproduce the problem.
> Either launch a topology with a very large jar file that takes minutes to download, or simulate by adding a sleep to the supervisor code just after the jar is downloaded.  This will cause a significant delay before the worker is up and heartbeating in.  On each scheduling run, such slots will not even be present for the scheduler logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)