You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Derek Dagit (JIRA)" <ji...@apache.org> on 2014/12/10 17:58:12 UTC

[jira] [Commented] (STORM-589) Suboptimal default worker hb timeouts for nimbus & supervisor

    [ https://issues.apache.org/jira/browse/STORM-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241377#comment-14241377 ] 

Derek Dagit commented on STORM-589:
-----------------------------------

I may not have thought of a reason why the current defaults are good and not necessarily suboptimal.  Please do comment if that is the case.

> Suboptimal default worker hb timeouts for nimbus & supervisor
> -------------------------------------------------------------
>
>                 Key: STORM-589
>                 URL: https://issues.apache.org/jira/browse/STORM-589
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>            Reporter: Derek Dagit
>            Priority: Minor
>
> Both worker heartbeat timeouts for nimbus and supervisor are set to 30 seconds by default:
> https://github.com/apache/storm/blob/3bbdc166bda7fb1a39b6906eda40da9bc83d5d4c/conf/defaults.yaml#L58
> https://github.com/apache/storm/blob/3bbdc166bda7fb1a39b6906eda40da9bc83d5d4c/conf/defaults.yaml#L118
> This means that it is when a worker dies in relation to its heartbeats that would determine whether the supervisor relaunches it or nimbus reassigns it.
> If the supervisor heartbeat is found to have timed out first, it is relaunched.  If the nimbus heartbeat is found to have timed out first, it is rescheduled.
> We may want the nimbus time-out to be larger than the supervisor time-out, to give the supervisor a chance to relaunch the worker before nimbus re-assigns it.
> As always, users administrating clusters are encouraged to set these as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)