You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Markus Weimer (JIRA)" <ji...@apache.org> on 2015/08/05 17:27:04 UTC

[jira] [Commented] (REEF-560) Add a configurable timeout for driver to recover evaluators on restart

    [ https://issues.apache.org/jira/browse/REEF-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658369#comment-14658369 ] 

Markus Weimer commented on REEF-560:
------------------------------------

{quote}
There should be a configurable timeout such that we allow the driver to be idle for the time being and wait for active tasks to report back to it.
{quote}

Don't we know how many events we need to generate and can therefore decide when we are done? The last event fired to the user before the clock can go idle is the restart completion event, right? If the app does nothing upon that, it seems OK to let the driver exit.

> Add a configurable timeout for driver to recover evaluators on restart
> ----------------------------------------------------------------------
>
>                 Key: REEF-560
>                 URL: https://issues.apache.org/jira/browse/REEF-560
>             Project: REEF
>          Issue Type: Sub-task
>          Components: REEF Driver
>            Reporter: Andrew Chung
>
> Currently on restart, if we fail to recover an evaluator, we generate a FailedEvaluatorEvent. If the user does not request for new evaluators on restart and no active evaluator has reported back to it yet before we fire the FailedEvaluatorEvent, the clock will detect idle and terminate the AM. There should be a configurable timeout such that we allow the driver to be idle for the time being and wait for active tasks to report back to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)