You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Andrew Chung (JIRA)" <ji...@apache.org> on 2015/08/17 20:45:46 UTC

[jira] [Commented] (REEF-345) Complete implementation for YARN AM HA

    [ https://issues.apache.org/jira/browse/REEF-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700006#comment-14700006 ] 

Andrew Chung commented on REEF-345:
-----------------------------------

I've discovered an issue with writing an end-to-end restart test. Currently, if our driver fails on an exception, we actively close the outstanding evaluators. This nullifies YARN's option to keep containers across application attempts such that even if it restarts the application and the driver restart handler is called, all the evaluators will have already been dead at this point. [~markus.weimer], what do you think we should do here? Should we have an option to leave evaluators running on unexpected driver failure?

> Complete implementation for YARN AM HA
> --------------------------------------
>
>                 Key: REEF-345
>                 URL: https://issues.apache.org/jira/browse/REEF-345
>             Project: REEF
>          Issue Type: New Feature
>          Components: REEF Driver, REEF.NET Driver
>         Environment: YARN, HDInsight
>            Reporter: Andrew Chung
>            Assignee: Andrew Chung
>
> The implementation logic for AM HA on YARN is incomplete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)