You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jian He (JIRA)" <ji...@apache.org> on 2014/01/03 02:55:50 UTC

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

    [ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861093#comment-13861093 ] 

Jian He commented on YARN-1490:
-------------------------------

- Create a field in AppSubmissionContext to indicate whether to clean the containers on AM failure or not.
- Copy the data structures(liveContainers etc.) inside SchedulerApplicationAttempt over in the case that new attempt is recovering the failed attempt’s scheduler info.
- Similarly, copy the needed data structures(finished Containers etc.) inside RMAppAttempt over in the case that new attempt is recovering the failed RMAppAttempt info.
- The failed attempt is changed to still receive container events and record the finished containers and new attempt is created with the reference of the objects of the previous attempt.
- The appAttempt data structure inside the schedulers are removed, only use SchedulerApplication.getCurrentAppAttempt to retrieve the current attempt.

> RM should optionally not kill all containers when an ApplicationMaster exits
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1490
>                 URL: https://issues.apache.org/jira/browse/YARN-1490
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Jian He
>         Attachments: YARN-1490.1.patch
>
>
> This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)