You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/04/15 17:32:15 UTC

[jira] [Updated] (YARN-1354) Recover applications upon nodemanager restart

     [ https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated YARN-1354:
-----------------------------

    Attachment: YARN-1354-v1.patch

Patch that persists applications to a leveldb state store when recovery is enabled.  This patch also addresses YARN-1355 because app acls are persisted as part of the app details.

The review for MAPREDUCE-5652 noted a potential issue with application completion events being lost as the NM goes down, and one way to mitigate that would be sending the list of active applications to the RM when the NM registers.  Then the RM can update the NM with any finished applications on the response or the next NM heartbeat.  That's not yet addressed with this initial patch, as I wanted to keep the patch size manageable and get some initial feedback.  After the feedback we can decide whether to address that corner case as part of this change or in a followup JIRA.

> Recover applications upon nodemanager restart
> ---------------------------------------------
>
>                 Key: YARN-1354
>                 URL: https://issues.apache.org/jira/browse/YARN-1354
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1354-v1.patch
>
>
> The set of active applications in the nodemanager context need to be recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)