You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/08/12 03:54:13 UTC

[jira] [Comment Edited] (YARN-1337) Recover containers upon nodemanager restart

    [ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093640#comment-14093640 ] 

Jason Lowe edited comment on YARN-1337 at 8/12/14 1:54 AM:
-----------------------------------------------------------

Thanks for taking another look, Junping.

bq. Better to add javadoc for new added (or move from private) public method.

I documented all of the NodeStatusUpdater methods and also the NMStateStoreService public methods that didn't already have javadocs.

bq. volatile is unncessary as it was using AtomicBoolean already.

Fixed.


was (Author: jlowe):
Thanks for taking another look, Junping.

.bq Better to add javadoc for new added (or move from private) public method.

I documented all of the NodeStatusUpdater methods and also the NMStateStoreService public methods that didn't already have javadocs.

.bq volatile is unncessary as it was using AtomicBoolean already.

Fixed.

> Recover containers upon nodemanager restart
> -------------------------------------------
>
>                 Key: YARN-1337
>                 URL: https://issues.apache.org/jira/browse/YARN-1337
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, YARN-1337-v3.patch
>
>
> To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down.  This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate.  The state of finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)