You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org> on 2013/04/10 23:05:16 UTC

[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync

    [ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628245#comment-13628245 ] 

Vinod Kumar Vavilapalli commented on YARN-495:
----------------------------------------------

Looked at the latest patch. Mostly looks good, but for one thing: We aren't stopping any startContainers() in transit, which may cause the newly added code (to wait for all containers for cleanup) to hang.

Few patch related comments:
 - NodeManager.cleanupContainers(): The log message needs to be fixed to not only say shutdown.
 - NodeStatusUpdaterImpl
    -- getNodeStatus(): It is much more than a getter, rename it to something like getNodeStatusAndUpdateContainersInContext()? Also add in to NodeStatusUpdater interface?
    -- The condidtional if(isStopped) isn't really needed, the thread dies on a resync anyways. Remove?
    -- rebootNodeStatusUpdater() can be protected?
 - TestNodeManagerReboot: The test shouldn't be using Resync API's anymore given the semantics change. Can you change it to directly use stop() and start()?
 - TestNodeManagerShutDown: rename getRegCount() to getNMRegistrationCount()
 - TestNodeStatusUpdater.getNodeManager(): Can cleanup the reboot related artifacts from this.
                
> Change NM behavior of reboot to resync
> --------------------------------------
>
>                 Key: YARN-495
>                 URL: https://issues.apache.org/jira/browse/YARN-495
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira