You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2014/08/06 00:33:13 UTC
[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart

    [ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086909#comment-14086909 ] 

Hadoop QA commented on YARN-1337:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12659958/YARN-1337-v1.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 7 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

                  org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
                  org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync
                  org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
                  org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot
                  org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
                  org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.TestPBLocalizerRPC
                  org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
                  org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
                  org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
                  org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
                  org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
                  org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
                  org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
                  org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
                  org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
                  org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
                  org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
                  org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4524//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4524//console

This message is automatically generated.

> Recover containers upon nodemanager restart
> -------------------------------------------
>
>                 Key: YARN-1337
>                 URL: https://issues.apache.org/jira/browse/YARN-1337
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1337-v1.patch
>
>
> To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down.  This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate.  The state of finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)