You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/11/11 18:09:40 UTC

[GitHub] [hadoop] goiri commented on a change in pull request #3632: YARN-10822. Containers going from New to Scheduled transition for kil…

goiri commented on a change in pull request #3632:
URL: https://github.com/apache/hadoop/pull/3632#discussion_r747713762



##########
File path: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
##########
@@ -1187,10 +1187,10 @@ public ContainerState transition(ContainerImpl container,
       if (container.recoveredStatus == RecoveredContainerStatus.COMPLETED) {
         container.sendFinishedEvents();
         return ContainerState.DONE;
-      } else if (container.recoveredStatus == RecoveredContainerStatus.QUEUED) {
-        return ContainerState.SCHEDULED;
-      } else if (container.recoveredAsKilled &&
-          container.recoveredStatus == RecoveredContainerStatus.REQUESTED) {
+      } else if (container.recoveredAsKilled && (

Review comment:
       This condition is getting a little obfuscated.
   Probably we should have a static method with this condition and explaining what's the sequence.

##########
File path: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
##########
@@ -682,7 +683,49 @@ public void testContainerCleanupOnShutdown() throws Exception {
     verify(cm, never()).handle(isA(CMgrCompletedAppsEvent.class));
   }
 
-  private void commonLaunchContainer(ApplicationId appId, ContainerId cid,
+  @Test
+  public void testKilledContainerInQueuedStateRecovery() throws Exception {
+    conf.setBoolean(YarnConfiguration.NM_RECOVERY_ENABLED, true);
+    conf.setBoolean(YarnConfiguration.NM_RECOVERY_SUPERVISED, true);
+    NMStateStoreService stateStore = new NMMemoryStateStoreService();
+    stateStore.init(conf);
+    stateStore.start();
+    context = createContext(conf, stateStore);
+    ContainerManagerImpl cm = createContainerManager(context, delSrvc);
+    ((NMContext) context).setContainerManager(cm);
+    cm.init(conf);
+    cm.start();
+
+    // add an application by starting a container
+    ApplicationId appId = ApplicationId.newInstance(0, 0);
+    ApplicationAttemptId attemptId =
+        ApplicationAttemptId.newInstance(appId, 1);
+    ContainerId cid = ContainerId.newContainerId(attemptId, 1);
+    createStartContainerRequest(appId, cid, cm);
+
+    Application app = context.getApplications().get(appId);
+    assertNotNull(app);
+
+    stateStore.storeContainerKilled(cid);
+    // restart and verify container scheduler has recovered correctly
+    cm.stop();
+    context = createContext(conf, stateStore);
+    cm = createContainerManager(context, delSrvc);
+    ((NMContext) context).setContainerManager(cm);
+    cm.init(conf);
+    cm.start();
+    assertEquals(1, context.getApplications().size());

Review comment:
       Can we assert 0 earlier too?

##########
File path: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
##########
@@ -682,7 +683,49 @@ public void testContainerCleanupOnShutdown() throws Exception {
     verify(cm, never()).handle(isA(CMgrCompletedAppsEvent.class));
   }
 
-  private void commonLaunchContainer(ApplicationId appId, ContainerId cid,
+  @Test
+  public void testKilledContainerInQueuedStateRecovery() throws Exception {
+    conf.setBoolean(YarnConfiguration.NM_RECOVERY_ENABLED, true);
+    conf.setBoolean(YarnConfiguration.NM_RECOVERY_SUPERVISED, true);
+    NMStateStoreService stateStore = new NMMemoryStateStoreService();
+    stateStore.init(conf);
+    stateStore.start();
+    context = createContext(conf, stateStore);
+    ContainerManagerImpl cm = createContainerManager(context, delSrvc);
+    ((NMContext) context).setContainerManager(cm);
+    cm.init(conf);
+    cm.start();
+
+    // add an application by starting a container
+    ApplicationId appId = ApplicationId.newInstance(0, 0);
+    ApplicationAttemptId attemptId =
+        ApplicationAttemptId.newInstance(appId, 1);
+    ContainerId cid = ContainerId.newContainerId(attemptId, 1);
+    createStartContainerRequest(appId, cid, cm);
+
+    Application app = context.getApplications().get(appId);
+    assertNotNull(app);
+
+    stateStore.storeContainerKilled(cid);
+    // restart and verify container scheduler has recovered correctly
+    cm.stop();
+    context = createContext(conf, stateStore);
+    cm = createContainerManager(context, delSrvc);
+    ((NMContext) context).setContainerManager(cm);
+    cm.init(conf);
+    cm.start();
+    assertEquals(1, context.getApplications().size());
+
+    ConcurrentMap<ContainerId, Container> containers = context.getContainers();
+    Container c = containers.get(cid);
+    assertEquals(ContainerState.DONE, c.getContainerState());
+    app = context.getApplications().get(appId);
+    Thread.sleep(1000);

Review comment:
       Can we wait for something specific instead of sleeping blindly?

##########
File path: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
##########
@@ -168,6 +168,7 @@ public synchronized void storeContainer(ContainerId containerId,
       int version, long startTime, StartContainerRequest startRequest) {
     RecoveredContainerState rcs = new RecoveredContainerState(containerId);
     rcs.startRequest = startRequest;
+    rcs.status = RecoveredContainerStatus.REQUESTED;

Review comment:
       Where does this come from?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org