You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/08/10 13:34:13 UTC
[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3668: HDDS-7022. EC: Open EC container are not closed when SCM container was already closed.

umamaheswararao commented on code in PR #3668:
URL: https://github.com/apache/ozone/pull/3668#discussion_r942457298


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -401,6 +407,16 @@ protected ContainerHealthResult processContainer(ContainerInfo containerInfo,
       return new ContainerHealthResult.HealthyResult(containerInfo);
     }
 
+    if (containerInfo.getState() == HddsProtos.LifeCycleState.CLOSED) {
+      List<ContainerReplica> unhealthyReplicas = replicas.stream()
+          .filter(r -> !compareState(containerInfo.getState(), r.getState()))
+          .collect(Collectors.toList());
+
+      if (unhealthyReplicas.size() > 0) {
+        handleUnhealthyReplicas(containerInfo, unhealthyReplicas);
+      }

Review Comment:
   In Legacy Replication Manager, we are returning after sending close command. Should we return too? There is no point of proceeding with the current replica states I think. UnderReplication processing may work though, but overreplication may fail still until that replica closed. Did you think about this?
   ```
   **/*
            * If the container is in CLOSING state, the replicas can either
            * be in OPEN or in CLOSING state. In both of this cases
            * we have to resend close container command to the datanodes.
            */
           if (state == LifeCycleState.CLOSING) {
             for (ContainerReplica replica: replicas) {
               if (replica.getState() != State.UNHEALTHY) {
                 sendCloseCommand(
                     container, replica.getDatanodeDetails(), false);
               }
             }
             return;
           }**
   ```



##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/TestECContainerRecovery.java:
##########
@@ -213,35 +216,56 @@ public void testContainerRecoveryOverReplicationProcessing()
     // increased.
     cluster.restartHddsDatanode(pipeline.getFirstNode(), true);
     // Check container is over replicated.
-    waitForContainerCount(6, containerID, scm);
+    waitForContainerCount(6, container.containerID(), scm);
+    // Wait for all the replicas to be closed.
+    container = scm.getContainerInfo(container.getContainerID());
+    waitForDNContainerState(container, scm);
 
     // Resume RM and wait the over replicated replica deleted.
     scm.getReplicationManager().start();
-    waitForContainerCount(5, containerID, scm);
+    waitForContainerCount(5, container.containerID(), scm);
+  }
+
+  private void waitForDNContainerState(ContainerInfo container,

Review Comment:
   This method should be renamed. We are not waiting for any specific DN state.
   And this test does not make much difference as you are anyway waiting until unhealthy replicas 0?
   That means, DN already reported it closed state.
   
   What we need to do it, we just need check scm state is closed or not, but not the replicas state.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org