You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2023/01/03 21:22:41 UTC

[GitHub] [ozone] sodonnel commented on a diff in pull request #4118: HDDS-7666. EC: Unrecoverable EC containers with some remaining replicas may block decommissioning

sodonnel commented on code in PR #4118:
URL: https://github.com/apache/ozone/pull/4118#discussion_r1060966088


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ECContainerReplicaCount.java:
##########
@@ -421,6 +422,43 @@ public boolean isSufficientlyReplicated(boolean includePendingAdd) {
         >= repConfig.getData() + remainingMaintenanceRedundancy;
   }
 
+  /**
+   * If we are checking a container for sufficient replication for "offline",
+   * ie decommission or maintenance, then it is not really a requirement that
+   * all replicas for the container are present. Instead, we can ensure the
+   * replica on the node going offline has a copy elsewhere on another
+   * IN_SERVICE node, and if so that replica is sufficiently replicated.
+   * @param datanode The datanode being checked to go offline.
+   * @return True if the container is sufficiently replicated or if this replica
+   *         on the passed node is present elsewhere on an IN_SERVICE node.
+   */
+  @Override
+  public boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode) {
+    boolean sufficientlyReplicated = isSufficientlyReplicated(false);
+    if (sufficientlyReplicated) {
+      return true;
+    }
+    // If it is not sufficiently replicated (ie the container has all replicas)
+    // then we need to check if the replica that is on this node is available
+    // on another ONLINE node, ie in the healthy set. This means we avoid
+    // blocking decommission or maintenance caused by un-recoverable EC
+    // containers.
+    ContainerReplica thisReplica = null;
+    for (ContainerReplica r : replicas) {
+      if (r.getDatanodeDetails().equals(datanode)) {
+        thisReplica = r;
+        break;
+      }
+    }
+    if (thisReplica == null) {
+      // From the set of replicas, none are on the passed datanode.
+      // This should not happen in practice but if it does we cannot indicate
+      // the container is sufficiently replicated.
+      return false;
+    }
+    return healthyIndexes.containsKey(thisReplica.getReplicaIndex());
+  }

Review Comment:
   The intention is for this method to be passed a decommissioning / maintenance node. If that is the case, its replicas will not be in healthy replicas, but the code does not enforce that.
   
   Perhaps we could also check there is a replica index for the passed in node in the decommission / maintenance indexes and return false if not. I will add that in.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org