You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2023/01/02 14:27:50 UTC

[GitHub] [ozone] adoroszlai commented on a diff in pull request #4118: HDDS-7666. EC: Unrecoverable EC containers with some remaining replicas may block decommissioning

adoroszlai commented on code in PR #4118:
URL: https://github.com/apache/ozone/pull/4118#discussion_r1060064203


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ECContainerReplicaCount.java:
##########
@@ -421,6 +422,43 @@ public boolean isSufficientlyReplicated(boolean includePendingAdd) {
         >= repConfig.getData() + remainingMaintenanceRedundancy;
   }
 
+  /**
+   * If we are checking a container for sufficient replication for "offline",
+   * ie decommission or maintenance, then it is not really a requirement that
+   * all replicas for the container are present. Instead, we can ensure the
+   * replica on the node going offline has a copy elsewhere on another
+   * IN_SERVICE node, and if so that replica is sufficiently replicated.
+   * @param datanode The datanode being checked to go offline.
+   * @return True if the container is sufficiently replicated or if this replica
+   *         on the passed node is present elsewhere on an IN_SERVICE node.
+   */
+  @Override
+  public boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode) {
+    boolean sufficientlyReplicated = isSufficientlyReplicated(false);
+    if (sufficientlyReplicated) {
+      return true;
+    }
+    // If it is not sufficiently replicated (ie the container has all replicas)
+    // then we need to check if the replica that is on this node is available
+    // on another ONLINE node, ie in the healthy set. This means we avoid
+    // blocking decommission or maintenance caused by un-recoverable EC
+    // containers.
+    ContainerReplica thisReplica = null;
+    for (ContainerReplica r : replicas) {
+      if (r.getDatanodeDetails().equals(datanode)) {
+        thisReplica = r;
+        break;
+      }
+    }
+    if (thisReplica == null) {
+      // From the set of replicas, none are on the passed datanode.
+      // This should not happen in practice but if it does we cannot indicate
+      // the container is sufficiently replicated.
+      return false;
+    }
+    return healthyIndexes.containsKey(thisReplica.getReplicaIndex());
+  }

Review Comment:
   This returns true even if called for the datanode which has the online replica.  Should we guard against that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org