You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by "siddhantsangwan (via GitHub)" <gi...@apache.org> on 2023/07/20 13:13:57 UTC

[GitHub] [ozone] siddhantsangwan opened a new pull request, #5097: EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

siddhantsangwan opened a new pull request, #5097:
URL: https://github.com/apache/ozone/pull/5097

   ## What changes were proposed in this pull request?
   
   Simply put, the problem here is that EC under replication handling is blocked in cases where the Rack Scatter policy cannot return new Datanodes because they don't satisfy the policy.
   
   Please check the jira for one such scenario, and this [document](https://docs.google.com/document/d/1ebuSwJZkw4wMWWCHinDvRCfNbeFD4kcHMyIN6Q6wD9g/edit?usp=sharing) for a detailed (but roughly documented) reproduction of some cases.
   
   Changes proposed: Allow the rack scatter policy to fallback and return the required number of datanodes even if the policy can't be satisfied. But still try to follow the policy as well as possible. Because `fallback` is true by default, the policy will now try to return nodes if possible in situations it'd throw an `SCMException` previously. The invariant that this policy tries to return the asked number of nodes and fails if it can't is still true.
   
   The normal definition of "maximum replicas a rack can host" (called `maxReplicasPerRack` in the code) in rack scatter policy is:
   ```
       return numReplicas / numberOfRacks
               + Math.min(numReplicas % numberOfRacks, 1);
   ```
   
   When falling back, the definition of `maxReplicasPerRack` in `chooseDatanodesInternal` is changed to "replication factor number of replicas", so that one rack can host all replicas of this container if needed. It'll still try to spread the replicas across as many racks as possible - this is done by the existing algorithm in `chooseNodesFromRacks`. The algorithm iterates over reach rack and just chooses one node per rack, so the chosen nodes are spread across as many racks as possible:
   
   ```
         for (Node rack : toChooseRacks) {
           if (rack == null) {
             // TODO: need to recheck why null coming here.
             continue;
           }
           Node node = chooseNode(rack.getNetworkFullPath(), unavailableNodes,
                   metadataSizeRequired, dataSizeRequired);
           if (node != null) {
             chosenNodes.add((DatanodeDetails) node);
             rackCntMap.merge(rack, 1, Math::addExact);
             mutableFavoredNodes.remove(node);
             unavailableNodes.add(node);
             nodesRequired--;
             if (nodesRequired == 0) {
               break;
             }
           } else {
             // Store the skipped racks to check them first in next outer loop
             skippedRacks.add(rack);
           }
         }
   ```
   The number of times this algo runs, or the number of times we will consider a rack for choosing a node, is controlled by the argument `maxOuterLoopIterations`. Previously, this argument was `1` in one case and `MAX_INT` in another case - now it's min of "maximum replicas per rack" and "the number of nodes we require" when not falling back, and "number of nodes we require" when falling back.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-9011
   
   ## How was this patch tested?
   
   Changed some unit tests and added some new ones.
   Draft while waiting for CI and doing some manual testing in a cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1273147550


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            FAILED_TO_FIND_SUITABLE_NODE);
+      }
     }
 
+    /*
+    If fallback is false, and we reach here, then nodesRequired is greater
+    than or equal to additionalRacksRequired. So we need to ask for
+    nodesRequired number of nodes. If fallback is true, then nodesRequired
+    could be less than additionalRacksRequired. We still need to ask for
+    nodesRequired number of more nodes, even if that won't satisfy the
+    requirement of being on additionalRacksRequired more racks.
+
+    So, in both cases, we want to ask for nodesRequired number of nodes.
+    */
+    int maxOuterLoopIterations = Math.min(nodesRequired, maxReplicasPerRack);
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>(
         chooseNodesFromRacks(racks, unavailableNodes,
-            mutableFavoredNodes, additionalRacksRequired,
-            metadataSizeRequired, dataSizeRequired, 1,
-            usedRacksCntMap, maxReplicasPerRack));
+            mutableFavoredNodes, nodesRequired, metadataSizeRequired,
+            dataSizeRequired, maxOuterLoopIterations, usedRacksCntMap,

Review Comment:
   > check if the misreplication count is going down or remaining same in the case fallback is enabled
   
   I believe we're already doing that here:
   ```
       List<DatanodeDetails> newPlacement =
           new ArrayList<>(usedNodes.size() + result.size());
       newPlacement.addAll(usedNodes);
       newPlacement.addAll(chosenNodes);
       ContainerPlacementStatus placementStatus =
           validateContainerPlacement(newPlacement, requiredReplicationFactor);
       if (!placementStatus.isPolicySatisfied()) {
         ContainerPlacementStatus initialPlacementStatus =
             validateContainerPlacement(usedNodes, requiredReplicationFactor);
         if (initialPlacementStatus.misReplicationCount()
                 < placementStatus.misReplicationCount()) {
           String errorMsg = "ContainerPlacementPolicy not met. Misreplication" +
                   " Reason: " + placementStatus.misReplicatedReason() +
                   " Initial Used nodes mis-replication Count: " +
                   initialPlacementStatus.misReplicationCount() +
                   " Used nodes + Chosen nodes mis-replication Count: " +
                   placementStatus.misReplicationCount();
           throw new SCMException(errorMsg, FAILED_TO_FIND_SUITABLE_NODE);
         }
       }
   ```
   
   > We shouldn't change this parameter.
   
   My understanding is that we should look for `additionalRacksRequired` if fallback is false, and for `nodesRequired` if fallback is true. Now, if fallback is false, then `nodesRequired` must be greater than `additionalRacksRequired` or we'll throw an exception earlier in the method:
   ```    if (nodesRequired < additionalRacksRequired) {
         String reason = "Required nodes size: " + nodesRequired
                 + " is less than required number of racks to choose: "
                 + additionalRacksRequired + ".";
         LOG.warn("Placement policy cannot choose the enough racks. {}"
                         + "Total number of Required Racks: {} Used Racks Count:" +
                         " {}, Required Nodes count: {}, fallback: {}.",
                 reason, numberOfRacksRequired, usedRacksCntMap.size(),
                 nodesRequired, fallback);
   
         /*
         The number of additional Datanodes ("nodesRequired") that were asked for
         are less than the additional number of racks ("additionalRacksRequired")
         that container replicas need to be on. This means that returning
         nodesRequired number of nodes will not satisfy the rack scatter policy.
         If fallback is false, throw an exception. Else, continue to find
         nodesRequired number of nodes.
          */
         if (!fallback) {
           throw new SCMException(reason,
               FAILED_TO_FIND_SUITABLE_NODE);
         }
       }
   ```
   
   So this means that for both values of fallback, the argument can just be `nodesRequired`. Anyway, `chooseNodesFromRacks` should still try to spread it across as many racks as possible, with excluded and used racks sorted at the end of the list. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1272886418


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {

Review Comment:
   I meant we should recalculate maxReplicas per rack based on the number of available rack and not based on the topology configuration, since there could be cases some racks could be excluded completely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1653893604

Update: In an offline discussion, we decided to keep the scope of this PR minimal. We've decided to go ahead with this suggestion:

> From what I understand all we have to change is make additionalRacksRequired = Math.min(nodesRequired, numberOfRacksRequired - usedRacksCntMap.size()); and remove the condition check nodesRequired < additionalRacksRequired all together. We wouldn't need a fall back flag then. This particular issue can only happen when under replication handler is calling this function asking for lesser nodes than the misreplication count. Rackscatter Placement policy is ensuring that the misreplication count is going down if an additional node is chosen.

This means situation 2 (from the linked doc) gets solved, and situation 4 remains unsolved (can be tackled in a future jira). Since situation 2 seems more critical, we restricted this PR to solving that problem. There will also be no regression in situation 1.

@swamirishi and @ashishkumar50 please take another look!

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1274392685


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -268,20 +276,36 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
     int numberOfRacksRequired = getRequiredRackCount(requiredReplicationFactor);
     int additionalRacksRequired =

Review Comment:
   > From what I understand all we have to change is make additionalRacksRequired = Math.min(nodesRequired, numberOfRacksRequired - usedRacksCntMap.size()); and remove the condition check nodesRequired < additionalRacksRequired all together.
   
   That will work for this particular case, yes. However simply doing that will not work in the first case, which is:
   ```
     /**
      * Scenario:
      * rack0 -> node0
      * rack1 -> node1
      * rack2 -> node2
      * rack3 -> node3
      * rack4 -> node4, node5
      * <p>
      * node0, node1, node2, node4 are used nodes. node3 is excluded node.
      * Expectation is that node5 should be chosen.
   ```
   
   So, we'll still need to "fall back and relax the policy" somehow, which is why I decided to use the existing `fallback` flag.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270595354


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            FAILED_TO_FIND_SUITABLE_NODE);
+      }
     }
 
+    /*
+    If fallback is false, and we reach here, then nodesRequired is greater
+    than or equal to additionalRacksRequired. So we need to ask for
+    nodesRequired number of nodes. If fallback is true, then nodesRequired
+    could be less than additionalRacksRequired. We still need to ask for
+    nodesRequired number of more nodes, even if that won't satisfy the
+    requirement of being on additionalRacksRequired more racks.
+
+    So, in both cases, we want to ask for nodesRequired number of nodes.
+    */
+    int maxOuterLoopIterations = Math.min(nodesRequired, maxReplicasPerRack);
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>(
         chooseNodesFromRacks(racks, unavailableNodes,
-            mutableFavoredNodes, additionalRacksRequired,
-            metadataSizeRequired, dataSizeRequired, 1,
-            usedRacksCntMap, maxReplicasPerRack));
+            mutableFavoredNodes, nodesRequired, metadataSizeRequired,
+            dataSizeRequired, maxOuterLoopIterations, usedRacksCntMap,

Review Comment:
   I changed this because it's possible to have a situation where `nodesRequired` is less than `additionalRacksRequired` (check the third case in my linked document). If we set it to `nodesRequired`, the policy still prefers finding a new rack rather than a rack that already has a used node.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270294967


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   Agreed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270603305


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {

Review Comment:
   I don't really understand what you're suggesting. But the general idea here is that it's possible to fail some of the conditions above this, such as `nodesRequired < additionalRacksRequired`, and still get a new DN without having to increase `maxReplicasPerRack`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1647285858

   This PR is ready for review. I've updated `maxOuterLoopIterations` according to the comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270590129


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {

Review Comment:
   I haven't seen it happen yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270275073


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   Good point. We can set `maxOuterLoopIterations` to a higher value. Looks like the number of iterations are bounded by both `maxOuterLoopIterations` and `OUTER_LOOP_MAX_RETRY`. The latter is a constant and its value currently is `3`, so we need to increase that too. Any suggestions for what their values should be?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1277108723


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -268,20 +276,36 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
     int numberOfRacksRequired = getRequiredRackCount(requiredReplicationFactor);
     int additionalRacksRequired =

Review Comment:
   > Ideally this can happen only in a small cluster.
   
   Yes, all of these are edge cases in small cluster situations.
   
   > Shouldn't we ideally wait for the unhealthy replica to get deleted and create the replica ? That should be fine right?
   
   Yes, that's the idea that's already existing in `ECUnderReplicationHandler`. This change as it stands should not interfere with that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1647713964

   Note: While this PR is meant to improve EC under replication handling, I'm trying to determine its impact on mis replication handling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] ashishkumar50 commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "ashishkumar50 (via GitHub)" <gi...@apache.org>.

ashishkumar50 commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270821521


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   At line 352 it should be max replica per rack, If we increase this iteration count beyond max replica per rack it can use same rack for multiple replica because every time outer loop will start reusing all the racks. Only in the same iteration it removes rack from reusing.
   If we want to optimize, we can remove racks which has already reached max replica and then any number of times like INT_MAX we can do for outer loop. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1273600156


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -268,20 +276,36 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
     int numberOfRacksRequired = getRequiredRackCount(requiredReplicationFactor);
     int additionalRacksRequired =

Review Comment:
   From what I understand all we have to change is make additionalRacksRequired = Math.min(nodesRequired, numberOfRacksRequired - usedRacksCntMap.size()); and remove the condition check nodesRequired < additionalRacksRequired all together. We wouldn't need a fall back flag then. This particular issue can only happen when under replication handler is calling this function asking for lesser nodes than the misreplication count. Rackscatter Placement policy is ensuring that the misreplication count is going down if an additional node is chosen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1648265281

   > > I'm trying to determine its impact on mis replication handling.
   > 
   > Mis replication handling works in two stages. The first stage is `MisReplicationHandler` using placement policy's `replicasToCopyToFixMisreplication` to ask for replicas that need to be copied to other racks. These target racks are selected using `SCMContainerPlacementRackScatter`. Once they're copied, the container becomes over replicated. Then `ECOverReplicationHandler` will delete replicas using the policy's `replicasToRemoveToFixOverreplication` API.
   > 
   > As far as I understand, there should be no negative impact on mis replication handling because:
   > 
   > 1. `SCMContainerPlacementRackScatter` will not return targets that make mis replication worse.
   > 2. `replicasToCopyToFixMisreplication` will only select replicas to copy if it's possible to improve mis replication.
   > 
   > @swamirishi I'd appreciate your feedback on this since you're very familiar with this area.
   The first priority for replication manager is to fix unhealthy replicas which would be done by underreplication handler. The next priority in line is fixing over replication & mis-replication has the least priority. So in the case of unhealthy replicas it misreplication shouldn't really matter. But the chooseDatanode function is used by both misreplication handler & underreplication handler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1655097635

   Thanks @swamirishi for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270299430


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   OUTER_LOOP_MAX_RETRY is there, when we don't find any nodes in any rack even after going through all the racks OUTER_LOOP_MAX_RETRY times, I believe we can make it configurable but default value seems to be fine. Even if we find one node in the iteration retry count is reset.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1271788527


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   > If we want to optimize, we can remove racks which has already reached max replica
   
   I agree, and I think we're already doing that.
   At line 70, we update the count of used nodes per rack:
   ```
   rackCntMap.merge(rack, 1, Math::addExact);
   ```
   And at line 126, we filter nodes according to this map:
   ```
         for (Node rack : racks) {
           if (rackCntMap.getOrDefault(rack, 0) < maxReplicasPerRack) {
             toChooseRacks.add(rack);
           }
         }
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] ashishkumar50 commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "ashishkumar50 (via GitHub)" <gi...@apache.org>.

ashishkumar50 commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1269591382


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   Changing `maxOuterLoopIterations` to exact number of nodes required, This will decrease probablity of getting exact number of nodes required.
   [chooseRandom](https://github.com/apache/ozone/blob/cd31fd57efa4f86d45b45aa8ad9e7e2eac1ddf3b/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java#L394) will return any random node. There is chance that random node is not valid [isValidNode](https://github.com/apache/ozone/blob/cd31fd57efa4f86d45b45aa8ad9e7e2eac1ddf3b/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java#L405).
   After certain retry(currently its 5) it will return null node back to `chooseNodesFromRacks`.
   Assuming this happens for all the racks. Then at the end of the loop we decrement `maxOuterLoopIterations` without even choosing any node. So the probability of getting required nodes keeps decreasing.
   I think maxOuterLoopIterations should not bound to number of nodes required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1648004475

> I'm trying to determine its impact on mis replication handling.

Mis replication handling works in two stages. The first stage is `MisReplicationHandler` using placement policy's `replicasToCopyToFixMisreplication` to ask for replicas that need to be copied to other racks. These target racks are selected using `SCMContainerPlacementRackScatter`. Once they're copied, the container becomes over replicated. Then `ECOverReplicationHandler` will delete replicas using the policy's `replicasToRemoveToFixOverreplication` API.

As far as I understand, there should be no negative impact on mis replication handling because:
1. `SCMContainerPlacementRackScatter` will not return targets that make mis replication worse.
2. `replicasToCopyToFixMisreplication` will only select replicas to copy if it's possible to improve mis replication.

@swamirishi I'd appreciate your feedback on this since you're very familiar with this area.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270606342


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   Ok, the consensus is that we should keep `maxOuterLoopIterations` as `Integer.MAX_VALUE`. I wonder if we should then change this argument to `Integer.MAX_VALUE` when we call `chooseNodesFromRacks` at line 352, too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1273147550


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            FAILED_TO_FIND_SUITABLE_NODE);
+      }
     }
 
+    /*
+    If fallback is false, and we reach here, then nodesRequired is greater
+    than or equal to additionalRacksRequired. So we need to ask for
+    nodesRequired number of nodes. If fallback is true, then nodesRequired
+    could be less than additionalRacksRequired. We still need to ask for
+    nodesRequired number of more nodes, even if that won't satisfy the
+    requirement of being on additionalRacksRequired more racks.
+
+    So, in both cases, we want to ask for nodesRequired number of nodes.
+    */
+    int maxOuterLoopIterations = Math.min(nodesRequired, maxReplicasPerRack);
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>(
         chooseNodesFromRacks(racks, unavailableNodes,
-            mutableFavoredNodes, additionalRacksRequired,
-            metadataSizeRequired, dataSizeRequired, 1,
-            usedRacksCntMap, maxReplicasPerRack));
+            mutableFavoredNodes, nodesRequired, metadataSizeRequired,
+            dataSizeRequired, maxOuterLoopIterations, usedRacksCntMap,

Review Comment:
   > check if the misreplication count is going down or remaining same in the case fallback is enabled
   
   I believe we're already doing that here:
   ```
       if (nodesRequired < additionalRacksRequired) {
         String reason = "Required nodes size: " + nodesRequired
                 + " is less than required number of racks to choose: "
                 + additionalRacksRequired + ".";
         LOG.warn("Placement policy cannot choose the enough racks. {}"
                         + "Total number of Required Racks: {} Used Racks Count:" +
                         " {}, Required Nodes count: {}, fallback: {}.",
                 reason, numberOfRacksRequired, usedRacksCntMap.size(),
                 nodesRequired, fallback);
   
         /*
         The number of additional Datanodes ("nodesRequired") that were asked for
         are less than the additional number of racks ("additionalRacksRequired")
         that container replicas need to be on. This means that returning
         nodesRequired number of nodes will not satisfy the rack scatter policy.
         If fallback is false, throw an exception. Else, continue to find
         nodesRequired number of nodes.
          */
         if (!fallback) {
           throw new SCMException(reason,
               FAILED_TO_FIND_SUITABLE_NODE);
         }
       }
   ```
   
   > We shouldn't change this parameter.
   
   My understanding is that we should look for `additionalRacksRequired` if fallback is false, and for `nodesRequired` if fallback is true. Now, if fallback is false, then `nodesRequired` must be greater than `additionalRacksRequired` or we'll throw an exception earlier in the method:
   ```    if (nodesRequired < additionalRacksRequired) {
         String reason = "Required nodes size: " + nodesRequired
                 + " is less than required number of racks to choose: "
                 + additionalRacksRequired + ".";
         LOG.warn("Placement policy cannot choose the enough racks. {}"
                         + "Total number of Required Racks: {} Used Racks Count:" +
                         " {}, Required Nodes count: {}, fallback: {}.",
                 reason, numberOfRacksRequired, usedRacksCntMap.size(),
                 nodesRequired, fallback);
   
         /*
         The number of additional Datanodes ("nodesRequired") that were asked for
         are less than the additional number of racks ("additionalRacksRequired")
         that container replicas need to be on. This means that returning
         nodesRequired number of nodes will not satisfy the rack scatter policy.
         If fallback is false, throw an exception. Else, continue to find
         nodesRequired number of nodes.
          */
         if (!fallback) {
           throw new SCMException(reason,
               FAILED_TO_FIND_SUITABLE_NODE);
         }
       }
   ```
   
   So this means that for both values of fallback, the argument can just be `nodesRequired`. Anyway, `chooseNodesFromRacks` should still try to spread it across as many racks as possible, with excluded and used racks sorted at the end of the list. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1273611127


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -268,20 +276,36 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
     int numberOfRacksRequired = getRequiredRackCount(requiredReplicationFactor);
     int additionalRacksRequired =

Review Comment:
   On the misreplication handler in that case we should only copy the number of replicas by the change in the misreplication count. Does this make sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270282206


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {

Review Comment:
   Curious to know when this condition being satisfied.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {

Review Comment:
   if we have fallback enabled shouldn't we just change the list available racks as the total number of racks required and changeMaxReplicasPerRack value instead? It would be simpler right?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   Agreed



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            FAILED_TO_FIND_SUITABLE_NODE);
+      }
     }
 
+    /*
+    If fallback is false, and we reach here, then nodesRequired is greater
+    than or equal to additionalRacksRequired. So we need to ask for
+    nodesRequired number of nodes. If fallback is true, then nodesRequired
+    could be less than additionalRacksRequired. We still need to ask for
+    nodesRequired number of more nodes, even if that won't satisfy the
+    requirement of being on additionalRacksRequired more racks.
+
+    So, in both cases, we want to ask for nodesRequired number of nodes.
+    */
+    int maxOuterLoopIterations = Math.min(nodesRequired, maxReplicasPerRack);
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>(
         chooseNodesFromRacks(racks, unavailableNodes,
-            mutableFavoredNodes, additionalRacksRequired,
-            metadataSizeRequired, dataSizeRequired, 1,
-            usedRacksCntMap, maxReplicasPerRack));
+            mutableFavoredNodes, nodesRequired, metadataSizeRequired,
+            dataSizeRequired, maxOuterLoopIterations, usedRacksCntMap,

Review Comment:
   We shouldn't change the numberOfNodesRequired from additionalRacksRequired to nodesRequired. Since this function call is for spreading placement to as many racks as possible & choose racks only from the unused racks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1655108347

   Thanks everyone for the reviews and discussions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1277086898


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -268,20 +276,36 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
     int numberOfRacksRequired = getRequiredRackCount(requiredReplicationFactor);
     int additionalRacksRequired =

Review Comment:
   I had another question. Ideally this can happen only in a small cluster. Shouldn't we ideally wait for the unhealthy replica to get deleted and create the replica ? That should be fine right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] ashishkumar50 commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "ashishkumar50 (via GitHub)" <gi...@apache.org>.

ashishkumar50 commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1271841166


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   Looks fine now, earlier it was 1 for the 1st call now changed to `maxReplicasPerRack` which is better than 1 and don't have any impact. Yes later we can optimize more in new Jira you created HDDS-9054.
   You can update PR description as well for `maxOuterLoopIterations`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] ashishkumar50 commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "ashishkumar50 (via GitHub)" <gi...@apache.org>.

ashishkumar50 commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270451384


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   Keeping `INT_MAX` doesn't harm anything here as `OUTER_LOOP_MAX_RETRY` already controls for exit condition too in case we really could not find enough nodes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1271868980


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   Thanks @ashishkumar50 for your review. I've now updated the PR description as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1651588527

   I retested after applying this PR in the ozone-topology docker environment and have updated the testing [document](https://docs.google.com/document/d/1ebuSwJZkw4wMWWCHinDvRCfNbeFD4kcHMyIN6Q6wD9g/edit?usp=sharing).
   
   > Conclusion:
   > Without patch HDDS-9011, under replication + mis replication is fixed in situation 1. It fails in Situation 2 and Situation 4. It will probably work in Situation 3 - not tested.
   > 
   > With patch HDDS-9011, under replication is fixed in ALL the above scenarios. However it performs worse in Situation 1 - it’s able to fix under replication but mis replication is blocked.
   
   
   So there's an impact to mis replication in at least one case, after all. I've explained the impact in Situation 1 in the doc. Otherwise, this patch works well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1274971464


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {

Review Comment:
   If you think this PR works as expected and what you're suggesting is an improvement that can be made separately, let's do that in https://issues.apache.org/jira/browse/HDDS-9054?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1272887920


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            FAILED_TO_FIND_SUITABLE_NODE);
+      }
     }
 
+    /*
+    If fallback is false, and we reach here, then nodesRequired is greater
+    than or equal to additionalRacksRequired. So we need to ask for
+    nodesRequired number of nodes. If fallback is true, then nodesRequired
+    could be less than additionalRacksRequired. We still need to ask for
+    nodesRequired number of more nodes, even if that won't satisfy the
+    requirement of being on additionalRacksRequired more racks.
+
+    So, in both cases, we want to ask for nodesRequired number of nodes.
+    */
+    int maxOuterLoopIterations = Math.min(nodesRequired, maxReplicasPerRack);
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>(
         chooseNodesFromRacks(racks, unavailableNodes,
-            mutableFavoredNodes, additionalRacksRequired,
-            metadataSizeRequired, dataSizeRequired, 1,
-            usedRacksCntMap, maxReplicasPerRack));
+            mutableFavoredNodes, nodesRequired, metadataSizeRequired,
+            dataSizeRequired, maxOuterLoopIterations, usedRacksCntMap,

Review Comment:
   I believe in that case we should just check if the misreplication count is going down or remaining same in the case fallback is enabled. We shouldn't change this parameter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1277078712


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -268,20 +276,36 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
     int numberOfRacksRequired = getRequiredRackCount(requiredReplicationFactor);
     int additionalRacksRequired =

Review Comment:
   yeah this does makes sense when the rack itself is not available. Thanks for pointing out such a scenario. @siddhantsangwan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] sumitagrawl merged pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "sumitagrawl (via GitHub)" <gi...@apache.org>.

sumitagrawl merged PR #5097:
URL: https://github.com/apache/ozone/pull/5097


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1649233078

   > The first priority for replication manager is to fix unhealthy replicas which would be done by underreplication handler. The next priority in line is fixing over replication & mis-replication has the least priority.
   
   Yes, I agree.
   
   > But the chooseDatanode function is used by both misreplication handler & underreplication handler.
   
   Yes, according to me the changes introduced in this PR should not have a negative impact on mis replication handling, like I said before. But I might be missing some edge cases that you're aware of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1273147550


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            FAILED_TO_FIND_SUITABLE_NODE);
+      }
     }
 
+    /*
+    If fallback is false, and we reach here, then nodesRequired is greater
+    than or equal to additionalRacksRequired. So we need to ask for
+    nodesRequired number of nodes. If fallback is true, then nodesRequired
+    could be less than additionalRacksRequired. We still need to ask for
+    nodesRequired number of more nodes, even if that won't satisfy the
+    requirement of being on additionalRacksRequired more racks.
+
+    So, in both cases, we want to ask for nodesRequired number of nodes.
+    */
+    int maxOuterLoopIterations = Math.min(nodesRequired, maxReplicasPerRack);
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>(
         chooseNodesFromRacks(racks, unavailableNodes,
-            mutableFavoredNodes, additionalRacksRequired,
-            metadataSizeRequired, dataSizeRequired, 1,
-            usedRacksCntMap, maxReplicasPerRack));
+            mutableFavoredNodes, nodesRequired, metadataSizeRequired,
+            dataSizeRequired, maxOuterLoopIterations, usedRacksCntMap,

Review Comment:
   > check if the misreplication count is going down or remaining same in the case fallback is enabled
   
   I believe we're already doing that here:
   ```
       List<DatanodeDetails> newPlacement =
           new ArrayList<>(usedNodes.size() + result.size());
       newPlacement.addAll(usedNodes);
       newPlacement.addAll(chosenNodes);
       ContainerPlacementStatus placementStatus =
           validateContainerPlacement(newPlacement, requiredReplicationFactor);
       if (!placementStatus.isPolicySatisfied()) {
         ContainerPlacementStatus initialPlacementStatus =
             validateContainerPlacement(usedNodes, requiredReplicationFactor);
         if (initialPlacementStatus.misReplicationCount()
                 < placementStatus.misReplicationCount()) {
           String errorMsg = "ContainerPlacementPolicy not met. Misreplication" +
                   " Reason: " + placementStatus.misReplicatedReason() +
                   " Initial Used nodes mis-replication Count: " +
                   initialPlacementStatus.misReplicationCount() +
                   " Used nodes + Chosen nodes mis-replication Count: " +
                   placementStatus.misReplicationCount();
           throw new SCMException(errorMsg, FAILED_TO_FIND_SUITABLE_NODE);
         }
       }
   ```
   
   > We shouldn't change this parameter.
   
   My understanding is that we should look for `additionalRacksRequired` if fallback is false, and for `nodesRequired` if fallback is true. Now, if fallback is false, then `nodesRequired` must be greater than `additionalRacksRequired` or we'll throw an exception earlier in the method:
   ```    
   if (nodesRequired < additionalRacksRequired) {
         String reason = "Required nodes size: " + nodesRequired
                 + " is less than required number of racks to choose: "
                 + additionalRacksRequired + ".";
         LOG.warn("Placement policy cannot choose the enough racks. {}"
                         + "Total number of Required Racks: {} Used Racks Count:" +
                         " {}, Required Nodes count: {}, fallback: {}.",
                 reason, numberOfRacksRequired, usedRacksCntMap.size(),
                 nodesRequired, fallback);
   
         /*
         The number of additional Datanodes ("nodesRequired") that were asked for
         are less than the additional number of racks ("additionalRacksRequired")
         that container replicas need to be on. This means that returning
         nodesRequired number of nodes will not satisfy the rack scatter policy.
         If fallback is false, throw an exception. Else, continue to find
         nodesRequired number of nodes.
          */
         if (!fallback) {
           throw new SCMException(reason,
               FAILED_TO_FIND_SUITABLE_NODE);
         }
       }
   ```
   
   So this means that for both values of fallback, the argument can just be `nodesRequired`. Anyway, `chooseNodesFromRacks` should still try to spread it across as many racks as possible, with excluded and used racks sorted at the end of the list. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1271798140


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -317,31 +362,48 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
                       "available racks. {} Available racks count: {},"
                       + " Excluded nodes count: {}, UsedNodes count: {}",
               reason, racks.size(), excludedNodesCount, usedNodesCount);
-      throw new SCMException(reason,
-              SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+
+      /*
+      Couldn't choose enough racks to satisfy the rack scatter policy. If
+      fallback is not allowed, then throw an exception.
+      */
+      if (!fallback) {
+        throw new SCMException(reason,
+            SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+      }
     }
 
     if (chosenNodes.size() < nodesRequired) {
+      /* If fallback is allowed, adjust maxReplicasToMove such that
+      replication factor number of replicas are allowed per rack.
+      */
+      if (fallback) {
+        maxReplicasPerRack = requiredReplicationFactor;
+      }
       racks.addAll(usedRacksCntMap.keySet());
       racks = sortRackWithExcludedNodes(racks, excludedNodes, usedRacksCntMap);
       racks.addAll(usedRacksCntMap.keySet());
+      LOG.debug("Available racks considering racks with used and exclude " +
+              "nodes: {}.", racks);
       chosenNodes.addAll(chooseNodesFromRacks(racks, unavailableNodes,
               mutableFavoredNodes, nodesRequired - chosenNodes.size(),
               metadataSizeRequired, dataSizeRequired,
-              Integer.MAX_VALUE, usedRacksCntMap, maxReplicasPerRack));
+              nodesRequired - chosenNodes.size(),

Review Comment:
   I've decided to make it `maxReplicasPerRack` at line 352 for the first call, and `Integer.MAX_VALUE ` for the second call. If we want to increase it for the first call, we can do that in this separate jira - https://issues.apache.org/jira/browse/HDDS-9054.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1270603305


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {

Review Comment:
   I don't really understand what you're suggesting - can you please rephrase that?
   
   The general idea here is that it's possible to fail some of the conditions above this, such as `nodesRequired < additionalRacksRequired`, and still get a new DN without having to increase `maxReplicasPerRack`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] umamaheswararao commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "umamaheswararao (via GitHub)" <gi...@apache.org>.

umamaheswararao commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1648773186

   @swamirishi could you please take a look at this PR ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] swamirishi commented on a diff in pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "swamirishi (via GitHub)" <gi...@apache.org>.

swamirishi commented on code in PR #5097:
URL: https://github.com/apache/ozone/pull/5097#discussion_r1272886534


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -290,24 +314,45 @@ protected List<DatanodeDetails> chooseDatanodesInternal(
       unavailableNodes.addAll(excludedNodes);
     }
 
+    LOG.debug("Available racks excluding racks with used nodes: {}.", racks);
     if (racks.size() < additionalRacksRequired) {
       String reason = "Number of existing racks: " + racks.size()
               + "is less than additional required number of racks to choose: "
               + additionalRacksRequired + " do not match.";
       LOG.warn("Placement policy cannot choose the enough racks. {}"
                       + "Total number of Required Racks: {} Used Racks Count:" +
-                      " {}, Required Nodes count: {}",
+                      " {}, Required Nodes count: {}, fallback: {}.",
               reason, numberOfRacksRequired, usedRacksCntMap.size(),
-              nodesRequired);
-      throw new SCMException(reason,
-              FAILED_TO_FIND_SUITABLE_NODE);
+              nodesRequired, fallback);
+
+      /*
+      The number of racks which aren't already used by existing replicas
+      ("racks") and hence are available for new replicas are less than the
+      additional number of racks that replicas need to be on
+      ("additionalRacksRequired").
+      */
+      if (!fallback) {

Review Comment:
   This should happen in the case racks.size() < additionalRacksRequired



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org

[GitHub] [ozone] siddhantsangwan commented on pull request #5097: HDDS-9011. EC: Unhealthy replica not replaced with a healthy replica in a rack-scatter environment

Posted by "siddhantsangwan (via GitHub)" <gi...@apache.org>.

siddhantsangwan commented on PR #5097:
URL: https://github.com/apache/ozone/pull/5097#issuecomment-1643912270

   @swamirishi can you also please review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org