You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/07/19 01:45:50 UTC

[GitHub] [ozone] swamirishi opened a new pull request, #3606: HDDS-7015: Failures while choosing nodes for creating container

swamirishi opened a new pull request, #3606:
URL: https://github.com/apache/ozone/pull/3606

   ## What changes were proposed in this pull request?
   When the object of the Class DataNodeDetails is built from protobuf only UUID of the datanode is added which is used for the hashcode. Thus not passing any information about the topology. While excluding datanodes the object is built from protobuf. NetworkTopology removes all nodes from the list which does not fall under the scope while selecting a random node. Default scope value is "/default-rack/" which won't match the required scope. Thus pass the proper object of DatanodeDetails while trying to get the random node from NetworkTopology should fix this.
   
   ## What is the link to the Apache JIRA
   https://jira.cloudera.com/browse/CDPD-42577
   
   ## How was this patch tested?
   Stage Test. Writing a unit test would be flaky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r925543664


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -126,7 +127,36 @@ public ConfigurationSource getConf() {
    * @throws SCMException SCM exception.
    */
   @Override
-  public List<DatanodeDetails> chooseDatanodes(
+  public final List<DatanodeDetails> chooseDatanodes(
+          List<DatanodeDetails> excludedNodes,
+          List<DatanodeDetails> favoredNodes,
+          int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+          throws SCMException {
+//    This method calls the chooseDatanodeInternal after fixing
+//    the excludeList to get the DatanodeDetails from the node manager.
+//    Network Topology requires this info to choose a random node.
+    return chooseDatanodesInternal(
+            Objects.isNull(excludedNodes)
+                    ? excludedNodes : excludedNodes.stream()
+                    .map(node -> nodeManager
+                            .getNodeByUuid(node.getUuidString()))

Review Comment:
   There are a few tests failing, and I think its because the MockNodeManager is not setup correctly to return the nodes when the MockNodeManager.getNodeByUuid() is called. However it does highlight a problem, in that if the excluded node is not found here, a null will end up in the excludedNodes array. I think we need to filter out any that map to null.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r925577072


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementCapacity.java:
##########
@@ -98,7 +98,7 @@ public SCMContainerPlacementCapacity(final NodeManager nodeManager,
    * @throws SCMException  SCMException
    */
   @Override
-  public List<DatanodeDetails> chooseDatanodes(
+  protected List<DatanodeDetails> chooseDatanodesInternal(

Review Comment:
   At line 105, I think you need to change `super.chooseDatanodes` to `super.chooseDatanodesInternal()`.
   
   Also to get the test to pass, I needed to add this to TestSCMContainerPlacementCapacity:
   
   ```
       when(mockNodeManager.getNodeByUuid(anyString())).thenAnswer(
           invocation -> {
             String uuid = invocation.getArgument(0);
             return datanodes.stream().filter(
                 datanode -> datanode.getUuid().toString().equals(uuid)).findFirst()
                     .orElse(null);
           });
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] swamirishi commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
swamirishi commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r925927519


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -126,7 +127,36 @@ public ConfigurationSource getConf() {
    * @throws SCMException SCM exception.
    */
   @Override
-  public List<DatanodeDetails> chooseDatanodes(
+  public final List<DatanodeDetails> chooseDatanodes(
+          List<DatanodeDetails> excludedNodes,
+          List<DatanodeDetails> favoredNodes,
+          int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+          throws SCMException {
+//    This method calls the chooseDatanodeInternal after fixing

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3606:
URL: https://github.com/apache/ozone/pull/3606#issuecomment-1188720619

   @swamirishi there are some checkstyle violations, please check:
   
   https://github.com/apache/ozone/runs/7401117903?check_suite_focus=true#step:6:8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #3606: HDDS-7015. Ensure excluded nodes have network location when used in placement policies

Posted by GitBox <gi...@apache.org>.
sodonnel merged PR #3606:
URL: https://github.com/apache/ozone/pull/3606


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r926012564


##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementCapacity.java:
##########
@@ -127,7 +137,7 @@ public void chooseDatanodes() throws SCMException {
 
       //when
       List<DatanodeDetails> datanodeDetails = scmContainerPlacementRandom
-          .chooseDatanodes(existingNodes, null, 1, 15, 15);
+          .chooseDatanodesInternal(existingNodes, null, 1, 15, 15);

Review Comment:
   Should we be calling the public interface, ie chooseDatanodes in the tests, rather than the Internal method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924175217


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -143,10 +147,12 @@ public List<DatanodeDetails> chooseDatanodes(
 
     List<Node> toChooseRacks = new LinkedList<>();
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>();
-    List<Node> unavailableNodes = new ArrayList<>();
     Set<Node> skippedRacks = new HashSet<>();
     if (excludedNodes != null) {
-      unavailableNodes.addAll(excludedNodes);
+      unavailableNodes.addAll(
+              excludedNodes.stream()
+                      .filter(node->!unavailableNodes.contains(node))
+                      .collect(Collectors.toList()));
     }

Review Comment:
   Oh, I see.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924332915


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -107,8 +107,12 @@ public List<DatanodeDetails> chooseDatanodes(
     int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
     List<Node> availableNodes = networkTopology.getNodes(
         networkTopology.getMaxLevel());
+    List<Node> unavailableNodes = new ArrayList<>();
     int totalNodesCount = availableNodes.size();
     if (excludedNodes != null) {
+      unavailableNodes.addAll(
+              availableNodes.stream().filter(excludedNodes::contains)

Review Comment:
   Why do we need to ensure the excluded node is in the available nodes? Can we not just use the excluded nodes as it is, without having to iterate over the available nodes?
   
   The available nodes list could be quite large, maybe over 1000 entries. That would mean it needs scan the excluded list for each of these. Lets say the excluded list contains 10 entries, that would mean 10k comparisons in the worst case.
   
   I think I don't understand how this fixes the issue, as the original code just had `unavailableNodes.addAll(excludedNodes);`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on PR #3606:
URL: https://github.com/apache/ozone/pull/3606#issuecomment-1188976178

   While the change here will work, I do wonder if the same problem exists in other placement policies. I also wonder if a similar problem could exist with favoredNodes, although we don't seem to use it right now, at least in the EC flow.
   
   Would it be better to provide a new method on SCMNodeManager (and its interface) to lookup the network location for a node that can be used within the Common Placement Policy, as it already has a reference to NodeManager. NodeManager already has the logic to resolve a host when it registers (see the register method).
   
   Then in SCMCommonPlacementPolicy, change the method:
   
   ```
   @Override
     public List<DatanodeDetails> chooseDatanodes(
         List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
         int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
         throws SCMException {
   ```
   
   To look like:
   
   ```
   
     public List<DatanodeDetails> final chooseDatanodes(
         List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
         int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
         throws SCMException {
         fixedFavoredNodes = setNetworkLocation(favoredNodes);
         fixedExcludedNodes = setNetworkLocation(excludedNodes);
         chooseDatanodesInternal(...);
     }
     
     protected List<DatanodeDetails> chooseDatanodesInternal(....) {
        // existing logic the original chooseDatanode
        // Sub-classes must override here to provide their own implementation of 
        // chooseDatanodes, ensuring the favored / excluded node lists always have their
        // network location resolved.
     }
         
   ```
   
   This way, each implementation must override the protected method and the common code will always get executed. It would be even better if we could make chooseDatanodesInternal abstract so they must provide an implementation, but I am not sure we can do that with the way things are structured, as we have Interface -> CommonImpl -> Other impls.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] swamirishi commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
swamirishi commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924145356


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -143,10 +147,12 @@ public List<DatanodeDetails> chooseDatanodes(
 
     List<Node> toChooseRacks = new LinkedList<>();
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>();
-    List<Node> unavailableNodes = new ArrayList<>();
     Set<Node> skippedRacks = new HashSet<>();
     if (excludedNodes != null) {
-      unavailableNodes.addAll(excludedNodes);
+      unavailableNodes.addAll(
+              excludedNodes.stream()
+                      .filter(node->!unavailableNodes.contains(node))
+                      .collect(Collectors.toList()));
     }

Review Comment:
   I am just adding the nodes which might not have come as part of the available nodes list.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] swamirishi commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
swamirishi commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r926077221


##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementRackAware.java:
##########
@@ -107,24 +107,23 @@ private void setup(int datanodeCount) {
       DatanodeDetails datanodeDetails =
           MockDatanodeDetails.createDatanodeDetails(
           hostname + i, rack + (i / NODE_PER_RACK));
+      datanodes.add(datanodeDetails);
+      cluster.add(datanodeDetails);
       DatanodeInfo datanodeInfo = new DatanodeInfo(
-          datanodeDetails, NodeStatus.inServiceHealthy(),
-          UpgradeUtils.defaultLayoutVersionProto());
+              datanodeDetails, NodeStatus.inServiceHealthy(),

Review Comment:
   Yeah might have been a typo. This testcase was failing was debugging it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924106015


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -143,10 +147,12 @@ public List<DatanodeDetails> chooseDatanodes(
 
     List<Node> toChooseRacks = new LinkedList<>();
     Set<DatanodeDetails> chosenNodes = new LinkedHashSet<>();
-    List<Node> unavailableNodes = new ArrayList<>();
     Set<Node> skippedRacks = new HashSet<>();
     if (excludedNodes != null) {
-      unavailableNodes.addAll(excludedNodes);
+      unavailableNodes.addAll(
+              excludedNodes.stream()
+                      .filter(node->!unavailableNodes.contains(node))
+                      .collect(Collectors.toList()));
     }

Review Comment:
   Since neither `excludedNodes`, nor `unavailableNodes` changes after filling `unavailableNodes` initially, is this second `addAll` call necessary?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] swamirishi commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
swamirishi commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924832321


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -107,8 +107,12 @@ public List<DatanodeDetails> chooseDatanodes(
     int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
     List<Node> availableNodes = networkTopology.getNodes(
         networkTopology.getMaxLevel());
+    List<Node> unavailableNodes = new ArrayList<>();
     int totalNodesCount = availableNodes.size();
     if (excludedNodes != null) {
+      unavailableNodes.addAll(
+              availableNodes.stream().filter(excludedNodes::contains)

Review Comment:
   The excludeList is passed through ExcludeListProto which basically has an array of strings in protobuf message. This is particularly being changed only on allocate block flow. Changing the message could make it not compatible with older clients which would be an issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] swamirishi commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
swamirishi commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924994691


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -127,6 +126,28 @@ public ConfigurationSource getConf() {
    */
   @Override
   public List<DatanodeDetails> chooseDatanodes(
+          List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+          int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+          throws SCMException {
+    return chooseDatanodesInternal(
+            excludedNodes.stream()
+                    .map(node->nodeManager.getNodeByUuid(node.getUuidString()))
+                    .collect(Collectors.toList()),
+            favoredNodes, nodesRequired, metadataSizeRequired, dataSizeRequired);

Review Comment:
   Exclude List comes as part of the ExcludeList Protobuf. Favoured nodes are passed nowhere & is passed as an empty list



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] swamirishi commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
swamirishi commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r925957440


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -126,7 +127,36 @@ public ConfigurationSource getConf() {
    * @throws SCMException SCM exception.
    */
   @Override
-  public List<DatanodeDetails> chooseDatanodes(
+  public final List<DatanodeDetails> chooseDatanodes(
+          List<DatanodeDetails> excludedNodes,
+          List<DatanodeDetails> favoredNodes,
+          int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+          throws SCMException {
+//    This method calls the chooseDatanodeInternal after fixing
+//    the excludeList to get the DatanodeDetails from the node manager.
+//    Network Topology requires this info to choose a random node.
+    return chooseDatanodesInternal(
+            Objects.isNull(excludedNodes)
+                    ? excludedNodes : excludedNodes.stream()
+                    .map(node -> nodeManager
+                            .getNodeByUuid(node.getUuidString()))

Review Comment:
   Instead of Filtering out. It is better to pass the object. It totally depends on the implementation if it is gonna use NetworkTopology to choose a node.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924913484


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -107,8 +107,12 @@ public List<DatanodeDetails> chooseDatanodes(
     int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
     List<Node> availableNodes = networkTopology.getNodes(
         networkTopology.getMaxLevel());
+    List<Node> unavailableNodes = new ArrayList<>();
     int totalNodesCount = availableNodes.size();
     if (excludedNodes != null) {
+      unavailableNodes.addAll(
+              availableNodes.stream().filter(excludedNodes::contains)

Review Comment:
   See my later comment about a suggested way forward. I think the solution here will leave other bugs still open in the other placement policies.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r925546946


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -126,7 +127,36 @@ public ConfigurationSource getConf() {
    * @throws SCMException SCM exception.
    */
   @Override
-  public List<DatanodeDetails> chooseDatanodes(
+  public final List<DatanodeDetails> chooseDatanodes(
+          List<DatanodeDetails> excludedNodes,
+          List<DatanodeDetails> favoredNodes,
+          int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+          throws SCMException {
+//    This method calls the chooseDatanodeInternal after fixing

Review Comment:
   Expand this comment a bit, and also mention this Jira in it please. Eg:
   
   > The nodes in the exclude list are created from the ExcludeList passed from the client. It contains only the DN UUID to exclude and does not include the network location. In order for the node choosing code to work correctly, it needs to know the network location (rack) the node is associated with. Therefore we lookup each excluded node in the NodeManager by its UUID, and if found, use the datanodeDetails instance stored inside NodeManager. If no node is found in NM, then it means SCM does know know about this node, so there is no point in passing it into the downstream methods and it can be filtered out. See HDDS-7015.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924975687


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -127,6 +126,28 @@ public ConfigurationSource getConf() {
    */
   @Override
   public List<DatanodeDetails> chooseDatanodes(
+          List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+          int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+          throws SCMException {
+    return chooseDatanodesInternal(
+            excludedNodes.stream()
+                    .map(node->nodeManager.getNodeByUuid(node.getUuidString()))
+                    .collect(Collectors.toList()),
+            favoredNodes, nodesRequired, metadataSizeRequired, dataSizeRequired);

Review Comment:
   Do we need to do something similar for favoredNodes as they are probably passed from client to server over protobuf too. Not sure if they end up in the topology class or not though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924974013


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -127,6 +126,28 @@ public ConfigurationSource getConf() {
    */
   @Override
   public List<DatanodeDetails> chooseDatanodes(
+          List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+          int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+          throws SCMException {
+    return chooseDatanodesInternal(
+            excludedNodes.stream()

Review Comment:
   Probably need a comment here explaining why we are doing the transformation on the excludedNodes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924372689


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java:
##########
@@ -107,8 +107,12 @@ public List<DatanodeDetails> chooseDatanodes(
     int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
     List<Node> availableNodes = networkTopology.getNodes(
         networkTopology.getMaxLevel());
+    List<Node> unavailableNodes = new ArrayList<>();
     int totalNodesCount = availableNodes.size();
     if (excludedNodes != null) {
+      unavailableNodes.addAll(
+              availableNodes.stream().filter(excludedNodes::contains)

Review Comment:
   I understand now what is going on here - the network location is not set inside the DatanodeDetails and therefore it doesn't work correctly in the topology code. So you are looking up the real datanode details if they are present in the available nodes to get a DNDetails object that is fully populated.
   
   I immediately wonder how the other placement policies behave with excluded nodes? We need to check if they have the same problem and fix them too.
   
   However I am not sure if this fix is the best - should we not ensure that the excluded nodes passed in have the location setup correctly inside the DatanodeDetails? Or check for the nodes in NodeManager when inflating the protobuf?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r925577072


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementCapacity.java:
##########
@@ -98,7 +98,7 @@ public SCMContainerPlacementCapacity(final NodeManager nodeManager,
    * @throws SCMException  SCMException
    */
   @Override
-  public List<DatanodeDetails> chooseDatanodes(
+  protected List<DatanodeDetails> chooseDatanodesInternal(

Review Comment:
   At line 105, I think you need to change `super.chooseDatanodes` to `super.chooseDatanodesInternal()`.
   
   Also to get the test to pass, I needed to add this to TestSCMContainerPlacementCapacity:
   
   ```
       when(mockNodeManager.getNodeByUuid(anyString())).thenAnswer(
           invocation -> {
             String uuid = invocation.getArgument(0);
             return datanodes.stream().filter(
                 datanode -> datanode.getUuid().toString().equals(uuid)).findFirst()
                     .orElse(null);
           });
   ```
   
   I suspect the other test classes for the other policies need something similar.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r926014962


##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementRackAware.java:
##########
@@ -107,24 +107,23 @@ private void setup(int datanodeCount) {
       DatanodeDetails datanodeDetails =
           MockDatanodeDetails.createDatanodeDetails(
           hostname + i, rack + (i / NODE_PER_RACK));
+      datanodes.add(datanodeDetails);
+      cluster.add(datanodeDetails);
       DatanodeInfo datanodeInfo = new DatanodeInfo(
-          datanodeDetails, NodeStatus.inServiceHealthy(),
-          UpgradeUtils.defaultLayoutVersionProto());
+              datanodeDetails, NodeStatus.inServiceHealthy(),

Review Comment:
   Can you check the indentation on these lines - it looks wrong, although checkstyle has passed it. I think most of the lines changed from here to the bottom of the PR are all indented more than they need to be.
   
   BTW, you can run checkstyle locally to see if the style is correct if you change the indentation here. From the root of the repo run:
   
   ```
   ./hadoop-ozone/dev-support/checks/checkstyle.sh
   ```
   
   Its the same in TestSCMContainerPlacementRackScatter.java too.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3606: HDDS-7015. Failures while choosing nodes for creating container

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3606:
URL: https://github.com/apache/ozone/pull/3606#discussion_r924973597


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -127,6 +126,28 @@ public ConfigurationSource getConf() {
    */
   @Override
   public List<DatanodeDetails> chooseDatanodes(
+          List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,

Review Comment:
   I think we need to make this method final so it is not mistakenly overridden in a child class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org