You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/04 08:59:35 UTC

[GitHub] [ozone] symious opened a new pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

symious opened a new pull request #2497:
URL: https://github.com/apache/ozone/pull/2497


   ## What changes were proposed in this pull request?
   
   One strategy for EC is to spread its replicas across as many racks as possible in the cluster to allow for multiple racks going down without affecting availability. This Jira is to implement an "As Many Racks As Possible" placement policy. While it is intended for EC, it could also be used by Ratis containers.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5326
   
   ## How was this patch tested?
   
   unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689581477



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);

Review comment:
       This will be executed on a retry pass. I have a couple of questions:
   
   1. Do we need to remove all the racks in chosenNodes here?
   
   2. When do we expect the loop to run again for a retry? Is this related to fallback, or do we just retry a few times if we don't find enough nodes on unique racks?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689531833



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {
+          Node affinityNode = ((InnerNode)rack).getNodes(2).get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRack.removeAll(chosenRacksInForLoop);
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        retryCount = 0;
+      }
+    }
+
+    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+  }
+
+  @Override
+  public DatanodeDetails chooseNode(List<DatanodeDetails> healthyNodes) {
+    return null;
+  }
+
+  /**
+   * Choose a datanode which meets the requirements. If there is no node which
+   * meets all the requirements, there is fallback chosen process depending on
+   * whether fallback is allowed when this class is instantiated.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to excluded. Can be null.
+   * @param affinityNode - the chosen nodes should be on the same rack as
+   *                    affinityNode. Can be null.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of chosen datanodes.
+   * @throws SCMException  SCMException
+   */
+  private Node chooseNode(List<Node> excludedNodes, Node affinityNode,
+      long metadataSizeRequired, long dataSizeRequired) throws SCMException {
+    int ancestorGen = RACK_LEVEL;
+    int maxRetry = MAX_RETRY;
+    List<String> excludedNodesForCapacity = null;
+    while(true) {
+      metrics.incrDatanodeChooseAttemptCount();
+      Node node = networkTopology.chooseRandom(NetConstants.ROOT,
+          excludedNodesForCapacity, excludedNodes, affinityNode, ancestorGen);
+      if (node == null) {
+        // cannot find the node which meets all constrains
+        LOG.warn("Failed to find the datanode for container. excludedNodes:" +
+            (excludedNodes == null ? "" : excludedNodes.toString()) +
+            ", affinityNode:" +
+            (affinityNode == null ? "" : affinityNode.getNetworkFullPath()));
+        return null;
+      }
+
+      DatanodeDetails datanodeDetails = (DatanodeDetails)node;
+      if (isValidNode(datanodeDetails, metadataSizeRequired,
+          dataSizeRequired)) {
+        return node;
+      }
+
+      maxRetry--;
+      if (maxRetry == 0) {
+        // avoid the infinite loop
+        String errMsg = "No satisfied datanode to meet the space constrains. "
+            + "metadatadata size required: " + metadataSizeRequired +
+            " data size required: " + dataSizeRequired;
+        LOG.info(errMsg);
+        return null;
+      }
+      if (excludedNodesForCapacity == null) {
+        excludedNodesForCapacity = new ArrayList<>();
+      }
+      excludedNodesForCapacity.add(node.getNetworkFullPath());
+    }
+  }
+
+  /**
+   * This default implementation handles rack aware policies and non rack
+   * aware policies. If a future placement policy needs to check more than racks
+   * to validate the policy (eg node groups, HDFS like upgrade domain) this
+   * method should be overridden in the sub class.
+   * This method requires that subclasses which implement rack aware policies
+   * override the default method getRequiredRackCount and getNetworkTopology.
+   * @param dns List of datanodes holding a replica of the container
+   * @param replicas The expected number of replicas
+   * @return ContainerPlacementStatus indicating if the placement policy is
+   *         met or not. Not this only considers the rack count and not the
+   *         number of replicas.
+   */
+  @Override
+  public ContainerPlacementStatus validateContainerPlacement(
+      List<DatanodeDetails> dns, int replicas) {
+    int requiredRacks = replicas;
+    if (networkTopology == null || replicas == 1 || requiredRacks == 1) {
+      return super.validateContainerPlacement(dns, replicas);
+    }
+    // We have a network topology so calculate if it is satisfied or not.
+    int numRacks = 1;
+    final int maxLevel = networkTopology.getMaxLevel();
+    // The leaf nodes are all at max level, so the number of nodes at
+    // leafLevel - 1 is the rack count
+    numRacks = networkTopology.getNumOfNodes(maxLevel - 1);
+    final long currentRackCount = dns.stream()
+        .map(d -> networkTopology.getAncestor(d, 1))
+        .distinct()
+        .count();
+
+    if (replicas > numRacks) {
+      requiredRacks = numRacks;
+    }
+    return new ContainerPlacementStatusDefault(
+        (int)currentRackCount, requiredRacks, numRacks);
+  }
+
+  private Node getRackOfDatanodeDetails(DatanodeDetails datanodeDetails) {
+    String location = datanodeDetails.getNetworkLocation();
+    return networkTopology.getNode(location);
+  }
+
+  private List<Node> sortRackWithExcludedNodes(List<Node> racks,
+      List<DatanodeDetails> excludedNodes) {
+    Set<Node> lessPreferedRacks = excludedNodes.stream()
+        .map(node -> networkTopology.getAncestor(node, RACK_LEVEL))
+        .collect(Collectors.toSet());
+    List <Node> result = new ArrayList<>();
+    for (Node rack : racks) {
+      if (!lessPreferedRacks.contains(rack)) {
+        result.add(rack);
+      }
+    }
+    result.addAll(lessPreferedRacks);
+    return result;
+  }
+
+  private List<Node> getAllRacks() {
+    int rackLevel = networkTopology.getMaxLevel()-1;
+    return networkTopology.getNodes(rackLevel);
+  }
+
+  private boolean isValidNode(DatanodeDetails datanodeDetails,

Review comment:
       The logic here is very similar to that in `SCMContainerPlacementRackAware.chooseNode()` - do you think we could add this method (`isValidNode()`) to `SCMCommonPlacmentPolicy` and also refactor the class above to use it too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689535562



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {
+          Node affinityNode = ((InnerNode)rack).getNodes(2).get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRack.removeAll(chosenRacksInForLoop);
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        retryCount = 0;
+      }
+    }
+
+    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+  }
+
+  @Override
+  public DatanodeDetails chooseNode(List<DatanodeDetails> healthyNodes) {
+    return null;
+  }
+
+  /**
+   * Choose a datanode which meets the requirements. If there is no node which
+   * meets all the requirements, there is fallback chosen process depending on
+   * whether fallback is allowed when this class is instantiated.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to excluded. Can be null.
+   * @param affinityNode - the chosen nodes should be on the same rack as
+   *                    affinityNode. Can be null.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of chosen datanodes.
+   * @throws SCMException  SCMException
+   */
+  private Node chooseNode(List<Node> excludedNodes, Node affinityNode,
+      long metadataSizeRequired, long dataSizeRequired) throws SCMException {
+    int ancestorGen = RACK_LEVEL;
+    int maxRetry = MAX_RETRY;
+    List<String> excludedNodesForCapacity = null;
+    while(true) {
+      metrics.incrDatanodeChooseAttemptCount();
+      Node node = networkTopology.chooseRandom(NetConstants.ROOT,
+          excludedNodesForCapacity, excludedNodes, affinityNode, ancestorGen);
+      if (node == null) {
+        // cannot find the node which meets all constrains
+        LOG.warn("Failed to find the datanode for container. excludedNodes:" +
+            (excludedNodes == null ? "" : excludedNodes.toString()) +
+            ", affinityNode:" +
+            (affinityNode == null ? "" : affinityNode.getNetworkFullPath()));
+        return null;
+      }
+
+      DatanodeDetails datanodeDetails = (DatanodeDetails)node;
+      if (isValidNode(datanodeDetails, metadataSizeRequired,
+          dataSizeRequired)) {
+        return node;
+      }
+
+      maxRetry--;
+      if (maxRetry == 0) {
+        // avoid the infinite loop
+        String errMsg = "No satisfied datanode to meet the space constrains. "
+            + "metadatadata size required: " + metadataSizeRequired +
+            " data size required: " + dataSizeRequired;
+        LOG.info(errMsg);
+        return null;
+      }
+      if (excludedNodesForCapacity == null) {
+        excludedNodesForCapacity = new ArrayList<>();
+      }
+      excludedNodesForCapacity.add(node.getNetworkFullPath());
+    }
+  }
+
+  /**
+   * This default implementation handles rack aware policies and non rack
+   * aware policies. If a future placement policy needs to check more than racks
+   * to validate the policy (eg node groups, HDFS like upgrade domain) this
+   * method should be overridden in the sub class.
+   * This method requires that subclasses which implement rack aware policies
+   * override the default method getRequiredRackCount and getNetworkTopology.
+   * @param dns List of datanodes holding a replica of the container
+   * @param replicas The expected number of replicas
+   * @return ContainerPlacementStatus indicating if the placement policy is
+   *         met or not. Not this only considers the rack count and not the
+   *         number of replicas.
+   */
+  @Override
+  public ContainerPlacementStatus validateContainerPlacement(
+      List<DatanodeDetails> dns, int replicas) {
+    int requiredRacks = replicas;
+    if (networkTopology == null || replicas == 1 || requiredRacks == 1) {
+      return super.validateContainerPlacement(dns, replicas);
+    }
+    // We have a network topology so calculate if it is satisfied or not.
+    int numRacks = 1;
+    final int maxLevel = networkTopology.getMaxLevel();
+    // The leaf nodes are all at max level, so the number of nodes at
+    // leafLevel - 1 is the rack count
+    numRacks = networkTopology.getNumOfNodes(maxLevel - 1);
+    final long currentRackCount = dns.stream()
+        .map(d -> networkTopology.getAncestor(d, 1))
+        .distinct()
+        .count();
+
+    if (replicas > numRacks) {
+      requiredRacks = numRacks;
+    }
+    return new ContainerPlacementStatusDefault(
+        (int)currentRackCount, requiredRacks, numRacks);
+  }
+
+  private Node getRackOfDatanodeDetails(DatanodeDetails datanodeDetails) {
+    String location = datanodeDetails.getNetworkLocation();
+    return networkTopology.getNode(location);
+  }
+
+  private List<Node> sortRackWithExcludedNodes(List<Node> racks,

Review comment:
       Is this method creating a list of all racks, but ensuring the rack associated with any excluded nodes are the at the end of the list? It would be worth a javadoc comment explaining what it is doing I think, as it took me a minute or two to figure it out :-) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-902659666


   @sodonnel Added shuffle for favoredNodes, choose skipped racks first for even distribution. Could you help to check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r696381971



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -205,8 +206,16 @@ public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
         retryCount = 0;
       }
     }
-
-    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+    List<DatanodeDetails> results = Arrays.asList(
+        chosenNodes.toArray(new DatanodeDetails[0]));
+    ContainerPlacementStatus placementStatus =
+        validateContainerPlacement(results, nodesRequired);
+    if (!placementStatus.isPolicySatisfied()) {
+      LOG.warn("ContainerPlacementPolicy not met, currentRacks is " +
+          placementStatus.actualPlacementCount() + ", desired racks is " +
+          placementStatus.expectedPlacementCount() + ".");

Review comment:
       Sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-901129410


   @sodonnel Sure, thanks for the commits: )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r693907069



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,332 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      if (toChooseRacks.size() == 0) {
+        toChooseRacks.addAll(racks);
+        if (!skippedRacks.isEmpty()) {
+          toChooseRacks.removeAll(skippedRacks);
+          toChooseRacks.addAll(0, skippedRacks);
+          skippedRacks.clear();
+        }
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRacks.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRacks.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRacks) {
+        List<Node> nodes = ((InnerNode)rack).getNodes(2);

Review comment:
       Here we are getting the list of nodes on a rack to do two things:
   
   1. Check there are more than zero nodes on the rack.
   2. Get the affinity node.
   
   We later call `NetworkTopology.chooseRandom(...)` with the affinity node.
   
   Could we just set the scope to the rack we are working on, and set affinityNode to null and let NetworkTopology give us a node or null? This would save allocating a list for all the nodes on a rack for each rack we check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689583420



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();

Review comment:
       I wonder if getAllRacks() needs to shuffle the available racks. With the code as it is now and ignoring favoredNodes - I think we will allocate a node from the first rack in this list first each time. If there are 20 racks, and have EC 6-3, will just the first 9 racks get used if we do no shuffle?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2497:
URL: https://github.com/apache/ozone/pull/2497


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689584980



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {

Review comment:
       Should we shuffle mutableFavoredNodes when it is created? If the client keeps passing the same list of favoredNodes, then we will always place the first replicate on the first one in this list. It would probably be better to shuffle them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r692148970



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {

Review comment:
       I am not sure if the order is for a preference either. I suspect favoredNodes is rarely used so its hard to know for sure how it was intended to be used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689555189



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {

Review comment:
       Calling `(InnerNode)rack).getNodes(2)` twice in a row here, which will form a new list each time. We should probably make a single call and store the result in a variable and then would only create a single list?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-899957097


   @sodonnel Thanks for the suggestions. 
   I will first add two other tickets, one for NetworkTopology, the other for SCMPlacementPolicy refactor.
   Then I will update this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689699159



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);

Review comment:
       What I meant is, do we need to remove all the racks already used in choosenNodes from toChooseRack after we all all the racks to it, otherwise we might use the same rack twice? There are two potential problems here - one is that we did not find all the nodes on the first pass over all racks, perhaps as there were not enough nodes with free space selected randomly from a certain rack. However if we tried again, we might get some good nodes and obtain the correct number of racks.
   
   The other problem is that there might not be enough racks to meet the goal, and therefore we cannot remove them from the list.
   
   I **think** we might see some corner cases where we don't select enough racks even though we should be able too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-906250399


   Thanks for the changes. I will commit this after we get a green CI run. I just re-triggered the jobs as acceptance failed on the last run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-901129410


   @sodonnel Sure, thanks for the commits: )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r693912812



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,332 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      if (toChooseRacks.size() == 0) {
+        toChooseRacks.addAll(racks);
+        if (!skippedRacks.isEmpty()) {
+          toChooseRacks.removeAll(skippedRacks);
+          toChooseRacks.addAll(0, skippedRacks);
+          skippedRacks.clear();
+        }
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRacks.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRacks.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRacks) {
+        List<Node> nodes = ((InnerNode)rack).getNodes(2);
+        if (nodes.size() > 0) {
+          Node affinityNode = nodes.get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        } else {
+          // Store the skipped racks to check them first in next outer loop
+          skippedRacks.add(rack);
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRacks.removeAll(chosenRacksInForLoop);
+      // If chosenNodes not changed, increase the retryCount
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        // If chosenNodes changed, reset the retryCount
+        retryCount = 0;
+      }
+    }
+

Review comment:
       I wonder if we should log a warning if we return a list of nodes where the number of racks is less than the value returned from the `requiredRacks()` method? That would indicate the cluster has enough racks, but for some reason, we have not been able to use enough racks to meet the requirement.
   
   We might need to keep a set of racks select as each node is picked to do this efficiently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-901395097


   I have merged master from today into the EC branch. Please rebase this one now and we can continue the good work on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r696354247



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,324 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      toChooseRacks.addAll(racks);
+      if (!skippedRacks.isEmpty()) {
+        toChooseRacks.removeAll(skippedRacks);
+        toChooseRacks.addAll(0, skippedRacks);
+        skippedRacks.clear();
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRacks.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRacks.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      for (Node rack : toChooseRacks) {
+        Node node = chooseNode(rack.getNetworkFullPath(), unavailableNodes,
+            metadataSizeRequired, dataSizeRequired);
+        if (node != null) {
+          chosenNodes.add(node);
+          mutableFavoredNodes.remove(node);
+          unavailableNodes.add(node);
+          nodesRequired--;
+          if (nodesRequired == 0) {
+            break;
+          }
+        } else {
+          // Store the skipped racks to check them first in next outer loop
+          skippedRacks.add(rack);
+        }
+      }
+      // Clear toChooseRacks for this loop
+      toChooseRacks.clear();
+
+      // If chosenNodes not changed, increase the retryCount
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        // If chosenNodes changed, reset the retryCount
+        retryCount = 0;
+      }
+    }
+    List<DatanodeDetails> results = Arrays.asList(

Review comment:
       One other thing to check which I just noticed. Could we make ChosenNodes a `List<DatanodeDetails>` rather than `List<Node>` to avoid having to copy it into a new array at the end here? Would that work with just a simple change to the definition?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689690973



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {
+          Node affinityNode = ((InnerNode)rack).getNodes(2).get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRack.removeAll(chosenRacksInForLoop);
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        retryCount = 0;
+      }
+    }
+
+    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+  }
+
+  @Override
+  public DatanodeDetails chooseNode(List<DatanodeDetails> healthyNodes) {
+    return null;
+  }
+
+  /**
+   * Choose a datanode which meets the requirements. If there is no node which
+   * meets all the requirements, there is fallback chosen process depending on
+   * whether fallback is allowed when this class is instantiated.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to excluded. Can be null.
+   * @param affinityNode - the chosen nodes should be on the same rack as
+   *                    affinityNode. Can be null.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of chosen datanodes.
+   * @throws SCMException  SCMException
+   */
+  private Node chooseNode(List<Node> excludedNodes, Node affinityNode,
+      long metadataSizeRequired, long dataSizeRequired) throws SCMException {
+    int ancestorGen = RACK_LEVEL;
+    int maxRetry = MAX_RETRY;
+    List<String> excludedNodesForCapacity = null;
+    while(true) {
+      metrics.incrDatanodeChooseAttemptCount();
+      Node node = networkTopology.chooseRandom(NetConstants.ROOT,
+          excludedNodesForCapacity, excludedNodes, affinityNode, ancestorGen);
+      if (node == null) {
+        // cannot find the node which meets all constrains
+        LOG.warn("Failed to find the datanode for container. excludedNodes:" +
+            (excludedNodes == null ? "" : excludedNodes.toString()) +
+            ", affinityNode:" +
+            (affinityNode == null ? "" : affinityNode.getNetworkFullPath()));
+        return null;
+      }
+
+      DatanodeDetails datanodeDetails = (DatanodeDetails)node;
+      if (isValidNode(datanodeDetails, metadataSizeRequired,
+          dataSizeRequired)) {
+        return node;
+      }
+
+      maxRetry--;
+      if (maxRetry == 0) {
+        // avoid the infinite loop
+        String errMsg = "No satisfied datanode to meet the space constrains. "
+            + "metadatadata size required: " + metadataSizeRequired +
+            " data size required: " + dataSizeRequired;
+        LOG.info(errMsg);
+        return null;
+      }
+      if (excludedNodesForCapacity == null) {
+        excludedNodesForCapacity = new ArrayList<>();
+      }
+      excludedNodesForCapacity.add(node.getNetworkFullPath());
+    }
+  }
+
+  /**
+   * This default implementation handles rack aware policies and non rack
+   * aware policies. If a future placement policy needs to check more than racks
+   * to validate the policy (eg node groups, HDFS like upgrade domain) this
+   * method should be overridden in the sub class.
+   * This method requires that subclasses which implement rack aware policies
+   * override the default method getRequiredRackCount and getNetworkTopology.
+   * @param dns List of datanodes holding a replica of the container
+   * @param replicas The expected number of replicas
+   * @return ContainerPlacementStatus indicating if the placement policy is
+   *         met or not. Not this only considers the rack count and not the
+   *         number of replicas.
+   */
+  @Override

Review comment:
       One other suggestion here, which I should have mentioned before. It would be a good idea to make this change on master branch first, and then we can merge master to the EC branch. That way, we will avoid potential conflicts in the future if someone makes some further changes related to this method. Same for the `isValidNode()` suggestion. We could make these two changes in a single PR against master. We have this Jira HDDS-5351 where we have done similar things already when developing EC.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-906121283


   @sodonnel Thanks for your reviews. Have added a warning for the cases, could you help to check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689598145



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {
+          Node affinityNode = ((InnerNode)rack).getNodes(2).get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRack.removeAll(chosenRacksInForLoop);
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        retryCount = 0;
+      }
+    }
+
+    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+  }
+
+  @Override
+  public DatanodeDetails chooseNode(List<DatanodeDetails> healthyNodes) {
+    return null;
+  }
+
+  /**
+   * Choose a datanode which meets the requirements. If there is no node which
+   * meets all the requirements, there is fallback chosen process depending on
+   * whether fallback is allowed when this class is instantiated.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to excluded. Can be null.
+   * @param affinityNode - the chosen nodes should be on the same rack as
+   *                    affinityNode. Can be null.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of chosen datanodes.
+   * @throws SCMException  SCMException
+   */
+  private Node chooseNode(List<Node> excludedNodes, Node affinityNode,
+      long metadataSizeRequired, long dataSizeRequired) throws SCMException {
+    int ancestorGen = RACK_LEVEL;
+    int maxRetry = MAX_RETRY;
+    List<String> excludedNodesForCapacity = null;
+    while(true) {
+      metrics.incrDatanodeChooseAttemptCount();
+      Node node = networkTopology.chooseRandom(NetConstants.ROOT,
+          excludedNodesForCapacity, excludedNodes, affinityNode, ancestorGen);
+      if (node == null) {
+        // cannot find the node which meets all constrains
+        LOG.warn("Failed to find the datanode for container. excludedNodes:" +
+            (excludedNodes == null ? "" : excludedNodes.toString()) +
+            ", affinityNode:" +
+            (affinityNode == null ? "" : affinityNode.getNetworkFullPath()));
+        return null;
+      }
+
+      DatanodeDetails datanodeDetails = (DatanodeDetails)node;
+      if (isValidNode(datanodeDetails, metadataSizeRequired,
+          dataSizeRequired)) {
+        return node;
+      }
+
+      maxRetry--;
+      if (maxRetry == 0) {
+        // avoid the infinite loop
+        String errMsg = "No satisfied datanode to meet the space constrains. "
+            + "metadatadata size required: " + metadataSizeRequired +
+            " data size required: " + dataSizeRequired;
+        LOG.info(errMsg);
+        return null;
+      }
+      if (excludedNodesForCapacity == null) {

Review comment:
       Should a node which is fails the `isValidNode()` check, be added to the excludedNode list too? This means that node will not be retried again if this rack is considered again in a retry?
   
   Lets say we have a rack with 10 nodes, but 8 are over capacity. 2 are good. There is a fairly good chance that in 3 retries, we will not find either of the good nodes. Then the outer loop in `chooseDatanodes` may consider this rack again, and it could fall into the same problem. However if we add the nodes to the exclude list, that list will be passed back into this method on a retry in the outer loop and we have a better chance of finding the nodes with capacity on the second or third pass.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689617353



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);

Review comment:
       > Do we need to remove all the racks in chosenNodes here?
   
   I think we don't need to remove the content of chosenNodes. The size of chosenNodes could be used the check if it's a retry or not.
   
   > When do we expect the loop to run again for a retry? Is this related to fallback, or do we just retry a few times if we don't find enough nodes on unique racks?
   
   If the size of chosenNodes doesn't change, we treat it as a retry, and if the size doesn't change for RETRY_MAX times, then we return with exception.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r693945140



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,332 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      if (toChooseRacks.size() == 0) {

Review comment:
       We only refill `toChooseRacks` if `toChooseRacks.size() == 0` and we only remove a value from it when we successfully pick a node from a rack. So after pass 1 in my example, there will be 6 nodes left in `toChooseRacks` and the same 6 in skipped, and we will just keep retrying the same skipped list. Here is where we remove from toChooseRacks:
   
   ```
   if (nodes.size() > 0) {
             Node affinityNode = nodes.get(0);
             Node node = chooseNode(unavailableNodes, affinityNode,
                 metadataSizeRequired, dataSizeRequired);
             if (node != null) {
               chosenNodes.add(node);
               mutableFavoredNodes.remove(node);
               unavailableNodes.add(node);
               nodesRequired--;
               if (nodesRequired == 0) {
                 break;
               }
             }
           } else {
             // Store the skipped racks to check them first in next outer loop
             skippedRacks.add(rack);
           }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-899571682


   @symious Thanks for this change. I have a few comments and questions inline through the code.
   
   One question - lets say we have a cluster with 3 racks and EC-3-2 needs 5 racks. Will the logic here handle that, and allocate something like below?
   
   Rack1 - replica_1, replica_4
   Rack2 - replica_2, replica_5
   Rack3 - replica_3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-894173093


   @sodonnel @umamaheswararao  Could you help to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r693931115



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,332 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      if (toChooseRacks.size() == 0) {

Review comment:
       I think that is the current implementation?
   First pass: 4 racks chosen, 6 racks added to skipped Racks.
   Second pass: 6 racks in skipped racks added in front of toChooseRacks, but the first  6 racks still got no result, so we choose the 5 rack, which will be same as one of the first 4 racks chosen in first pass.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689525798



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {
+          Node affinityNode = ((InnerNode)rack).getNodes(2).get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRack.removeAll(chosenRacksInForLoop);
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        retryCount = 0;
+      }
+    }
+
+    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+  }
+
+  @Override
+  public DatanodeDetails chooseNode(List<DatanodeDetails> healthyNodes) {
+    return null;
+  }
+
+  /**
+   * Choose a datanode which meets the requirements. If there is no node which
+   * meets all the requirements, there is fallback chosen process depending on
+   * whether fallback is allowed when this class is instantiated.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to excluded. Can be null.
+   * @param affinityNode - the chosen nodes should be on the same rack as
+   *                    affinityNode. Can be null.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of chosen datanodes.
+   * @throws SCMException  SCMException
+   */
+  private Node chooseNode(List<Node> excludedNodes, Node affinityNode,
+      long metadataSizeRequired, long dataSizeRequired) throws SCMException {
+    int ancestorGen = RACK_LEVEL;
+    int maxRetry = MAX_RETRY;
+    List<String> excludedNodesForCapacity = null;
+    while(true) {
+      metrics.incrDatanodeChooseAttemptCount();
+      Node node = networkTopology.chooseRandom(NetConstants.ROOT,
+          excludedNodesForCapacity, excludedNodes, affinityNode, ancestorGen);
+      if (node == null) {
+        // cannot find the node which meets all constrains
+        LOG.warn("Failed to find the datanode for container. excludedNodes:" +
+            (excludedNodes == null ? "" : excludedNodes.toString()) +
+            ", affinityNode:" +
+            (affinityNode == null ? "" : affinityNode.getNetworkFullPath()));
+        return null;
+      }
+
+      DatanodeDetails datanodeDetails = (DatanodeDetails)node;
+      if (isValidNode(datanodeDetails, metadataSizeRequired,
+          dataSizeRequired)) {
+        return node;
+      }
+
+      maxRetry--;
+      if (maxRetry == 0) {
+        // avoid the infinite loop
+        String errMsg = "No satisfied datanode to meet the space constrains. "
+            + "metadatadata size required: " + metadataSizeRequired +
+            " data size required: " + dataSizeRequired;
+        LOG.info(errMsg);
+        return null;
+      }
+      if (excludedNodesForCapacity == null) {
+        excludedNodesForCapacity = new ArrayList<>();
+      }
+      excludedNodesForCapacity.add(node.getNetworkFullPath());
+    }
+  }
+
+  /**
+   * This default implementation handles rack aware policies and non rack
+   * aware policies. If a future placement policy needs to check more than racks
+   * to validate the policy (eg node groups, HDFS like upgrade domain) this
+   * method should be overridden in the sub class.
+   * This method requires that subclasses which implement rack aware policies
+   * override the default method getRequiredRackCount and getNetworkTopology.
+   * @param dns List of datanodes holding a replica of the container
+   * @param replicas The expected number of replicas
+   * @return ContainerPlacementStatus indicating if the placement policy is
+   *         met or not. Not this only considers the rack count and not the
+   *         number of replicas.
+   */
+  @Override

Review comment:
       This method is almost a duplicate of the one in SCMCommonPlacementPolicy. My intention of that implementation was that the method getRequiredRackCount() should be overridden in the sub-classes, and hopefully then this method does not need to be overridden. However getRequiredRackCount() does not work for EC, as we need to pass the number of replicas to it.
   
   I think it would be better, if we changed `SCMCommonPlacementPolicy.getRequiredRackCount()` to accept a parameter `int numReplicas`, and then the sub-classes can ignore or use that. That means in this class, the implementation would be:
   
   ```
   @Override
   protected int getRequiredRackCount(int expectedReplicas) {
       return expectedReplicas;
     }
   ```
   
   Do you think that makes sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r696386799



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,324 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      toChooseRacks.addAll(racks);
+      if (!skippedRacks.isEmpty()) {
+        toChooseRacks.removeAll(skippedRacks);
+        toChooseRacks.addAll(0, skippedRacks);
+        skippedRacks.clear();
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRacks.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRacks.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      for (Node rack : toChooseRacks) {
+        Node node = chooseNode(rack.getNetworkFullPath(), unavailableNodes,
+            metadataSizeRequired, dataSizeRequired);
+        if (node != null) {
+          chosenNodes.add(node);
+          mutableFavoredNodes.remove(node);
+          unavailableNodes.add(node);
+          nodesRequired--;
+          if (nodesRequired == 0) {
+            break;
+          }
+        } else {
+          // Store the skipped racks to check them first in next outer loop
+          skippedRacks.add(rack);
+        }
+      }
+      // Clear toChooseRacks for this loop
+      toChooseRacks.clear();
+
+      // If chosenNodes not changed, increase the retryCount
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        // If chosenNodes changed, reset the retryCount
+        retryCount = 0;
+      }
+    }
+    List<DatanodeDetails> results = Arrays.asList(

Review comment:
       Updated, the code does look cleaner.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-901128256


   I have committed the two pre-requisites to master. Now I need to merge master to the EC branch before you can rebase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r693907069



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,332 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      if (toChooseRacks.size() == 0) {
+        toChooseRacks.addAll(racks);
+        if (!skippedRacks.isEmpty()) {
+          toChooseRacks.removeAll(skippedRacks);
+          toChooseRacks.addAll(0, skippedRacks);
+          skippedRacks.clear();
+        }
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRacks.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRacks.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRacks) {
+        List<Node> nodes = ((InnerNode)rack).getNodes(2);

Review comment:
       Here we are getting the list of nodes on a rack to do two things:
   
   1. Check there are more than zero nodes on the rack.
   2. Get the affinity node.
   
   We later call `NetworkTopology.chooseRandom(...)` which the affinity node.
   
   Could we just set the scope to the rack we are working on, and set affinityNode to null and let NetworkTopology give us a node or null? This would save allocating a list for all the nodes on a rack for each rack we check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r693927095



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,332 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      if (toChooseRacks.size() == 0) {

Review comment:
       The only way for toChooseRacks.size() to equal zero, is if we don't have enough racks in the cluster to meet the goal.
   
   Lets say we have 10 racks and EC 3-2, so we need 5 nodes and ideally 5 racks. 6 racks are "bad" for some reason - they have zero nodes or all nodes are full.
   
   On the first pass of the outerloop, we will pick 4 nodes form 4 racks, and skip 6 racks.
   
   On the next pass, we will try the same skipped 6 again, as that is all which is left in `toChooseRacks`.
   
   On the next pass it will be the same, and eventually we will error.
   
   I think we need to refill toChooseRacks on each pass of the outloop, with the skipped racks first. That way:
   
   On the first pass we will get 4 nodes on 4 racks. 6 skipped racks
   
   On the next pass, we will give the skipped racks another try first, but they will all still fail, then we will pick a node from a remaining good rack. We have not met the goal, but we have fallen back as not enough good racks are available.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689982243



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {
+          Node affinityNode = ((InnerNode)rack).getNodes(2).get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRack.removeAll(chosenRacksInForLoop);
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        retryCount = 0;
+      }
+    }
+
+    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+  }
+
+  @Override
+  public DatanodeDetails chooseNode(List<DatanodeDetails> healthyNodes) {
+    return null;
+  }
+
+  /**
+   * Choose a datanode which meets the requirements. If there is no node which
+   * meets all the requirements, there is fallback chosen process depending on
+   * whether fallback is allowed when this class is instantiated.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to excluded. Can be null.
+   * @param affinityNode - the chosen nodes should be on the same rack as
+   *                    affinityNode. Can be null.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of chosen datanodes.
+   * @throws SCMException  SCMException
+   */
+  private Node chooseNode(List<Node> excludedNodes, Node affinityNode,
+      long metadataSizeRequired, long dataSizeRequired) throws SCMException {
+    int ancestorGen = RACK_LEVEL;
+    int maxRetry = MAX_RETRY;
+    List<String> excludedNodesForCapacity = null;
+    while(true) {
+      metrics.incrDatanodeChooseAttemptCount();
+      Node node = networkTopology.chooseRandom(NetConstants.ROOT,
+          excludedNodesForCapacity, excludedNodes, affinityNode, ancestorGen);
+      if (node == null) {
+        // cannot find the node which meets all constrains
+        LOG.warn("Failed to find the datanode for container. excludedNodes:" +
+            (excludedNodes == null ? "" : excludedNodes.toString()) +
+            ", affinityNode:" +
+            (affinityNode == null ? "" : affinityNode.getNetworkFullPath()));
+        return null;
+      }
+
+      DatanodeDetails datanodeDetails = (DatanodeDetails)node;
+      if (isValidNode(datanodeDetails, metadataSizeRequired,
+          dataSizeRequired)) {
+        return node;
+      }
+
+      maxRetry--;
+      if (maxRetry == 0) {
+        // avoid the infinite loop
+        String errMsg = "No satisfied datanode to meet the space constrains. "
+            + "metadatadata size required: " + metadataSizeRequired +
+            " data size required: " + dataSizeRequired;
+        LOG.info(errMsg);
+        return null;
+      }
+      if (excludedNodesForCapacity == null) {
+        excludedNodesForCapacity = new ArrayList<>();
+      }
+      excludedNodesForCapacity.add(node.getNetworkFullPath());
+    }
+  }
+
+  /**
+   * This default implementation handles rack aware policies and non rack
+   * aware policies. If a future placement policy needs to check more than racks
+   * to validate the policy (eg node groups, HDFS like upgrade domain) this
+   * method should be overridden in the sub class.
+   * This method requires that subclasses which implement rack aware policies
+   * override the default method getRequiredRackCount and getNetworkTopology.
+   * @param dns List of datanodes holding a replica of the container
+   * @param replicas The expected number of replicas
+   * @return ContainerPlacementStatus indicating if the placement policy is
+   *         met or not. Not this only considers the rack count and not the
+   *         number of replicas.
+   */
+  @Override

Review comment:
       Sure, I will raise another ticket to add the changes to master.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689998900



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);

Review comment:
       I think if all nodes in each rack are available, the outer loop would be used mostly for situations like "10 replicas and 2 racks" (numReplicas > numRacks), so that for each loop we can get 2 nodes.
   
   For the first problem, I was thinking we can add a list to store the racks not used in this loop, and put them at the head of toChooseRack so that they will be easier chosen.
   Another thing is we can enlarge the RETRY_COUNT of inner loop, so that the first problem is more likely to be eliminated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-899577613


   @sodonnel Thanks for the detailed review. Will solve each question in the next update.
   
   > One question - lets say we have a cluster with 3 racks and EC-3-2 needs 5 racks. Will the logic here handle that, and allocate something like below?
   Rack1 - replica_1, replica_4
   Rack2 - replica_2, replica_5
   Rack3 - replica_3
   
   The result of the current implementation would be so. I think a shuffle would be good then the result should be different.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689621506



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {

Review comment:
       I'm not sure if there is a priority in the favoredNodes, say the favoredNodes is like [MostFavoredNode, NormalFavoredNode, lessFavoredButStillFavoredNode].




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r693949553



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,332 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      if (toChooseRacks.size() == 0) {

Review comment:
       Sorry. I missed where you always add to `chosenRacksInForLoop`.
   
   In that case, can we avoid adding to `chosenRacksInForLoop`, and also avoid the line `toChooseRacks.removeAll(chosenRacksInForLoop);` and just refill the `toChooseRacks` always on each pass?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689982886



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {

Review comment:
       Updated with a variable to store the result.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#issuecomment-901128256


   I have committed the two pre-requisites to master. Now I need to merge master to the EC branch before you can rebase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r696351712



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -205,8 +206,16 @@ public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
         retryCount = 0;
       }
     }
-
-    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+    List<DatanodeDetails> results = Arrays.asList(
+        chosenNodes.toArray(new DatanodeDetails[0]));
+    ContainerPlacementStatus placementStatus =
+        validateContainerPlacement(results, nodesRequired);
+    if (!placementStatus.isPolicySatisfied()) {
+      LOG.warn("ContainerPlacementPolicy not met, currentRacks is " +
+          placementStatus.actualPlacementCount() + ", desired racks is " +
+          placementStatus.expectedPlacementCount() + ".");

Review comment:
       Looks good, but can you change the log to use placeholders rather than concatenation:
   
   ```
         LOG.warn("ContainerPlacementPolicy not met, currentRacks is {}, desired racks is {},", placementStatus.actualPlacementCount(), placementStatus.expectedPlacementCount());
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r693929025



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,332 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  // OUTER_LOOP is to avoid infinity choose on all racks
+  private static final int OUTER_LOOP_MAX_RETRY= 3;
+  // INNER_LOOP is to choose node in each rack
+  private static final int INNER_LOOP_MAX_RETRY= 5;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+      Collections.shuffle(mutableFavoredNodes);
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRacks = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    Set<Node> skippedRacks = new HashSet<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > OUTER_LOOP_MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      // Refill toChooseRacks, we put skippedRacks in front of toChooseRacks
+      // for a even distribution
+      if (toChooseRacks.size() == 0) {
+        toChooseRacks.addAll(racks);
+        if (!skippedRacks.isEmpty()) {
+          toChooseRacks.removeAll(skippedRacks);
+          toChooseRacks.addAll(0, skippedRacks);
+          skippedRacks.clear();
+        }
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRacks.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRacks.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRacks) {
+        List<Node> nodes = ((InnerNode)rack).getNodes(2);
+        if (nodes.size() > 0) {
+          Node affinityNode = nodes.get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        } else {
+          // Store the skipped racks to check them first in next outer loop
+          skippedRacks.add(rack);
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRacks.removeAll(chosenRacksInForLoop);

Review comment:
       If we refill toChooseRacks on each pass, we can avoid this removeAll line.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2497: HDDS-5326. EC: Create a new as many racks as possible placement policy for EC

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2497:
URL: https://github.com/apache/ozone/pull/2497#discussion_r689982502



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java
##########
@@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.scm.container.placement.algorithms;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.scm.ContainerPlacementStatus;
+import org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.net.InnerNode;
+import org.apache.hadoop.hdds.scm.net.NetConstants;
+import org.apache.hadoop.hdds.scm.net.NetworkTopology;
+import org.apache.hadoop.hdds.scm.net.Node;
+import org.apache.hadoop.hdds.scm.node.DatanodeInfo;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Container placement policy that scatter datanodes on different racks
+ * , together with the space to satisfy the size constraints.
+ * <p>
+ * This placement policy will try to distribute datanodes on as many racks as
+ * possible.
+ * <p>
+ * This implementation applies to network topology like "/rack/node". Don't
+ * recommend to use this if the network topology has more layers.
+ * <p>
+ */
+public final class SCMContainerPlacementRackScatter
+    extends SCMCommonPlacementPolicy {
+  @VisibleForTesting
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMContainerPlacementRackScatter.class);
+  private final NetworkTopology networkTopology;
+  private static final int RACK_LEVEL = 1;
+  private static final int MAX_RETRY= 3;
+  private final SCMContainerPlacementMetrics metrics;
+
+  /**
+   * Constructs a Container Placement with rack awareness.
+   *
+   * @param nodeManager Node Manager
+   * @param conf Configuration
+   */
+  public SCMContainerPlacementRackScatter(final NodeManager nodeManager,
+      final ConfigurationSource conf, final NetworkTopology networkTopology,
+      boolean fallback, final SCMContainerPlacementMetrics metrics) {
+    super(nodeManager, conf);
+    this.networkTopology = networkTopology;
+    this.metrics = metrics;
+  }
+
+  /**
+   * Called by SCM to choose datanodes.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to exclude.
+   * @param favoredNodes - list of nodes preferred. This is a hint to the
+   *                     allocator, whether the favored nodes will be used
+   *                     depends on whether the nodes meets the allocator's
+   *                     requirement.
+   * @param nodesRequired - number of datanodes required.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of datanodes.
+   * @throws SCMException  SCMException
+   */
+  @Override
+  public List<DatanodeDetails> chooseDatanodes(
+      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes,
+      int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
+      throws SCMException {
+    Preconditions.checkArgument(nodesRequired > 0);
+    metrics.incrDatanodeRequestCount(nodesRequired);
+    int datanodeCount = networkTopology.getNumOfLeafNode(NetConstants.ROOT);
+    int excludedNodesCount = excludedNodes == null ? 0 : excludedNodes.size();
+    if (datanodeCount < nodesRequired + excludedNodesCount) {
+      throw new SCMException("No enough datanodes to choose. " +
+          "TotalNode = " + datanodeCount +
+          " RequiredNode = " + nodesRequired +
+          " ExcludedNode = " + excludedNodesCount, null);
+    }
+
+    List<DatanodeDetails> mutableFavoredNodes = new ArrayList<>();
+    if (favoredNodes != null) {
+      // Generate mutableFavoredNodes, only stores valid favoredNodes
+      for (DatanodeDetails datanodeDetails : favoredNodes) {
+        if (isValidNode(datanodeDetails, metadataSizeRequired,
+            dataSizeRequired)) {
+          mutableFavoredNodes.add(datanodeDetails);
+        }
+      }
+    }
+    if (excludedNodes != null) {
+      mutableFavoredNodes.removeAll(excludedNodes);
+    }
+
+    // For excluded nodes, we sort their racks at rear
+    List<Node> racks = getAllRacks();
+    if (excludedNodes != null && excludedNodes.size() > 0) {
+      racks = sortRackWithExcludedNodes(racks, excludedNodes);
+    }
+
+    List<Node> toChooseRack = new LinkedList<>(racks);
+    List<Node> chosenNodes = new ArrayList<>();
+    List<Node> unavailableNodes = new ArrayList<>();
+    if (excludedNodes != null) {
+      unavailableNodes.addAll(excludedNodes);
+    }
+
+    // If the result doesn't change after retryCount, we return with exception
+    int retryCount = 0;
+    while (nodesRequired > 0) {
+      if (retryCount > MAX_RETRY) {
+        throw new SCMException("No satisfied datanode to meet the" +
+            " excludedNodes and affinityNode constrains.", null);
+      }
+      int chosenListSize = chosenNodes.size();
+
+      if (toChooseRack.size() == 0) {
+        toChooseRack.addAll(racks);
+      }
+
+      if (mutableFavoredNodes.size() > 0) {
+        List<Node> chosenFavoredNodesInForLoop = new ArrayList<>();
+        for (DatanodeDetails favoredNode : mutableFavoredNodes) {
+          Node curRack = getRackOfDatanodeDetails(favoredNode);
+          if (toChooseRack.contains(curRack)) {
+            chosenNodes.add(favoredNode);
+            toChooseRack.remove(curRack);
+            chosenFavoredNodesInForLoop.add(favoredNode);
+            unavailableNodes.add(favoredNode);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        mutableFavoredNodes.removeAll(chosenFavoredNodesInForLoop);
+      }
+
+      // If satisfied by favored nodes, return then.
+      if (nodesRequired == 0) {
+        break;
+      }
+
+      List<Node> chosenRacksInForLoop = new ArrayList<>();
+      for (Node rack : toChooseRack) {
+        if (((InnerNode)rack).getNodes(2).size() > 0) {
+          Node affinityNode = ((InnerNode)rack).getNodes(2).get(0);
+          Node node = chooseNode(unavailableNodes, affinityNode,
+              metadataSizeRequired, dataSizeRequired);
+          if (node != null) {
+            chosenNodes.add(node);
+            mutableFavoredNodes.remove(node);
+            unavailableNodes.add(node);
+            nodesRequired--;
+            if (nodesRequired == 0) {
+              break;
+            }
+          }
+        }
+        chosenRacksInForLoop.add(rack);
+      }
+      toChooseRack.removeAll(chosenRacksInForLoop);
+      if (chosenListSize == chosenNodes.size()) {
+        retryCount++;
+      } else {
+        retryCount = 0;
+      }
+    }
+
+    return Arrays.asList(chosenNodes.toArray(new DatanodeDetails[0]));
+  }
+
+  @Override
+  public DatanodeDetails chooseNode(List<DatanodeDetails> healthyNodes) {
+    return null;
+  }
+
+  /**
+   * Choose a datanode which meets the requirements. If there is no node which
+   * meets all the requirements, there is fallback chosen process depending on
+   * whether fallback is allowed when this class is instantiated.
+   *
+   *
+   * @param excludedNodes - list of the datanodes to excluded. Can be null.
+   * @param affinityNode - the chosen nodes should be on the same rack as
+   *                    affinityNode. Can be null.
+   * @param dataSizeRequired - size required for the container.
+   * @param metadataSizeRequired - size required for Ratis metadata.
+   * @return List of chosen datanodes.
+   * @throws SCMException  SCMException
+   */
+  private Node chooseNode(List<Node> excludedNodes, Node affinityNode,
+      long metadataSizeRequired, long dataSizeRequired) throws SCMException {
+    int ancestorGen = RACK_LEVEL;
+    int maxRetry = MAX_RETRY;
+    List<String> excludedNodesForCapacity = null;
+    while(true) {
+      metrics.incrDatanodeChooseAttemptCount();
+      Node node = networkTopology.chooseRandom(NetConstants.ROOT,
+          excludedNodesForCapacity, excludedNodes, affinityNode, ancestorGen);
+      if (node == null) {
+        // cannot find the node which meets all constrains
+        LOG.warn("Failed to find the datanode for container. excludedNodes:" +
+            (excludedNodes == null ? "" : excludedNodes.toString()) +
+            ", affinityNode:" +
+            (affinityNode == null ? "" : affinityNode.getNetworkFullPath()));
+        return null;
+      }
+
+      DatanodeDetails datanodeDetails = (DatanodeDetails)node;
+      if (isValidNode(datanodeDetails, metadataSizeRequired,
+          dataSizeRequired)) {
+        return node;
+      }
+
+      maxRetry--;
+      if (maxRetry == 0) {
+        // avoid the infinite loop
+        String errMsg = "No satisfied datanode to meet the space constrains. "
+            + "metadatadata size required: " + metadataSizeRequired +
+            " data size required: " + dataSizeRequired;
+        LOG.info(errMsg);
+        return null;
+      }
+      if (excludedNodesForCapacity == null) {
+        excludedNodesForCapacity = new ArrayList<>();
+      }
+      excludedNodesForCapacity.add(node.getNetworkFullPath());
+    }
+  }
+
+  /**
+   * This default implementation handles rack aware policies and non rack
+   * aware policies. If a future placement policy needs to check more than racks
+   * to validate the policy (eg node groups, HDFS like upgrade domain) this
+   * method should be overridden in the sub class.
+   * This method requires that subclasses which implement rack aware policies
+   * override the default method getRequiredRackCount and getNetworkTopology.
+   * @param dns List of datanodes holding a replica of the container
+   * @param replicas The expected number of replicas
+   * @return ContainerPlacementStatus indicating if the placement policy is
+   *         met or not. Not this only considers the rack count and not the
+   *         number of replicas.
+   */
+  @Override
+  public ContainerPlacementStatus validateContainerPlacement(
+      List<DatanodeDetails> dns, int replicas) {
+    int requiredRacks = replicas;
+    if (networkTopology == null || replicas == 1 || requiredRacks == 1) {
+      return super.validateContainerPlacement(dns, replicas);
+    }
+    // We have a network topology so calculate if it is satisfied or not.
+    int numRacks = 1;
+    final int maxLevel = networkTopology.getMaxLevel();
+    // The leaf nodes are all at max level, so the number of nodes at
+    // leafLevel - 1 is the rack count
+    numRacks = networkTopology.getNumOfNodes(maxLevel - 1);
+    final long currentRackCount = dns.stream()
+        .map(d -> networkTopology.getAncestor(d, 1))
+        .distinct()
+        .count();
+
+    if (replicas > numRacks) {
+      requiredRacks = numRacks;
+    }
+    return new ContainerPlacementStatusDefault(
+        (int)currentRackCount, requiredRacks, numRacks);
+  }
+
+  private Node getRackOfDatanodeDetails(DatanodeDetails datanodeDetails) {
+    String location = datanodeDetails.getNetworkLocation();
+    return networkTopology.getNode(location);
+  }
+
+  private List<Node> sortRackWithExcludedNodes(List<Node> racks,
+      List<DatanodeDetails> excludedNodes) {
+    Set<Node> lessPreferedRacks = excludedNodes.stream()
+        .map(node -> networkTopology.getAncestor(node, RACK_LEVEL))
+        .collect(Collectors.toSet());
+    List <Node> result = new ArrayList<>();
+    for (Node rack : racks) {
+      if (!lessPreferedRacks.contains(rack)) {
+        result.add(rack);
+      }
+    }
+    result.addAll(lessPreferedRacks);
+    return result;
+  }
+
+  private List<Node> getAllRacks() {
+    int rackLevel = networkTopology.getMaxLevel()-1;
+    return networkTopology.getNodes(rackLevel);
+  }
+
+  private boolean isValidNode(DatanodeDetails datanodeDetails,

Review comment:
       Will add these changes to master.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org