You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2016/03/24 18:05:25 UTC

[jira] [Created] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

Ming Ma created HDFS-10206:
------------------------------

             Summary: getBlockLocations might not sort datanodes properly by distance
                 Key: HDFS-10206
                 URL: https://issues.apache.org/jira/browse/HDFS-10206
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Ming Ma


If the DFSClient machine is not a datanode, but it shares its rack with some datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} might not put the local-rack datanodes at the beginning of the sorted list. That is because the function didn't call {{networktopology.add(client);}} to properly set the node's parent node; something required by {{networktopology.sortByDistance}} to compute distance between two nodes in the same topology tree.

Another issue with {{networktopology.sortByDistance}} is it only distinguishes local rack from remote rack, but it doesn't support general distance calculation to tell how remote the rack is.

{noformat}
NetworkTopology.java
  protected int getWeight(Node reader, Node node) {
    // 0 is local, 1 is same rack, 2 is off rack
    // Start off by initializing to off rack
    int weight = 2;
    if (reader != null) {
      if (reader.equals(node)) {
        weight = 0;
      } else if (isOnSameRack(reader, node)) {
        weight = 1;
      }
    }
    return weight;
  }
{noformat}

HDFS-10203 has suggested moving the sorting from namenode to DFSClient to address another issue. Regardless of where we do the sorting, we still fix the issues outline here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)