You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "He Tianyi (JIRA)" <ji...@apache.org> on 2016/04/14 10:53:25 UTC

[jira] [Created] (HDFS-10290) Move getBlocks calls to DataNode in Balancer

He Tianyi created HDFS-10290:
--------------------------------

             Summary: Move getBlocks calls to DataNode in Balancer
                 Key: HDFS-10290
                 URL: https://issues.apache.org/jira/browse/HDFS-10290
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: balancer & mover
    Affects Versions: 2.6.0
            Reporter: He Tianyi


In current implementation, Balancer asks NameNode for a list of blocks on specific DataNode. This made workload of NameNode heavier, and actually it caused NameNode flappy when average # of blocks on each DataNode reaches 1,000,000 (NameNode heap size is 192GB, cpu: Xeon E5-2630 * 2).

Recently I investigated whether {{getBlocks}} invocation from Balancer can be handled by DataNodes, turned out to be practical. 
The only pitfall is: since DataNode has no information about other locations of each block it possesses, some block move may fail (since target node may already has a replica of that particular block).

I think this may be beneficial for large clusters.

Any suggestions or comments?
Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)