You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Yogesh Darji (JIRA)" <ji...@apache.org> on 2017/04/25 20:32:04 UTC
[jira] [Commented] (HADOOP-1652) Rebalance data blocks when new data nodes added or data nodes become full

    [ https://issues.apache.org/jira/browse/HADOOP-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983566#comment-15983566 ] 

Yogesh Darji commented on HADOOP-1652:
--------------------------------------

Guys, I am using Hadoop 2.7.3,  my replication factor is 1. My DN1 has used 1.13GB and my DN2 has used 1.89GB, and both have a capacity of 17.72GB, can you please tell me why it is not balancing automatically when I do 

hdfs balancer -threshold 10

to 10% default threshold? I have been breaking my head all day, but didn't seem to work. I treid threhold value as 2, it worked, now 
DN1: 1.51GB
DN2: 1.51GB

Can someone please help me with some maths here? What's happening with 10% threshold, why isn't it working?

Thank you so much in advance.

> Rebalance data blocks when new data nodes added or data nodes become full
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-1652
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1652
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.13.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.16.0
>
>         Attachments: balancer1.patch, balancer2.patch, balancer3.patch, balancer4.patch, balancer5.patch, balancer6.patch, balancer7.patch, balancer8.patch, BalancerAdminGuide1.pdf, BalancerAdminGuide.pdf, Balancer.html, balancer.patch, BalancerUserGuide2.pdf, RebalanceDesign4.pdf, RebalanceDesign5.pdf, RebalanceDesign6.pdf
>
>
> When a new data node joins hdfs cluster, it does not hold much data. So any map task assigned to the machine most likely does not read local data, thus increasing the use of network bandwidth. On the other hand, when some data nodes become full, new data blocks are placed on only non-full data nodes, thus reducing their read parallelism. 
> This jira aims to find an approach to redistribute data blocks when imbalance occurs in the cluster.  An solution should meet the following requirements:
> 1. It maintains data availablility guranteens in the sense that rebalancing does not reduce the number of replicas that a block has or the number of racks that the block resides.
> 2. An adminstrator should be able to invoke and interrupt rebalancing from a command line.
> 3. Rebalancing should be throttled so that rebalancing does not cause a namenode to be too busy to serve any incoming request or saturate the network.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org