You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Mingliang Liu <ml...@hortonworks.com> on 2016/08/11 19:05:28 UTC

Re: HDFS Balancer - Recommended Threshold Value

This email thread should go to user@ which is for end-user questions and discussion, instead of hdfs-dev@

My 2 cents:
> The original design of Balancer is intentionally to make it run slowly so that the balancing activities won't affect the normal cluster activities and the running jobs.
The limit of maximum size of data that the Balancer will move between a chosen datanode pair is 10GB. This is not configurable however in 2.4 stack. Please refer to https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html

Thanks.

L

On Aug 11, 2016, at 3:21 AM, Senthil Kumar <se...@gmail.com>> wrote:

Thanks Lars for your quick response!

Here is my Cluster Utilization..
DFS Used% : 74.39%
DFS Remaining% : 25.60%


Block Pool Used% : 74.39%
DataNodes usages : Min % Median % Max % stdev %
1.25% 99.72% 99.99% 22.53%
Hadoop Version : *2.4.1*

Let's take an example :

Cluster  Live Nodes           :    1000
Capacity Used 95-99%      :      700
Capacity Used 90 -95 %    :       50
Capacity Used  < 90 %     :      250

I'm looking for an option to balance the data quickly from the nodes
category 90-95% to < 90% nodes category.. I know there is an option like
-include  & -exclude but it's not helping me ( or am i not using it
effectively ??  Pls advise here how to use these options properly if i want
to balance my cluster as described above ) .

Is there any option like --force-balance ( include two other inputs like
force-balance-source-hosts(file) & force-balance-dest-hosts(file) ).. this
way i believe we can achieve balancing in urgency mode when you have 90% of
nodes hitting  99% disk usage or when we have median 95% and above .. Pls
add your thoughts here ..


Here is the code that constructs the NW Topology by categorizing like
over-utilized , avg utilized and under-utilized .. Sometimes i could see
nodes with 70% of usage also comes under over-utilized ( tried with
threshold 10 - 30 ) . Correct me if anything wrong in my understanding.

https://github.com/apache/hadoop/tree/release-2.4.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer

*/*create network topology and all data node lists: *
*     * overloaded, above-average, below-average, and underloaded*
*     * we alternates the accessing of the given datanodes array either by*
*     * an increasing order or a decreasing order.*
*     */  *
*    long overLoadedBytes = 0L, underLoadedBytes = 0L;*
*    for (DatanodeInfo datanode : DFSUtil.shuffle(datanodes)) {*
*      if (datanode.isDecommissioned() ||
datanode.isDecommissionInProgress()) {*
*        continue; // ignore decommissioning or decommissioned nodes*
*      }*
*      cluster.add(datanode);*
*      BalancerDatanode datanodeS;*
*      final double avg = policy.getAvgUtilization();*
*      if (policy.getUtilization(datanode) > avg) {*
*        datanodeS = new Source(datanode, policy, threshold);*
*        if (isAboveAvgUtilized(datanodeS)) {*
*          this.aboveAvgUtilizedDatanodes.add((Source)datanodeS);*
*        } else {*
*          assert(isOverUtilized(datanodeS)) :*
*            datanodeS.getDisplayName()+ "is not an overUtilized node";*
*          this.overUtilizedDatanodes.add((Source)datanodeS);*
*          overLoadedBytes += (long)((datanodeS.utilization-avg*
*              -threshold)*datanodeS.datanode.getCapacity()/100.0);*
*        }*
*      } else {*
*        datanodeS = new BalancerDatanode(datanode, policy, threshold);*
*        if ( isBelowOrEqualAvgUtilized(datanodeS)) {*
*          this.belowAvgUtilizedDatanodes.add(datanodeS);*
*        } else {*
*          assert isUnderUtilized(datanodeS) : "isUnderUtilized("*
*              + datanodeS.getDisplayName() + ")=" +
isUnderUtilized(datanodeS)*
*              + ", utilization=" + datanodeS.utilization; *
*          this.underUtilizedDatanodes.add(datanodeS);*
*          underLoadedBytes += (long)((avg-threshold-*
*
datanodeS.utilization)*datanodeS.datanode.getCapacity()/100.0);*
*        }*
*      }*
*      datanodeMap.put(datanode.getDatanodeUuid(), datanodeS);*
*    }*


Could someone help me here to understand the balancing policy and what are
the different parameters should i use to balance ( bring down median )
cluster ??

--Senthil

On Wed, Aug 10, 2016 at 8:21 PM, Lars Francke <la...@gmail.com>
wrote:

Hi Senthil,

I'm not sure I fully understand.

If you're using a threshold of 30 that means you have a range of 60% that
the balancer would consider to be okay.

Example: The used space divided by your total available space in the
cluster is 80% Then with a 30% threshold the balancer would try to bring
all nodes within the range of 50-100% utilisation.

The default threshold is 10% and that's a fairly huge range still
especially on clusters that are almost at capacity. So a threshold of 5 or
even lower might work for you.

What is your utilisation in the cluster (used space / available space)?

Cheers,
Lars

On Wed, Aug 10, 2016 at 3:27 PM, Senthil Kumar <se...@gmail.com>
wrote:

Hi Team ,  We are running big cluster ( 3000 nodes cluster ) , many time
we
are hitting  Median Increasing to 99.99 % ( 80 % of the DN's ) .  Balancer
is running all time in cluster ..But still  median is not coming down i.e
<
90 % ..

Here is how i start balancer ?
/apache/hadoop/sbin/start-balancer.sh
-Ddfs.balance.bandwidthPerSec=104857600  *-threshold  30*

What the recommended value for thershold ??  Is there any way to pass
param
only to move blocks from Over Utilized ( 98-100%) to under utilized ?


Pls advise!




Regards,
Senthil