You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Usman Waheed <us...@opera.com> on 2009/04/27 10:36:29 UTC

Balancing datanodes - Running hadoop 0.18.3

Hi,
I had sent out an email yesterday asking about how to balance the 
cluster after setting the replication level to 2. I have 4 datanodes and 
one namenode in my setup.
Using the -R switch with -setrep did the trick but one of my nodes 
became under utilized. I then ran hadoop balancer and it did help but 
upto a certain extent.

Datanode 4 noted below is now up to almost 5% but when i try to balance 
the datanode again using the "hadoop balance" command it says that the 
cluster is already balanced which isnt.
I wonder if there is an alternate way(s) or maybe overtime Datanode-4 
will pick up more blocks?

Any clues?

Thanks,
Usman

Name: 1
State          : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 222235858599(206.97 GB)
Used raw bytes: 48140136448 (44.83 GB)
% used: 16.39%
Last contact: Mon Apr 27 08:34:46 UTC 2009


Name: 2
State          : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 231235100994(215.35 GB)
Used raw bytes: 40704245760 (37.91 GB)
% used: 13.86%
Last contact: Mon Apr 27 08:34:45 UTC 2009


Name: 3
State          : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 211936026161(197.38 GB)
Used raw bytes: 59591700480 (55.5 GB)
% used: 20.28%
Last contact: Mon Apr 27 08:34:45 UTC 2009


*Name: 4
*State          : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 258876991693(241.1 GB)
Used raw bytes: 12142653440 (11.31 GB)
% used: 4.13%
Last contact: Mon Apr 27 08:34:46 UTC 2009


Re: Balancing datanodes - Running hadoop 0.18.3

Posted by Usman Waheed <us...@opera.com>.
Hi Tamir,

Thanks for the info, makes sense now :).

Cheers,
Usman
> Hi,
>
> The balancer works with the average utilization of all the nodes in the
> cluster - in your case it's about 13%. Only nodes that are +/- 10% off the
> average will be rebalanced. Node 4 isn't under-utilized because 13-10=3
> which is less than 4%. You can use a different threshold than the default
> 10% (hadoop balancer -threshold 5). Read more here:
> http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer
>
> Tamir
>
>
> On Mon, Apr 27, 2009 at 11:36 AM, Usman Waheed <us...@opera.com> wrote:
>
>   
>> Hi,
>> I had sent out an email yesterday asking about how to balance the cluster
>> after setting the replication level to 2. I have 4 datanodes and one
>> namenode in my setup.
>> Using the -R switch with -setrep did the trick but one of my nodes became
>> under utilized. I then ran hadoop balancer and it did help but upto a
>> certain extent.
>>
>> Datanode 4 noted below is now up to almost 5% but when i try to balance the
>> datanode again using the "hadoop balance" command it says that the cluster
>> is already balanced which isnt.
>> I wonder if there is an alternate way(s) or maybe overtime Datanode-4 will
>> pick up more blocks?
>>
>> Any clues?
>>
>> Thanks,
>> Usman
>>
>> Name: 1
>> State          : In Service
>> Total raw bytes: 293778976768 (273.6 GB)
>> Remaining raw bytes: 222235858599(206.97 GB)
>> Used raw bytes: 48140136448 (44.83 GB)
>> % used: 16.39%
>> Last contact: Mon Apr 27 08:34:46 UTC 2009
>>
>>
>> Name: 2
>> State          : In Service
>> Total raw bytes: 293778976768 (273.6 GB)
>> Remaining raw bytes: 231235100994(215.35 GB)
>> Used raw bytes: 40704245760 (37.91 GB)
>> % used: 13.86%
>> Last contact: Mon Apr 27 08:34:45 UTC 2009
>>
>>
>> Name: 3
>> State          : In Service
>> Total raw bytes: 293778976768 (273.6 GB)
>> Remaining raw bytes: 211936026161(197.38 GB)
>> Used raw bytes: 59591700480 (55.5 GB)
>> % used: 20.28%
>> Last contact: Mon Apr 27 08:34:45 UTC 2009
>>
>>
>> *Name: 4
>> *State          : In Service
>> Total raw bytes: 293778976768 (273.6 GB)
>> Remaining raw bytes: 258876991693(241.1 GB)
>> Used raw bytes: 12142653440 (11.31 GB)
>> % used: 4.13%
>> Last contact: Mon Apr 27 08:34:46 UTC 2009
>>
>>
>>     
>
>   


Re: Balancing datanodes - Running hadoop 0.18.3

Posted by Tamir Kamara <ta...@gmail.com>.
Hi,

The balancer works with the average utilization of all the nodes in the
cluster - in your case it's about 13%. Only nodes that are +/- 10% off the
average will be rebalanced. Node 4 isn't under-utilized because 13-10=3
which is less than 4%. You can use a different threshold than the default
10% (hadoop balancer -threshold 5). Read more here:
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer

Tamir


On Mon, Apr 27, 2009 at 11:36 AM, Usman Waheed <us...@opera.com> wrote:

> Hi,
> I had sent out an email yesterday asking about how to balance the cluster
> after setting the replication level to 2. I have 4 datanodes and one
> namenode in my setup.
> Using the -R switch with -setrep did the trick but one of my nodes became
> under utilized. I then ran hadoop balancer and it did help but upto a
> certain extent.
>
> Datanode 4 noted below is now up to almost 5% but when i try to balance the
> datanode again using the "hadoop balance" command it says that the cluster
> is already balanced which isnt.
> I wonder if there is an alternate way(s) or maybe overtime Datanode-4 will
> pick up more blocks?
>
> Any clues?
>
> Thanks,
> Usman
>
> Name: 1
> State          : In Service
> Total raw bytes: 293778976768 (273.6 GB)
> Remaining raw bytes: 222235858599(206.97 GB)
> Used raw bytes: 48140136448 (44.83 GB)
> % used: 16.39%
> Last contact: Mon Apr 27 08:34:46 UTC 2009
>
>
> Name: 2
> State          : In Service
> Total raw bytes: 293778976768 (273.6 GB)
> Remaining raw bytes: 231235100994(215.35 GB)
> Used raw bytes: 40704245760 (37.91 GB)
> % used: 13.86%
> Last contact: Mon Apr 27 08:34:45 UTC 2009
>
>
> Name: 3
> State          : In Service
> Total raw bytes: 293778976768 (273.6 GB)
> Remaining raw bytes: 211936026161(197.38 GB)
> Used raw bytes: 59591700480 (55.5 GB)
> % used: 20.28%
> Last contact: Mon Apr 27 08:34:45 UTC 2009
>
>
> *Name: 4
> *State          : In Service
> Total raw bytes: 293778976768 (273.6 GB)
> Remaining raw bytes: 258876991693(241.1 GB)
> Used raw bytes: 12142653440 (11.31 GB)
> % used: 4.13%
> Last contact: Mon Apr 27 08:34:46 UTC 2009
>
>