You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "David J. O'Dell" <do...@videoegg.com> on 2008/07/18 18:56:14 UTC

Timeouts when running balancer

I'm trying to re balance my cluster as I've added to more nodes.
When I run balancer with the default threshold I am seeing timeouts in
the logs:

2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Decided to
move block -8432927406854991437 with a length of 128 MB bytes from
10.11.6.234:50010 to 10.11.6.235:50010 using proxy source 10.11.6.234:50010
2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Starting
Block mover for -8432927406854991437 from 10.11.6.234:50010 to
10.11.6.235:50010
2008-07-18 09:52:46,826 WARN org.apache.hadoop.dfs.Balancer: Timeout
moving block -8432927406854991437 from 10.11.6.234:50010 to
10.11.6.235:50010 through 10.11.6.234:50010

I read in the balancer guide->
http://issues.apache.org/jira/secure/attachment/12370966/BalancerUserGuide2
That the default transfer rate is 1mb/sec
I tried increasing this to 1gb/sec but I'm still seeing the timeouts.
All of the nodes have gigE nics and are on the same switch.


-- 
David O'Dell
Director, Operations
e: dodell@videoegg.com
t:  (415) 738-5152
180 Townsend St., Third Floor
San Francisco, CA 94107

Re: Timeouts when running balancer

Posted by "David J. O'Dell" <do...@videoegg.com>.

You are correct.
The default 1mb/sec is too low.
1gb/sec is too high.
I changed it to 10mb/sec and its humming along.
Thanks.


Taeho Kang wrote:
> By setting "dfs.balance.bandwidthPerSec" to 1GB/sec, each datanode is able
> to utilize up to 1GB/sec for block balancing. It seems to be too high as
> even a gigabit ethernet can't handle that much data per sec.
>
> When you get timeouts, it probably means your network is saturated. Maybe
> you were running a big map reduce job which required lots of data transfer
> among nodes by then?
>
> Try setting it to be 10~30MB/sec and see what happens.
>
> On Sat, Jul 19, 2008 at 1:56 AM, David J. O'Dell <do...@videoegg.com>
> wrote:
>
>   
>> I'm trying to re balance my cluster as I've added to more nodes.
>> When I run balancer with the default threshold I am seeing timeouts in
>> the logs:
>>
>> 2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Decided to
>> move block -8432927406854991437 with a length of 128 MB bytes from
>> 10.11.6.234:50010 to 10.11.6.235:50010 using proxy source
>> 10.11.6.234:50010
>> 2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Starting
>> Block mover for -8432927406854991437 from 10.11.6.234:50010 to
>> 10.11.6.235:50010
>> 2008-07-18 09:52:46,826 WARN org.apache.hadoop.dfs.Balancer: Timeout
>> moving block -8432927406854991437 from 10.11.6.234:50010 to
>> 10.11.6.235:50010 through 10.11.6.234:50010
>>
>> I read in the balancer guide->
>> http://issues.apache.org/jira/secure/attachment/12370966/BalancerUserGuide2
>> That the default transfer rate is 1mb/sec
>> I tried increasing this to 1gb/sec but I'm still seeing the timeouts.
>> All of the nodes have gigE nics and are on the same switch.
>>
>>
>> --
>> David O'Dell
>> Director, Operations
>> e: dodell@videoegg.com
>> t:  (415) 738-5152
>> 180 Townsend St., Third Floor
>> San Francisco, CA 94107
>>
>>
>>     

-- 
David O'Dell
Director, Operations
e: dodell@videoegg.com
t:  (415) 738-5152
180 Townsend St., Third Floor
San Francisco, CA 94107

Re: Timeouts when running balancer

Posted by Taeho Kang <tk...@gmail.com>.

By setting "dfs.balance.bandwidthPerSec" to 1GB/sec, each datanode is able
to utilize up to 1GB/sec for block balancing. It seems to be too high as
even a gigabit ethernet can't handle that much data per sec.

When you get timeouts, it probably means your network is saturated. Maybe
you were running a big map reduce job which required lots of data transfer
among nodes by then?

Try setting it to be 10~30MB/sec and see what happens.

On Sat, Jul 19, 2008 at 1:56 AM, David J. O'Dell <do...@videoegg.com>
wrote:

> I'm trying to re balance my cluster as I've added to more nodes.
> When I run balancer with the default threshold I am seeing timeouts in
> the logs:
>
> 2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Decided to
> move block -8432927406854991437 with a length of 128 MB bytes from
> 10.11.6.234:50010 to 10.11.6.235:50010 using proxy source
> 10.11.6.234:50010
> 2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Starting
> Block mover for -8432927406854991437 from 10.11.6.234:50010 to
> 10.11.6.235:50010
> 2008-07-18 09:52:46,826 WARN org.apache.hadoop.dfs.Balancer: Timeout
> moving block -8432927406854991437 from 10.11.6.234:50010 to
> 10.11.6.235:50010 through 10.11.6.234:50010
>
> I read in the balancer guide->
> http://issues.apache.org/jira/secure/attachment/12370966/BalancerUserGuide2
> That the default transfer rate is 1mb/sec
> I tried increasing this to 1gb/sec but I'm still seeing the timeouts.
> All of the nodes have gigE nics and are on the same switch.
>
>
> --
> David O'Dell
> Director, Operations
> e: dodell@videoegg.com
> t:  (415) 738-5152
> 180 Townsend St., Third Floor
> San Francisco, CA 94107
>
>