You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by stephen mulcahy <st...@deri.org> on 2010/01/22 12:57:18 UTC

Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Hi,

I've been running some tests on some new hardware we have acquired.

As a baseline, I ran the Hadoop sort[1] with 10GB and 100GB of data. As 
an experiment, I ran it on 4 systems (1 configured as master+slave and 3 
as slaves) - first with an MTU of 1500 and then with an MTU of 9000.

I was somewhat surprised at the results of enabling Jumbo frames - it 
resulted in a slowdown. In the case of the write operations, the slow 
down was about 5%. In the case of 10GB sort, the slowdown was around 6% 
and in the case of the 100GB sort, the slowdown was nearly 20%.

Has anyone else done any testing of Hadoop with Jumbo frames? If so, 
have you seen similar results or is this a characteristic of my 
systems/network? Is there an obvious reason why a larger MTU would 
result in a slowdown in Hadoop?

Thanks for your thoughts,

-stephen

[1] http://wiki.apache.org/hadoop/Sort
-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Posted by Edward Capriolo <ed...@gmail.com>.
 a couple of people are working on some hadoop dfs optimizations
called rad fs. I heard mentioned some limitations that were not
optimized for jumbo frames. The conversation went quickly over my
head.

On 1/22/10, stephen mulcahy <st...@deri.org> wrote:
> Hi,
>
> I've been running some tests on some new hardware we have acquired.
>
> As a baseline, I ran the Hadoop sort[1] with 10GB and 100GB of data. As
> an experiment, I ran it on 4 systems (1 configured as master+slave and 3
> as slaves) - first with an MTU of 1500 and then with an MTU of 9000.
>
> I was somewhat surprised at the results of enabling Jumbo frames - it
> resulted in a slowdown. In the case of the write operations, the slow
> down was about 5%. In the case of 10GB sort, the slowdown was around 6%
> and in the case of the 100GB sort, the slowdown was nearly 20%.
>
> Has anyone else done any testing of Hadoop with Jumbo frames? If so,
> have you seen similar results or is this a characteristic of my
> systems/network? Is there an obvious reason why a larger MTU would
> result in a slowdown in Hadoop?
>
> Thanks for your thoughts,
>
> -stephen
>
> [1] http://wiki.apache.org/hadoop/Sort
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com
>

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Posted by "David B. Ritch" <da...@gmail.com>.
Have you looked at packet loss and retransmission rates?  When frames
are larger, a lost bit requires retransmission of more data.  It may be
that this happens under heavy load, such as when you're doing a large
transfer, but has less impact when testing individual file transfers.

David

On 1/29/2010 3:52 AM, stephen mulcahy wrote:
> Allen Wittenauer wrote:
>> We're working on a patch that monkeys with the TCP buffers because we're
>> seeing slow downs with big transfers as well.  It might be related...
>
> Kewl. Happy to test it out if our cluster hasn't moved into production
> at thate stage. Just drop me a mail.
>
> -stephen
>


Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Posted by stephen mulcahy <st...@deri.org>.
Allen Wittenauer wrote:
> We're working on a patch that monkeys with the TCP buffers because we're
> seeing slow downs with big transfers as well.  It might be related...

Kewl. Happy to test it out if our cluster hasn't moved into production 
at thate stage. Just drop me a mail.

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Posted by Allen Wittenauer <aw...@linkedin.com>.
We're working on a patch that monkeys with the TCP buffers because we're
seeing slow downs with big transfers as well.  It might be related...


On 1/28/10 9:25 AM, "stephen mulcahy" <st...@deri.org> wrote:

> Jay Booth wrote:
>> Did you set io.file.buffer.size (or whatever the property is) to a large
>> value?
> 
> Just re-ran the benchmark with that bumped to 65536 (as proposed in
> http://www.cloudera.com/blog/tag/configuration/). The benchmark is still
> slower with jumbo frames than without (but difference was reduced a
> little by bumping io.file.buffer.size.
> 
>> Also, even if you do set that, I'm not 100% sure that it will lead to use of
>> the jumboframes, I'd have to take a look through the code to be sure.
>>  Another issue is that you may lose any networking performance gains as a
>> result of increased buffer allocation costs and garbage collection.
> 
> I guess there must be something going on anyways - as there is clearly a
> performance drop-off, which surprised me, as lots of apps benefit
> significantly from jumbo frames.
> 
> Thanks for the feedback,
> 
> -stephen


Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Posted by stephen mulcahy <st...@deri.org>.
Jay Booth wrote:
> Did you set io.file.buffer.size (or whatever the property is) to a large
> value?

Just re-ran the benchmark with that bumped to 65536 (as proposed in 
http://www.cloudera.com/blog/tag/configuration/). The benchmark is still 
slower with jumbo frames than without (but difference was reduced a 
little by bumping io.file.buffer.size.

> Also, even if you do set that, I'm not 100% sure that it will lead to use of
> the jumboframes, I'd have to take a look through the code to be sure.
>  Another issue is that you may lose any networking performance gains as a
> result of increased buffer allocation costs and garbage collection.

I guess there must be something going on anyways - as there is clearly a 
performance drop-off, which surprised me, as lots of apps benefit 
significantly from jumbo frames.

Thanks for the feedback,

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Posted by Jay Booth <ja...@gmail.com>.
Did you set io.file.buffer.size (or whatever the property is) to a large
value?  If you didn't, then hadoop will explicitly create packets at the
default size, and you'll see zero utilization of the jumboframes.  (although
that would argue for a flat performance, no?  do jumbo frames have increased
overhead-per-packet that could come into play with smaller packets here?)

Also, even if you do set that, I'm not 100% sure that it will lead to use of
the jumboframes, I'd have to take a look through the code to be sure.
 Another issue is that you may lose any networking performance gains as a
result of increased buffer allocation costs and garbage collection.

Ed, I threw out the radfs experiment, I'm working on HDFS-918 now -- that's
what I was mentioning the other night as being possibly more conducive to
jumbo frames due to less buffer allocation on BlockSender instantiation.
 But I haven't benchmarked yet so that's all very much "in theory"

On Tue, Jan 26, 2010 at 1:12 PM, stephen mulcahy
<st...@deri.org>wrote:

> Eli Collins wrote:
>
>> I was somewhat surprised at the results of enabling Jumbo frames - it
>>> resulted in a slowdown. In the case of the write operations, the slow
>>> down
>>> was about 5%. In the case of 10GB sort, the slowdown was around 6% and in
>>> the case of the 100GB sort, the slowdown was nearly 20%.
>>>
>>
>> Are normal (non-hadoop) file transfers faster between the same hosts
>> with jumbo frames enabled? ie have your ruled out host/configuration
>> issues?
>>
>
> Yes - I did some testing on two different clusters we have - both with and
> without jumbo frames and in both cases, I saw a small but noticeable
> increase in the overall bandwidth when moving from an MTU of 1500 to an MTU
> of 9000.
>
> Has anyone else tested Hadoop performance with Jumbo frames? Are you seeing
> something different to what we're seeing?
>
> Thanks,
>
> -stephen
>
>
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com
>

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Posted by stephen mulcahy <st...@deri.org>.
Eli Collins wrote:
>> I was somewhat surprised at the results of enabling Jumbo frames - it
>> resulted in a slowdown. In the case of the write operations, the slow down
>> was about 5%. In the case of 10GB sort, the slowdown was around 6% and in
>> the case of the 100GB sort, the slowdown was nearly 20%.
> 
> Are normal (non-hadoop) file transfers faster between the same hosts
> with jumbo frames enabled? ie have your ruled out host/configuration
> issues?

Yes - I did some testing on two different clusters we have - both with 
and without jumbo frames and in both cases, I saw a small but noticeable 
increase in the overall bandwidth when moving from an MTU of 1500 to an 
MTU of 9000.

Has anyone else tested Hadoop performance with Jumbo frames? Are you 
seeing something different to what we're seeing?

Thanks,

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

Posted by Eli Collins <el...@cloudera.com>.
On Fri, Jan 22, 2010 at 3:57 AM, stephen mulcahy
<st...@deri.org> wrote:
> Hi,
>
> I've been running some tests on some new hardware we have acquired.
>
> As a baseline, I ran the Hadoop sort[1] with 10GB and 100GB of data. As an
> experiment, I ran it on 4 systems (1 configured as master+slave and 3 as
> slaves) - first with an MTU of 1500 and then with an MTU of 9000.
>
> I was somewhat surprised at the results of enabling Jumbo frames - it
> resulted in a slowdown. In the case of the write operations, the slow down
> was about 5%. In the case of 10GB sort, the slowdown was around 6% and in
> the case of the 100GB sort, the slowdown was nearly 20%.

Are normal (non-hadoop) file transfers faster between the same hosts
with jumbo frames enabled? ie have your ruled out host/configuration
issues?

Often a slowdown with jumbo frames means some part of the network
can't support them and the packets are getting fragmented as a result.
You can a tool like tshark to see if that's happening.

Thanks,
Eli