You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by sam liu <li...@gmail.com> on 2013/06/07 04:11:29 UTC

Why Yarn has worse performance for terasort, than MRv1?

Hi Experts,

We are thinking about whether to use Yarn or not in the near future, and I
ran teragen/terasort on Yarn and MRv1 for comprison.

My env is three nodes cluster, and each node has similar hardware: 2 cpu(4
core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be
fair, I did not make any performance tuning on their configurations, but
use the default configuration values.

Before testing, I think Yarn will be much better than MRv1, if they all use
default configuration, because Yarn is a better framework than MRv1.
However, the test result shows some differences:

MRv1: Hadoop-1.1.1
Yarn: Hadoop-2.0.4

(A) Teragen: generate 10 GB data:
- MRv1: 193 sec
- Yarn: 69 sec
*Yarn is 2.8 times better than MRv1*

(B) Terasort: sort 10 GB data:
- MRv1: 451 sec
- Yarn: 1136 sec
*Yarn is 2.5 times worse than MRv1*

After a fast analysis, I think the direct cause might be that Yarn is much
faster than MRv1 on Map phase, but much worse on Reduce phase.

Here I have two questions:
*- Why my tests show Yarn is worse than MRv1 for terasort?
*
*- What's the stratage for tuning Yarn performance? Is any materials?*

Thanks!
-- 

Sam Liu

Re: Why Yarn has worse performance for terasort, than MRv1?

Posted by Robert Evans <ev...@yahoo-inc.com>.
It is rather difficult to tell without looking into detail of your config
etc.  For our benchmarks on a 350 node cluster running 0.23.3 terasort was
about 5% faster over 1.0.2.   How many map/reduce tasks were launched for
each?  How long did the various phases take map/reduce/shuffle?  Did you
flush the file system caches in between runs?  How many runs did you do on
each system?

--Bobby

On 6/6/13 9:11 PM, "sam liu" <li...@gmail.com> wrote:

>Hi Experts,
>
>We are thinking about whether to use Yarn or not in the near future, and I
>ran teragen/terasort on Yarn and MRv1 for comprison.
>
>My env is three nodes cluster, and each node has similar hardware: 2 cpu(4
>core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be
>fair, I did not make any performance tuning on their configurations, but
>use the default configuration values.
>
>Before testing, I think Yarn will be much better than MRv1, if they all
>use
>default configuration, because Yarn is a better framework than MRv1.
>However, the test result shows some differences:
>
>MRv1: Hadoop-1.1.1
>Yarn: Hadoop-2.0.4
>
>(A) Teragen: generate 10 GB data:
>- MRv1: 193 sec
>- Yarn: 69 sec
>*Yarn is 2.8 times better than MRv1*
>
>(B) Terasort: sort 10 GB data:
>- MRv1: 451 sec
>- Yarn: 1136 sec
>*Yarn is 2.5 times worse than MRv1*
>
>After a fast analysis, I think the direct cause might be that Yarn is much
>faster than MRv1 on Map phase, but much worse on Reduce phase.
>
>Here I have two questions:
>*- Why my tests show Yarn is worse than MRv1 for terasort?
>*
>*- What's the stratage for tuning Yarn performance? Is any materials?*
>
>Thanks!
>-- 
>
>Sam Liu