You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by Todd Lipcon <to...@apache.org> on 2014/09/06 01:58:04 UTC

[VOTE] Merge branch MAPREDUCE-2841 to trunk

Hi all,

As I've reported recently [1], work on the MAPREDUCE-2841 branch has
progressed well and the development team working on it feels that it is
ready to be merged into trunk.

For those not familiar with the JIRA (it's a bit lengthy to read from start
to finish!) the goal of this work is to build a native implementation of
the map-side sort code. The native implementation's primary advantage is
its speed: for example, terasort is 30% faster on a wall-clock basis and
60% faster on a resource consumption basis. For clusters which make heavy
use of MapReduce, this is a substantial improvement to their efficiency.
Users may enable the feature by switching a single configuration flag, and
it will fall back to the original implementation in cases where the native
code doesn't support the configured features/types.

The new work is entirely pluggable and off-by-default to mitigate risk. The
merge patch itself does not modify even a single line of existing code: all
necessary plug-points have already been committed to trunk for some time.

Though we do not yet have a full +1 precommit Jenkins run on the JIRA,
there are only a few small nits to fix before merge, so I figured that we
could start the vote in parallel. Of course we will not merge until it has
a positive precommit run.

Though this branch is a new contribution to the Apache repository, it
represents work done over several years by a large community of developers
including the following:

Binglin Chang
Yang Dong
Sean Zhong
Manu Zhang
Zhongliang Zhu
Vincent Wang
Yan Dong
Cheng Lian
Xusen Yin
Fangqin Dai
Jiang Weihua
Gansha Wu
Avik Dey

The vote will run for 7 days, ending Friday 9/12 EOD PST.

I'll start the voting with my own +1.

-Todd

[1]
http://search-hadoop.com/m/09oay13EwlV/native+task+progress&subj=Native+task+branch+progress

Re: [VOTE] Merge branch MAPREDUCE-2841 to trunk

Posted by Todd Lipcon <to...@cloudera.com>.

With four committer +1s, this vote passes.

I'll take care of merging this to trunk either later tonight or over the
weekend.

Thanks to all of the contributors who helped with this work!

-Todd

Re: [VOTE] Merge branch MAPREDUCE-2841 to trunk

Posted by Karthik Kambatla <ka...@cloudera.com>.

+1

I skimmed over the initial import, but looked at the follow-up patches more
closely. There is very little change to the existing code, most (all?) of
which is already committed to trunk. Ran wordcount with the default
collector and the native collector on a single node setup - the latter
takes ~ 10% less wall-clock time. Haven't verified the CPU usage myself.



On Thu, Sep 11, 2014 at 5:00 AM, Devaraj K <de...@apache.org> wrote:

> +1
>
> Good performance improvement. Nice work…
>
>
>
> On Sat, Sep 6, 2014 at 6:05 AM, Chris Douglas <cd...@apache.org> wrote:
>
> > +1
> >
> > The change to the existing code is very limited and the perf is
> > impressive. -C
> >
> > On Fri, Sep 5, 2014 at 4:58 PM, Todd Lipcon <to...@apache.org> wrote:
> > > Hi all,
> > >
> > > As I've reported recently [1], work on the MAPREDUCE-2841 branch has
> > > progressed well and the development team working on it feels that it is
> > > ready to be merged into trunk.
> > >
> > > For those not familiar with the JIRA (it's a bit lengthy to read from
> > start
> > > to finish!) the goal of this work is to build a native implementation
> of
> > > the map-side sort code. The native implementation's primary advantage
> is
> > > its speed: for example, terasort is 30% faster on a wall-clock basis
> and
> > > 60% faster on a resource consumption basis. For clusters which make
> heavy
> > > use of MapReduce, this is a substantial improvement to their
> efficiency.
> > > Users may enable the feature by switching a single configuration flag,
> > and
> > > it will fall back to the original implementation in cases where the
> > native
> > > code doesn't support the configured features/types.
> > >
> > > The new work is entirely pluggable and off-by-default to mitigate risk.
> > The
> > > merge patch itself does not modify even a single line of existing code:
> > all
> > > necessary plug-points have already been committed to trunk for some
> time.
> > >
> > > Though we do not yet have a full +1 precommit Jenkins run on the JIRA,
> > > there are only a few small nits to fix before merge, so I figured that
> we
> > > could start the vote in parallel. Of course we will not merge until it
> > has
> > > a positive precommit run.
> > >
> > > Though this branch is a new contribution to the Apache repository, it
> > > represents work done over several years by a large community of
> > developers
> > > including the following:
> > >
> > > Binglin Chang
> > > Yang Dong
> > > Sean Zhong
> > > Manu Zhang
> > > Zhongliang Zhu
> > > Vincent Wang
> > > Yan Dong
> > > Cheng Lian
> > > Xusen Yin
> > > Fangqin Dai
> > > Jiang Weihua
> > > Gansha Wu
> > > Avik Dey
> > >
> > > The vote will run for 7 days, ending Friday 9/12 EOD PST.
> > >
> > > I'll start the voting with my own +1.
> > >
> > > -Todd
> > >
> > > [1]
> > >
> >
> http://search-hadoop.com/m/09oay13EwlV/native+task+progress&subj=Native+task+branch+progress
> >
>
>
>
> --
>
>
> Thanks
> Devaraj K
>

Re: [VOTE] Merge branch MAPREDUCE-2841 to trunk

Posted by Devaraj K <de...@apache.org>.

+1

Good performance improvement. Nice work…



On Sat, Sep 6, 2014 at 6:05 AM, Chris Douglas <cd...@apache.org> wrote:

> +1
>
> The change to the existing code is very limited and the perf is
> impressive. -C
>
> On Fri, Sep 5, 2014 at 4:58 PM, Todd Lipcon <to...@apache.org> wrote:
> > Hi all,
> >
> > As I've reported recently [1], work on the MAPREDUCE-2841 branch has
> > progressed well and the development team working on it feels that it is
> > ready to be merged into trunk.
> >
> > For those not familiar with the JIRA (it's a bit lengthy to read from
> start
> > to finish!) the goal of this work is to build a native implementation of
> > the map-side sort code. The native implementation's primary advantage is
> > its speed: for example, terasort is 30% faster on a wall-clock basis and
> > 60% faster on a resource consumption basis. For clusters which make heavy
> > use of MapReduce, this is a substantial improvement to their efficiency.
> > Users may enable the feature by switching a single configuration flag,
> and
> > it will fall back to the original implementation in cases where the
> native
> > code doesn't support the configured features/types.
> >
> > The new work is entirely pluggable and off-by-default to mitigate risk.
> The
> > merge patch itself does not modify even a single line of existing code:
> all
> > necessary plug-points have already been committed to trunk for some time.
> >
> > Though we do not yet have a full +1 precommit Jenkins run on the JIRA,
> > there are only a few small nits to fix before merge, so I figured that we
> > could start the vote in parallel. Of course we will not merge until it
> has
> > a positive precommit run.
> >
> > Though this branch is a new contribution to the Apache repository, it
> > represents work done over several years by a large community of
> developers
> > including the following:
> >
> > Binglin Chang
> > Yang Dong
> > Sean Zhong
> > Manu Zhang
> > Zhongliang Zhu
> > Vincent Wang
> > Yan Dong
> > Cheng Lian
> > Xusen Yin
> > Fangqin Dai
> > Jiang Weihua
> > Gansha Wu
> > Avik Dey
> >
> > The vote will run for 7 days, ending Friday 9/12 EOD PST.
> >
> > I'll start the voting with my own +1.
> >
> > -Todd
> >
> > [1]
> >
> http://search-hadoop.com/m/09oay13EwlV/native+task+progress&subj=Native+task+branch+progress
>



-- 


Thanks
Devaraj K

Re: [VOTE] Merge branch MAPREDUCE-2841 to trunk

Posted by Chris Douglas <cd...@apache.org>.

+1

The change to the existing code is very limited and the perf is impressive. -C

On Fri, Sep 5, 2014 at 4:58 PM, Todd Lipcon <to...@apache.org> wrote:
> Hi all,
>
> As I've reported recently [1], work on the MAPREDUCE-2841 branch has
> progressed well and the development team working on it feels that it is
> ready to be merged into trunk.
>
> For those not familiar with the JIRA (it's a bit lengthy to read from start
> to finish!) the goal of this work is to build a native implementation of
> the map-side sort code. The native implementation's primary advantage is
> its speed: for example, terasort is 30% faster on a wall-clock basis and
> 60% faster on a resource consumption basis. For clusters which make heavy
> use of MapReduce, this is a substantial improvement to their efficiency.
> Users may enable the feature by switching a single configuration flag, and
> it will fall back to the original implementation in cases where the native
> code doesn't support the configured features/types.
>
> The new work is entirely pluggable and off-by-default to mitigate risk. The
> merge patch itself does not modify even a single line of existing code: all
> necessary plug-points have already been committed to trunk for some time.
>
> Though we do not yet have a full +1 precommit Jenkins run on the JIRA,
> there are only a few small nits to fix before merge, so I figured that we
> could start the vote in parallel. Of course we will not merge until it has
> a positive precommit run.
>
> Though this branch is a new contribution to the Apache repository, it
> represents work done over several years by a large community of developers
> including the following:
>
> Binglin Chang
> Yang Dong
> Sean Zhong
> Manu Zhang
> Zhongliang Zhu
> Vincent Wang
> Yan Dong
> Cheng Lian
> Xusen Yin
> Fangqin Dai
> Jiang Weihua
> Gansha Wu
> Avik Dey
>
> The vote will run for 7 days, ending Friday 9/12 EOD PST.
>
> I'll start the voting with my own +1.
>
> -Todd
>
> [1]
> http://search-hadoop.com/m/09oay13EwlV/native+task+progress&subj=Native+task+branch+progress