You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by Anty <an...@gmail.com> on 2012/02/16 16:01:15 UTC

Optimized Hadoop

Hi: Guys
       We just deliver a optimized hadoop , if you are interested, Pls
refer to https://github.com/hanborq/hadoop

-- 
Best Regards
Anty Rao

Re: Optimized Hadoop

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Feb 16, 2012 at 8:25 PM, Schubert Zhang <zs...@gmail.com> wrote:
> 1) it should be sort-avoidance.

right - that's a nice improvement, looking forward to getting that in
trunk at some point.

> 2) work pool (like Tenzing)
>

Looking at the code, it seems you only support the default task
executor. Do you have plans to support run-as-user through the linux
task-controller? It's a requirement for secure environments. But, it
makes the worker pool model a little tougher since you can't share a
JVM cross-user.

Also, how does class-unloading and reloading interact with this model?

> Sorry ,the adaptive heartbeat code is not in this github code, we are
> discussing it.
>
>
>
> On Fri, Feb 17, 2012 at 11:00 AM, Anty <an...@gmail.com> wrote:
>>
>> Hi: Todd
>>
>> yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
>> We mainly add the following two features:
>> 1) shuffle avoidance
>> 2) work pool
>>
>>
>> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>
>>> Hey Schubert,
>>>
>>> Looking at the code on github, it looks like your rewritten shuffle is
>>> in fact just a backport of the shuffle from MR2. I didn't look closely
>>> - are there any distinguishing factors?
>>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>>> same as what's in 1.0?
>>>
>>> -Todd
>>>
>>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>>> wrote:
>>> > Here is the presentation to describe our job,
>>> >
>>> > http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>>> > Wellcome to give your advises.
>>> > It's just a little step, and we are continue to do more improvements,
>>> > thanks
>>> > for your help.
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>>> >>
>>> >> Hi: Guys
>>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>>> >> refer to https://github.com/hanborq/hadoop
>>> >>
>>> >> --
>>> >> Best Regards
>>> >> Anty Rao
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>
>>
>>
>>
>> --
>> Best Regards
>> Anty Rao
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Optimized Hadoop

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Feb 16, 2012 at 8:25 PM, Schubert Zhang <zs...@gmail.com> wrote:
> 1) it should be sort-avoidance.

right - that's a nice improvement, looking forward to getting that in
trunk at some point.

> 2) work pool (like Tenzing)
>

Looking at the code, it seems you only support the default task
executor. Do you have plans to support run-as-user through the linux
task-controller? It's a requirement for secure environments. But, it
makes the worker pool model a little tougher since you can't share a
JVM cross-user.

Also, how does class-unloading and reloading interact with this model?

> Sorry ,the adaptive heartbeat code is not in this github code, we are
> discussing it.
>
>
>
> On Fri, Feb 17, 2012 at 11:00 AM, Anty <an...@gmail.com> wrote:
>>
>> Hi: Todd
>>
>> yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
>> We mainly add the following two features:
>> 1) shuffle avoidance
>> 2) work pool
>>
>>
>> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>
>>> Hey Schubert,
>>>
>>> Looking at the code on github, it looks like your rewritten shuffle is
>>> in fact just a backport of the shuffle from MR2. I didn't look closely
>>> - are there any distinguishing factors?
>>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>>> same as what's in 1.0?
>>>
>>> -Todd
>>>
>>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>>> wrote:
>>> > Here is the presentation to describe our job,
>>> >
>>> > http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>>> > Wellcome to give your advises.
>>> > It's just a little step, and we are continue to do more improvements,
>>> > thanks
>>> > for your help.
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>>> >>
>>> >> Hi: Guys
>>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>>> >> refer to https://github.com/hanborq/hadoop
>>> >>
>>> >> --
>>> >> Best Regards
>>> >> Anty Rao
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>
>>
>>
>>
>> --
>> Best Regards
>> Anty Rao
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Optimized Hadoop

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Feb 16, 2012 at 8:25 PM, Schubert Zhang <zs...@gmail.com> wrote:
> 1) it should be sort-avoidance.

right - that's a nice improvement, looking forward to getting that in
trunk at some point.

> 2) work pool (like Tenzing)
>

Looking at the code, it seems you only support the default task
executor. Do you have plans to support run-as-user through the linux
task-controller? It's a requirement for secure environments. But, it
makes the worker pool model a little tougher since you can't share a
JVM cross-user.

Also, how does class-unloading and reloading interact with this model?

> Sorry ,the adaptive heartbeat code is not in this github code, we are
> discussing it.
>
>
>
> On Fri, Feb 17, 2012 at 11:00 AM, Anty <an...@gmail.com> wrote:
>>
>> Hi: Todd
>>
>> yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
>> We mainly add the following two features:
>> 1) shuffle avoidance
>> 2) work pool
>>
>>
>> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>
>>> Hey Schubert,
>>>
>>> Looking at the code on github, it looks like your rewritten shuffle is
>>> in fact just a backport of the shuffle from MR2. I didn't look closely
>>> - are there any distinguishing factors?
>>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>>> same as what's in 1.0?
>>>
>>> -Todd
>>>
>>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>>> wrote:
>>> > Here is the presentation to describe our job,
>>> >
>>> > http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>>> > Wellcome to give your advises.
>>> > It's just a little step, and we are continue to do more improvements,
>>> > thanks
>>> > for your help.
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>>> >>
>>> >> Hi: Guys
>>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>>> >> refer to https://github.com/hanborq/hadoop
>>> >>
>>> >> --
>>> >> Best Regards
>>> >> Anty Rao
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>
>>
>>
>>
>> --
>> Best Regards
>> Anty Rao
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Optimized Hadoop

Posted by Sharad Agarwal <sh...@apache.org>.

I have some work pool related thoughts for MRv2 which are captured here:

https://issues.apache.org/jira/browse/MAPREDUCE-3315

Please feel free to add your inputs and contributions.

Sharad


>
>
> On Fri, Feb 17, 2012 at 9:55 AM, Schubert Zhang <zs...@gmail.com> wrote:
>
>> 1) it should be sort-avoidance.
>> 2) work pool (like Tenzing)
>>
>> Sorry ,the adaptive heartbeat code is not in this github code, we are
>> discussing it.
>>
>>
>>
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

1) it should be sort-avoidance.
2) work pool (like Tenzing)

Sorry ,the adaptive heartbeat code is not in this github code, we are
discussing it.


On Fri, Feb 17, 2012 at 11:00 AM, Anty <an...@gmail.com> wrote:

> Hi: Todd
>
> yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
> We mainly add the following two features:
> 1) shuffle avoidance
> 2) work pool
>
>
> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hey Schubert,
>>
>> Looking at the code on github, it looks like your rewritten shuffle is
>> in fact just a backport of the shuffle from MR2. I didn't look closely
>> - are there any distinguishing factors?
>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>> same as what's in 1.0?
>>
>> -Todd
>>
>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>> wrote:
>> > Here is the presentation to describe our job,
>> >
>> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>> > Wellcome to give your advises.
>> > It's just a little step, and we are continue to do more improvements,
>> thanks
>> > for your help.
>> >
>> >
>> >
>> >
>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>> >>
>> >> Hi: Guys
>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>> >> refer to https://github.com/hanborq/hadoop
>> >>
>> >> --
>> >> Best Regards
>> >> Anty Rao
>> >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

1) it should be sort-avoidance.
2) work pool (like Tenzing)

Sorry ,the adaptive heartbeat code is not in this github code, we are
discussing it.


On Fri, Feb 17, 2012 at 11:00 AM, Anty <an...@gmail.com> wrote:

> Hi: Todd
>
> yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
> We mainly add the following two features:
> 1) shuffle avoidance
> 2) work pool
>
>
> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hey Schubert,
>>
>> Looking at the code on github, it looks like your rewritten shuffle is
>> in fact just a backport of the shuffle from MR2. I didn't look closely
>> - are there any distinguishing factors?
>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>> same as what's in 1.0?
>>
>> -Todd
>>
>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>> wrote:
>> > Here is the presentation to describe our job,
>> >
>> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>> > Wellcome to give your advises.
>> > It's just a little step, and we are continue to do more improvements,
>> thanks
>> > for your help.
>> >
>> >
>> >
>> >
>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>> >>
>> >> Hi: Guys
>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>> >> refer to https://github.com/hanborq/hadoop
>> >>
>> >> --
>> >> Best Regards
>> >> Anty Rao
>> >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

1) it should be sort-avoidance.
2) work pool (like Tenzing)

Sorry ,the adaptive heartbeat code is not in this github code, we are
discussing it.


On Fri, Feb 17, 2012 at 11:00 AM, Anty <an...@gmail.com> wrote:

> Hi: Todd
>
> yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
> We mainly add the following two features:
> 1) shuffle avoidance
> 2) work pool
>
>
> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hey Schubert,
>>
>> Looking at the code on github, it looks like your rewritten shuffle is
>> in fact just a backport of the shuffle from MR2. I didn't look closely
>> - are there any distinguishing factors?
>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>> same as what's in 1.0?
>>
>> -Todd
>>
>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>> wrote:
>> > Here is the presentation to describe our job,
>> >
>> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>> > Wellcome to give your advises.
>> > It's just a little step, and we are continue to do more improvements,
>> thanks
>> > for your help.
>> >
>> >
>> >
>> >
>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>> >>
>> >> Hi: Guys
>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>> >> refer to https://github.com/hanborq/hadoop
>> >>
>> >> --
>> >> Best Regards
>> >> Anty Rao
>> >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by Anty <an...@gmail.com>.

Hi: Todd

yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
We mainly add the following two features:
1) shuffle avoidance
2) work pool

On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey Schubert,
>
> Looking at the code on github, it looks like your rewritten shuffle is
> in fact just a backport of the shuffle from MR2. I didn't look closely
> - are there any distinguishing factors?
> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> same as what's in 1.0?
>
> -Todd
>
> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> > Here is the presentation to describe our job,
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > Wellcome to give your advises.
> > It's just a little step, and we are continue to do more improvements,
> thanks
> > for your help.
> >
> >
> >
> >
> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> >>
> >> Hi: Guys
> >>        We just deliver a optimized hadoop , if you are interested, Pls
> >> refer to https://github.com/hanborq/hadoop
> >>
> >> --
> >> Best Regards
> >> Anty Rao
> >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Best Regards
Anty Rao

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

@Todd,
Yes, in our first code tag, we intendedly keep away from the security and
user-control feature.
It is because in our existing deploys of production solutions in enterprise
field, this feature is always turned off. I think it may be mainly because
of the different business model between Hanborq and others.

But, we really have plan to completely compat with Apache and Cloudera in
the future.

For the worker-pool implementation, it is true we will continue to improve
our solution....

Schubert Zhang

Looking at the code, it seems you only support the default task
executor. Do you have plans to support run-as-user through the linux
task-controller? It's a requirement for secure environments. But, it
makes the worker pool model a little tougher since you can't share a
JVM cross-user.



On Wed, Feb 22, 2012 at 7:34 PM, Dieter Plaetinck <
dieter.plaetinck@intec.ugent.be> wrote:

> Great work folks! Very interesting.
>
> PS: did you notice if you google for "hanborq" or HDH it's very hard to
> find your website, hanborq.com ?
>
> Dieter
>
> On Tue, 21 Feb 2012 02:17:31 +0800
> Schubert Zhang <zs...@gmail.com> wrote:
>
> > We just update the slides of this improvements:
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> >
> > Updates:
> > (1) modified some describes to make things more clear and accuracy.
> > (2) add some benchmarks to make sense.
> >
> > On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:
> >
> > >
> > >
> > > On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > >> Hey Schubert,
> > >>
> > >> Looking at the code on github, it looks like your rewritten shuffle is
> > >> in fact just a backport of the shuffle from MR2. I didn't look closely
> > >>
> > >
> > > additionally, the rewritten shuffle in MR2 has some bugs, which harm
> the
> > > overall performance, for which I have already file a jira to report
> this,
> > > with a patch available.
> > > MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
> > >
> > >
> > >
> > >> - are there any distinguishing factors?
> > >> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> > >> same as what's in 1.0?
> > >>
> > >> -Todd
> > >>
> > >> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
> > >> wrote:
> > >> > Here is the presentation to describe our job,
> > >> >
> > >>
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > >> > Wellcome to give your advises.
> > >> > It's just a little step, and we are continue to do more
> improvements,
> > >> thanks
> > >> > for your help.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> > >> >>
> > >> >> Hi: Guys
> > >> >>        We just deliver a optimized hadoop , if you are interested,
> Pls
> > >> >> refer to https://github.com/hanborq/hadoop
> > >> >>
> > >> >> --
> > >> >> Best Regards
> > >> >> Anty Rao
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Anty Rao
> > >
>
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

Thanks Dieter, Any comment is welcome.

Hehe, Hanborq Inc. is a small and low profile company, enen though we have
been in hadoop ecosystem for 4+ years. In fact, we were working hard and
busy in resolving big data problems of big enterprises, in china.  Of
cause, we were also finding our business model.

I think our home page site (www.hanborq.com) is very simple and ungainly
now, it seems we should get a guy who is good at website. :-)
But if you Google "Hanborq Hadoop", "Hanborq MapReduce", you may get you
want.

Thanks

On Wed, Feb 22, 2012 at 7:34 PM, Dieter Plaetinck <
dieter.plaetinck@intec.ugent.be> wrote:

> Great work folks! Very interesting.
>
> PS: did you notice if you google for "hanborq" or HDH it's very hard to
> find your website, hanborq.com ?
>
> Dieter
>
> On Tue, 21 Feb 2012 02:17:31 +0800
> Schubert Zhang <zs...@gmail.com> wrote:
>
> > We just update the slides of this improvements:
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> >
> > Updates:
> > (1) modified some describes to make things more clear and accuracy.
> > (2) add some benchmarks to make sense.
> >
> > On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:
> >
> > >
> > >
> > > On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > >> Hey Schubert,
> > >>
> > >> Looking at the code on github, it looks like your rewritten shuffle is
> > >> in fact just a backport of the shuffle from MR2. I didn't look closely
> > >>
> > >
> > > additionally, the rewritten shuffle in MR2 has some bugs, which harm
> the
> > > overall performance, for which I have already file a jira to report
> this,
> > > with a patch available.
> > > MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
> > >
> > >
> > >
> > >> - are there any distinguishing factors?
> > >> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> > >> same as what's in 1.0?
> > >>
> > >> -Todd
> > >>
> > >> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
> > >> wrote:
> > >> > Here is the presentation to describe our job,
> > >> >
> > >>
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > >> > Wellcome to give your advises.
> > >> > It's just a little step, and we are continue to do more
> improvements,
> > >> thanks
> > >> > for your help.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> > >> >>
> > >> >> Hi: Guys
> > >> >>        We just deliver a optimized hadoop , if you are interested,
> Pls
> > >> >> refer to https://github.com/hanborq/hadoop
> > >> >>
> > >> >> --
> > >> >> Best Regards
> > >> >> Anty Rao
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Anty Rao
> > >
>
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

Thanks Dieter, Any comment is welcome.

Hehe, Hanborq Inc. is a small and low profile company, enen though we have
been in hadoop ecosystem for 4+ years. In fact, we were working hard and
busy in resolving big data problems of big enterprises, in china.  Of
cause, we were also finding our business model.

I think our home page site (www.hanborq.com) is very simple and ungainly
now, it seems we should get a guy who is good at website. :-)
But if you Google "Hanborq Hadoop", "Hanborq MapReduce", you may get you
want.

Thanks

On Wed, Feb 22, 2012 at 7:34 PM, Dieter Plaetinck <
dieter.plaetinck@intec.ugent.be> wrote:

> Great work folks! Very interesting.
>
> PS: did you notice if you google for "hanborq" or HDH it's very hard to
> find your website, hanborq.com ?
>
> Dieter
>
> On Tue, 21 Feb 2012 02:17:31 +0800
> Schubert Zhang <zs...@gmail.com> wrote:
>
> > We just update the slides of this improvements:
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> >
> > Updates:
> > (1) modified some describes to make things more clear and accuracy.
> > (2) add some benchmarks to make sense.
> >
> > On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:
> >
> > >
> > >
> > > On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > >> Hey Schubert,
> > >>
> > >> Looking at the code on github, it looks like your rewritten shuffle is
> > >> in fact just a backport of the shuffle from MR2. I didn't look closely
> > >>
> > >
> > > additionally, the rewritten shuffle in MR2 has some bugs, which harm
> the
> > > overall performance, for which I have already file a jira to report
> this,
> > > with a patch available.
> > > MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
> > >
> > >
> > >
> > >> - are there any distinguishing factors?
> > >> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> > >> same as what's in 1.0?
> > >>
> > >> -Todd
> > >>
> > >> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
> > >> wrote:
> > >> > Here is the presentation to describe our job,
> > >> >
> > >>
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > >> > Wellcome to give your advises.
> > >> > It's just a little step, and we are continue to do more
> improvements,
> > >> thanks
> > >> > for your help.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> > >> >>
> > >> >> Hi: Guys
> > >> >>        We just deliver a optimized hadoop , if you are interested,
> Pls
> > >> >> refer to https://github.com/hanborq/hadoop
> > >> >>
> > >> >> --
> > >> >> Best Regards
> > >> >> Anty Rao
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Anty Rao
> > >
>
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

Thanks Dieter, Any comment is welcome.

Hehe, Hanborq Inc. is a small and low profile company, enen though we have
been in hadoop ecosystem for 4+ years. In fact, we were working hard and
busy in resolving big data problems of big enterprises, in china.  Of
cause, we were also finding our business model.

I think our home page site (www.hanborq.com) is very simple and ungainly
now, it seems we should get a guy who is good at website. :-)
But if you Google "Hanborq Hadoop", "Hanborq MapReduce", you may get you
want.

Thanks

On Wed, Feb 22, 2012 at 7:34 PM, Dieter Plaetinck <
dieter.plaetinck@intec.ugent.be> wrote:

> Great work folks! Very interesting.
>
> PS: did you notice if you google for "hanborq" or HDH it's very hard to
> find your website, hanborq.com ?
>
> Dieter
>
> On Tue, 21 Feb 2012 02:17:31 +0800
> Schubert Zhang <zs...@gmail.com> wrote:
>
> > We just update the slides of this improvements:
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> >
> > Updates:
> > (1) modified some describes to make things more clear and accuracy.
> > (2) add some benchmarks to make sense.
> >
> > On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:
> >
> > >
> > >
> > > On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > >> Hey Schubert,
> > >>
> > >> Looking at the code on github, it looks like your rewritten shuffle is
> > >> in fact just a backport of the shuffle from MR2. I didn't look closely
> > >>
> > >
> > > additionally, the rewritten shuffle in MR2 has some bugs, which harm
> the
> > > overall performance, for which I have already file a jira to report
> this,
> > > with a patch available.
> > > MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
> > >
> > >
> > >
> > >> - are there any distinguishing factors?
> > >> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> > >> same as what's in 1.0?
> > >>
> > >> -Todd
> > >>
> > >> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
> > >> wrote:
> > >> > Here is the presentation to describe our job,
> > >> >
> > >>
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > >> > Wellcome to give your advises.
> > >> > It's just a little step, and we are continue to do more
> improvements,
> > >> thanks
> > >> > for your help.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> > >> >>
> > >> >> Hi: Guys
> > >> >>        We just deliver a optimized hadoop , if you are interested,
> Pls
> > >> >> refer to https://github.com/hanborq/hadoop
> > >> >>
> > >> >> --
> > >> >> Best Regards
> > >> >> Anty Rao
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Anty Rao
> > >
>
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

@Todd,
Yes, in our first code tag, we intendedly keep away from the security and
user-control feature.
It is because in our existing deploys of production solutions in enterprise
field, this feature is always turned off. I think it may be mainly because
of the different business model between Hanborq and others.

But, we really have plan to completely compat with Apache and Cloudera in
the future.

For the worker-pool implementation, it is true we will continue to improve
our solution....

Schubert Zhang

Looking at the code, it seems you only support the default task
executor. Do you have plans to support run-as-user through the linux
task-controller? It's a requirement for secure environments. But, it
makes the worker pool model a little tougher since you can't share a
JVM cross-user.



On Wed, Feb 22, 2012 at 7:34 PM, Dieter Plaetinck <
dieter.plaetinck@intec.ugent.be> wrote:

> Great work folks! Very interesting.
>
> PS: did you notice if you google for "hanborq" or HDH it's very hard to
> find your website, hanborq.com ?
>
> Dieter
>
> On Tue, 21 Feb 2012 02:17:31 +0800
> Schubert Zhang <zs...@gmail.com> wrote:
>
> > We just update the slides of this improvements:
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> >
> > Updates:
> > (1) modified some describes to make things more clear and accuracy.
> > (2) add some benchmarks to make sense.
> >
> > On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:
> >
> > >
> > >
> > > On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > >> Hey Schubert,
> > >>
> > >> Looking at the code on github, it looks like your rewritten shuffle is
> > >> in fact just a backport of the shuffle from MR2. I didn't look closely
> > >>
> > >
> > > additionally, the rewritten shuffle in MR2 has some bugs, which harm
> the
> > > overall performance, for which I have already file a jira to report
> this,
> > > with a patch available.
> > > MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
> > >
> > >
> > >
> > >> - are there any distinguishing factors?
> > >> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> > >> same as what's in 1.0?
> > >>
> > >> -Todd
> > >>
> > >> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
> > >> wrote:
> > >> > Here is the presentation to describe our job,
> > >> >
> > >>
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > >> > Wellcome to give your advises.
> > >> > It's just a little step, and we are continue to do more
> improvements,
> > >> thanks
> > >> > for your help.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> > >> >>
> > >> >> Hi: Guys
> > >> >>        We just deliver a optimized hadoop , if you are interested,
> Pls
> > >> >> refer to https://github.com/hanborq/hadoop
> > >> >>
> > >> >> --
> > >> >> Best Regards
> > >> >> Anty Rao
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Anty Rao
> > >
>
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

@Todd,
Yes, in our first code tag, we intendedly keep away from the security and
user-control feature.
It is because in our existing deploys of production solutions in enterprise
field, this feature is always turned off. I think it may be mainly because
of the different business model between Hanborq and others.

But, we really have plan to completely compat with Apache and Cloudera in
the future.

For the worker-pool implementation, it is true we will continue to improve
our solution....

Schubert Zhang

Looking at the code, it seems you only support the default task
executor. Do you have plans to support run-as-user through the linux
task-controller? It's a requirement for secure environments. But, it
makes the worker pool model a little tougher since you can't share a
JVM cross-user.



On Wed, Feb 22, 2012 at 7:34 PM, Dieter Plaetinck <
dieter.plaetinck@intec.ugent.be> wrote:

> Great work folks! Very interesting.
>
> PS: did you notice if you google for "hanborq" or HDH it's very hard to
> find your website, hanborq.com ?
>
> Dieter
>
> On Tue, 21 Feb 2012 02:17:31 +0800
> Schubert Zhang <zs...@gmail.com> wrote:
>
> > We just update the slides of this improvements:
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> >
> > Updates:
> > (1) modified some describes to make things more clear and accuracy.
> > (2) add some benchmarks to make sense.
> >
> > On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:
> >
> > >
> > >
> > > On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > >> Hey Schubert,
> > >>
> > >> Looking at the code on github, it looks like your rewritten shuffle is
> > >> in fact just a backport of the shuffle from MR2. I didn't look closely
> > >>
> > >
> > > additionally, the rewritten shuffle in MR2 has some bugs, which harm
> the
> > > overall performance, for which I have already file a jira to report
> this,
> > > with a patch available.
> > > MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
> > >
> > >
> > >
> > >> - are there any distinguishing factors?
> > >> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> > >> same as what's in 1.0?
> > >>
> > >> -Todd
> > >>
> > >> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
> > >> wrote:
> > >> > Here is the presentation to describe our job,
> > >> >
> > >>
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > >> > Wellcome to give your advises.
> > >> > It's just a little step, and we are continue to do more
> improvements,
> > >> thanks
> > >> > for your help.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> > >> >>
> > >> >> Hi: Guys
> > >> >>        We just deliver a optimized hadoop , if you are interested,
> Pls
> > >> >> refer to https://github.com/hanborq/hadoop
> > >> >>
> > >> >> --
> > >> >> Best Regards
> > >> >> Anty Rao
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Anty Rao
> > >
>
>

Re: Optimized Hadoop

Posted by Dieter Plaetinck <di...@intec.ugent.be>.

Great work folks! Very interesting.

PS: did you notice if you google for "hanborq" or HDH it's very hard to find your website, hanborq.com ?

Dieter

On Tue, 21 Feb 2012 02:17:31 +0800
Schubert Zhang <zs...@gmail.com> wrote:

> We just update the slides of this improvements:
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> 
> Updates:
> (1) modified some describes to make things more clear and accuracy.
> (2) add some benchmarks to make sense.
> 
> On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:
> 
> >
> >
> > On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
> >
> >> Hey Schubert,
> >>
> >> Looking at the code on github, it looks like your rewritten shuffle is
> >> in fact just a backport of the shuffle from MR2. I didn't look closely
> >>
> >
> > additionally, the rewritten shuffle in MR2 has some bugs, which harm the
> > overall performance, for which I have already file a jira to report this,
> > with a patch available.
> > MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
> >
> >
> >
> >> - are there any distinguishing factors?
> >> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> >> same as what's in 1.0?
> >>
> >> -Todd
> >>
> >> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
> >> wrote:
> >> > Here is the presentation to describe our job,
> >> >
> >> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> >> > Wellcome to give your advises.
> >> > It's just a little step, and we are continue to do more improvements,
> >> thanks
> >> > for your help.
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> >> >>
> >> >> Hi: Guys
> >> >>        We just deliver a optimized hadoop , if you are interested, Pls
> >> >> refer to https://github.com/hanborq/hadoop
> >> >>
> >> >> --
> >> >> Best Regards
> >> >> Anty Rao
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Best Regards
> > Anty Rao
> >

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

We just update the slides of this improvements:
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a

Updates:
(1) modified some describes to make things more clear and accuracy.
(2) add some benchmarks to make sense.

On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:

>
>
> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hey Schubert,
>>
>> Looking at the code on github, it looks like your rewritten shuffle is
>> in fact just a backport of the shuffle from MR2. I didn't look closely
>>
>
> additionally, the rewritten shuffle in MR2 has some bugs, which harm the
> overall performance, for which I have already file a jira to report this,
> with a patch available.
> MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
>
>
>
>> - are there any distinguishing factors?
>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>> same as what's in 1.0?
>>
>> -Todd
>>
>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>> wrote:
>> > Here is the presentation to describe our job,
>> >
>> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>> > Wellcome to give your advises.
>> > It's just a little step, and we are continue to do more improvements,
>> thanks
>> > for your help.
>> >
>> >
>> >
>> >
>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>> >>
>> >> Hi: Guys
>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>> >> refer to https://github.com/hanborq/hadoop
>> >>
>> >> --
>> >> Best Regards
>> >> Anty Rao
>> >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

We just update the slides of this improvements:
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a

Updates:
(1) modified some describes to make things more clear and accuracy.
(2) add some benchmarks to make sense.

On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:

>
>
> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hey Schubert,
>>
>> Looking at the code on github, it looks like your rewritten shuffle is
>> in fact just a backport of the shuffle from MR2. I didn't look closely
>>
>
> additionally, the rewritten shuffle in MR2 has some bugs, which harm the
> overall performance, for which I have already file a jira to report this,
> with a patch available.
> MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
>
>
>
>> - are there any distinguishing factors?
>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>> same as what's in 1.0?
>>
>> -Todd
>>
>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>> wrote:
>> > Here is the presentation to describe our job,
>> >
>> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>> > Wellcome to give your advises.
>> > It's just a little step, and we are continue to do more improvements,
>> thanks
>> > for your help.
>> >
>> >
>> >
>> >
>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>> >>
>> >> Hi: Guys
>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>> >> refer to https://github.com/hanborq/hadoop
>> >>
>> >> --
>> >> Best Regards
>> >> Anty Rao
>> >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

We just update the slides of this improvements:
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a

Updates:
(1) modified some describes to make things more clear and accuracy.
(2) add some benchmarks to make sense.

On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:

>
>
> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hey Schubert,
>>
>> Looking at the code on github, it looks like your rewritten shuffle is
>> in fact just a backport of the shuffle from MR2. I didn't look closely
>>
>
> additionally, the rewritten shuffle in MR2 has some bugs, which harm the
> overall performance, for which I have already file a jira to report this,
> with a patch available.
> MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
>
>
>
>> - are there any distinguishing factors?
>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>> same as what's in 1.0?
>>
>> -Todd
>>
>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>> wrote:
>> > Here is the presentation to describe our job,
>> >
>> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>> > Wellcome to give your advises.
>> > It's just a little step, and we are continue to do more improvements,
>> thanks
>> > for your help.
>> >
>> >
>> >
>> >
>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>> >>
>> >> Hi: Guys
>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>> >> refer to https://github.com/hanborq/hadoop
>> >>
>> >> --
>> >> Best Regards
>> >> Anty Rao
>> >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

We just update the slides of this improvements:
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a

Updates:
(1) modified some describes to make things more clear and accuracy.
(2) add some benchmarks to make sense.

On Sat, Feb 18, 2012 at 11:12 PM, Anty <an...@gmail.com> wrote:

>
>
> On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hey Schubert,
>>
>> Looking at the code on github, it looks like your rewritten shuffle is
>> in fact just a backport of the shuffle from MR2. I didn't look closely
>>
>
> additionally, the rewritten shuffle in MR2 has some bugs, which harm the
> overall performance, for which I have already file a jira to report this,
> with a patch available.
> MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>
>
>
>
>> - are there any distinguishing factors?
>> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
>> same as what's in 1.0?
>>
>> -Todd
>>
>> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com>
>> wrote:
>> > Here is the presentation to describe our job,
>> >
>> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
>> > Wellcome to give your advises.
>> > It's just a little step, and we are continue to do more improvements,
>> thanks
>> > for your help.
>> >
>> >
>> >
>> >
>> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>> >>
>> >> Hi: Guys
>> >>        We just deliver a optimized hadoop , if you are interested, Pls
>> >> refer to https://github.com/hanborq/hadoop
>> >>
>> >> --
>> >> Best Regards
>> >> Anty Rao
>> >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by Anty <an...@gmail.com>.

On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey Schubert,
>
> Looking at the code on github, it looks like your rewritten shuffle is
> in fact just a backport of the shuffle from MR2. I didn't look closely
>

additionally, the rewritten shuffle in MR2 has some bugs, which harm the
overall performance, for which I have already file a jira to report this,
with a patch available.
MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>



> - are there any distinguishing factors?
> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> same as what's in 1.0?
>
> -Todd
>
> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> > Here is the presentation to describe our job,
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > Wellcome to give your advises.
> > It's just a little step, and we are continue to do more improvements,
> thanks
> > for your help.
> >
> >
> >
> >
> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> >>
> >> Hi: Guys
> >>        We just deliver a optimized hadoop , if you are interested, Pls
> >> refer to https://github.com/hanborq/hadoop
> >>
> >> --
> >> Best Regards
> >> Anty Rao
> >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Best Regards
Anty Rao

Re: Optimized Hadoop

Posted by Anty <an...@gmail.com>.

Hi: Todd

yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
We mainly add the following two features:
1) shuffle avoidance
2) work pool

On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey Schubert,
>
> Looking at the code on github, it looks like your rewritten shuffle is
> in fact just a backport of the shuffle from MR2. I didn't look closely
> - are there any distinguishing factors?
> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> same as what's in 1.0?
>
> -Todd
>
> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> > Here is the presentation to describe our job,
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > Wellcome to give your advises.
> > It's just a little step, and we are continue to do more improvements,
> thanks
> > for your help.
> >
> >
> >
> >
> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> >>
> >> Hi: Guys
> >>        We just deliver a optimized hadoop , if you are interested, Pls
> >> refer to https://github.com/hanborq/hadoop
> >>
> >> --
> >> Best Regards
> >> Anty Rao
> >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Best Regards
Anty Rao

Re: Optimized Hadoop

Posted by Anty <an...@gmail.com>.

On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey Schubert,
>
> Looking at the code on github, it looks like your rewritten shuffle is
> in fact just a backport of the shuffle from MR2. I didn't look closely
>

additionally, the rewritten shuffle in MR2 has some bugs, which harm the
overall performance, for which I have already file a jira to report this,
with a patch available.
MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>



> - are there any distinguishing factors?
> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> same as what's in 1.0?
>
> -Todd
>
> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> > Here is the presentation to describe our job,
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > Wellcome to give your advises.
> > It's just a little step, and we are continue to do more improvements,
> thanks
> > for your help.
> >
> >
> >
> >
> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> >>
> >> Hi: Guys
> >>        We just deliver a optimized hadoop , if you are interested, Pls
> >> refer to https://github.com/hanborq/hadoop
> >>
> >> --
> >> Best Regards
> >> Anty Rao
> >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Best Regards
Anty Rao

Re: Optimized Hadoop

Posted by Anty <an...@gmail.com>.

Hi: Todd

yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
We mainly add the following two features:
1) shuffle avoidance
2) work pool

On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey Schubert,
>
> Looking at the code on github, it looks like your rewritten shuffle is
> in fact just a backport of the shuffle from MR2. I didn't look closely
> - are there any distinguishing factors?
> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> same as what's in 1.0?
>
> -Todd
>
> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> > Here is the presentation to describe our job,
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > Wellcome to give your advises.
> > It's just a little step, and we are continue to do more improvements,
> thanks
> > for your help.
> >
> >
> >
> >
> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> >>
> >> Hi: Guys
> >>        We just deliver a optimized hadoop , if you are interested, Pls
> >> refer to https://github.com/hanborq/hadoop
> >>
> >> --
> >> Best Regards
> >> Anty Rao
> >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Best Regards
Anty Rao

Re: Optimized Hadoop

Posted by Anty <an...@gmail.com>.

On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey Schubert,
>
> Looking at the code on github, it looks like your rewritten shuffle is
> in fact just a backport of the shuffle from MR2. I didn't look closely
>

additionally, the rewritten shuffle in MR2 has some bugs, which harm the
overall performance, for which I have already file a jira to report this,
with a patch available.
MAPREDUCE-3685 <https://issues.apache.org/jira/browse/MAPREDUCE-3685>



> - are there any distinguishing factors?
> Also, the OOB heartbeat and adaptive heartbeat code seems to be the
> same as what's in 1.0?
>
> -Todd
>
> On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> > Here is the presentation to describe our job,
> >
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> > Wellcome to give your advises.
> > It's just a little step, and we are continue to do more improvements,
> thanks
> > for your help.
> >
> >
> >
> >
> > On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
> >>
> >> Hi: Guys
> >>        We just deliver a optimized hadoop , if you are interested, Pls
> >> refer to https://github.com/hanborq/hadoop
> >>
> >> --
> >> Best Regards
> >> Anty Rao
> >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Best Regards
Anty Rao

Re: Optimized Hadoop

Posted by Todd Lipcon <to...@cloudera.com>.

Hey Schubert,

Looking at the code on github, it looks like your rewritten shuffle is
in fact just a backport of the shuffle from MR2. I didn't look closely
- are there any distinguishing factors?
Also, the OOB heartbeat and adaptive heartbeat code seems to be the
same as what's in 1.0?

-Todd

On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> Here is the presentation to describe our job,
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> Wellcome to give your advises.
> It's just a little step, and we are continue to do more improvements, thanks
> for your help.
>
>
>
>
> On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>>
>> Hi: Guys
>>        We just deliver a optimized hadoop , if you are interested, Pls
>> refer to https://github.com/hanborq/hadoop
>>
>> --
>> Best Regards
>> Anty Rao
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Optimized Hadoop

Posted by Todd Lipcon <to...@cloudera.com>.

Hey Schubert,

Looking at the code on github, it looks like your rewritten shuffle is
in fact just a backport of the shuffle from MR2. I didn't look closely
- are there any distinguishing factors?
Also, the OOB heartbeat and adaptive heartbeat code seems to be the
same as what's in 1.0?

-Todd

On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> Here is the presentation to describe our job,
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> Wellcome to give your advises.
> It's just a little step, and we are continue to do more improvements, thanks
> for your help.
>
>
>
>
> On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>>
>> Hi: Guys
>>        We just deliver a optimized hadoop , if you are interested, Pls
>> refer to https://github.com/hanborq/hadoop
>>
>> --
>> Best Regards
>> Anty Rao
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Optimized Hadoop

Posted by Todd Lipcon <to...@cloudera.com>.

Hey Schubert,

Looking at the code on github, it looks like your rewritten shuffle is
in fact just a backport of the shuffle from MR2. I didn't look closely
- are there any distinguishing factors?
Also, the OOB heartbeat and adaptive heartbeat code seems to be the
same as what's in 1.0?

-Todd

On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang <zs...@gmail.com> wrote:
> Here is the presentation to describe our job,
> http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
> Wellcome to give your advises.
> It's just a little step, and we are continue to do more improvements, thanks
> for your help.
>
>
>
>
> On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:
>>
>> Hi: Guys
>>        We just deliver a optimized hadoop , if you are interested, Pls
>> refer to https://github.com/hanborq/hadoop
>>
>> --
>> Best Regards
>> Anty Rao
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

Here is the presentation to describe our job,
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216aWellcome
to give your advises.
It's just a little step, and we are continue to do more improvements,
thanks for your help.

On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:

> Hi: Guys
>        We just deliver a optimized hadoop , if you are interested, Pls
> refer to https://github.com/hanborq/hadoop
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by brisk <my...@gmail.com>.

Hi Arun,

Just curious, could you please give a quick hint about why 0.23.1 performs
much better in shuffle phase? Any significant changes from prior version?

Thanks,
Ethan

On Thu, Feb 16, 2012 at 9:38 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Interesting, thanks for sharing.
>
> Are you planning on contributing any of these back to Apache Hadoop?
>
> Also, hadoop-0.23.1 significantly outperforms hadoop-1 (previously
> hadoop-0.20.xxx) on both framework (map/reduce task launch) and runtime
> (sort, shuffle etc.). Also makes it much easier to implement Distributed
> Worker Pool with the newer MRv2 (YARN) architecture. We would love to have
> you involved and to contribute back.
>
> Arun
>
> On Feb 16, 2012, at 7:01 AM, Anty wrote:
>
> > Hi: Guys
> >       We just deliver a optimized hadoop , if you are interested, Pls
> > refer to https://github.com/hanborq/hadoop
> >
> > --
> > Best Regards
> > Anty Rao
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Optimized Hadoop

Posted by Arun C Murthy <ac...@hortonworks.com>.

Interesting, thanks for sharing.

Are you planning on contributing any of these back to Apache Hadoop?

Also, hadoop-0.23.1 significantly outperforms hadoop-1 (previously hadoop-0.20.xxx) on both framework (map/reduce task launch) and runtime (sort, shuffle etc.). Also makes it much easier to implement Distributed Worker Pool with the newer MRv2 (YARN) architecture. We would love to have you involved and to contribute back.

Arun

On Feb 16, 2012, at 7:01 AM, Anty wrote:

> Hi: Guys
>       We just deliver a optimized hadoop , if you are interested, Pls
> refer to https://github.com/hanborq/hadoop
> 
> -- 
> Best Regards
> Anty Rao

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

Here is the presentation to describe our job,
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216aWellcome
to give your advises.
It's just a little step, and we are continue to do more improvements,
thanks for your help.

On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:

> Hi: Guys
>        We just deliver a optimized hadoop , if you are interested, Pls
> refer to https://github.com/hanborq/hadoop
>
> --
> Best Regards
> Anty Rao
>

Re: Optimized Hadoop

Posted by Schubert Zhang <zs...@gmail.com>.

Here is the presentation to describe our job,
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216aWellcome
to give your advises.
It's just a little step, and we are continue to do more improvements,
thanks for your help.

On Thu, Feb 16, 2012 at 11:01 PM, Anty <an...@gmail.com> wrote:

> Hi: Guys
>        We just deliver a optimized hadoop , if you are interested, Pls
> refer to https://github.com/hanborq/hadoop
>
> --
> Best Regards
> Anty Rao
>