You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Aaron Kimball <aa...@cloudera.com> on 2009/08/25 02:43:52 UTC

Re: Hadoop streaming: How is data distributed from mappers to reducers?

Yes. It works just like Java-based MapReduce in that regard.
- Aaron

On Sun, Aug 23, 2009 at 5:09 AM, Nipun Saggar <ni...@gmail.com>wrote:

> Hi all,
>
> I have recently started using Hadoop streaming. From the documentation, I
> understand that by default, each line output from a mapper up to the first
> tab becomes the key and rest of the line is the value. I wanted to know
> that
> between the mapper and reducer, is there a shuffling(sorting) phase? More
> specifically, Would it be correct to assume that output from all mappers
> with the same key will go to the same reducer?
>
> Thanks,
> Nipun
>

RE: Hadoop streaming: How is data distributed from mappers to reducers?

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

Hadoop will make sure that every <k,v> pair with same key will land up in same reducer and consumed in a single reduce instance.

-----Original Message-----
From: Nipun Saggar [mailto:nipun.saggar@gmail.com] 
Sent: Tuesday, August 25, 2009 10:41 AM
To: common-user@hadoop.apache.org
Subject: Re: Hadoop streaming: How is data distributed from mappers to reducers?

Does that mean that, if the same key is emitted more than once from a
mapper, it is not necessary that the key value pairs (for that same key)
will go to the same reducer?

-Nipun

On Tue, Aug 25, 2009 at 6:13 AM, Aaron Kimball <aa...@cloudera.com> wrote:

> Yes. It works just like Java-based MapReduce in that regard.
> - Aaron
>
> On Sun, Aug 23, 2009 at 5:09 AM, Nipun Saggar <nipun.saggar@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I have recently started using Hadoop streaming. From the documentation, I
> > understand that by default, each line output from a mapper up to the
> first
> > tab becomes the key and rest of the line is the value. I wanted to know
> > that
> > between the mapper and reducer, is there a shuffling(sorting) phase? More
> > specifically, Would it be correct to assume that output from all mappers
> > with the same key will go to the same reducer?
> >
> > Thanks,
> > Nipun
> >
>

Re: Hadoop streaming: How is data distributed from mappers to reducers?

Posted by Nipun Saggar <ni...@gmail.com>.

Does that mean that, if the same key is emitted more than once from a
mapper, it is not necessary that the key value pairs (for that same key)
will go to the same reducer?

-Nipun

On Tue, Aug 25, 2009 at 6:13 AM, Aaron Kimball <aa...@cloudera.com> wrote:

> Yes. It works just like Java-based MapReduce in that regard.
> - Aaron
>
> On Sun, Aug 23, 2009 at 5:09 AM, Nipun Saggar <nipun.saggar@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I have recently started using Hadoop streaming. From the documentation, I
> > understand that by default, each line output from a mapper up to the
> first
> > tab becomes the key and rest of the line is the value. I wanted to know
> > that
> > between the mapper and reducer, is there a shuffling(sorting) phase? More
> > specifically, Would it be correct to assume that output from all mappers
> > with the same key will go to the same reducer?
> >
> > Thanks,
> > Nipun
> >
>