You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mark <st...@gmail.com> on 2013/03/27 19:46:18 UTC

Sorting/Partitioning of Pig output

I understand in the traditional map/reduce paradigm that each key will get sent to the same reducer sorted but in pig there is no such thing as a "key".  I'm curious to know how pig knows to which reducer to send its output to?

So when creating a custom StoreFunc is there any guarentee on the ordering of Tuples that come into putNext?

And another even more basic question. Do StoreFuncs operate at the Map phase or Reduce phase?

Thanks



Re: Sorting/Partitioning of Pig output

Posted by Yen SYU <ye...@gmail.com>.
My understanding is a StoreFunc in Pig has similar role as OutputFormat in
purely java mapreduce jobs.

On Wed, Mar 27, 2013 at 4:41 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> as far as when the storefunc works, it depends on whether the job is map
> only or map/reduce. It'll work on the last phase. Generally this is the
> reduce phase.
>
> As far as how pig knows where to send it's output, there are keys in pig.
> Basically, a reduce job is necessary any time you have a group, join, or
> sort. In the case of a group or join, the key is the group key and the join
> key, respectively. In the case of a sort it is more complicated.
>
>
> 2013/3/27 Mark <st...@gmail.com>
>
> > I understand in the traditional map/reduce paradigm that each key will
> get
> > sent to the same reducer sorted but in pig there is no such thing as a
> > "key".  I'm curious to know how pig knows to which reducer to send its
> > output to?
> >
> > So when creating a custom StoreFunc is there any guarentee on the
> ordering
> > of Tuples that come into putNext?
> >
> > And another even more basic question. Do StoreFuncs operate at the Map
> > phase or Reduce phase?
> >
> > Thanks
> >
> >
> >
>

Re: Sorting/Partitioning of Pig output

Posted by Jonathan Coveney <jc...@gmail.com>.
as far as when the storefunc works, it depends on whether the job is map
only or map/reduce. It'll work on the last phase. Generally this is the
reduce phase.

As far as how pig knows where to send it's output, there are keys in pig.
Basically, a reduce job is necessary any time you have a group, join, or
sort. In the case of a group or join, the key is the group key and the join
key, respectively. In the case of a sort it is more complicated.


2013/3/27 Mark <st...@gmail.com>

> I understand in the traditional map/reduce paradigm that each key will get
> sent to the same reducer sorted but in pig there is no such thing as a
> "key".  I'm curious to know how pig knows to which reducer to send its
> output to?
>
> So when creating a custom StoreFunc is there any guarentee on the ordering
> of Tuples that come into putNext?
>
> And another even more basic question. Do StoreFuncs operate at the Map
> phase or Reduce phase?
>
> Thanks
>
>
>