You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mark <st...@gmail.com> on 2013/03/22 20:39:10 UTC

General Pig store questions

In map/reduce all values for 1 key are guaranteed to go to the same reducer. Is there something analogous to this in Pig? If so, what determines the key when I output a bunch of tuples? 

Re: General Pig store questions

Posted by Prashant Kommireddi <pr...@gmail.com>.
Hi Mark,

It depends on the operations. For eg, one might want to aggregate
based on a certain field - in M/R it would be implemented by writing
out a key value pair from the mapper, and implement the aggregation
function in reducer, say Count or Sum based on the key.

To answer your question, you would typically use "group by" a certain
field in the tuple and that would the key on which the reducers
operate. For eg,

A = load 'input' as userid, accnt;
B = group A by user;
C = foreach B generate group, COUNT(A);

In this example the user field is the key. It's equivalent to a
context.write(user, 1) in the map function of plain MR (generally
speaking)

Sent from my iPhone

On Mar 22, 2013, at 12:39 PM, Mark <st...@gmail.com> wrote:

> In map/reduce all values for 1 key are guaranteed to go to the same reducer. Is there something analogous to this in Pig? If so, what determines the key when I output a bunch of tuples?