You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kunal Nawale <nk...@gmail.com> on 2011/01/28 21:58:25 UTC

grouping data based on variable number of keys

Hi,
 I have a relation R as (a, b, c, d, e)

I need to group data, but the grouping criterion is variable, depending on
what input params my pig script receives.
My input params are group_on_a, group_on_b, group_on_c, group_on_d which
contain a value either 'T' or 'F'


so the group statement could be:
A = GROUP R BY (a,b)  if group_on_a and group_on_b are T and everything else
is F
A = GROUP R BY (a,c) if group_on_a and group_on_c are T and everything else
is F
A = GROUP R BY (a,b,c) if group_on_a and group_on_b, group_on_b are T and
everything else is F
A = GROUP R BY (a,b,c,d)  and so on.

Is there a way I could do this in pig ?
Regards,
-kunal

Re: grouping data based on variable number of keys

Posted by Jonathan Coveney <jc...@gmail.com>.
The only thing I could think of would be to feed all of your potential keys
to a UDF which then processes them, creates a tuple which is the new, actual
key, and then you group and whatnot on that.

2011/1/28 Kunal Nawale <nk...@gmail.com>

> Hi,
>  I have a relation R as (a, b, c, d, e)
>
> I need to group data, but the grouping criterion is variable, depending on
> what input params my pig script receives.
> My input params are group_on_a, group_on_b, group_on_c, group_on_d which
> contain a value either 'T' or 'F'
>
>
> so the group statement could be:
> A = GROUP R BY (a,b)  if group_on_a and group_on_b are T and everything
> else
> is F
> A = GROUP R BY (a,c) if group_on_a and group_on_c are T and everything else
> is F
> A = GROUP R BY (a,b,c) if group_on_a and group_on_b, group_on_b are T and
> everything else is F
> A = GROUP R BY (a,b,c,d)  and so on.
>
> Is there a way I could do this in pig ?
> Regards,
> -kunal
>