You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kunal Nawale <nk...@gmail.com> on 2011/01/28 21:58:25 UTC
grouping data based on variable number of keys
Hi,
I have a relation R as (a, b, c, d, e)
I need to group data, but the grouping criterion is variable, depending on
what input params my pig script receives.
My input params are group_on_a, group_on_b, group_on_c, group_on_d which
contain a value either 'T' or 'F'
so the group statement could be:
A = GROUP R BY (a,b) if group_on_a and group_on_b are T and everything else
is F
A = GROUP R BY (a,c) if group_on_a and group_on_c are T and everything else
is F
A = GROUP R BY (a,b,c) if group_on_a and group_on_b, group_on_b are T and
everything else is F
A = GROUP R BY (a,b,c,d) and so on.
Is there a way I could do this in pig ?
Regards,
-kunal
Re: grouping data based on variable number of keys
Posted by Jonathan Coveney <jc...@gmail.com>.
The only thing I could think of would be to feed all of your potential keys
to a UDF which then processes them, creates a tuple which is the new, actual
key, and then you group and whatnot on that.
2011/1/28 Kunal Nawale <nk...@gmail.com>
> Hi,
> I have a relation R as (a, b, c, d, e)
>
> I need to group data, but the grouping criterion is variable, depending on
> what input params my pig script receives.
> My input params are group_on_a, group_on_b, group_on_c, group_on_d which
> contain a value either 'T' or 'F'
>
>
> so the group statement could be:
> A = GROUP R BY (a,b) if group_on_a and group_on_b are T and everything
> else
> is F
> A = GROUP R BY (a,c) if group_on_a and group_on_c are T and everything else
> is F
> A = GROUP R BY (a,b,c) if group_on_a and group_on_b, group_on_b are T and
> everything else is F
> A = GROUP R BY (a,b,c,d) and so on.
>
> Is there a way I could do this in pig ?
> Regards,
> -kunal
>