You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Tamir Kamara <ta...@gmail.com> on 2009/03/01 15:35:01 UTC
SUM of an expression
Hi,
I'm trying to generate a sum of an expression like the following:
b = GROUP a by domain;
r = FOREACH b generate group, SUM(a.x+a.y);
This results in an error that DefaultDataBag cannot be cast to Tuple, but
both x and y are tuples (int).
This is each to get around by generating the inner expression of the sum in
a separate line, but I wonder if this isn't this something pig should be
able to do on its own?
Thanks,
Tamir
Re: SUM of an expression
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Tamir Kamara wrote:
> Hi,
>
> I'm trying to generate a sum of an expression like the following:
>
> b = GROUP a by domain;
> r = FOREACH b generate group, SUM(a.x+a.y);
What you need is something like this :
b = GROUP a by domain;
r = FOREACH b {
X = FOREACH a GENERATE x+y;
generate group, SUM(X);
}
I dont think Pig supports this right now.
So you will need to simulate this through a UDF.
b = GROUP a by domain;
r = FOREACH b generate group, CUSTOM_SUM(a);
Within your udf, for each tuple in the input bag (a), pick 'x' and 'y',
add it - and sum them all up.
Note - might be a good idea to make it an algebraic function (so that
combiners get invoked for your script above).
Regards,
Mridul
>
> This results in an error that DefaultDataBag cannot be cast to Tuple, but
> both x and y are tuples (int).
> This is each to get around by generating the inner expression of the sum in
> a separate line, but I wonder if this isn't this something pig should be
> able to do on its own?
>
> Thanks,
> Tamir
>
Re: SUM of an expression
Posted by Alan Gates <ga...@yahoo-inc.com>.
The issue here is the semantics of a.x and a.y. Once you say "group
a", then the a in "FOREACH b" is a bag. a.x means take the bag a, and
for each tuple project just the field x, and then put the resulting
tuples in a bag. So a.x is a bag of tuples with just the field x.
Pig doesn't know how to add two bags. So, if you change this to:
a1 = foreach a generate domain, x + y as xy;
b = group a1 by domain;
r = foreach b generate group, SUM(a.xy);
then the right things should happen.
Alan.
On Mar 1, 2009, at 6:35 AM, Tamir Kamara wrote:
> Hi,
>
> I'm trying to generate a sum of an expression like the following:
>
> b = GROUP a by domain;
> r = FOREACH b generate group, SUM(a.x+a.y);
>
> This results in an error that DefaultDataBag cannot be cast to
> Tuple, but
> both x and y are tuples (int).
> This is each to get around by generating the inner expression of the
> sum in
> a separate line, but I wonder if this isn't this something pig
> should be
> able to do on its own?
>
> Thanks,
> Tamir