You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by hc busy <hc...@gmail.com> on 2010/03/08 23:43:33 UTC
more bagging fun
okay. Here's the bag that I have:
{group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
number2:int}}
and I want to do this
grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
TABLE.number2);
grunt> DUMP CALCULATE;
2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1039: Incompatible types in Multiplication Operator left hand side:bag
right hand side:bag
This seems useful that I may want to calculate an agg. of some arithmetic
operations on member of a bag. Any suggestions?
... Looking at the documentation it looks like I want to do something like
SUM(TABLE.(number1 / number2))
but that doesn't work either :-(
Re: more bagging fun
Posted by hc busy <hc...@gmail.com>.
An additional thought... we can define udf's like
ADD(bag{(int,int)}), DIVIDE(bag{(int,int)}), MULTIPLY(bag{(int,int)}),
SQRT(bag{(float)})..
basically vectorize most of the common arithmetic operations, but then the
language has to support it by converting
bag.a + bag.b
to
ADD(bag.(a,b))
I guess there are some difficulties, for instance:
SQRT(bag.a)+bag.b
How would this work? because sqrt(bag.a) returns a bag, how would we convert
it to the correct per tuple operation? It's almost like we want to convert
an expression
SUM(SQRT(bag.a),bag.b)
into a function F such that
SUM(SQRT(bag.a),bag.b) = F(bag.a,bag.b)
and then the F is computed by iterating through on each tuple of the bag.
FOREACH ... GENERATE ..., F(bag.(a,b));
On Wed, Mar 10, 2010 at 9:31 AM, hc busy <hc...@gmail.com> wrote:
>
> So, pig team, what is the right way to accomplish this?
>
>
> On Tue, Mar 9, 2010 at 10:50 PM, Mridul Muralidharan <
> mridulm@yahoo-inc.com> wrote:
>
>> On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
>>
>>> okay. Here's the bag that I have:
>>>
>>> {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
>>> number2:int}}
>>>
>>>
>>>
>>> and I want to do this
>>>
>>> grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1
>>> /
>>> TABLE.number2);
>>>
>>
>>
>> TABLE.number1 actually gives you the bag of number1 values found in TABLE
>> - but I am never really sure of the semantics in these situations since I am
>> slightly nervous that it is impl dependent ... my understanding is, what you
>> are attempting should not work, but I could be wrong.
>>
>> I do know that TABLE.(number1, number2) will consistently project and pair
>> up the fields : so to 'fix' this, you can write your own DIVIDE_SUM which
>> does something like this :
>>
>> grunt> CALCULATE= FOREACH TABLE_group GENERATE group,
>> DIVIDE_SUM(TABLE.(number1 , number2));
>>
>> And DIVIDE_SUM udf impl takes in a bag with tuples containing schema
>> (numerator, denominator) : and returns :
>>
>> result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );
>>
>>
>> Obviously, this is not as 'elegant' as your initial code and is definitely
>> more cumbersome ... so clarifying this behavior with someone from pig team
>> will definitely be better before you attempt this.
>>
>>
>> Regards,
>> Mridul
>>
>>
>>
>>> grunt> DUMP CALCULATE;
>>>
>>> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 1039: Incompatible types in Multiplication Operator left hand
>>> side:bag
>>> right hand side:bag
>>>
>>>
>>>
>>> This seems useful that I may want to calculate an agg. of some arithmetic
>>> operations on member of a bag. Any suggestions?
>>>
>>> ... Looking at the documentation it looks like I want to do something
>>> like
>>>
>>> SUM(TABLE.(number1 / number2))
>>>
>>> but that doesn't work either :-(
>>>
>>
>>
>
Re: more bagging fun
Posted by hc busy <hc...@gmail.com>.
So, pig team, what is the right way to accomplish this?
On Tue, Mar 9, 2010 at 10:50 PM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:
> On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
>
>> okay. Here's the bag that I have:
>>
>> {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
>> number2:int}}
>>
>>
>>
>> and I want to do this
>>
>> grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
>> TABLE.number2);
>>
>
>
> TABLE.number1 actually gives you the bag of number1 values found in TABLE -
> but I am never really sure of the semantics in these situations since I am
> slightly nervous that it is impl dependent ... my understanding is, what you
> are attempting should not work, but I could be wrong.
>
> I do know that TABLE.(number1, number2) will consistently project and pair
> up the fields : so to 'fix' this, you can write your own DIVIDE_SUM which
> does something like this :
>
> grunt> CALCULATE= FOREACH TABLE_group GENERATE group,
> DIVIDE_SUM(TABLE.(number1 , number2));
>
> And DIVIDE_SUM udf impl takes in a bag with tuples containing schema
> (numerator, denominator) : and returns :
>
> result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );
>
>
> Obviously, this is not as 'elegant' as your initial code and is definitely
> more cumbersome ... so clarifying this behavior with someone from pig team
> will definitely be better before you attempt this.
>
>
> Regards,
> Mridul
>
>
>
>> grunt> DUMP CALCULATE;
>>
>> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1039: Incompatible types in Multiplication Operator left hand
>> side:bag
>> right hand side:bag
>>
>>
>>
>> This seems useful that I may want to calculate an agg. of some arithmetic
>> operations on member of a bag. Any suggestions?
>>
>> ... Looking at the documentation it looks like I want to do something like
>>
>> SUM(TABLE.(number1 / number2))
>>
>> but that doesn't work either :-(
>>
>
>
Re: more bagging fun
Posted by hc busy <hc...@gmail.com>.
So, pig team, what is the right way to accomplish this?
On Tue, Mar 9, 2010 at 10:50 PM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:
> On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
>
>> okay. Here's the bag that I have:
>>
>> {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
>> number2:int}}
>>
>>
>>
>> and I want to do this
>>
>> grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
>> TABLE.number2);
>>
>
>
> TABLE.number1 actually gives you the bag of number1 values found in TABLE -
> but I am never really sure of the semantics in these situations since I am
> slightly nervous that it is impl dependent ... my understanding is, what you
> are attempting should not work, but I could be wrong.
>
> I do know that TABLE.(number1, number2) will consistently project and pair
> up the fields : so to 'fix' this, you can write your own DIVIDE_SUM which
> does something like this :
>
> grunt> CALCULATE= FOREACH TABLE_group GENERATE group,
> DIVIDE_SUM(TABLE.(number1 , number2));
>
> And DIVIDE_SUM udf impl takes in a bag with tuples containing schema
> (numerator, denominator) : and returns :
>
> result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );
>
>
> Obviously, this is not as 'elegant' as your initial code and is definitely
> more cumbersome ... so clarifying this behavior with someone from pig team
> will definitely be better before you attempt this.
>
>
> Regards,
> Mridul
>
>
>
>> grunt> DUMP CALCULATE;
>>
>> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1039: Incompatible types in Multiplication Operator left hand
>> side:bag
>> right hand side:bag
>>
>>
>>
>> This seems useful that I may want to calculate an agg. of some arithmetic
>> operations on member of a bag. Any suggestions?
>>
>> ... Looking at the documentation it looks like I want to do something like
>>
>> SUM(TABLE.(number1 / number2))
>>
>> but that doesn't work either :-(
>>
>
>
Re: more bagging fun
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
> okay. Here's the bag that I have:
>
> {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
> number2:int}}
>
>
>
> and I want to do this
>
> grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
> TABLE.number2);
TABLE.number1 actually gives you the bag of number1 values found in
TABLE - but I am never really sure of the semantics in these situations
since I am slightly nervous that it is impl dependent ... my
understanding is, what you are attempting should not work, but I could
be wrong.
I do know that TABLE.(number1, number2) will consistently project and
pair up the fields : so to 'fix' this, you can write your own DIVIDE_SUM
which does something like this :
grunt> CALCULATE= FOREACH TABLE_group GENERATE group,
DIVIDE_SUM(TABLE.(number1 , number2));
And DIVIDE_SUM udf impl takes in a bag with tuples containing schema
(numerator, denominator) : and returns :
result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );
Obviously, this is not as 'elegant' as your initial code and is
definitely more cumbersome ... so clarifying this behavior with someone
from pig team will definitely be better before you attempt this.
Regards,
Mridul
>
> grunt> DUMP CALCULATE;
>
> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1039: Incompatible types in Multiplication Operator left hand side:bag
> right hand side:bag
>
>
>
> This seems useful that I may want to calculate an agg. of some arithmetic
> operations on member of a bag. Any suggestions?
>
> ... Looking at the documentation it looks like I want to do something like
>
> SUM(TABLE.(number1 / number2))
>
> but that doesn't work either :-(
Fwd: more bagging fun
Posted by hc busy <hc...@gmail.com>.
Can I file a bug to fix this?
On Mon, Mar 8, 2010 at 2:43 PM, hc busy <hc...@gmail.com> wrote:
> okay. Here's the bag that I have:
>
> {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
> number2:int}}
>
>
>
> and I want to do this
>
> grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
> TABLE.number2);
>
> grunt> DUMP CALCULATE;
>
> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1039: Incompatible types in Multiplication Operator left hand side:bag
> right hand side:bag
>
>
>
> This seems useful that I may want to calculate an agg. of some arithmetic
> operations on member of a bag. Any suggestions?
>
> ... Looking at the documentation it looks like I want to do something like
>
> SUM(TABLE.(number1 / number2))
>
> but that doesn't work either :-(
>
>
Re: more bagging fun
Posted by hc busy <hc...@gmail.com>.
Can I file a bug to fix this?
On Mon, Mar 8, 2010 at 2:43 PM, hc busy <hc...@gmail.com> wrote:
> okay. Here's the bag that I have:
>
> {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
> number2:int}}
>
>
>
> and I want to do this
>
> grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
> TABLE.number2);
>
> grunt> DUMP CALCULATE;
>
> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1039: Incompatible types in Multiplication Operator left hand side:bag
> right hand side:bag
>
>
>
> This seems useful that I may want to calculate an agg. of some arithmetic
> operations on member of a bag. Any suggestions?
>
> ... Looking at the documentation it looks like I want to do something like
>
> SUM(TABLE.(number1 / number2))
>
> but that doesn't work either :-(
>
>