You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by hc busy <hc...@gmail.com> on 2010/03/08 23:43:33 UTC

more bagging fun

okay. Here's the bag that I have:

 {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
number2:int}}



and I want to do this

grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
TABLE.number2);

grunt> DUMP CALCULATE;

2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1039: Incompatible types in Multiplication Operator left hand side:bag
right hand side:bag



This seems useful that I may want to calculate an agg. of some arithmetic
operations on member of a bag. Any suggestions?

... Looking at the documentation it looks like I want to do something like

SUM(TABLE.(number1 / number2))

but that doesn't work either :-(

Re: more bagging fun

Posted by hc busy <hc...@gmail.com>.
An additional thought... we can define udf's like

ADD(bag{(int,int)}), DIVIDE(bag{(int,int)}), MULTIPLY(bag{(int,int)}),
SQRT(bag{(float)})..

basically vectorize most of the common arithmetic operations, but then the
language has to support it by converting

bag.a + bag.b

to

ADD(bag.(a,b))

I guess there are some difficulties, for instance:

SQRT(bag.a)+bag.b

How would this work? because sqrt(bag.a) returns a bag, how would we convert
it to the correct per tuple operation? It's almost like we want to convert
an expression

SUM(SQRT(bag.a),bag.b)

into a function F such that

SUM(SQRT(bag.a),bag.b) = F(bag.a,bag.b)

and then the F is computed by iterating through on each tuple of the bag.

FOREACH ... GENERATE ..., F(bag.(a,b));






On Wed, Mar 10, 2010 at 9:31 AM, hc busy <hc...@gmail.com> wrote:

>
> So, pig team, what is the right way to accomplish this?
>
>
> On Tue, Mar 9, 2010 at 10:50 PM, Mridul Muralidharan <
> mridulm@yahoo-inc.com> wrote:
>
>> On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
>>
>>> okay. Here's the bag that I have:
>>>
>>>  {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
>>> number2:int}}
>>>
>>>
>>>
>>> and I want to do this
>>>
>>> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1
>>> /
>>> TABLE.number2);
>>>
>>
>>
>> TABLE.number1 actually gives you the bag of number1 values found in TABLE
>> - but I am never really sure of the semantics in these situations since I am
>> slightly nervous that it is impl dependent ... my understanding is, what you
>> are attempting should not work, but I could be wrong.
>>
>> I do know that TABLE.(number1, number2) will consistently project and pair
>> up the fields : so to 'fix' this, you can write your own DIVIDE_SUM which
>> does something like this :
>>
>> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group,
>> DIVIDE_SUM(TABLE.(number1 , number2));
>>
>> And DIVIDE_SUM udf impl takes in a bag with tuples containing schema
>> (numerator, denominator) : and returns :
>>
>> result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );
>>
>>
>> Obviously, this is not as 'elegant' as your initial code and is definitely
>> more cumbersome ... so clarifying this behavior with someone from pig team
>> will definitely be better before you attempt this.
>>
>>
>> Regards,
>> Mridul
>>
>>
>>
>>> grunt>  DUMP CALCULATE;
>>>
>>> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 1039: Incompatible types in Multiplication Operator left hand
>>> side:bag
>>> right hand side:bag
>>>
>>>
>>>
>>> This seems useful that I may want to calculate an agg. of some arithmetic
>>> operations on member of a bag. Any suggestions?
>>>
>>> ... Looking at the documentation it looks like I want to do something
>>> like
>>>
>>> SUM(TABLE.(number1 / number2))
>>>
>>> but that doesn't work either :-(
>>>
>>
>>
>

Re: more bagging fun

Posted by hc busy <hc...@gmail.com>.
So, pig team, what is the right way to accomplish this?

On Tue, Mar 9, 2010 at 10:50 PM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:

> On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
>
>> okay. Here's the bag that I have:
>>
>>  {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
>> number2:int}}
>>
>>
>>
>> and I want to do this
>>
>> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
>> TABLE.number2);
>>
>
>
> TABLE.number1 actually gives you the bag of number1 values found in TABLE -
> but I am never really sure of the semantics in these situations since I am
> slightly nervous that it is impl dependent ... my understanding is, what you
> are attempting should not work, but I could be wrong.
>
> I do know that TABLE.(number1, number2) will consistently project and pair
> up the fields : so to 'fix' this, you can write your own DIVIDE_SUM which
> does something like this :
>
> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group,
> DIVIDE_SUM(TABLE.(number1 , number2));
>
> And DIVIDE_SUM udf impl takes in a bag with tuples containing schema
> (numerator, denominator) : and returns :
>
> result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );
>
>
> Obviously, this is not as 'elegant' as your initial code and is definitely
> more cumbersome ... so clarifying this behavior with someone from pig team
> will definitely be better before you attempt this.
>
>
> Regards,
> Mridul
>
>
>
>> grunt>  DUMP CALCULATE;
>>
>> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1039: Incompatible types in Multiplication Operator left hand
>> side:bag
>> right hand side:bag
>>
>>
>>
>> This seems useful that I may want to calculate an agg. of some arithmetic
>> operations on member of a bag. Any suggestions?
>>
>> ... Looking at the documentation it looks like I want to do something like
>>
>> SUM(TABLE.(number1 / number2))
>>
>> but that doesn't work either :-(
>>
>
>

Re: more bagging fun

Posted by hc busy <hc...@gmail.com>.
So, pig team, what is the right way to accomplish this?

On Tue, Mar 9, 2010 at 10:50 PM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:

> On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
>
>> okay. Here's the bag that I have:
>>
>>  {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
>> number2:int}}
>>
>>
>>
>> and I want to do this
>>
>> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
>> TABLE.number2);
>>
>
>
> TABLE.number1 actually gives you the bag of number1 values found in TABLE -
> but I am never really sure of the semantics in these situations since I am
> slightly nervous that it is impl dependent ... my understanding is, what you
> are attempting should not work, but I could be wrong.
>
> I do know that TABLE.(number1, number2) will consistently project and pair
> up the fields : so to 'fix' this, you can write your own DIVIDE_SUM which
> does something like this :
>
> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group,
> DIVIDE_SUM(TABLE.(number1 , number2));
>
> And DIVIDE_SUM udf impl takes in a bag with tuples containing schema
> (numerator, denominator) : and returns :
>
> result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );
>
>
> Obviously, this is not as 'elegant' as your initial code and is definitely
> more cumbersome ... so clarifying this behavior with someone from pig team
> will definitely be better before you attempt this.
>
>
> Regards,
> Mridul
>
>
>
>> grunt>  DUMP CALCULATE;
>>
>> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1039: Incompatible types in Multiplication Operator left hand
>> side:bag
>> right hand side:bag
>>
>>
>>
>> This seems useful that I may want to calculate an agg. of some arithmetic
>> operations on member of a bag. Any suggestions?
>>
>> ... Looking at the documentation it looks like I want to do something like
>>
>> SUM(TABLE.(number1 / number2))
>>
>> but that doesn't work either :-(
>>
>
>

Re: more bagging fun

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
> okay. Here's the bag that I have:
>
>   {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
> number2:int}}
>
>
>
> and I want to do this
>
> grunt>  CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
> TABLE.number2);


TABLE.number1 actually gives you the bag of number1 values found in 
TABLE - but I am never really sure of the semantics in these situations 
since I am slightly nervous that it is impl dependent ... my 
understanding is, what you are attempting should not work, but I could 
be wrong.

I do know that TABLE.(number1, number2) will consistently project and 
pair up the fields : so to 'fix' this, you can write your own DIVIDE_SUM 
which does something like this :

grunt>  CALCULATE= FOREACH TABLE_group GENERATE group, 
DIVIDE_SUM(TABLE.(number1 , number2));

And DIVIDE_SUM udf impl takes in a bag with tuples containing schema 
(numerator, denominator) : and returns :

result == sum ( foreach tuple ( tuple.numerator / tuple.denominator ) );


Obviously, this is not as 'elegant' as your initial code and is 
definitely more cumbersome ... so clarifying this behavior with someone 
from pig team will definitely be better before you attempt this.


Regards,
Mridul

>
> grunt>  DUMP CALCULATE;
>
> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1039: Incompatible types in Multiplication Operator left hand side:bag
> right hand side:bag
>
>
>
> This seems useful that I may want to calculate an agg. of some arithmetic
> operations on member of a bag. Any suggestions?
>
> ... Looking at the documentation it looks like I want to do something like
>
> SUM(TABLE.(number1 / number2))
>
> but that doesn't work either :-(


Fwd: more bagging fun

Posted by hc busy <hc...@gmail.com>.
Can I file a bug to fix this?


On Mon, Mar 8, 2010 at 2:43 PM, hc busy <hc...@gmail.com> wrote:

> okay. Here's the bag that I have:
>
>   {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
> number2:int}}
>
>
>
> and I want to do this
>
>  grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
> TABLE.number2);
>
> grunt> DUMP CALCULATE;
>
> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1039: Incompatible types in Multiplication Operator left hand side:bag
> right hand side:bag
>
>
>
> This seems useful that I may want to calculate an agg. of some arithmetic
> operations on member of a bag. Any suggestions?
>
> ... Looking at the documentation it looks like I want to do something like
>
> SUM(TABLE.(number1 / number2))
>
> but that doesn't work either :-(
>
>

Re: more bagging fun

Posted by hc busy <hc...@gmail.com>.
Can I file a bug to fix this?

On Mon, Mar 8, 2010 at 2:43 PM, hc busy <hc...@gmail.com> wrote:

> okay. Here's the bag that I have:
>
>   {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
> number2:int}}
>
>
>
> and I want to do this
>
>  grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
> TABLE.number2);
>
> grunt> DUMP CALCULATE;
>
> 2010-03-08 14:02:41,055 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1039: Incompatible types in Multiplication Operator left hand side:bag
> right hand side:bag
>
>
>
> This seems useful that I may want to calculate an agg. of some arithmetic
> operations on member of a bag. Any suggestions?
>
> ... Looking at the documentation it looks like I want to do something like
>
> SUM(TABLE.(number1 / number2))
>
> but that doesn't work either :-(
>
>