You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mat Kelcey <ma...@gmail.com> on 2010/04/25 12:38:48 UTC

short circuiting the pig ? operator

hi all,

i have a case where i want to avoid a divide by zero case

relation2 = foreach relation1 {
 val = (n==0 ? 0 : val/n);
 generate val;
}

the trouble is the right hand side of the bincond; val/n is always
evaluated even for the n==0 case
i get org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
Divide by zero. Converting it to NULL.
and the value in the tuple is null

any advice on how i can easily do this? (i'd prefer not to have to write a udf)

is there something in the piggybank for this? i'm having real troubles
finding the piggybank javadocs online...

cheers,
mat

Re: short circuiting the pig ? operator

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
We do short-circuiting for filter conditions; no reason not to do it for the
?: operator, other than the code not being there.

The foreach thing is more complex.

-D

On Wed, Apr 28, 2010 at 3:48 AM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:

>
> I am not very sure what are all the runtime implications of some of pig
> idioms (and I have a feeling it changes with impl) ... including nested
> foreach.
>
> For example :
>
> B = foreach A {
>  X0 = ...
>  X = .. work on X0 ...;
>  GENERATE X, udf1(X), udf2(X);
> }
>
>
> will cause X0/X to be evaluated multiple times : and not once with value
> getting reused.
> pig simply does something similar to macro substitution (at plan level, not
> script level).
>
>
> I am guessing that "condition ? op1 : op2" is something similar - it
> evaluates all branches and then checks condition ... not sure if this is
> related with above, but could be.
>
>
> Regards,
> Mridul
>
>
>
>
> On Monday 26 April 2010 05:28 AM, Mat Kelcey wrote:
>
>> good point, but to be honest this is a contrived example to illustrate
>> the problem,
>> my actual case is quite a bit more complex.
>>
>> my current approach is something similar though,
>> partitioning into two relations, one with the zeros, one without, and
>> handling both separately before unioning
>>
>> mat
>>
>> On 26 April 2010 03:03, Zaki Rahaman<za...@gmail.com>  wrote:
>>
>>> Hi Mat,
>>>
>>> Why wouldn't you just filter your relation for non zero values before
>>> dividing?
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 25, 2010, at 6:38 AM, Mat Kelcey<ma...@gmail.com>
>>>  wrote:
>>>
>>>  hi all,
>>>>
>>>> i have a case where i want to avoid a divide by zero case
>>>>
>>>> relation2 = foreach relation1 {
>>>> val = (n==0 ? 0 : val/n);
>>>> generate val;
>>>> }
>>>>
>>>> the trouble is the right hand side of the bincond; val/n is always
>>>> evaluated even for the n==0 case
>>>> i get
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
>>>> Divide by zero. Converting it to NULL.
>>>> and the value in the tuple is null
>>>>
>>>> any advice on how i can easily do this? (i'd prefer not to have to write
>>>> a
>>>> udf)
>>>>
>>>> is there something in the piggybank for this? i'm having real troubles
>>>> finding the piggybank javadocs online...
>>>>
>>>> cheers,
>>>> mat
>>>>
>>>
>>>
>

Re: short circuiting the pig ? operator

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
I am not very sure what are all the runtime implications of some of pig 
idioms (and I have a feeling it changes with impl) ... including nested 
foreach.

For example :

B = foreach A {
   X0 = ...
   X = .. work on X0 ...;
   GENERATE X, udf1(X), udf2(X);
}


will cause X0/X to be evaluated multiple times : and not once with value 
getting reused.
pig simply does something similar to macro substitution (at plan level, 
not script level).


I am guessing that "condition ? op1 : op2" is something similar - it 
evaluates all branches and then checks condition ... not sure if this is 
related with above, but could be.


Regards,
Mridul



On Monday 26 April 2010 05:28 AM, Mat Kelcey wrote:
> good point, but to be honest this is a contrived example to illustrate
> the problem,
> my actual case is quite a bit more complex.
>
> my current approach is something similar though,
> partitioning into two relations, one with the zeros, one without, and
> handling both separately before unioning
>
> mat
>
> On 26 April 2010 03:03, Zaki Rahaman<za...@gmail.com>  wrote:
>> Hi Mat,
>>
>> Why wouldn't you just filter your relation for non zero values before
>> dividing?
>>
>> Sent from my iPhone
>>
>> On Apr 25, 2010, at 6:38 AM, Mat Kelcey<ma...@gmail.com>  wrote:
>>
>>> hi all,
>>>
>>> i have a case where i want to avoid a divide by zero case
>>>
>>> relation2 = foreach relation1 {
>>> val = (n==0 ? 0 : val/n);
>>> generate val;
>>> }
>>>
>>> the trouble is the right hand side of the bincond; val/n is always
>>> evaluated even for the n==0 case
>>> i get
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
>>> Divide by zero. Converting it to NULL.
>>> and the value in the tuple is null
>>>
>>> any advice on how i can easily do this? (i'd prefer not to have to write a
>>> udf)
>>>
>>> is there something in the piggybank for this? i'm having real troubles
>>> finding the piggybank javadocs online...
>>>
>>> cheers,
>>> mat
>>


Re: short circuiting the pig ? operator

Posted by Mat Kelcey <ma...@gmail.com>.
good point, but to be honest this is a contrived example to illustrate
the problem,
my actual case is quite a bit more complex.

my current approach is something similar though,
partitioning into two relations, one with the zeros, one without, and
handling both separately before unioning

mat

On 26 April 2010 03:03, Zaki Rahaman <za...@gmail.com> wrote:
> Hi Mat,
>
> Why wouldn't you just filter your relation for non zero values before
> dividing?
>
> Sent from my iPhone
>
> On Apr 25, 2010, at 6:38 AM, Mat Kelcey <ma...@gmail.com> wrote:
>
>> hi all,
>>
>> i have a case where i want to avoid a divide by zero case
>>
>> relation2 = foreach relation1 {
>> val = (n==0 ? 0 : val/n);
>> generate val;
>> }
>>
>> the trouble is the right hand side of the bincond; val/n is always
>> evaluated even for the n==0 case
>> i get
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
>> Divide by zero. Converting it to NULL.
>> and the value in the tuple is null
>>
>> any advice on how i can easily do this? (i'd prefer not to have to write a
>> udf)
>>
>> is there something in the piggybank for this? i'm having real troubles
>> finding the piggybank javadocs online...
>>
>> cheers,
>> mat
>

Re: short circuiting the pig ? operator

Posted by Zaki Rahaman <za...@gmail.com>.
Hi Mat,

Why wouldn't you just filter your relation for non zero values before  
dividing?

Sent from my iPhone

On Apr 25, 2010, at 6:38 AM, Mat Kelcey <ma...@gmail.com>  
wrote:

> hi all,
>
> i have a case where i want to avoid a divide by zero case
>
> relation2 = foreach relation1 {
> val = (n==0 ? 0 : val/n);
> generate val;
> }
>
> the trouble is the right hand side of the bincond; val/n is always
> evaluated even for the n==0 case
> i get  
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
> Divide by zero. Converting it to NULL.
> and the value in the tuple is null
>
> any advice on how i can easily do this? (i'd prefer not to have to  
> write a udf)
>
> is there something in the piggybank for this? i'm having real troubles
> finding the piggybank javadocs online...
>
> cheers,
> mat