You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mat Kelcey <ma...@gmail.com> on 2010/04/25 12:38:48 UTC
short circuiting the pig ? operator
hi all,
i have a case where i want to avoid a divide by zero case
relation2 = foreach relation1 {
val = (n==0 ? 0 : val/n);
generate val;
}
the trouble is the right hand side of the bincond; val/n is always
evaluated even for the n==0 case
i get org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
Divide by zero. Converting it to NULL.
and the value in the tuple is null
any advice on how i can easily do this? (i'd prefer not to have to write a udf)
is there something in the piggybank for this? i'm having real troubles
finding the piggybank javadocs online...
cheers,
mat
Re: short circuiting the pig ? operator
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
We do short-circuiting for filter conditions; no reason not to do it for the
?: operator, other than the code not being there.
The foreach thing is more complex.
-D
On Wed, Apr 28, 2010 at 3:48 AM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:
>
> I am not very sure what are all the runtime implications of some of pig
> idioms (and I have a feeling it changes with impl) ... including nested
> foreach.
>
> For example :
>
> B = foreach A {
> X0 = ...
> X = .. work on X0 ...;
> GENERATE X, udf1(X), udf2(X);
> }
>
>
> will cause X0/X to be evaluated multiple times : and not once with value
> getting reused.
> pig simply does something similar to macro substitution (at plan level, not
> script level).
>
>
> I am guessing that "condition ? op1 : op2" is something similar - it
> evaluates all branches and then checks condition ... not sure if this is
> related with above, but could be.
>
>
> Regards,
> Mridul
>
>
>
>
> On Monday 26 April 2010 05:28 AM, Mat Kelcey wrote:
>
>> good point, but to be honest this is a contrived example to illustrate
>> the problem,
>> my actual case is quite a bit more complex.
>>
>> my current approach is something similar though,
>> partitioning into two relations, one with the zeros, one without, and
>> handling both separately before unioning
>>
>> mat
>>
>> On 26 April 2010 03:03, Zaki Rahaman<za...@gmail.com> wrote:
>>
>>> Hi Mat,
>>>
>>> Why wouldn't you just filter your relation for non zero values before
>>> dividing?
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 25, 2010, at 6:38 AM, Mat Kelcey<ma...@gmail.com>
>>> wrote:
>>>
>>> hi all,
>>>>
>>>> i have a case where i want to avoid a divide by zero case
>>>>
>>>> relation2 = foreach relation1 {
>>>> val = (n==0 ? 0 : val/n);
>>>> generate val;
>>>> }
>>>>
>>>> the trouble is the right hand side of the bincond; val/n is always
>>>> evaluated even for the n==0 case
>>>> i get
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
>>>> Divide by zero. Converting it to NULL.
>>>> and the value in the tuple is null
>>>>
>>>> any advice on how i can easily do this? (i'd prefer not to have to write
>>>> a
>>>> udf)
>>>>
>>>> is there something in the piggybank for this? i'm having real troubles
>>>> finding the piggybank javadocs online...
>>>>
>>>> cheers,
>>>> mat
>>>>
>>>
>>>
>
Re: short circuiting the pig ? operator
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
I am not very sure what are all the runtime implications of some of pig
idioms (and I have a feeling it changes with impl) ... including nested
foreach.
For example :
B = foreach A {
X0 = ...
X = .. work on X0 ...;
GENERATE X, udf1(X), udf2(X);
}
will cause X0/X to be evaluated multiple times : and not once with value
getting reused.
pig simply does something similar to macro substitution (at plan level,
not script level).
I am guessing that "condition ? op1 : op2" is something similar - it
evaluates all branches and then checks condition ... not sure if this is
related with above, but could be.
Regards,
Mridul
On Monday 26 April 2010 05:28 AM, Mat Kelcey wrote:
> good point, but to be honest this is a contrived example to illustrate
> the problem,
> my actual case is quite a bit more complex.
>
> my current approach is something similar though,
> partitioning into two relations, one with the zeros, one without, and
> handling both separately before unioning
>
> mat
>
> On 26 April 2010 03:03, Zaki Rahaman<za...@gmail.com> wrote:
>> Hi Mat,
>>
>> Why wouldn't you just filter your relation for non zero values before
>> dividing?
>>
>> Sent from my iPhone
>>
>> On Apr 25, 2010, at 6:38 AM, Mat Kelcey<ma...@gmail.com> wrote:
>>
>>> hi all,
>>>
>>> i have a case where i want to avoid a divide by zero case
>>>
>>> relation2 = foreach relation1 {
>>> val = (n==0 ? 0 : val/n);
>>> generate val;
>>> }
>>>
>>> the trouble is the right hand side of the bincond; val/n is always
>>> evaluated even for the n==0 case
>>> i get
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
>>> Divide by zero. Converting it to NULL.
>>> and the value in the tuple is null
>>>
>>> any advice on how i can easily do this? (i'd prefer not to have to write a
>>> udf)
>>>
>>> is there something in the piggybank for this? i'm having real troubles
>>> finding the piggybank javadocs online...
>>>
>>> cheers,
>>> mat
>>
Re: short circuiting the pig ? operator
Posted by Mat Kelcey <ma...@gmail.com>.
good point, but to be honest this is a contrived example to illustrate
the problem,
my actual case is quite a bit more complex.
my current approach is something similar though,
partitioning into two relations, one with the zeros, one without, and
handling both separately before unioning
mat
On 26 April 2010 03:03, Zaki Rahaman <za...@gmail.com> wrote:
> Hi Mat,
>
> Why wouldn't you just filter your relation for non zero values before
> dividing?
>
> Sent from my iPhone
>
> On Apr 25, 2010, at 6:38 AM, Mat Kelcey <ma...@gmail.com> wrote:
>
>> hi all,
>>
>> i have a case where i want to avoid a divide by zero case
>>
>> relation2 = foreach relation1 {
>> val = (n==0 ? 0 : val/n);
>> generate val;
>> }
>>
>> the trouble is the right hand side of the bincond; val/n is always
>> evaluated even for the n==0 case
>> i get
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
>> Divide by zero. Converting it to NULL.
>> and the value in the tuple is null
>>
>> any advice on how i can easily do this? (i'd prefer not to have to write a
>> udf)
>>
>> is there something in the piggybank for this? i'm having real troubles
>> finding the piggybank javadocs online...
>>
>> cheers,
>> mat
>
Re: short circuiting the pig ? operator
Posted by Zaki Rahaman <za...@gmail.com>.
Hi Mat,
Why wouldn't you just filter your relation for non zero values before
dividing?
Sent from my iPhone
On Apr 25, 2010, at 6:38 AM, Mat Kelcey <ma...@gmail.com>
wrote:
> hi all,
>
> i have a case where i want to avoid a divide by zero case
>
> relation2 = foreach relation1 {
> val = (n==0 ? 0 : val/n);
> generate val;
> }
>
> the trouble is the right hand side of the bincond; val/n is always
> evaluated even for the n==0 case
> i get
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide:
> Divide by zero. Converting it to NULL.
> and the value in the tuple is null
>
> any advice on how i can easily do this? (i'd prefer not to have to
> write a udf)
>
> is there something in the piggybank for this? i'm having real troubles
> finding the piggybank javadocs online...
>
> cheers,
> mat