You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2010/11/29 23:58:51 UTC

How to sum with a conditional in pig?

I realize this may be a lowly question, but I've searched around and
couldn't find anything definitive. I am also quite new to Pig and am trying
to get my head around the pig-esque way of doing things.

I am trying to sum based on conditionality, and am not sure how to make this
work. My system uses pig .6, if that is relevant.

counted  = foreach grouped generate group, SUM(if limited.number2 is null? 0
: 1);

grouped is a group of type {group: chararray,limited: {number1:
chararray,number2: chararray}

number1 isn't really relevant here. number2

The error I get is:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
parsing. Invalid alias: SUM in {group: chararray,limited: {number1:
chararray,number2: chararray}}

But if I were to simply do SUM(limited.number2) it would work fine.

My goal is to have a set of outputs that are group, and then the
corresponding number of non-null characters in that group. I could of course
do this in a much more roundabout way, but I want to know why this or
something like it doesn't work...reading through the documentation, I see
things like this

D = FOREACH C GENERATE FLATTEN((IsEmpty(A) ? null : A)),
FLATTEN((IsEmpty(B) ? null : B))

which seem to imply that you can work on that level for functions, but maybe
not! Either way, I'd like to understand why it does or doesn't work, and the
better paradigm for thinking about this sort of thing.

Re: How to sum with a conditional in pig?

Posted by Jonathan Coveney <jc...@gmail.com>.
Dmitriy,

I appreciate the help. I tried it without the if statement, however, and I
still get a parser error: invalid alias: SUM

It's quite odd... anyone perhaps have some conditional sum type code in this
vein that they know should work?

2010/11/29 Dmitriy Ryaboy <dv...@gmail.com>

> It should work, but there is a syntax error that's causing the parser to
> get
> confused. You don't want the "if" in there -- just
>
> counted  = foreach grouped generate group, SUM( limited.number2 is null? 0
> :
> 1);
>
> On Mon, Nov 29, 2010 at 2:58 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > I realize this may be a lowly question, but I've searched around and
> > couldn't find anything definitive. I am also quite new to Pig and am
> trying
> > to get my head around the pig-esque way of doing things.
> >
> > I am trying to sum based on conditionality, and am not sure how to make
> > this
> > work. My system uses pig .6, if that is relevant.
> >
> > counted  = foreach grouped generate group, SUM(if limited.number2 is
> null?
> > 0
> > : 1);
> >
> > grouped is a group of type {group: chararray,limited: {number1:
> > chararray,number2: chararray}
> >
> > number1 isn't really relevant here. number2
> >
> > The error I get is:
> >
> > [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
> > parsing. Invalid alias: SUM in {group: chararray,limited: {number1:
> > chararray,number2: chararray}}
> >
> > But if I were to simply do SUM(limited.number2) it would work fine.
> >
> > My goal is to have a set of outputs that are group, and then the
> > corresponding number of non-null characters in that group. I could of
> > course
> > do this in a much more roundabout way, but I want to know why this or
> > something like it doesn't work...reading through the documentation, I see
> > things like this
> >
> > D = FOREACH C GENERATE FLATTEN((IsEmpty(A) ? null : A)),
> > FLATTEN((IsEmpty(B) ? null : B))
> >
> > which seem to imply that you can work on that level for functions, but
> > maybe
> > not! Either way, I'd like to understand why it does or doesn't work, and
> > the
> > better paradigm for thinking about this sort of thing.
> >
>

Re: How to sum with a conditional in pig?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It should work, but there is a syntax error that's causing the parser to get
confused. You don't want the "if" in there -- just

counted  = foreach grouped generate group, SUM( limited.number2 is null? 0 :
1);

On Mon, Nov 29, 2010 at 2:58 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> I realize this may be a lowly question, but I've searched around and
> couldn't find anything definitive. I am also quite new to Pig and am trying
> to get my head around the pig-esque way of doing things.
>
> I am trying to sum based on conditionality, and am not sure how to make
> this
> work. My system uses pig .6, if that is relevant.
>
> counted  = foreach grouped generate group, SUM(if limited.number2 is null?
> 0
> : 1);
>
> grouped is a group of type {group: chararray,limited: {number1:
> chararray,number2: chararray}
>
> number1 isn't really relevant here. number2
>
> The error I get is:
>
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
> parsing. Invalid alias: SUM in {group: chararray,limited: {number1:
> chararray,number2: chararray}}
>
> But if I were to simply do SUM(limited.number2) it would work fine.
>
> My goal is to have a set of outputs that are group, and then the
> corresponding number of non-null characters in that group. I could of
> course
> do this in a much more roundabout way, but I want to know why this or
> something like it doesn't work...reading through the documentation, I see
> things like this
>
> D = FOREACH C GENERATE FLATTEN((IsEmpty(A) ? null : A)),
> FLATTEN((IsEmpty(B) ? null : B))
>
> which seem to imply that you can work on that level for functions, but
> maybe
> not! Either way, I'd like to understand why it does or doesn't work, and
> the
> better paradigm for thinking about this sort of thing.
>