You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Jonathan Coveney <jc...@gmail.com> on 2010/11/30 17:05:46 UTC

Help: frustration with types and whatnot while trying to do a conditional sum

I appreciate any help you can give. I've searched around and haven't found
anything directly related... I've gone through documentation but can't find
a real reason why this doesn't work.

Here is the jist of my code (n1 is arbitrary, just to group by, n2 is either
null or a large integer):

table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff);
pared = foreach table generate n1, n2;
grouped = group pared by n1;
counted  = foreach grouped generate group, (double)SUM((IsEmpty(pared.n2) ?
0:1))/(double)COUNT(pared.n2) as ratio:double;
ordered = order counted by ratio desc;
limited = limit ordered 200;
dump limited;

This gets this error:

ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an
explicit cast.

If I take out the double parenthesis in the counted sum

ERROR 1000: Error during parsing. Invalid alias: SUM in {group:
chararray,pared: {n1: chararray,n2: chararray}}

I THINK the error is that sum wants the column of a bag as an input, not
actual integers...so I thought I'd try and make that happen by making the
input take the form I want.

So in order to try and get around this, I thought this might work (changing
only these lines)

pared = foreach beacon_fact generate n1, (IsEmpty(n2) ? 0 : 1) as ooz:int;
grouped = group pared by n1;
counted  = foreach grouped generate group,
(double)SUM(pared.n1)/(double)COUNT(pared.n2) as ratio:double;

But this gives this error:
ERROR 1000: Error during parsing. Invalid alias: n2 in {n1: chararray,ooz:
int}

I have no real clue why this fails... I tried breaking it up into two steps
and it doesn't matter.

I'd ideally like to do this without making a UDF, as I feel the base
functionality should support it. Not sure.

Either way, I'd appreciate any help or pointers, as well as any rationale as
to why it does or doesn't work within the pig framework. The whole bag
system is still somewhat counterintuitive.

Thank you for your time

Re: Help: frustration with types and whatnot while trying to do a conditional sum

Posted by Jonathan Coveney <jc...@gmail.com>.
I am sorry, I got confused. I will do that.

2010/11/30 Owen O'Malley <om...@apache.org>

> Pig has moved to its own mailing lists. Please follow up over there.
> -- Owen
>
> On Tue, Nov 30, 2010 at 8:05 AM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > I appreciate any help you can give. I've searched around and haven't
> found
> > anything directly related... I've gone through documentation but can't
> find
> > a real reason why this doesn't work.
> >
> > Here is the jist of my code (n1 is arbitrary, just to group by, n2 is
> > either
> > null or a large integer):
> >
> > table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant
> stuff);
> > pared = foreach table generate n1, n2;
> > grouped = group pared by n1;
> > counted  = foreach grouped generate group, (double)SUM((IsEmpty(pared.n2)
> ?
> > 0:1))/(double)COUNT(pared.n2) as ratio:double;
> > ordered = order counted by ratio desc;
> > limited = limit ordered 200;
> > dump limited;
> >
> > This gets this error:
> >
> > ERROR 1045: Could not infer the matching function for
> > org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an
> > explicit cast.
> >
> > If I take out the double parenthesis in the counted sum
> >
> > ERROR 1000: Error during parsing. Invalid alias: SUM in {group:
> > chararray,pared: {n1: chararray,n2: chararray}}
> >
> > I THINK the error is that sum wants the column of a bag as an input, not
> > actual integers...so I thought I'd try and make that happen by making the
> > input take the form I want.
> >
> > So in order to try and get around this, I thought this might work
> (changing
> > only these lines)
> >
> > pared = foreach beacon_fact generate n1, (IsEmpty(n2) ? 0 : 1) as
> ooz:int;
> > grouped = group pared by n1;
> > counted  = foreach grouped generate group,
> > (double)SUM(pared.n1)/(double)COUNT(pared.n2) as ratio:double;
> >
> > But this gives this error:
> > ERROR 1000: Error during parsing. Invalid alias: n2 in {n1:
> chararray,ooz:
> > int}
> >
> > I have no real clue why this fails... I tried breaking it up into two
> steps
> > and it doesn't matter.
> >
> > I'd ideally like to do this without making a UDF, as I feel the base
> > functionality should support it. Not sure.
> >
> > Either way, I'd appreciate any help or pointers, as well as any rationale
> > as
> > to why it does or doesn't work within the pig framework. The whole bag
> > system is still somewhat counterintuitive.
> >
> > Thank you for your time
> >
>

Re: Help: frustration with types and whatnot while trying to do a conditional sum

Posted by Owen O'Malley <om...@apache.org>.
Pig has moved to its own mailing lists. Please follow up over there.
-- Owen

On Tue, Nov 30, 2010 at 8:05 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> I appreciate any help you can give. I've searched around and haven't found
> anything directly related... I've gone through documentation but can't find
> a real reason why this doesn't work.
>
> Here is the jist of my code (n1 is arbitrary, just to group by, n2 is
> either
> null or a large integer):
>
> table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff);
> pared = foreach table generate n1, n2;
> grouped = group pared by n1;
> counted  = foreach grouped generate group, (double)SUM((IsEmpty(pared.n2) ?
> 0:1))/(double)COUNT(pared.n2) as ratio:double;
> ordered = order counted by ratio desc;
> limited = limit ordered 200;
> dump limited;
>
> This gets this error:
>
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an
> explicit cast.
>
> If I take out the double parenthesis in the counted sum
>
> ERROR 1000: Error during parsing. Invalid alias: SUM in {group:
> chararray,pared: {n1: chararray,n2: chararray}}
>
> I THINK the error is that sum wants the column of a bag as an input, not
> actual integers...so I thought I'd try and make that happen by making the
> input take the form I want.
>
> So in order to try and get around this, I thought this might work (changing
> only these lines)
>
> pared = foreach beacon_fact generate n1, (IsEmpty(n2) ? 0 : 1) as ooz:int;
> grouped = group pared by n1;
> counted  = foreach grouped generate group,
> (double)SUM(pared.n1)/(double)COUNT(pared.n2) as ratio:double;
>
> But this gives this error:
> ERROR 1000: Error during parsing. Invalid alias: n2 in {n1: chararray,ooz:
> int}
>
> I have no real clue why this fails... I tried breaking it up into two steps
> and it doesn't matter.
>
> I'd ideally like to do this without making a UDF, as I feel the base
> functionality should support it. Not sure.
>
> Either way, I'd appreciate any help or pointers, as well as any rationale
> as
> to why it does or doesn't work within the pig framework. The whole bag
> system is still somewhat counterintuitive.
>
> Thank you for your time
>

Re: Help: frustration with types and whatnot while trying to do a conditional sum

Posted by Owen O'Malley <om...@apache.org>.
Pig has moved to its own mailing lists. Please follow up over there.
-- Owen

On Tue, Nov 30, 2010 at 8:05 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> I appreciate any help you can give. I've searched around and haven't found
> anything directly related... I've gone through documentation but can't find
> a real reason why this doesn't work.
>
> Here is the jist of my code (n1 is arbitrary, just to group by, n2 is
> either
> null or a large integer):
>
> table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff);
> pared = foreach table generate n1, n2;
> grouped = group pared by n1;
> counted  = foreach grouped generate group, (double)SUM((IsEmpty(pared.n2) ?
> 0:1))/(double)COUNT(pared.n2) as ratio:double;
> ordered = order counted by ratio desc;
> limited = limit ordered 200;
> dump limited;
>
> This gets this error:
>
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an
> explicit cast.
>
> If I take out the double parenthesis in the counted sum
>
> ERROR 1000: Error during parsing. Invalid alias: SUM in {group:
> chararray,pared: {n1: chararray,n2: chararray}}
>
> I THINK the error is that sum wants the column of a bag as an input, not
> actual integers...so I thought I'd try and make that happen by making the
> input take the form I want.
>
> So in order to try and get around this, I thought this might work (changing
> only these lines)
>
> pared = foreach beacon_fact generate n1, (IsEmpty(n2) ? 0 : 1) as ooz:int;
> grouped = group pared by n1;
> counted  = foreach grouped generate group,
> (double)SUM(pared.n1)/(double)COUNT(pared.n2) as ratio:double;
>
> But this gives this error:
> ERROR 1000: Error during parsing. Invalid alias: n2 in {n1: chararray,ooz:
> int}
>
> I have no real clue why this fails... I tried breaking it up into two steps
> and it doesn't matter.
>
> I'd ideally like to do this without making a UDF, as I feel the base
> functionality should support it. Not sure.
>
> Either way, I'd appreciate any help or pointers, as well as any rationale
> as
> to why it does or doesn't work within the pig framework. The whole bag
> system is still somewhat counterintuitive.
>
> Thank you for your time
>