You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Brian Adams <br...@chacha.com> on 2010/07/06 23:47:11 UTC
Sum Null Column
So I asked a question earlier, but figured it wasn't very clear and thus
less likely to get answered, so here goes.
I have 2 sources with seperate ngrams and counts and after doing a full
out join i get:
Example with numbers:
dog,10,dog,20
cat,5,null/empty,null/empty
null/empty, null/empty,mouse,10
If i want to sum up $1 with $3 the first row would give me
dog,10,dog,20,30
However, when I cat the results out, the second row looks like
cat,5,emtpy,emtpy,empty
And Lastly, the 3rd appears like
empty,empty,mouse,10,emtpy
So if there is not a count in position $1 or $3, then I cannot add them
together.
How do I replace the emtpy with a 0 so I can perform the summation in
the last column?
Desired Output:
dog,10,dog,20,30
cat,5,emtpy,0,5
empty,0,mouse,10,10
Thanks guys.
Re: Sum Null Column
Posted by Brian Adams <br...@chacha.com>.
I thought so. I am trying this:
fullprep = FOREACH joined GENERATE $0 AS smsgram,(long)$1 AS smscount,$2
AS webgram,(long)$3 AS webcount;
summed = FOREACH fullprep GENERATE $0,$1,$2,$3,($1 + $3) as sumcount;
The thing is, if $1 or $3 are empty, then I need to insert a 0 manually,
right?
On Tue, 2010-07-06 at 15:21 -0700, Dmitriy Ryaboy wrote:
> You are just trying to add up $1 and $3? Just use a plus, you don't need
> SUM. SUM is for aggregating over groups.
>
> On Tue, Jul 6, 2010 at 3:11 PM, Brian Adams <br...@chacha.com> wrote:
>
> > The Current line I have is this:
> > fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
> > webgram,$3 AS webcount,SUM((int)$1,(int)$3) AS sumcount;
> >
> > I get the error:
> > ERROR 1045: Could not infer the matching function for
> > org.apache.pig.builtin.SUM as multiple or none of them fit. Please use
> > an explicit cast.
> >
> > I tried the conditional earlier and it didnt seem to work.
> >
> > On Tue, 2010-07-06 at 15:04 -0700, Dmitriy Ryaboy wrote:
> > > you can run a replacement loop if you want:
> > >
> > > zeroes = foreach foo generate ( x is null ? 0 : x) as x;
> > >
> > > or if you are dealing with empties instead of nulls
> > > zeroes = foreach foo generate (IsEmpty(x) ? 0 : x) as x;
> > >
> > > On Tue, Jul 6, 2010 at 2:57 PM, Wilkes, Chris <cw...@gmail.com> wrote:
> > >
> > > > SUM($1,$3) doesn't work? If the column is a non-double then it is
> > > > converted to a 0.
> > > >
> > > >
> > > >
> > > >
> > > > On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:
> > > >
> > > > So I asked a question earlier, but figured it wasn't very clear and
> > thus
> > > >> less likely to get answered, so here goes.
> > > >>
> > > >> I have 2 sources with seperate ngrams and counts and after doing a
> > full
> > > >> out join i get:
> > > >>
> > > >> Example with numbers:
> > > >> dog,10,dog,20
> > > >> cat,5,null/empty,null/empty
> > > >> null/empty, null/empty,mouse,10
> > > >>
> > > >> If i want to sum up $1 with $3 the first row would give me
> > > >> dog,10,dog,20,30
> > > >>
> > > >> However, when I cat the results out, the second row looks like
> > > >> cat,5,emtpy,emtpy,empty
> > > >>
> > > >> And Lastly, the 3rd appears like
> > > >> empty,empty,mouse,10,emtpy
> > > >>
> > > >> So if there is not a count in position $1 or $3, then I cannot add
> > them
> > > >> together.
> > > >>
> > > >> How do I replace the emtpy with a 0 so I can perform the summation in
> > > >> the last column?
> > > >>
> > > >> Desired Output:
> > > >> dog,10,dog,20,30
> > > >> cat,5,emtpy,0,5
> > > >> empty,0,mouse,10,10
> > > >>
> > > >> Thanks guys.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> >
Re: Sum Null Column
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
You are just trying to add up $1 and $3? Just use a plus, you don't need
SUM. SUM is for aggregating over groups.
On Tue, Jul 6, 2010 at 3:11 PM, Brian Adams <br...@chacha.com> wrote:
> The Current line I have is this:
> fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
> webgram,$3 AS webcount,SUM((int)$1,(int)$3) AS sumcount;
>
> I get the error:
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.SUM as multiple or none of them fit. Please use
> an explicit cast.
>
> I tried the conditional earlier and it didnt seem to work.
>
> On Tue, 2010-07-06 at 15:04 -0700, Dmitriy Ryaboy wrote:
> > you can run a replacement loop if you want:
> >
> > zeroes = foreach foo generate ( x is null ? 0 : x) as x;
> >
> > or if you are dealing with empties instead of nulls
> > zeroes = foreach foo generate (IsEmpty(x) ? 0 : x) as x;
> >
> > On Tue, Jul 6, 2010 at 2:57 PM, Wilkes, Chris <cw...@gmail.com> wrote:
> >
> > > SUM($1,$3) doesn't work? If the column is a non-double then it is
> > > converted to a 0.
> > >
> > >
> > >
> > >
> > > On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:
> > >
> > > So I asked a question earlier, but figured it wasn't very clear and
> thus
> > >> less likely to get answered, so here goes.
> > >>
> > >> I have 2 sources with seperate ngrams and counts and after doing a
> full
> > >> out join i get:
> > >>
> > >> Example with numbers:
> > >> dog,10,dog,20
> > >> cat,5,null/empty,null/empty
> > >> null/empty, null/empty,mouse,10
> > >>
> > >> If i want to sum up $1 with $3 the first row would give me
> > >> dog,10,dog,20,30
> > >>
> > >> However, when I cat the results out, the second row looks like
> > >> cat,5,emtpy,emtpy,empty
> > >>
> > >> And Lastly, the 3rd appears like
> > >> empty,empty,mouse,10,emtpy
> > >>
> > >> So if there is not a count in position $1 or $3, then I cannot add
> them
> > >> together.
> > >>
> > >> How do I replace the emtpy with a 0 so I can perform the summation in
> > >> the last column?
> > >>
> > >> Desired Output:
> > >> dog,10,dog,20,30
> > >> cat,5,emtpy,0,5
> > >> empty,0,mouse,10,10
> > >>
> > >> Thanks guys.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
>
RE: Sum Null Column
Posted by Matthew Smith <Ma...@g2-inc.com>.
This worked for me,
joined = JOIN Bytes1 BY Group1, Bytes2 BY Group2;
end = FOREACH joined GENERATE $0 AS something, $1/$3 AS ratio;
Looks similar to your situation.
-----Original Message-----
From: Brian Adams [mailto:brian.adams@chacha.com]
Sent: Tuesday, July 06, 2010 6:20 PM
To: pig-user@hadoop.apache.org
Subject: RE: Sum Null Column
Matt,
Haven't tried that route yet. I can give that one a go.
On Tue, 2010-07-06 at 18:23 -0400, Matthew Smith wrote:
> Shouldn't you have to do it in another table? Add one more step?
>
> fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
> webgram,$3 AS webcount;
>
> addedStep = FOREACH fullprep $0,$1,$2,$3, SUM(fullprep.$1 + fullprep.$3) as sumCount;
>
> Just my 2c
>
>
> -----Original Message-----
> From: Brian Adams [mailto:brian.adams@chacha.com]
> Sent: Tuesday, July 06, 2010 6:11 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: Sum Null Column
>
> The Current line I have is this:
> fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
> webgram,$3 AS webcount,SUM((int)$1,(int)$3) AS sumcount;
>
> I get the error:
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.SUM as multiple or none of them fit. Please use
> an explicit cast.
>
> I tried the conditional earlier and it didnt seem to work.
>
> On Tue, 2010-07-06 at 15:04 -0700, Dmitriy Ryaboy wrote:
> > you can run a replacement loop if you want:
> >
> > zeroes = foreach foo generate ( x is null ? 0 : x) as x;
> >
> > or if you are dealing with empties instead of nulls
> > zeroes = foreach foo generate (IsEmpty(x) ? 0 : x) as x;
> >
> > On Tue, Jul 6, 2010 at 2:57 PM, Wilkes, Chris <cw...@gmail.com> wrote:
> >
> > > SUM($1,$3) doesn't work? If the column is a non-double then it is
> > > converted to a 0.
> > >
> > >
> > >
> > >
> > > On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:
> > >
> > > So I asked a question earlier, but figured it wasn't very clear and thus
> > >> less likely to get answered, so here goes.
> > >>
> > >> I have 2 sources with seperate ngrams and counts and after doing a full
> > >> out join i get:
> > >>
> > >> Example with numbers:
> > >> dog,10,dog,20
> > >> cat,5,null/empty,null/empty
> > >> null/empty, null/empty,mouse,10
> > >>
> > >> If i want to sum up $1 with $3 the first row would give me
> > >> dog,10,dog,20,30
> > >>
> > >> However, when I cat the results out, the second row looks like
> > >> cat,5,emtpy,emtpy,empty
> > >>
> > >> And Lastly, the 3rd appears like
> > >> empty,empty,mouse,10,emtpy
> > >>
> > >> So if there is not a count in position $1 or $3, then I cannot add them
> > >> together.
> > >>
> > >> How do I replace the emtpy with a 0 so I can perform the summation in
> > >> the last column?
> > >>
> > >> Desired Output:
> > >> dog,10,dog,20,30
> > >> cat,5,emtpy,0,5
> > >> empty,0,mouse,10,10
> > >>
> > >> Thanks guys.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
RE: Sum Null Column
Posted by Brian Adams <br...@chacha.com>.
Matt,
Haven't tried that route yet. I can give that one a go.
On Tue, 2010-07-06 at 18:23 -0400, Matthew Smith wrote:
> Shouldn't you have to do it in another table? Add one more step?
>
> fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
> webgram,$3 AS webcount;
>
> addedStep = FOREACH fullprep $0,$1,$2,$3, SUM(fullprep.$1 + fullprep.$3) as sumCount;
>
> Just my 2c
>
>
> -----Original Message-----
> From: Brian Adams [mailto:brian.adams@chacha.com]
> Sent: Tuesday, July 06, 2010 6:11 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: Sum Null Column
>
> The Current line I have is this:
> fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
> webgram,$3 AS webcount,SUM((int)$1,(int)$3) AS sumcount;
>
> I get the error:
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.SUM as multiple or none of them fit. Please use
> an explicit cast.
>
> I tried the conditional earlier and it didnt seem to work.
>
> On Tue, 2010-07-06 at 15:04 -0700, Dmitriy Ryaboy wrote:
> > you can run a replacement loop if you want:
> >
> > zeroes = foreach foo generate ( x is null ? 0 : x) as x;
> >
> > or if you are dealing with empties instead of nulls
> > zeroes = foreach foo generate (IsEmpty(x) ? 0 : x) as x;
> >
> > On Tue, Jul 6, 2010 at 2:57 PM, Wilkes, Chris <cw...@gmail.com> wrote:
> >
> > > SUM($1,$3) doesn't work? If the column is a non-double then it is
> > > converted to a 0.
> > >
> > >
> > >
> > >
> > > On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:
> > >
> > > So I asked a question earlier, but figured it wasn't very clear and thus
> > >> less likely to get answered, so here goes.
> > >>
> > >> I have 2 sources with seperate ngrams and counts and after doing a full
> > >> out join i get:
> > >>
> > >> Example with numbers:
> > >> dog,10,dog,20
> > >> cat,5,null/empty,null/empty
> > >> null/empty, null/empty,mouse,10
> > >>
> > >> If i want to sum up $1 with $3 the first row would give me
> > >> dog,10,dog,20,30
> > >>
> > >> However, when I cat the results out, the second row looks like
> > >> cat,5,emtpy,emtpy,empty
> > >>
> > >> And Lastly, the 3rd appears like
> > >> empty,empty,mouse,10,emtpy
> > >>
> > >> So if there is not a count in position $1 or $3, then I cannot add them
> > >> together.
> > >>
> > >> How do I replace the emtpy with a 0 so I can perform the summation in
> > >> the last column?
> > >>
> > >> Desired Output:
> > >> dog,10,dog,20,30
> > >> cat,5,emtpy,0,5
> > >> empty,0,mouse,10,10
> > >>
> > >> Thanks guys.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
RE: Sum Null Column
Posted by Matthew Smith <Ma...@g2-inc.com>.
Shouldn't you have to do it in another table? Add one more step?
fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
webgram,$3 AS webcount;
addedStep = FOREACH fullprep $0,$1,$2,$3, SUM(fullprep.$1 + fullprep.$3) as sumCount;
Just my 2c
-----Original Message-----
From: Brian Adams [mailto:brian.adams@chacha.com]
Sent: Tuesday, July 06, 2010 6:11 PM
To: pig-user@hadoop.apache.org
Subject: Re: Sum Null Column
The Current line I have is this:
fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
webgram,$3 AS webcount,SUM((int)$1,(int)$3) AS sumcount;
I get the error:
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.SUM as multiple or none of them fit. Please use
an explicit cast.
I tried the conditional earlier and it didnt seem to work.
On Tue, 2010-07-06 at 15:04 -0700, Dmitriy Ryaboy wrote:
> you can run a replacement loop if you want:
>
> zeroes = foreach foo generate ( x is null ? 0 : x) as x;
>
> or if you are dealing with empties instead of nulls
> zeroes = foreach foo generate (IsEmpty(x) ? 0 : x) as x;
>
> On Tue, Jul 6, 2010 at 2:57 PM, Wilkes, Chris <cw...@gmail.com> wrote:
>
> > SUM($1,$3) doesn't work? If the column is a non-double then it is
> > converted to a 0.
> >
> >
> >
> >
> > On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:
> >
> > So I asked a question earlier, but figured it wasn't very clear and thus
> >> less likely to get answered, so here goes.
> >>
> >> I have 2 sources with seperate ngrams and counts and after doing a full
> >> out join i get:
> >>
> >> Example with numbers:
> >> dog,10,dog,20
> >> cat,5,null/empty,null/empty
> >> null/empty, null/empty,mouse,10
> >>
> >> If i want to sum up $1 with $3 the first row would give me
> >> dog,10,dog,20,30
> >>
> >> However, when I cat the results out, the second row looks like
> >> cat,5,emtpy,emtpy,empty
> >>
> >> And Lastly, the 3rd appears like
> >> empty,empty,mouse,10,emtpy
> >>
> >> So if there is not a count in position $1 or $3, then I cannot add them
> >> together.
> >>
> >> How do I replace the emtpy with a 0 so I can perform the summation in
> >> the last column?
> >>
> >> Desired Output:
> >> dog,10,dog,20,30
> >> cat,5,emtpy,0,5
> >> empty,0,mouse,10,10
> >>
> >> Thanks guys.
> >>
> >>
> >>
> >>
> >>
> >
Re: Sum Null Column
Posted by Brian Adams <br...@chacha.com>.
The Current line I have is this:
fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
webgram,$3 AS webcount,SUM((int)$1,(int)$3) AS sumcount;
I get the error:
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.SUM as multiple or none of them fit. Please use
an explicit cast.
I tried the conditional earlier and it didnt seem to work.
On Tue, 2010-07-06 at 15:04 -0700, Dmitriy Ryaboy wrote:
> you can run a replacement loop if you want:
>
> zeroes = foreach foo generate ( x is null ? 0 : x) as x;
>
> or if you are dealing with empties instead of nulls
> zeroes = foreach foo generate (IsEmpty(x) ? 0 : x) as x;
>
> On Tue, Jul 6, 2010 at 2:57 PM, Wilkes, Chris <cw...@gmail.com> wrote:
>
> > SUM($1,$3) doesn't work? If the column is a non-double then it is
> > converted to a 0.
> >
> >
> >
> >
> > On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:
> >
> > So I asked a question earlier, but figured it wasn't very clear and thus
> >> less likely to get answered, so here goes.
> >>
> >> I have 2 sources with seperate ngrams and counts and after doing a full
> >> out join i get:
> >>
> >> Example with numbers:
> >> dog,10,dog,20
> >> cat,5,null/empty,null/empty
> >> null/empty, null/empty,mouse,10
> >>
> >> If i want to sum up $1 with $3 the first row would give me
> >> dog,10,dog,20,30
> >>
> >> However, when I cat the results out, the second row looks like
> >> cat,5,emtpy,emtpy,empty
> >>
> >> And Lastly, the 3rd appears like
> >> empty,empty,mouse,10,emtpy
> >>
> >> So if there is not a count in position $1 or $3, then I cannot add them
> >> together.
> >>
> >> How do I replace the emtpy with a 0 so I can perform the summation in
> >> the last column?
> >>
> >> Desired Output:
> >> dog,10,dog,20,30
> >> cat,5,emtpy,0,5
> >> empty,0,mouse,10,10
> >>
> >> Thanks guys.
> >>
> >>
> >>
> >>
> >>
> >
Re: Sum Null Column
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
you can run a replacement loop if you want:
zeroes = foreach foo generate ( x is null ? 0 : x) as x;
or if you are dealing with empties instead of nulls
zeroes = foreach foo generate (IsEmpty(x) ? 0 : x) as x;
On Tue, Jul 6, 2010 at 2:57 PM, Wilkes, Chris <cw...@gmail.com> wrote:
> SUM($1,$3) doesn't work? If the column is a non-double then it is
> converted to a 0.
>
>
>
>
> On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:
>
> So I asked a question earlier, but figured it wasn't very clear and thus
>> less likely to get answered, so here goes.
>>
>> I have 2 sources with seperate ngrams and counts and after doing a full
>> out join i get:
>>
>> Example with numbers:
>> dog,10,dog,20
>> cat,5,null/empty,null/empty
>> null/empty, null/empty,mouse,10
>>
>> If i want to sum up $1 with $3 the first row would give me
>> dog,10,dog,20,30
>>
>> However, when I cat the results out, the second row looks like
>> cat,5,emtpy,emtpy,empty
>>
>> And Lastly, the 3rd appears like
>> empty,empty,mouse,10,emtpy
>>
>> So if there is not a count in position $1 or $3, then I cannot add them
>> together.
>>
>> How do I replace the emtpy with a 0 so I can perform the summation in
>> the last column?
>>
>> Desired Output:
>> dog,10,dog,20,30
>> cat,5,emtpy,0,5
>> empty,0,mouse,10,10
>>
>> Thanks guys.
>>
>>
>>
>>
>>
>
Re: Sum Null Column
Posted by "Wilkes, Chris" <cw...@gmail.com>.
SUM($1,$3) doesn't work? If the column is a non-double then it is
converted to a 0.
On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:
> So I asked a question earlier, but figured it wasn't very clear and
> thus
> less likely to get answered, so here goes.
>
> I have 2 sources with seperate ngrams and counts and after doing a
> full
> out join i get:
>
> Example with numbers:
> dog,10,dog,20
> cat,5,null/empty,null/empty
> null/empty, null/empty,mouse,10
>
> If i want to sum up $1 with $3 the first row would give me
> dog,10,dog,20,30
>
> However, when I cat the results out, the second row looks like
> cat,5,emtpy,emtpy,empty
>
> And Lastly, the 3rd appears like
> empty,empty,mouse,10,emtpy
>
> So if there is not a count in position $1 or $3, then I cannot add
> them
> together.
>
> How do I replace the emtpy with a 0 so I can perform the summation in
> the last column?
>
> Desired Output:
> dog,10,dog,20,30
> cat,5,emtpy,0,5
> empty,0,mouse,10,10
>
> Thanks guys.
>
>
>
>