You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Preeti Gupta <pr...@gmail.com> on 2013/03/05 00:50:04 UTC

avoiding Group by or filter

Hello,

Can I compute SUM or AVG without using GROUPBY OR FILTER?

Re: avoiding Group by or filter

Posted by Jonathan Coveney <jc...@gmail.com>.
There have been a number of explanations on the topic before, so I would
prefer to point at one of them (or ensure we document it better), but
basically all of the aggregation functions we use (sum, avg, etc) all
function on bags of stuff. This is actually true in SQL as well (it just
hides the "group all", but it is implied). In this case, you are grouping
all of the rows together in order to run the function on them, since you
cannot run a function on a relation, only on a bag. Does that make any
sense? I know this is sort of an annoying nuance to understand in Pig...


2013/3/5 Eli Finkelshteyn <el...@thebackplane.com>

> Yes. You can use any eval function such as SUM or AVG as long as your data
> is in the format (item1, … , item, {(tup1), …(tupn)}). See
> http://pig.apache.org/docs/r0.10.0/func.html#eval-functions for more info.
>
> On Mar 4, 2013, at 3:50 PM, Preeti Gupta wrote:
>
> > Hello,
> >
> > Can I compute SUM or AVG without using GROUPBY OR FILTER?
>
>

Re: avoiding Group by or filter

Posted by Eli Finkelshteyn <el...@thebackplane.com>.
Yes. You can use any eval function such as SUM or AVG as long as your data is in the format (item1, … , item, {(tup1), …(tupn)}). See http://pig.apache.org/docs/r0.10.0/func.html#eval-functions for more info.

On Mar 4, 2013, at 3:50 PM, Preeti Gupta wrote:

> Hello,
> 
> Can I compute SUM or AVG without using GROUPBY OR FILTER?


Re: avoiding Group by or filter

Posted by Preeti Gupta <pr...@gmail.com>.
because there is nothing to group
On Mar 5, 2013, at 3:14 AM, Jonathan Coveney <jc...@gmail.com> wrote:

> Why don't you want to group?
> 
> 
> 2013/3/5 Preeti Gupta <pr...@gmail.com>
> 
>> I want to compute the Average for 1 column dataset
>> 1
>> 2
>> 3
>> 4
>> 5
>> 
>> and I am not able to do without grouping.
>> 
>> However I got an average with
>> 
>> avg = foreach (group dividends all) generate AVG(dividends);
>> 
>> But
>> 
>> avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
>> 
>> says use explicit cast.
>> 
>> My script is very small
>> 
>> dividends = load 'myfile.txt' as (A:double);
>> dump dividends
>> --grouped   = filter dividends by A>-10000000.0;
>> avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
>> 
>> 
>> 
>> <file try.pig, line 5, column 65> Multiple matching functions for
>> org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}},
>> {{(double)}}). Please use an explicit cast.
>> 
>> 
>> On Mar 4, 2013, at 8:30 PM, Prashant Kommireddi <pr...@gmail.com>
>> wrote:
>> 
>>> Hi Preeti,
>>> 
>>> Using FILTER or not depends on your requirements and has nothing to do
>> with
>>> SUM or AVG.
>>> 
>>> SUM, AVG accept bags as input, so as long as you are able to provide that
>>> it should be fine. (Though its very common that users use GROUP BY to
>>> rollup on a key before using these UDFs).
>>> 
>>> For example:
>>> 
>>> grunt> cat data
>>> 1    5
>>> 5    8
>>> 
>>> grunt> A = load 'data';
>>> grunt> B = foreach A generate TOBAG($0, $1) as bagg;
>>> grunt> dump B;
>>> ({(1),(5)})
>>> ({(5),(8)})
>>> 
>>> grunt> C = foreach B generate AVG(bagg);
>>> grunt> dump C;
>>> (3.0)
>>> (6.5)
>>> 
>>> -Prashant
>>> 
>>> 
>>> On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <preetigupta25@gmail.com
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Can I compute SUM or AVG without using GROUPBY OR FILTER?
>>>> 
>> 
>> 


Re: avoiding Group by or filter

Posted by Jonathan Coveney <jc...@gmail.com>.
Why don't you want to group?


2013/3/5 Preeti Gupta <pr...@gmail.com>

> I want to compute the Average for 1 column dataset
> 1
> 2
> 3
> 4
> 5
>
> and I am not able to do without grouping.
>
> However I got an average with
>
> avg = foreach (group dividends all) generate AVG(dividends);
>
> But
>
> avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
>
>  says use explicit cast.
>
> My script is very small
>
> dividends = load 'myfile.txt' as (A:double);
> dump dividends
> --grouped   = filter dividends by A>-10000000.0;
> avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
>
>
>
> <file try.pig, line 5, column 65> Multiple matching functions for
> org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}},
> {{(double)}}). Please use an explicit cast.
>
>
> On Mar 4, 2013, at 8:30 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:
>
> > Hi Preeti,
> >
> > Using FILTER or not depends on your requirements and has nothing to do
> with
> > SUM or AVG.
> >
> > SUM, AVG accept bags as input, so as long as you are able to provide that
> > it should be fine. (Though its very common that users use GROUP BY to
> > rollup on a key before using these UDFs).
> >
> > For example:
> >
> > grunt> cat data
> > 1    5
> > 5    8
> >
> > grunt> A = load 'data';
> > grunt> B = foreach A generate TOBAG($0, $1) as bagg;
> > grunt> dump B;
> > ({(1),(5)})
> > ({(5),(8)})
> >
> > grunt> C = foreach B generate AVG(bagg);
> > grunt> dump C;
> > (3.0)
> > (6.5)
> >
> > -Prashant
> >
> >
> > On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <preetigupta25@gmail.com
> >wrote:
> >
> >> Hello,
> >>
> >> Can I compute SUM or AVG without using GROUPBY OR FILTER?
> >>
>
>

Re: avoiding Group by or filter

Posted by Preeti Gupta <pr...@gmail.com>.
I want to compute the Average for 1 column dataset
1
2
3
4
5

and I am not able to do without grouping.

However I got an average with 

avg = foreach (group dividends all) generate AVG(dividends);

But 

avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);

 says use explicit cast.

My script is very small

dividends = load 'myfile.txt' as (A:double);
dump dividends
--grouped   = filter dividends by A>-10000000.0;
avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);



<file try.pig, line 5, column 65> Multiple matching functions for org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}}, {{(double)}}). Please use an explicit cast.


On Mar 4, 2013, at 8:30 PM, Prashant Kommireddi <pr...@gmail.com> wrote:

> Hi Preeti,
> 
> Using FILTER or not depends on your requirements and has nothing to do with
> SUM or AVG.
> 
> SUM, AVG accept bags as input, so as long as you are able to provide that
> it should be fine. (Though its very common that users use GROUP BY to
> rollup on a key before using these UDFs).
> 
> For example:
> 
> grunt> cat data
> 1    5
> 5    8
> 
> grunt> A = load 'data';
> grunt> B = foreach A generate TOBAG($0, $1) as bagg;
> grunt> dump B;
> ({(1),(5)})
> ({(5),(8)})
> 
> grunt> C = foreach B generate AVG(bagg);
> grunt> dump C;
> (3.0)
> (6.5)
> 
> -Prashant
> 
> 
> On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <pr...@gmail.com>wrote:
> 
>> Hello,
>> 
>> Can I compute SUM or AVG without using GROUPBY OR FILTER?
>> 


Re: avoiding Group by or filter

Posted by Prashant Kommireddi <pr...@gmail.com>.
Hi Preeti,

Using FILTER or not depends on your requirements and has nothing to do with
SUM or AVG.

SUM, AVG accept bags as input, so as long as you are able to provide that
it should be fine. (Though its very common that users use GROUP BY to
rollup on a key before using these UDFs).

For example:

grunt> cat data
1    5
5    8

grunt> A = load 'data';
grunt> B = foreach A generate TOBAG($0, $1) as bagg;
grunt> dump B;
({(1),(5)})
({(5),(8)})

grunt> C = foreach B generate AVG(bagg);
grunt> dump C;
(3.0)
(6.5)

-Prashant


On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <pr...@gmail.com>wrote:

> Hello,
>
> Can I compute SUM or AVG without using GROUPBY OR FILTER?
>