You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by shan s <my...@gmail.com> on 2012/05/15 15:50:14 UTC

How to combine muliple group by

How can I combine multiple group by that are performed on essentially same
relation?
In the case below, can I do this in single foreach?

e1 =  load 'emp' using PigStorage() as (empid, school, district, score);

e2 = group e1 by empid;
e3 = foreach e2 generate group, AVG(e1.score) as s;
e4 = order e3 by s desc;
e5 = limit e4 3;
dump e5;

e2 = group e1 by school;
e3 = foreach e2 generate group, AVG(e1.score) as s;
e4 = order e3 by s desc;
e5 = limit e4 3;
dump e5;
Thank You,
Prashant.

Re: How to combine muliple group by

Posted by shan s <my...@gmail.com>.
Thanks Alan.
Yes, I see it now..

One question on optimizeion/projection I have is that lets say I have
rleation with 20 fields.
On Wed, May 16, 2012 at 6:09 AM, Alan Gates <ga...@hortonworks.com> wrote:

> Pig will auto-combine these for you.  In the script example you give Pig
> should already be combining both group bys into a single MR job.  You can
> check this by running explain on it.
>
> Alan.
>
> On May 15, 2012, at 3:11 PM, shan s wrote:
>
> > Thanks Bill.
> >
> > My objective is to improve performance. So I do want to combine the
> logic.
> > If we were to do this in java, we could do this in single foreach.
> >
> > Will the macro help in this regard? Or will it  just act as code
> generator?
> >
> > On Tue, May 15, 2012 at 8:30 PM, Bill Graham <bi...@gmail.com>
> wrote:
> >
> >> You can combine multiple relations using the UNION operator. If you're
> >> trying to combine logic, you can use a macro to do e2-e5 below that
> takes
> >> (e1, empid) or (e1, group). See the example here:
> >>
> >> http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/
> >>
> >> On Tue, May 15, 2012 at 6:50 AM, shan s <my...@gmail.com> wrote:
> >>
> >>> How can I combine multiple group by that are performed on essentially
> >> same
> >>> relation?
> >>> In the case below, can I do this in single foreach?
> >>>
> >>> e1 =  load 'emp' using PigStorage() as (empid, school, district,
> score);
> >>>
> >>> e2 = group e1 by empid;
> >>> e3 = foreach e2 generate group, AVG(e1.score) as s;
> >>> e4 = order e3 by s desc;
> >>> e5 = limit e4 3;
> >>> dump e5;
> >>>
> >>> e2 = group e1 by school;
> >>> e3 = foreach e2 generate group, AVG(e1.score) as s;
> >>> e4 = order e3 by s desc;
> >>> e5 = limit e4 3;
> >>> dump e5;
> >>> Thank You,
> >>> Prashant.
> >>>
> >>
> >>
> >>
> >> --
> >> *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> >> billgraham@gmail.com going forward.*
> >>
>
>

Re: How to combine muliple group by

Posted by Alan Gates <ga...@hortonworks.com>.
Pig will auto-combine these for you.  In the script example you give Pig should already be combining both group bys into a single MR job.  You can check this by running explain on it.

Alan.

On May 15, 2012, at 3:11 PM, shan s wrote:

> Thanks Bill.
> 
> My objective is to improve performance. So I do want to combine the logic.
> If we were to do this in java, we could do this in single foreach.
> 
> Will the macro help in this regard? Or will it  just act as code generator?
> 
> On Tue, May 15, 2012 at 8:30 PM, Bill Graham <bi...@gmail.com> wrote:
> 
>> You can combine multiple relations using the UNION operator. If you're
>> trying to combine logic, you can use a macro to do e2-e5 below that takes
>> (e1, empid) or (e1, group). See the example here:
>> 
>> http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/
>> 
>> On Tue, May 15, 2012 at 6:50 AM, shan s <my...@gmail.com> wrote:
>> 
>>> How can I combine multiple group by that are performed on essentially
>> same
>>> relation?
>>> In the case below, can I do this in single foreach?
>>> 
>>> e1 =  load 'emp' using PigStorage() as (empid, school, district, score);
>>> 
>>> e2 = group e1 by empid;
>>> e3 = foreach e2 generate group, AVG(e1.score) as s;
>>> e4 = order e3 by s desc;
>>> e5 = limit e4 3;
>>> dump e5;
>>> 
>>> e2 = group e1 by school;
>>> e3 = foreach e2 generate group, AVG(e1.score) as s;
>>> e4 = order e3 by s desc;
>>> e5 = limit e4 3;
>>> dump e5;
>>> Thank You,
>>> Prashant.
>>> 
>> 
>> 
>> 
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> billgraham@gmail.com going forward.*
>> 


Re: How to combine muliple group by

Posted by shan s <my...@gmail.com>.
Thanks Bill.

My objective is to improve performance. So I do want to combine the logic.
If we were to do this in java, we could do this in single foreach.

Will the macro help in this regard? Or will it  just act as code generator?

On Tue, May 15, 2012 at 8:30 PM, Bill Graham <bi...@gmail.com> wrote:

> You can combine multiple relations using the UNION operator. If you're
> trying to combine logic, you can use a macro to do e2-e5 below that takes
> (e1, empid) or (e1, group). See the example here:
>
> http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/
>
> On Tue, May 15, 2012 at 6:50 AM, shan s <my...@gmail.com> wrote:
>
> > How can I combine multiple group by that are performed on essentially
> same
> > relation?
> > In the case below, can I do this in single foreach?
> >
> > e1 =  load 'emp' using PigStorage() as (empid, school, district, score);
> >
> > e2 = group e1 by empid;
> > e3 = foreach e2 generate group, AVG(e1.score) as s;
> > e4 = order e3 by s desc;
> > e5 = limit e4 3;
> > dump e5;
> >
> > e2 = group e1 by school;
> > e3 = foreach e2 generate group, AVG(e1.score) as s;
> > e4 = order e3 by s desc;
> > e5 = limit e4 3;
> > dump e5;
> > Thank You,
> > Prashant.
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*
>

Re: How to combine muliple group by

Posted by Bill Graham <bi...@gmail.com>.
You can combine multiple relations using the UNION operator. If you're
trying to combine logic, you can use a macro to do e2-e5 below that takes
(e1, empid) or (e1, group). See the example here:

http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/

On Tue, May 15, 2012 at 6:50 AM, shan s <my...@gmail.com> wrote:

> How can I combine multiple group by that are performed on essentially same
> relation?
> In the case below, can I do this in single foreach?
>
> e1 =  load 'emp' using PigStorage() as (empid, school, district, score);
>
> e2 = group e1 by empid;
> e3 = foreach e2 generate group, AVG(e1.score) as s;
> e4 = order e3 by s desc;
> e5 = limit e4 3;
> dump e5;
>
> e2 = group e1 by school;
> e3 = foreach e2 generate group, AVG(e1.score) as s;
> e4 = order e3 by s desc;
> e5 = limit e4 3;
> dump e5;
> Thank You,
> Prashant.
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*