You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by M G <mg...@gmail.com> on 2013/04/15 22:25:04 UTC
Rank within a group
Is there a way to do RANK within a group in PIG 0.11.1?
In the following sample dataset, I would like to Rank DESC by Income, and
further RANK by Income for each Industry.
Name Industry Income
John,Banking, 20,000
Jane, Banking, 35,000
Chen,Real Estate, 30,000
Hari, Real Estate, 22,000
Asha, Technology, 26,000
I tried something like this, but I get syntax error.
names_by_ind = group names by industry;
rank_by_ind = foreach names_by_ind {
results = RANK names BY income DESC;
GENERATE flatten(results);
}
Re: Rank within a group
Posted by M G <mg...@gmail.com>.
Thanks a lot for your response. Much appreciated.
Mythili
On Tue, Apr 16, 2013 at 12:00 PM, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:
> Hi,
>
> nested RANK is not supported yet, however it is easy to implement as a UDF.
> Just sort the records and assign an increasing counter with the UDF.
> We will probably add support for nested RANK in the next release.
>
>
> Cheers,
>
> --
> Gianmarco
>
>
> On Mon, Apr 15, 2013 at 11:10 PM, M G <mg...@gmail.com> wrote:
>
> > Hi Johnny Zhang:
> >
> >
> > What I am looking for is overall rank and rank within each group. Sorry
> if
> > I was not clear.
> >
> > What I am looking to get is something like this.
> >
> > (1, 1, John, Banking, 20000)
> > (5, 2, Jane, Banking, 35000)
> > (3, 1, Asha, Technology, 26000)
> > (2, 1, Hari, Real Estate, 22000)
> > (4, 2, Chen, Real Estate, 30000)
> >
> > Thanks,
> > Mythili
> >
> >
> > On Mon, Apr 15, 2013 at 1:58 PM, Johnny Zhang <xi...@cloudera.com>
> > wrote:
> >
> > > Hi, M G:
> > > for input data
> > > John Banking 20000
> > > Jane Banking 35000
> > > Chen Real Estate 30000
> > > Hari Real Estate 22000
> > > Asha Technology 26000
> > >
> > >
> > > a = load '/var/lib/jenkins/income' as (name:chararray,
> > industry:chararray,
> > > income:int);
> > > b = rank a by income;
> > > c = group b by industry;
> > > d = foreach c generate flatten(b);
> > > dump d;
> > >
> > > output is:
> > > (1,John,Banking,20000)
> > > (5,Jane,Banking,35000)
> > > (3,Asha,Technology,26000)
> > > (2,Hari,Real Estate,22000)
> > > (4,Chen,Real Estate,30000)
> > >
> > > Johnny
> > >
> > >
> > > On Mon, Apr 15, 2013 at 1:25 PM, M G <mg...@gmail.com> wrote:
> > >
> > > > Is there a way to do RANK within a group in PIG 0.11.1?
> > > >
> > > > In the following sample dataset, I would like to Rank DESC by Income,
> > and
> > > > further RANK by Income for each Industry.
> > > >
> > > > Name Industry Income
> > > >
> > > > John,Banking, 20,000
> > > > Jane, Banking, 35,000
> > > > Chen,Real Estate, 30,000
> > > > Hari, Real Estate, 22,000
> > > > Asha, Technology, 26,000
> > > >
> > > > I tried something like this, but I get syntax error.
> > > >
> > > > names_by_ind = group names by industry;
> > > >
> > > > rank_by_ind = foreach names_by_ind {
> > > > results = RANK names BY income DESC;
> > > > GENERATE flatten(results);
> > > > }
> > > >
> > >
> >
>
Re: Rank within a group
Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Hi,
nested RANK is not supported yet, however it is easy to implement as a UDF.
Just sort the records and assign an increasing counter with the UDF.
We will probably add support for nested RANK in the next release.
Cheers,
--
Gianmarco
On Mon, Apr 15, 2013 at 11:10 PM, M G <mg...@gmail.com> wrote:
> Hi Johnny Zhang:
>
>
> What I am looking for is overall rank and rank within each group. Sorry if
> I was not clear.
>
> What I am looking to get is something like this.
>
> (1, 1, John, Banking, 20000)
> (5, 2, Jane, Banking, 35000)
> (3, 1, Asha, Technology, 26000)
> (2, 1, Hari, Real Estate, 22000)
> (4, 2, Chen, Real Estate, 30000)
>
> Thanks,
> Mythili
>
>
> On Mon, Apr 15, 2013 at 1:58 PM, Johnny Zhang <xi...@cloudera.com>
> wrote:
>
> > Hi, M G:
> > for input data
> > John Banking 20000
> > Jane Banking 35000
> > Chen Real Estate 30000
> > Hari Real Estate 22000
> > Asha Technology 26000
> >
> >
> > a = load '/var/lib/jenkins/income' as (name:chararray,
> industry:chararray,
> > income:int);
> > b = rank a by income;
> > c = group b by industry;
> > d = foreach c generate flatten(b);
> > dump d;
> >
> > output is:
> > (1,John,Banking,20000)
> > (5,Jane,Banking,35000)
> > (3,Asha,Technology,26000)
> > (2,Hari,Real Estate,22000)
> > (4,Chen,Real Estate,30000)
> >
> > Johnny
> >
> >
> > On Mon, Apr 15, 2013 at 1:25 PM, M G <mg...@gmail.com> wrote:
> >
> > > Is there a way to do RANK within a group in PIG 0.11.1?
> > >
> > > In the following sample dataset, I would like to Rank DESC by Income,
> and
> > > further RANK by Income for each Industry.
> > >
> > > Name Industry Income
> > >
> > > John,Banking, 20,000
> > > Jane, Banking, 35,000
> > > Chen,Real Estate, 30,000
> > > Hari, Real Estate, 22,000
> > > Asha, Technology, 26,000
> > >
> > > I tried something like this, but I get syntax error.
> > >
> > > names_by_ind = group names by industry;
> > >
> > > rank_by_ind = foreach names_by_ind {
> > > results = RANK names BY income DESC;
> > > GENERATE flatten(results);
> > > }
> > >
> >
>
Re: Rank within a group
Posted by M G <mg...@gmail.com>.
Hi Johnny Zhang:
What I am looking for is overall rank and rank within each group. Sorry if
I was not clear.
What I am looking to get is something like this.
(1, 1, John, Banking, 20000)
(5, 2, Jane, Banking, 35000)
(3, 1, Asha, Technology, 26000)
(2, 1, Hari, Real Estate, 22000)
(4, 2, Chen, Real Estate, 30000)
Thanks,
Mythili
On Mon, Apr 15, 2013 at 1:58 PM, Johnny Zhang <xi...@cloudera.com> wrote:
> Hi, M G:
> for input data
> John Banking 20000
> Jane Banking 35000
> Chen Real Estate 30000
> Hari Real Estate 22000
> Asha Technology 26000
>
>
> a = load '/var/lib/jenkins/income' as (name:chararray, industry:chararray,
> income:int);
> b = rank a by income;
> c = group b by industry;
> d = foreach c generate flatten(b);
> dump d;
>
> output is:
> (1,John,Banking,20000)
> (5,Jane,Banking,35000)
> (3,Asha,Technology,26000)
> (2,Hari,Real Estate,22000)
> (4,Chen,Real Estate,30000)
>
> Johnny
>
>
> On Mon, Apr 15, 2013 at 1:25 PM, M G <mg...@gmail.com> wrote:
>
> > Is there a way to do RANK within a group in PIG 0.11.1?
> >
> > In the following sample dataset, I would like to Rank DESC by Income, and
> > further RANK by Income for each Industry.
> >
> > Name Industry Income
> >
> > John,Banking, 20,000
> > Jane, Banking, 35,000
> > Chen,Real Estate, 30,000
> > Hari, Real Estate, 22,000
> > Asha, Technology, 26,000
> >
> > I tried something like this, but I get syntax error.
> >
> > names_by_ind = group names by industry;
> >
> > rank_by_ind = foreach names_by_ind {
> > results = RANK names BY income DESC;
> > GENERATE flatten(results);
> > }
> >
>
Re: Rank within a group
Posted by Johnny Zhang <xi...@cloudera.com>.
Hi, M G:
for input data
John Banking 20000
Jane Banking 35000
Chen Real Estate 30000
Hari Real Estate 22000
Asha Technology 26000
a = load '/var/lib/jenkins/income' as (name:chararray, industry:chararray,
income:int);
b = rank a by income;
c = group b by industry;
d = foreach c generate flatten(b);
dump d;
output is:
(1,John,Banking,20000)
(5,Jane,Banking,35000)
(3,Asha,Technology,26000)
(2,Hari,Real Estate,22000)
(4,Chen,Real Estate,30000)
Johnny
On Mon, Apr 15, 2013 at 1:25 PM, M G <mg...@gmail.com> wrote:
> Is there a way to do RANK within a group in PIG 0.11.1?
>
> In the following sample dataset, I would like to Rank DESC by Income, and
> further RANK by Income for each Industry.
>
> Name Industry Income
>
> John,Banking, 20,000
> Jane, Banking, 35,000
> Chen,Real Estate, 30,000
> Hari, Real Estate, 22,000
> Asha, Technology, 26,000
>
> I tried something like this, but I get syntax error.
>
> names_by_ind = group names by industry;
>
> rank_by_ind = foreach names_by_ind {
> results = RANK names BY income DESC;
> GENERATE flatten(results);
> }
>