You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by ugo jardonnet <ug...@gmail.com> on 2011/04/26 15:11:26 UTC
TOP ordering
Hi. I am looking for a way to get the result of top ordered. Is it possible
?
Example:
A = LOAD 'datatest' USING PigStorage(';') as (first: chararray, second:
int);
D = GROUP A BY first;
topResults = FOREACH D {
result = TOP(3, 1, A);
GENERATE flatten(result); -- unordered
};
dump topResults
best,
Re: TOP ordering
Posted by ugo jardonnet <ug...@gmail.com>.
mmm
In fact TOP doesn't order results. I was looking for a way to do this from
PIG.
The problem is TOP returns a bag which cannot be ordered. And of course
after the foreach its to late.
2011/4/26 Sven Krasser <kr...@gmail.com>
> At a glance it could be this: The first field in D.A is of type chararray,
> but TOP orders based on long.
> -Sven
>
> On Tue, Apr 26, 2011 at 6:11 AM, ugo jardonnet <ugo.jardonnet@gmail.com
> >wrote:
>
> > Hi. I am looking for a way to get the result of top ordered. Is it
> possible
> > ?
> >
> > Example:
> >
> > A = LOAD 'datatest' USING PigStorage(';') as (first: chararray, second:
> > int);
> > D = GROUP A BY first;
> > topResults = FOREACH D {
> > result = TOP(3, 1, A);
> > GENERATE flatten(result); -- unordered
> > };
> > dump topResults
> >
> > best,
> >
>
>
>
> --
> http://sites.google.com/site/krasser/
>
Re: TOP ordering
Posted by Sven Krasser <kr...@gmail.com>.
At a glance it could be this: The first field in D.A is of type chararray,
but TOP orders based on long.
-Sven
On Tue, Apr 26, 2011 at 6:11 AM, ugo jardonnet <ug...@gmail.com>wrote:
> Hi. I am looking for a way to get the result of top ordered. Is it possible
> ?
>
> Example:
>
> A = LOAD 'datatest' USING PigStorage(';') as (first: chararray, second:
> int);
> D = GROUP A BY first;
> topResults = FOREACH D {
> result = TOP(3, 1, A);
> GENERATE flatten(result); -- unordered
> };
> dump topResults
>
> best,
>
--
http://sites.google.com/site/krasser/
Re: TOP ordering
Posted by ugo jardonnet <ug...@gmail.com>.
2011/4/26 Dmitriy Ryaboy <dv...@gmail.com>
> This may be helpful in understanding what happens when you do a group-by:
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
>
> Thank you very Much.
> Also, are you sure TOP doesn't give you items in order? It's a bag, but the
> implementation is such that flattening it should give you things in proper
> order (I think -- haven't tried).
>
>
I took a look at the implementation of TOP. The output bag is built
iterating on a priority_queue,
which is not supposed to "returns the elements in any particular order".
Thank both of you for clearing things out about foreach.
Re: TOP ordering
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
This may be helpful in understanding what happens when you do a group-by:
http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
Also, are you sure TOP doesn't give you items in order? It's a bag, but the
implementation is such that flattening it should give you things in proper
order (I think -- haven't tried).
D
On Tue, Apr 26, 2011 at 9:54 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
> A has changed. A outside the foreach is a relation (all the records you
> loaded). Inside the foreach A is a bag created by the group by. So what
> this does is order the bag A by the second input, and then take the top 3
> records. Actually, given that order by goes from least to greatest this
> will give the bottom 3 records. You'll need to change it to 'srtd = order A
> by second desc;' to get the top 3.
>
> Alan.
>
>
> On Apr 26, 2011, at 9:36 AM, ugo jardonnet wrote:
>
> 2011/4/26 Alan Gates <ga...@yahoo-inc.com>
>>
>> topResults = foreach D {
>>> srtd = order A by second;
>>> top3 = limit srtd 3;
>>> generate flatten(top3);
>>> };
>>>
>>> Alan.
>>>
>>> Thank you Alan. It works perfectly.
>>>
>>
>> I realize I didn't really understood the mechanism behind foreach.
>> Reading this piece of code I would have expect each top3 to be the same.
>> I suppose A is filtered by D at the beginning of the loop ?
>>
>
>
Re: TOP ordering
Posted by Alan Gates <ga...@yahoo-inc.com>.
A has changed. A outside the foreach is a relation (all the records
you loaded). Inside the foreach A is a bag created by the group by.
So what this does is order the bag A by the second input, and then
take the top 3 records. Actually, given that order by goes from least
to greatest this will give the bottom 3 records. You'll need to
change it to 'srtd = order A by second desc;' to get the top 3.
Alan.
On Apr 26, 2011, at 9:36 AM, ugo jardonnet wrote:
> 2011/4/26 Alan Gates <ga...@yahoo-inc.com>
>
>> topResults = foreach D {
>> srtd = order A by second;
>> top3 = limit srtd 3;
>> generate flatten(top3);
>> };
>>
>> Alan.
>>
>> Thank you Alan. It works perfectly.
>
> I realize I didn't really understood the mechanism behind foreach.
> Reading this piece of code I would have expect each top3 to be the
> same.
> I suppose A is filtered by D at the beginning of the loop ?
Re: TOP ordering
Posted by ugo jardonnet <ug...@gmail.com>.
2011/4/26 Alan Gates <ga...@yahoo-inc.com>
> topResults = foreach D {
> srtd = order A by second;
> top3 = limit srtd 3;
> generate flatten(top3);
> };
>
> Alan.
>
> Thank you Alan. It works perfectly.
I realize I didn't really understood the mechanism behind foreach.
Reading this piece of code I would have expect each top3 to be the same.
I suppose A is filtered by D at the beginning of the loop ?
Re: TOP ordering
Posted by Alan Gates <ga...@yahoo-inc.com>.
topResults = foreach D {
srtd = order A by second;
top3 = limit srtd 3;
generate flatten(top3);
};
Alan.
On Apr 26, 2011, at 6:11 AM, ugo jardonnet wrote:
> Hi. I am looking for a way to get the result of top ordered. Is it
> possible
> ?
>
> Example:
>
> A = LOAD 'datatest' USING PigStorage(';') as (first: chararray,
> second:
> int);
> D = GROUP A BY first;
> topResults = FOREACH D {
> result = TOP(3, 1, A);
> GENERATE flatten(result); -- unordered
> };
> dump topResults
>
> best,