You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by James Newhaven <ja...@gmail.com> on 2012/05/29 21:25:31 UTC

Losing ordering after using ORDER BY

Hi,

I've noticed that I seem to be losing the ordering of my relation after
passing the result of an ORDER BY to an EVAL function.

For example:

D = FOREACH C GENERATE COUNT($1) as countd;
E = ORDER D BY $0 DESC;
D1 = GROUP E ALL;
D2 = FOREACH D1 GENERATE MyCustomEvalFunc($1);

When inspecting the results in MyCustomEvalFunc I noticed the ordering of
my results isn't the same as relation E (which uses ORDER BY DESC).

Any help appreciated!

Thanks,
James

Re: Losing ordering after using ORDER BY

Posted by James Newhaven <ja...@gmail.com>.
Thanks Jonathan. That worked fine.

James



On 29 May 2012, at 08:43 PM, Jonathan Coveney <jc...@gmail.com> wrote:

> If you do a grouping, the ordering changes. What you want to do is:
>
> D = FOREACH C GENERATE COUNT($1) as countd;
> D1 = GROUP D ALL;
> D2 = FOREACH D1 {
>  ord = ORDER $1 BY $0 desc;
>  GENERATE MyCustomEvalFunc(ord);
> }
>
> Keep in mind that you'llbe ordering all of your data on one reducer, but
> this isn't very different from what you're doing, where you were passing
> all of your data to one reducer anyway (which is what group all generally
> does). If you have memory issues, this is why.
>
> 2012/5/29 James Newhaven <ja...@gmail.com>
>
>> Hi,
>>
>> I've noticed that I seem to be losing the ordering of my relation after
>> passing the result of an ORDER BY to an EVAL function.
>>
>> For example:
>>
>> D = FOREACH C GENERATE COUNT($1) as countd;
>> E = ORDER D BY $0 DESC;
>> D1 = GROUP E ALL;
>> D2 = FOREACH D1 GENERATE MyCustomEvalFunc($1);
>>
>> When inspecting the results in MyCustomEvalFunc I noticed the ordering of
>> my results isn't the same as relation E (which uses ORDER BY DESC).
>>
>> Any help appreciated!
>>
>> Thanks,
>> James
>>

Re: Losing ordering after using ORDER BY

Posted by Jonathan Coveney <jc...@gmail.com>.
If you do a grouping, the ordering changes. What you want to do is:

D = FOREACH C GENERATE COUNT($1) as countd;
D1 = GROUP D ALL;
D2 = FOREACH D1 {
  ord = ORDER $1 BY $0 desc;
  GENERATE MyCustomEvalFunc(ord);
}

Keep in mind that you'llbe ordering all of your data on one reducer, but
this isn't very different from what you're doing, where you were passing
all of your data to one reducer anyway (which is what group all generally
does). If you have memory issues, this is why.

2012/5/29 James Newhaven <ja...@gmail.com>

> Hi,
>
> I've noticed that I seem to be losing the ordering of my relation after
> passing the result of an ORDER BY to an EVAL function.
>
> For example:
>
> D = FOREACH C GENERATE COUNT($1) as countd;
> E = ORDER D BY $0 DESC;
> D1 = GROUP E ALL;
> D2 = FOREACH D1 GENERATE MyCustomEvalFunc($1);
>
> When inspecting the results in MyCustomEvalFunc I noticed the ordering of
> my results isn't the same as relation E (which uses ORDER BY DESC).
>
> Any help appreciated!
>
> Thanks,
> James
>