You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Nikhil <gu...@gmail.com> on 2009/08/13 19:55:21 UTC

Re: Tuple ordering after a group-by

Hi,

This is a follow up question to the thread "Tuple ordering after a group-by".

Would this - {suppose A has the schema [date, id, some_value]}

B = GROUP A BY id;
C = FOREACH B {
         A1 = ORDER A BY date ASC;
         GENERATE id, A1.num;
}

Guarantee that C always contains some_value ordered by date, and if I
feed this to a UDF, it will be fed in the same order ?

I want to confirm because the PigLatin manual for ORDER says "if you
further process the relation there is no guarantee that the contents
will be processed in the order you originally specified."

Thanks
-nikhil

Re: Tuple ordering after a group-by

Posted by Nikhil <gu...@gmail.com>.
Thanks, Alan.

I have written an UDF to find the "trend" in a time series [number of
sessions/incoming query] - somewhat like trendingtopics.org does.
Since this UDF will be reused for finding trends in some other series [where
data will be ordered by say both date & id2], I want to just feed the
numbers to the UDF [ordered by the time series].

So for now, I'll stick with -

C = FOREACH B {
       A1 = ORDER A BY date ASC;
       GENERATE id, UDF(A1.num);
}

-nikhil

On Thu, Aug 13, 2009 at 12:13 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> What do you want to feed to a UDF?  Usually it would be A1.  Technically
> there is no guarantee that the ordering would persist past the FOREACH, but
> in reality it will.  So
>
> C = FOREACH B {
>        A1 = ORDER A BY date ASC;
>        GENERATE id, UDF(A1);
> }
>
> It is guaranteed that A1 will passed to UDF ordered by date.
>
> C = FOREACH B {
>        A1 = ORDER A BY date ASC;
>        GENERATE id, A1.num;
> }
>
> The output of C will be first collected by id (by which I mean all
> instances of a given id will appear contiguously in the stream, there is no
> guarantee of ordering), and within each collection it the records will be
> ordered by date.
>
> Alan.
>
>
>
> On Aug 13, 2009, at 10:55 AM, Nikhil wrote:
>
>  Hi,
>>
>> This is a follow up question to the thread "Tuple ordering after a
>> group-by".
>>
>> Would this - {suppose A has the schema [date, id, some_value]}
>>
>> B = GROUP A BY id;
>> C = FOREACH B {
>>        A1 = ORDER A BY date ASC;
>>        GENERATE id, A1.num;
>> }
>>
>> Guarantee that C always contains some_value ordered by date, and if I
>> feed this to a UDF, it will be fed in the same order ?
>>
>> I want to confirm because the PigLatin manual for ORDER says "if you
>> further process the relation there is no guarantee that the contents
>> will be processed in the order you originally specified."
>>
>> Thanks
>> -nikhil
>>
>
>

Re: Tuple ordering after a group-by

Posted by Alan Gates <ga...@yahoo-inc.com>.
What do you want to feed to a UDF?  Usually it would be A1.   
Technically there is no guarantee that the ordering would persist past  
the FOREACH, but in reality it will.  So

C = FOREACH B {
	A1 = ORDER A BY date ASC;
	GENERATE id, UDF(A1);
}

It is guaranteed that A1 will passed to UDF ordered by date.

C = FOREACH B {
	A1 = ORDER A BY date ASC;
	GENERATE id, A1.num;
}

The output of C will be first collected by id (by which I mean all  
instances of a given id will appear contiguously in the stream, there  
is no guarantee of ordering), and within each collection it the  
records will be ordered by date.

Alan.

	
On Aug 13, 2009, at 10:55 AM, Nikhil wrote:

> Hi,
>
> This is a follow up question to the thread "Tuple ordering after a  
> group-by".
>
> Would this - {suppose A has the schema [date, id, some_value]}
>
> B = GROUP A BY id;
> C = FOREACH B {
>         A1 = ORDER A BY date ASC;
>         GENERATE id, A1.num;
> }
>
> Guarantee that C always contains some_value ordered by date, and if I
> feed this to a UDF, it will be fed in the same order ?
>
> I want to confirm because the PigLatin manual for ORDER says "if you
> further process the relation there is no guarantee that the contents
> will be processed in the order you originally specified."
>
> Thanks
> -nikhil