You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by elein <el...@varlena.com> on 2010/07/28 23:45:38 UTC

UNION -- Ordered

I've got 
A = FOREACH ...
B = FOREACH ...
C = FOREACH ...
...

X = UNION A, B, C,...

Each of the A, B, C data is a single tuple.  I want X ordered
by the order specified in the UNION.  The data in A, B, C, ... is not 
necessarily in explicit sort order so ORDER X by field does not work.  I've tried breaking 
the union into only unioning two pieces then that union plus another piece, etc.  
That does not work either.

Anyone have any ideas how to do this


elein
elein@varlena.com





Re: UNION -- Ordered

Posted by elein <el...@varlena.com>.
Yes, Thank you.  I was trying to avoid adding a sort column.


On Jul 28, 2010, at 6:05 PM, Thejas M Nair wrote:

> As you observed, union does not guarantee the ordering . You will need to project an additional column indicating the order you want, so that you can do an order-by on it.
> 
> -Thejas
> 
> 
> 
> On 7/28/10 2:45 PM, "elein" <el...@varlena.com> wrote:
> 
> 
> 
> I've got
> A = FOREACH ...
> B = FOREACH ...
> C = FOREACH ...
> ...
> 
> X = UNION A, B, C,...
> 
> Each of the A, B, C data is a single tuple.  I want X ordered
> by the order specified in the UNION.  The data in A, B, C, ... is not
> necessarily in explicit sort order so ORDER X by field does not work.  I've tried breaking
> the union into only unioning two pieces then that union plus another piece, etc.
> That does not work either.
> 
> Anyone have any ideas how to do this
> 
> 
> elein
> elein@varlena.com
> 
> 
> 
> 
> 
> 

elein
elein@varlena.com





Re: parallism level

Posted by Thejas M Nair <te...@yahoo-inc.com>.
Please see http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html .

You can use Oset default_parallel 10¹ to ask query to use 10 reducers for
all MR jobs, or specify Oparallel x¹ in the pig statement to ask pig to use
x number of reducers for that operation (for operations like group, order-by
, join that usually result in a separate MR job).
-Thejas



On 7/28/10 7:00 PM, "Gang Luo" <lg...@yahoo.com.cn> wrote:

> Hi all,
> by default the parallism (number of reducers) of a pig query is 1. How to
> change
> this value? If I set the value to 10, does that mean all the MR jobs for this
> query will run with 10 reducers?
> 
> 
> Thanks,
> -Gang
> 
> 
>      
> 



parallism level

Posted by Gang Luo <lg...@yahoo.com.cn>.
Hi all,
by default the parallism (number of reducers) of a pig query is 1. How to change 
this value? If I set the value to 10, does that mean all the MR jobs for this 
query will run with 10 reducers? 


Thanks,
-Gang


      

Re: UNION -- Ordered

Posted by Thejas M Nair <te...@yahoo-inc.com>.
As you observed, union does not guarantee the ordering . You will need to project an additional column indicating the order you want, so that you can do an order-by on it.

-Thejas



On 7/28/10 2:45 PM, "elein" <el...@varlena.com> wrote:



I've got
A = FOREACH ...
B = FOREACH ...
C = FOREACH ...
...

X = UNION A, B, C,...

Each of the A, B, C data is a single tuple.  I want X ordered
by the order specified in the UNION.  The data in A, B, C, ... is not
necessarily in explicit sort order so ORDER X by field does not work.  I've tried breaking
the union into only unioning two pieces then that union plus another piece, etc.
That does not work either.

Anyone have any ideas how to do this


elein
elein@varlena.com