You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Adam Silberstein <ad...@trifacta.com> on 2014/06/19 20:56:43 UTC

order guarantees in bags

Hey All,
I have a question about Pig’s guarantees around the order of tuples in bags.  I am trying to decide how paranoid to be about this.

Documentation says that bags are unordered.  But, in practice, I have never seen Pig re-order the tuples in a default data bag and nothing about the current implementation suggests they can get out of order.

Also, if you look at PigAvroStorage or JsonStorage (at least the elephant-bird version), both read in arrays as bags.  Does that mean they implicitly don’t care about maintaining order in arrays?  Or are they counting on the current implementation to keep them in order.

Thanks for any insights on this!
Adam



Re: order guarantees in bags

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Adam,

Pig doesn't reorder arrays when loading them. But when you do group-by, the
order of bags on the reducer side is not deterministic. So for example, if
you do "limit n" in a nested foreach after group-by, you can get different
results every run.

Thanks,
Cheolsoo



On Thu, Jun 19, 2014 at 11:56 AM, Adam Silberstein <ad...@trifacta.com>
wrote:

> Hey All,
> I have a question about Pig’s guarantees around the order of tuples in
> bags.  I am trying to decide how paranoid to be about this.
>
> Documentation says that bags are unordered.  But, in practice, I have
> never seen Pig re-order the tuples in a default data bag and nothing about
> the current implementation suggests they can get out of order.
>
> Also, if you look at PigAvroStorage or JsonStorage (at least the
> elephant-bird version), both read in arrays as bags.  Does that mean they
> implicitly don’t care about maintaining order in arrays?  Or are they
> counting on the current implementation to keep them in order.
>
> Thanks for any insights on this!
> Adam
>
>
>