You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Adam Silberstein <ad...@trifacta.com> on 2014/06/19 20:56:43 UTC
order guarantees in bags
Hey All,
I have a question about Pig’s guarantees around the order of tuples in bags. I am trying to decide how paranoid to be about this.
Documentation says that bags are unordered. But, in practice, I have never seen Pig re-order the tuples in a default data bag and nothing about the current implementation suggests they can get out of order.
Also, if you look at PigAvroStorage or JsonStorage (at least the elephant-bird version), both read in arrays as bags. Does that mean they implicitly don’t care about maintaining order in arrays? Or are they counting on the current implementation to keep them in order.
Thanks for any insights on this!
Adam
Re: order guarantees in bags
Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Adam,
Pig doesn't reorder arrays when loading them. But when you do group-by, the
order of bags on the reducer side is not deterministic. So for example, if
you do "limit n" in a nested foreach after group-by, you can get different
results every run.
Thanks,
Cheolsoo
On Thu, Jun 19, 2014 at 11:56 AM, Adam Silberstein <ad...@trifacta.com>
wrote:
> Hey All,
> I have a question about Pig’s guarantees around the order of tuples in
> bags. I am trying to decide how paranoid to be about this.
>
> Documentation says that bags are unordered. But, in practice, I have
> never seen Pig re-order the tuples in a default data bag and nothing about
> the current implementation suggests they can get out of order.
>
> Also, if you look at PigAvroStorage or JsonStorage (at least the
> elephant-bird version), both read in arrays as bags. Does that mean they
> implicitly don’t care about maintaining order in arrays? Or are they
> counting on the current implementation to keep them in order.
>
> Thanks for any insights on this!
> Adam
>
>
>