You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kevin Burton <bu...@spinn3r.com> on 2011/08/22 07:29:35 UTC

"Table is not sorted" when using Zebra and USING merge with DISTINCT and GROUP

Both DISTINCT and GROUP cause the result to be ordered.

Why does using a merge cause this to fail?

Specifically, Zebra then thinks the results aren't sorted, when they are.

I think the problem is that Zebra actually writes the sort info into the
table schema on disk but that with DISTINCT and GROUP it isn't written.

I'll have to see what other operations will result in a sorted table and
then implement support for them as well.

I have a DISTINCT operation which can then be merge joined do another table
which is already sorted.

Both are rather large files…… like 500GB … so avoiding a resort would be a
good thing :)

Kevin

-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Re: "Table is not sorted" when using Zebra and USING merge with DISTINCT and GROUP

Posted by Alan Gates <ga...@hortonworks.com>.
Neither group nor distinct produce total sorted order in the output, so Zebra is correct to not record the results as sorted.  Given our current implementation of group and distinct results are sorted per part file, but not across part files.

Alan.

On Aug 21, 2011, at 10:29 PM, Kevin Burton wrote:

> Both DISTINCT and GROUP cause the result to be ordered.
> 
> Why does using a merge cause this to fail?
> 
> Specifically, Zebra then thinks the results aren't sorted, when they are.
> 
> I think the problem is that Zebra actually writes the sort info into the
> table schema on disk but that with DISTINCT and GROUP it isn't written.
> 
> I'll have to see what other operations will result in a sorted table and
> then implement support for them as well.
> 
> I have a DISTINCT operation which can then be merge joined do another table
> which is already sorted.
> 
> Both are rather large files…… like 500GB … so avoiding a resort would be a
> good thing :)
> 
> Kevin
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> 
> Location: *San Francisco, CA*
> Skype: *burtonator*
> 
> Skype-in: *(415) 871-0687*