You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Cristian Giha <Cr...@equifax.com> on 2014/08/04 15:14:08 UTC

Hive Sort By Table optimization

Hi,

I am trying to understand how I can use effectively the SORT BY clause on a clustered table. For example, when I am creating a table, I am using the clause

Create external table TEST(...)
Clustered By (id) Into "N" buckets
...

So, I am putting data from another  table with a clause Select id, ... from another_table cluster by(id). It's work okey, I don't understand what improvements I have when I add the

Clustered By (id) Sorted by(ID) Into "N" buckets

Or maybe sorted by a second column. It's only for Join table queries or  has its another optimizations?

Regards.

Cristian Giha Sepúlveda