You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Cristian Giha <Cr...@equifax.com> on 2014/08/04 15:14:08 UTC
Hive Sort By Table optimization
Hi,
I am trying to understand how I can use effectively the SORT BY clause on a clustered table. For example, when I am creating a table, I am using the clause
Create external table TEST(...)
Clustered By (id) Into "N" buckets
...
So, I am putting data from another table with a clause Select id, ... from another_table cluster by(id). It's work okey, I don't understand what improvements I have when I add the
Clustered By (id) Sorted by(ID) Into "N" buckets
Or maybe sorted by a second column. It's only for Join table queries or has its another optimizations?
Regards.
Cristian Giha SepĂșlveda