You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2013/03/07 17:56:14 UTC

[jira] [Created] (HIVE-4137) optimize group by followed by joins for bucketed/sorted tables

Namit Jain created HIVE-4137:
--------------------------------

             Summary: optimize group by followed by joins for bucketed/sorted tables
                 Key: HIVE-4137
                 URL: https://issues.apache.org/jira/browse/HIVE-4137
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain


Consider the following scenario:

create table T1 (...) clustered by (key) sorted by (key) into 2 buckets;
create table T2 (...) clustered by (key) sorted by (key) into 2 buckets;
create table T3 (...) clustered by (key) sorted by (key) into 2 buckets;

SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;


insert overwrite table T3
select ..
from 
(select key, aggr() from T1 group by key) s1
full outer join
(select key, aggr() from T2 group by key) s2
on s1.key=s2.ley;

Ideally, this query can be performed in a single map-only job.
Group By -> SortMerge Join.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira