You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2009/01/27 04:12:59 UTC

[jira] Commented: (HIVE-252) Automatically add CLUSTER BY and set the number of reducers if the target table is declared with "CLUSTERED BY (xxx) INTO yyy BUCKETS"

    [ https://issues.apache.org/jira/browse/HIVE-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667571#action_12667571 ] 

Joydeep Sen Sarma commented on HIVE-252:
----------------------------------------

if we are going down this line - this should be generalized. an insert to any clustered table would have to check that the data is clustered by the right key prior to the insertion. 

the query re-write may not be so obvious in all cases where a violation of the above is detected. it would be more consistent user experience to 'suggest' the right query where possible.

there's the flip side to this - to set the clustering property on the inserted table/partition in case the query is constructed in this manner (what Adam was asking for a couple of days back)

> Automatically add CLUSTER BY and set the number of reducers if the target table is declared with "CLUSTERED BY (xxx) INTO yyy BUCKETS"
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-252
>                 URL: https://issues.apache.org/jira/browse/HIVE-252
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Zheng Shao
>
> We should automatically add a "cluster by" clause to the following query with 64 reducers.
> CREATE TABLE aaa (a BIGINT, b INT)
> PARTITIONED BY(ds STRING)
> CLUSTERED BY(a) INTO 64 BUCKETS 
> STORED AS SEQUENCEFILE;
> INSERT OVERWRITE TABLE aaa PARTITION(ds='2009-01-24')
> SELECT a.a, a.b
> FROM training_set a
> WHERE a.ds = '2009-01-24';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.