You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Deepak Jaiswal (JIRA)" <ji...@apache.org> on 2017/12/07 20:12:00 UTC

[jira] [Resolved] (HIVE-17923) 'cluster by' should not be needed for a bucketed table

     [ https://issues.apache.org/jira/browse/HIVE-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deepak Jaiswal resolved HIVE-17923.
-----------------------------------
    Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/HIVE-18157

> 'cluster by' should not be needed for a bucketed table
> ------------------------------------------------------
>
>                 Key: HIVE-17923
>                 URL: https://issues.apache.org/jira/browse/HIVE-17923
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Eugene Koifman
>            Assignee: Deepak Jaiswal
>            Priority: Blocker
>
> given 
> {noformat}
> CREATE TABLE over10k_orc_bucketed(t tinyint,
>            si smallint,
>            i int,
>            b bigint,
>            f float,
>            d double,
>            bo boolean,
>            s string,
>            ts timestamp,
>            `dec` decimal(4,2),
>            bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC;
> {noformat}
> insert into over10k_orc_bucketed select * from over10k
> {noformat}
> produces 1 data file (bucket 0).  It should produce 4 based on input data.
> {noformat}
> insert into over10k_orc_bucketed select * from over10k cluster by si
> {noformat}
> does the right thing.
> acid_vectorization_original.q has the full script (HIVE-17458)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)