You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2015/09/03 21:13:45 UTC

[jira] [Updated] (HIVE-11719) acid insert with dynamic partitioning doesn't create empty buckets

     [ https://issues.apache.org/jira/browse/HIVE-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eugene Koifman updated HIVE-11719:
----------------------------------
    Description: 
{code:sql}
CREATE TABLE T(a INT, b STRING)
      PARTITIONED BY(ds string)
      CLUSTERED BY(a) INTO 2 BUCKETS
      STORED AS ORC TBLPROPERTIES ('transactional'='true')

insert into T partition (ds) values (1, 'fred', 'today'), (2, 'wilma', 'yesterday')
{code}

See TestCompactor.dynamicPartitioningUpdate()

This will currently create 1 bucket file in each partition.  This may break Bucket based joins on MR since they expect to always have a full complement of buckets.
Should not be an issue on Tez.

See FileSinkOperator.createBucketForFileIdx()

Also, double check that compaction properly handles empty buckets,i.e. does delta/base have full complement of bucket files

  was:


{code:sql}
CREATE TABLE T(a INT, b STRING)
      PARTITIONED BY(ds string)
      CLUSTERED BY(a) INTO 2 BUCKETS
      STORED AS ORC TBLPROPERTIES ('transactional'='true')

insert into T partition (ds) values (1, 'fred', 'today'), (2, 'wilma', 'yesterday')
{code}

See TestCompactor.dynamicPartitioningUpdate()

This will currently create 1 bucket file in each partition.  This may break Bucket based joins on MR since they expect to always have a full complement of buckets.
Should not be an issue on Tez.

See FileSinkOperator.createBucketForFileIdx()


> acid insert with dynamic partitioning doesn't create empty buckets
> ------------------------------------------------------------------
>
>                 Key: HIVE-11719
>                 URL: https://issues.apache.org/jira/browse/HIVE-11719
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> {code:sql}
> CREATE TABLE T(a INT, b STRING)
>       PARTITIONED BY(ds string)
>       CLUSTERED BY(a) INTO 2 BUCKETS
>       STORED AS ORC TBLPROPERTIES ('transactional'='true')
> insert into T partition (ds) values (1, 'fred', 'today'), (2, 'wilma', 'yesterday')
> {code}
> See TestCompactor.dynamicPartitioningUpdate()
> This will currently create 1 bucket file in each partition.  This may break Bucket based joins on MR since they expect to always have a full complement of buckets.
> Should not be an issue on Tez.
> See FileSinkOperator.createBucketForFileIdx()
> Also, double check that compaction properly handles empty buckets,i.e. does delta/base have full complement of bucket files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)