You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Abhishek Somani (JIRA)" <ji...@apache.org> on 2016/08/30 05:54:20 UTC
[jira] [Comment Edited] (HIVE-14633) #.of Files in a partition ! = #.Of buckets in a partitioned,bucketed table

    [ https://issues.apache.org/jira/browse/HIVE-14633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448135#comment-15448135 ] 

Abhishek Somani edited comment on HIVE-14633 at 8/30/16 5:54 AM:
-----------------------------------------------------------------

Isn't this expected? Insert into will just create those copy files you see, with the same bucket id as seen above. This is not expected to affect any functionality and hive takes care of those copies correctly. Others can confirm.

Do you seen any functionality broken due to this?


was (Author: asomani):
I think this is expected. Insert into will just create those copy files you see, with the same bucket id as seen above. This is not expected to affect any functionality and hive takes care of those copies correctly. Others can confirm.

Do you seen any functionality broken due to this?

> #.of Files in a partition ! = #.Of buckets in a partitioned,bucketed table
> --------------------------------------------------------------------------
>
>                 Key: HIVE-14633
>                 URL: https://issues.apache.org/jira/browse/HIVE-14633
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>         Environment: HDP 2.3.2
>            Reporter: Hanu
>
> Ideally the number of files should be equal to number of buckets declared in a table DDL. It is working fine whenever an initial insert or every insert overwrite is performed. But, insert into hive bucketed table is creating extra files. 
> ex:
> # of Buckets = 4
> No. of files after Initial insert --> 4
> No. of files after 2nd insert --> 8
> No. of files after 3rd insert --> 12
> No. of files after n insert --> n* # of Buckets.
> First insert list : 
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0
> -rwxrwxrwx   3 hvallur hdfs        308 2016-08-25 12:42 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0
> 2nd Insert:
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:47 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:47 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs        308 2016-08-25 12:42 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0
> -rwxrwxrwx   3 hvallur hdfs        302 2016-08-25 12:47 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:42 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0
> -rwxrwxrwx   3 hvallur hdfs         49 2016-08-25 12:47 hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0_copy_1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)