You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Pau Tallada Crespí (JIRA)" <ji...@apache.org> on 2018/06/28 13:31:00 UTC

[jira] [Commented] (HIVE-12895) Bucket files not renamed with multiple insert overwrite table statements

    [ https://issues.apache.org/jira/browse/HIVE-12895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526304#comment-16526304 ] 

Pau Tallada Crespí commented on HIVE-12895:
-------------------------------------------

Hi,

Any progress on this?

We just hit the bug doing a single INSERT OVERWRITE into a dynamically partitioned table

Tbl: PARTITIONED BY (col1) CLUSTERED BY (col2) INTO 2048 BUCKETS

INSERT OVERWRITE TABLE Tbl PARTITION (col1)
SELECT a, b, c, col2, col1
FROM other_table
JOIN another_table
ON condition
WHERE criteria
DISTRIBUTE BY col1;

 

> Bucket files not renamed with multiple insert overwrite table statements
> ------------------------------------------------------------------------
>
>                 Key: HIVE-12895
>                 URL: https://issues.apache.org/jira/browse/HIVE-12895
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Charles Pritchard
>            Priority: Major
>
> With two tables that have different cluster by columns, using multiple INSERT OVERWRITE TABLE syntax results in the output files of one of the tables being named "_bucket_number_0" which should have clearly been renamed to the usual "00000_0" style. The temporary filename is not picked up for later selects, making this a more urgent issue.
> This is with:
> Tbl1: CLUSTERED BY (col1) SORTED BY(col1) INTO 1 BUCKETS;
> Tbl2: CLUSTERED BY (col2) SORTED BY(col2) INTO 1 BUCKETS;
> FROM statement
> INSERT OVERWRITE TABLE tbl1 select...
> INSERT OVERWRITE TABLE tbl2 select...;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)