You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ádám Szita (Jira)" <ji...@apache.org> on 2022/11/15 10:01:00 UTC

[jira] [Updated] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

     [ https://issues.apache.org/jira/browse/HIVE-26133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ádám Szita updated HIVE-26133:
------------------------------
    Component/s: Iceberg integration

> Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26133
>                 URL: https://issues.apache.org/jira/browse/HIVE-26133
>             Project: Hive
>          Issue Type: Improvement
>          Components: Iceberg integration
>            Reporter: László Pintér
>            Assignee: László Pintér
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/ffffgggg.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/ffffgggg.orc
> If you then want to overwrite the table with itself, it will detect these two records to belong to different partitions (as they do), and therefore does not overwrite the original record with the new one, resulting in duplicate entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> ------------------------------+
> testice1000.a testice1000.b
> ------------------------------+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> ------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)