You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2018/06/14 01:47:00 UTC

[jira] [Updated] (HIVE-19891) inserting into external tables with custom partition directories may cause data loss

     [ https://issues.apache.org/jira/browse/HIVE-19891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated HIVE-19891:
------------------------------------
    Description: 
tbl1 is just used as a prop to create data, could be an existing directory for an external table.
Due to weird behavior of LoadTableDesc (some ancient code for overriding old partition path), custom partition path is overwritten after the query and the data in it ceases being a part of the table (can be seen in desc formatted output with masking commented out in QTestUtil)
This affects branch-1 too, so it's pretty old.

{noformat}drop table tbl1;
CREATE TABLE tbl1 (index int, value int ) PARTITIONED BY ( created_date string );
insert into tbl1 partition(created_date='2018-02-01') VALUES (2, 2);

CREATE external TABLE tbl2 (index int, value int ) PARTITIONED BY ( created_date string );
ALTER TABLE tbl2 ADD PARTITION(created_date='2018-02-01');
ALTER TABLE tbl2 PARTITION(created_date='2018-02-01') SET LOCATION 'file:/Users/sergey/git/hivegit/itests/qtest/target/warehouse/tbl1/created_date=2018-02-01';
select * from tbl2;
describe formatted tbl2 partition(created_date='2018-02-01');
insert into tbl2 partition(created_date='2018-02-01') VALUES (1, 1);
select * from tbl2;
describe formatted tbl2 partition(created_date='2018-02-01');
{noformat}


> inserting into external tables with custom partition directories may cause data loss
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-19891
>                 URL: https://issues.apache.org/jira/browse/HIVE-19891
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> tbl1 is just used as a prop to create data, could be an existing directory for an external table.
> Due to weird behavior of LoadTableDesc (some ancient code for overriding old partition path), custom partition path is overwritten after the query and the data in it ceases being a part of the table (can be seen in desc formatted output with masking commented out in QTestUtil)
> This affects branch-1 too, so it's pretty old.
> {noformat}drop table tbl1;
> CREATE TABLE tbl1 (index int, value int ) PARTITIONED BY ( created_date string );
> insert into tbl1 partition(created_date='2018-02-01') VALUES (2, 2);
> CREATE external TABLE tbl2 (index int, value int ) PARTITIONED BY ( created_date string );
> ALTER TABLE tbl2 ADD PARTITION(created_date='2018-02-01');
> ALTER TABLE tbl2 PARTITION(created_date='2018-02-01') SET LOCATION 'file:/Users/sergey/git/hivegit/itests/qtest/target/warehouse/tbl1/created_date=2018-02-01';
> select * from tbl2;
> describe formatted tbl2 partition(created_date='2018-02-01');
> insert into tbl2 partition(created_date='2018-02-01') VALUES (1, 1);
> select * from tbl2;
> describe formatted tbl2 partition(created_date='2018-02-01');
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)