You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "J.P Feng (JIRA)" <ji...@apache.org> on 2018/09/19 08:56:00 UTC

[jira] [Updated] (HIVE-20594) insert overwrite may brings duplicated data when hdfs path exists but partition missing in hms

     [ https://issues.apache.org/jira/browse/HIVE-20594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

J.P Feng updated HIVE-20594:
----------------------------
    Attachment: HIVE-20594.patch

> insert overwrite may brings duplicated data when hdfs path exists but partition missing in hms
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20594
>                 URL: https://issues.apache.org/jira/browse/HIVE-20594
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.1.1
>            Reporter: J.P Feng
>            Priority: Major
>         Attachments: HIVE-20594.patch
>
>
> when i insert overwrite a partitioned table whose hdfs path exists but its partition is missing from hms, i will get the duplicated data.
>  
> sql: insert overwrite table hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns partition (month = '201808' ) select * from xxx where month = '201808';
>  
> 1. there is 10 files in hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns
>     month=201808/000001_0
>     month=201808/000002_0 ... month=201808/000009_0
> 2. if hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns is a external table and i drop partition (month=201808) / or in other ways, i drop partition (month=201808) but do not remove the data under it
> 3.insert overwrite table hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns partition (month = '201808' ) select * from xxx where month = '201808' 
> if in such sql, it generates 9 maps, and may generates 9 files :
> month=201808/000001_0 ~ month=201808/000008_0
>  
> after executing such sql, we may find the file `month=201808/000009_0` will still remain, then we may get the duplicated data.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)