You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/02/27 10:07:00 UTC

[jira] [Updated] (HIVE-22941) Empty files are inserted into external tables after HIVE-21714

     [ https://issues.apache.org/jira/browse/HIVE-22941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

László Bodor updated HIVE-22941:
--------------------------------
    Description: 
There were multiple patches targeting an issue when INSERT OVERWRITE was ineffective if the input is empty:
HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is empty
HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the input is empty

From these patches, HIVE-21714 seems to have a bad effect on external tables, because of this part:
https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268

The issue was that the original files in the table survived an insert overwrite, and select(*)>0 was after that. HIVE-21714 seems to enable writing empty files regardless of execution engine, which is not the proper way, as the proper solution would be to completely avoid writing empty files for Tez (this is what HIVE-14014 was about). I found that changing to logic to...
{code}
if (!isTez && (isStreaming || this.isInsertOverwrite)) 
{code}
(which could be an easy solution for external tables) breaks some test cases (both full ACID and MM) in insert_overwrite.q, which could mean they rely somehow on the empty generated file. We need to find a proper solution which is applicable for all table types without polluting external tables.

> Empty files are inserted into external tables after HIVE-21714
> --------------------------------------------------------------
>
>                 Key: HIVE-22941
>                 URL: https://issues.apache.org/jira/browse/HIVE-22941
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> There were multiple patches targeting an issue when INSERT OVERWRITE was ineffective if the input is empty:
> HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
> HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is empty
> HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the input is empty
> From these patches, HIVE-21714 seems to have a bad effect on external tables, because of this part:
> https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268
> The issue was that the original files in the table survived an insert overwrite, and select(*)>0 was after that. HIVE-21714 seems to enable writing empty files regardless of execution engine, which is not the proper way, as the proper solution would be to completely avoid writing empty files for Tez (this is what HIVE-14014 was about). I found that changing to logic to...
> {code}
> if (!isTez && (isStreaming || this.isInsertOverwrite)) 
> {code}
> (which could be an easy solution for external tables) breaks some test cases (both full ACID and MM) in insert_overwrite.q, which could mean they rely somehow on the empty generated file. We need to find a proper solution which is applicable for all table types without polluting external tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)