You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Vaibhav Gumashta (JIRA)" <ji...@apache.org> on 2019/03/15 07:30:00 UTC

[jira] [Resolved] (HIVE-21451) ACID: Avoid using hive.acid.key.index to determine if the file is original or not

     [ https://issues.apache.org/jira/browse/HIVE-21451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vaibhav Gumashta resolved HIVE-21451.
-------------------------------------
    Resolution: Duplicate

Thanks [~pvary]. Marked as dup of HIVE-20580.

> ACID: Avoid using hive.acid.key.index to determine if the file is original or not
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-21451
>                 URL: https://issues.apache.org/jira/browse/HIVE-21451
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 3.1.1
>            Reporter: Vaibhav Gumashta
>            Priority: Major
>
> The transactional files written in hive have each row decorated with {{ROW__ID}} column. However, when we bring in files using {{LOAD DATA...}} command to the transactional tables, they do not have these metadata columns (in Hive ACID parlance, these are called original files). These original files are decorated with an inferred {{ROW__ID}} generated while reading these. However, after these are compacted, the {{ROW__ID}} metadata column, becomes part of the file itself.
> To determine if a file is original or not, currently we use check for the presence of {{hive.acid.key.index}}. For query based compaction, currently we do not write {{hive.acid.key.index}} (HIVE-21165). This means, there is a possibility that that even after compaction, they get treated as original files.
> Irrespective of HIVE-21165, we should avoid {{hive.acid.key.index}} to decide whether the file is original or not, and instead look for the presence of {{ROW__ID}} to do that. {{hive.acid.key.index}} should be treated as a performance optimization, as it was seemingly meant to be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)