You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Karen Coppage (Jira)" <ji...@apache.org> on 2019/11/14 15:49:00 UTC

[jira] [Commented] (HIVE-21266) Issue with single delta file

    [ https://issues.apache.org/jira/browse/HIVE-21266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974375#comment-16974375 ] 

Karen Coppage commented on HIVE-21266:
--------------------------------------

Translation: 

- Suppose you have 1 delta file from streaming ingest: {{delta_11_20,}} where {{txnid:13}} was aborted.

- 1 delta file is not eligible for compaction. Compaction is skipped.

- Whether compaction is run or not, markCleaned() is called on the compaction. markCleaned() drops metadata, which includes info about which transactions were aborted. Therefore the information that transaction 13 was aborted is lost.

> Issue with single delta file
> ----------------------------
>
>                 Key: HIVE-21266
>                 URL: https://issues.apache.org/jira/browse/HIVE-21266
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 4.0.0
>            Reporter: Eugene Koifman
>            Assignee: Karen Coppage
>            Priority: Major
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java#L353-L357]
>  
> {noformat}
> if ((deltaCount + (dir.getBaseDirectory() == null ? 0 : 1)) + origCount <= 1) {
>       LOG.debug("Not compacting {}; current base is {} and there are {} deltas and {} originals", sd.getLocation(), dir
>           .getBaseDirectory(), deltaCount, origCount);
>       return;
>     }
>  {noformat}
> Is problematic.
> Suppose you have 1 delta file from streaming ingest: {{delta_11_20}} where {{txnid:13}} was aborted.  The code above will not rewrite the delta (which drops anything that belongs to the aborted txn) and transition the compaction to "ready_for_cleaning" state which will drop the metadata about the aborted txn in {{markCleaned()}}.  Now aborted data will come back as committed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)