You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2018/03/16 15:12:05 UTC

[jira] [Commented] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

    [ https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402027#comment-16402027 ] 

Sankar Hariappan commented on HIVE-18747:
-----------------------------------------

Attached 01.patch with below changes.
 # Added MIN_HISTORY_LEVEL to mark the min_uncommitted_txn referred by any ongoing txn.
 # If any txn aborted, it would retain an entry in MIN_HISTORY_LEVEL to avoid any future txn to read aborted data. Now, compactor would remove this entry once the aborted delta directories are deleted by cleaner.
 # If no uncommitted txn in the system when we take the snapshot, it means all data are committed but to avoid future txn from reading data written by current txn, an entry would be added by current txn to mark itself as uncommitted.
 # Cleaner would use MIN_HISTORY_LEVEL table and cleanup the entries in TXN_TO_WRITE_ID.
 ## It would retain one entry (highest txn <= min_uncommitted_txn) as LWM for preserving writeId_hwm for future txns.
 ## The min_uncommitted_txn calculation follows the below elimination logic.
 ### Assume  min_uncommitted_txn = latest txn Id allocated in the system.
 ### If MIN_HISTORY_LEVEL is non-empty, then overwrite min_uncommitted_txn = min(uncommitted_txn).
 ### Else if MIN_HISTORY_LEVEL is empty (if all txns are committed), then traverse through TXNS table to find min_uncommitted_txn = min(open_txn). 

[~ekoifman], can you please review the patch?

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> ------------------------------------------------------------------
>
>                 Key: HIVE-18747
>                 URL: https://issues.apache.org/jira/browse/HIVE-18747
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Minor
>              Labels: ACID
>             Fix For: 3.0.0
>
>         Attachments: HIVE-18747.01.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it is necessary to clean-up for active tables too for better performance.
> Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which is referred by any active ValidTxnList snapshot as open/aborted txn. If no references found in this table for any txn, then it is eligible for cleanup.
> After clean-up, need to maintain just one entry (highest txn <= min_uncommitted_txn) per table to mark as LWM (low water mark).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)