You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Barna Zsombor Klara (JIRA)" <ji...@apache.org> on 2016/08/19 02:59:20 UTC

[jira] [Commented] (HIVE-14427) CompactionTxnHandler.markCleaned() can delete aborted txns

    [ https://issues.apache.org/jira/browse/HIVE-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427526#comment-15427526 ] 

Barna Zsombor Klara commented on HIVE-14427:
--------------------------------------------

Hi [~ekoifman]

are you currently working on this, or could I take a look at it?

Thanks,
Zsombor

> CompactionTxnHandler.markCleaned() can delete aborted txns
> ----------------------------------------------------------
>
>                 Key: HIVE-14427
>                 URL: https://issues.apache.org/jira/browse/HIVE-14427
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>            Reporter: Eugene Koifman
>
> We can modify 
> {noformat}
> s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and txn_state = '" +
>           TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table = '" +
>           info.tableName + "'" + (info.highestTxnId == 0 ? "" : " and txn_id <= " + info.highestTxnId);
> {noformat}
> to use select txn_id, count(*) ... group by txn_id so that we know the number of components in a TXN.
> Then when running "delete from TXN_COMPONENTS where..." we know how many rows were deleted.
> If the sum of all values from 1st query matched total number of rows deleted, we know that all Aborted txns in this set are empty and thus can be deleted here.
> This means we clean up aborted txns from TXNS table quicker and avoid a large join in _cleanEmptyAbortedTxns()_.  Also, doing delete on TXNS here will have PKs in WHERE clause so it should be cheap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)