You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Peter Vary (Jira)" <ji...@apache.org> on 2019/11/04 09:59:00 UTC

[jira] [Commented] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe

    [ https://issues.apache.org/jira/browse/HIVE-22420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966523#comment-16966523 ] 

Peter Vary commented on HIVE-22420:
-----------------------------------

[~hamvas.aron]: LGTM +1, as discussed please throw an exception in startHeartBeat, since that is still not reentrant.

> DbTxnManager.stopHeartbeat() should be thread-safe
> --------------------------------------------------
>
>                 Key: HIVE-22420
>                 URL: https://issues.apache.org/jira/browse/HIVE-22420
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Aron Hamvas
>            Assignee: Aron Hamvas
>            Priority: Major
>         Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch
>
>
> When a transactional query is being executed and interrupted via HS2 close operation request, both the background pool thread executing the query and the HttpHandler thread running the close operation logic will eventually call the below method:
> {noformat}
> Driver.releaseLocksAndCommitOrRollback(commit boolean)
> {noformat}
> Since this method is invoked several times in both threads, it can happen that the two threads invoke it at the same time, and due to a race condition, the txnId field of the DbTxnManager used by both threads could be set to 0 without actually successfully aborting the transaction.
> The root cause is stopHeartbeat() method in DbTxnManager not being thread safe:
> When Thread-1 and Thread-2 enter stopHeartbeat() with very little time difference, Thread-1 might successfully cancel the heartbeat task and set the heartbeatTask field to null, while Thread-2 is trying to observe its state. Thread-1 will return to the calling rollbackTxn() method and continue execution there, while Thread-2 wis thrown back to the same method with a NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is sending this 0 value to HMS. So, the txn will not be aborted, and the locks cannot be released later on either.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)