You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "John Sherman (Jira)" <ji...@apache.org> on 2023/01/05 16:57:00 UTC

[jira] [Resolved] (HIVE-26875) Transaction conflict retry loop only executes once

     [ https://issues.apache.org/jira/browse/HIVE-26875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sherman resolved HIVE-26875.
---------------------------------
    Resolution: Fixed

Committed to master

> Transaction conflict retry loop only executes once
> --------------------------------------------------
>
>                 Key: HIVE-26875
>                 URL: https://issues.apache.org/jira/browse/HIVE-26875
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: John Sherman
>            Assignee: John Sherman
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently the "conflict retry loop" only executes once.
> [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264]
> The intent of this loop is to detect if a conflicting transaction has committed while we were waiting to acquire locks. If there is a conflicting transaction, it invalidates the snapshot, rolls-back the transaction, opens a new transaction and tries to re-acquire locks (and then recompile). It then checks again if a conflicting transaction has committed and if so, redoes the above steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times.
> However - isValidTxnState relies on getNonSharedLockedTable():
> [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422]
> which does:
> {code:java}
>   private Set<String> getNonSharedLockedTables() {
>     if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) {
>       return Collections.emptySet(); // Nothing to check
>     }{code}
> getHiveLocks gets populated by lockAndRespond... HOWEVER -
> compileInternal ends up calling compile which ends up calling preparForCompile which ends up calling prepareContext which ends up destroying the context with the information lockAndRespond populated. So when the loop executes after all of this, it will never detect a 2nd conflict because isValidTxnState will always return true (because it thinks there are no locked objects).
> This manifests as duplicate records being created during concurrent UPDATEs if a transaction get conflicted twice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)