You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Maxim Muzafarov (Jira)" <ji...@apache.org> on 2019/10/03 11:09:00 UTC

[jira] [Updated] (IGNITE-6527) Deadlock detection works incorrectly with some timeouts that haven't caused by deadlocks.

     [ https://issues.apache.org/jira/browse/IGNITE-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maxim Muzafarov updated IGNITE-6527:
------------------------------------
    Fix Version/s:     (was: 2.8)
                   2.9

> Deadlock detection works incorrectly with some timeouts that haven't caused by deadlocks.
> -----------------------------------------------------------------------------------------
>
>                 Key: IGNITE-6527
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6527
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.3
>            Reporter: Vitaliy Biryukov
>            Assignee: Andrey N. Gura
>            Priority: Major
>             Fix For: 2.9
>
>         Attachments: TxOptimisticDeadlockDetectionIncorrectMessageTest.java
>
>
> Deadlock detection works incorrectly with timeouts that haven't caused by deadlocks. In case of a deadlock in future. Or can detect another deadlock which was not the cause of timeout.
> *requested keys:* keys primary for the same node and blocking in sequential order during the timeout (or all keys that haven't locked by an optimistic transaction in case of near cache).
> *candidates:* keys candidates to be locked on a primary node (entries contains in  GridDhtTxLocal). 
> In the process of updating the Wait-For-Graph requested keys used as candidates.  But "TxDeadlock.toString" method use candidates which were received from messages. 
> 1) It causes an incorrect error message.
> Example: 
> K1: TX1 holds lock, TX2 waits lock.
> K2: TX3 holds lock, TX1 waits lock.
> Transactions:
> TX1 [txId=GridCacheVersion [topVer=118090802, order=1506610794980, nodeOrder=1], nodeId=f03b1ae3-a100-479c-9671-11d5cef00000, threadId=455]
> TX2 [txId=GridCacheVersion [topVer=118090802, order=1506610794980, nodeOrder=2], nodeId=2c0c0e78-cab2-4b23-a985-4965e4200001, threadId=456]
> TX3 [txId=GridCacheVersion [topVer=118090802, order=1506610794980, nodeOrder=3], nodeId=3340dc48-f1a1-4ea8-8742-19b314300002, threadId=457]
> Keys:
> K1 [key=6, cache=cache]
> K2 [key=1, cache=cache]
> 2) DD can detect another deadlock which was not the cause of timeout but it would be the cause if the current deadlock did not happen.
> These are very rare situations, but they can happen.
> I see several solutions:
> * Just make a correct message.
> * log warn and continue detecting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)