You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Andrey Gura (JIRA)" <ji...@apache.org> on 2018/08/28 10:36:00 UTC

[jira] [Commented] (IGNITE-6527) Deadlock detection works incorrectly with some timeouts that haven't caused by deadlocks.

    [ https://issues.apache.org/jira/browse/IGNITE-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594804#comment-16594804 ] 

Andrey Gura commented on IGNITE-6527:
-------------------------------------

[~VitaliyB] Could you please move tickets to the "Path available" status in the future when your code is ready for review. Otherwise, nobody see this ticket in the boards.
Could you please also check that issue is sill reproducible and fixes are actual?

> Deadlock detection works incorrectly with some timeouts that haven't caused by deadlocks.
> -----------------------------------------------------------------------------------------
>
>                 Key: IGNITE-6527
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6527
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.3
>            Reporter: Vitaliy Biryukov
>            Assignee: Andrey Gura
>            Priority: Major
>             Fix For: 2.8
>
>         Attachments: TxOptimisticDeadlockDetectionIncorrectMessageTest.java
>
>
> Deadlock detection works incorrectly with timeouts that haven't caused by deadlocks. In case of a deadlock in future. Or can detect another deadlock which was not the cause of timeout.
> *requested keys:* keys primary for the same node and blocking in sequential order during the timeout (or all keys that haven't locked by an optimistic transaction in case of near cache).
> *candidates:* keys candidates to be locked on a primary node (entries contains in  GridDhtTxLocal). 
> In the process of updating the Wait-For-Graph requested keys used as candidates.  But "TxDeadlock.toString" method use candidates which were received from messages. 
> 1) It causes an incorrect error message.
> Example: 
> K1: TX1 holds lock, TX2 waits lock.
> K2: TX3 holds lock, TX1 waits lock.
> Transactions:
> TX1 [txId=GridCacheVersion [topVer=118090802, order=1506610794980, nodeOrder=1], nodeId=f03b1ae3-a100-479c-9671-11d5cef00000, threadId=455]
> TX2 [txId=GridCacheVersion [topVer=118090802, order=1506610794980, nodeOrder=2], nodeId=2c0c0e78-cab2-4b23-a985-4965e4200001, threadId=456]
> TX3 [txId=GridCacheVersion [topVer=118090802, order=1506610794980, nodeOrder=3], nodeId=3340dc48-f1a1-4ea8-8742-19b314300002, threadId=457]
> Keys:
> K1 [key=6, cache=cache]
> K2 [key=1, cache=cache]
> 2) DD can detect another deadlock which was not the cause of timeout but it would be the cause if the current deadlock did not happen.
> These are very rare situations, but they can happen.
> I see several solutions:
> * Just make a correct message.
> * log warn and continue detecting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)