You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Thomas Kramer <do...@gmx.de> on 2022/09/05 20:48:25 UTC

Deadlock analysis

I'm experiencing a transaction deadlock and would like to understand how
to find out the cause of it.

Snipped from the log I get:

/Deadlock detected:

K1: TX1 holds lock, TX2 waits lock.
K2: TX2 holds lock, TX1 waits lock.

Transactions:

TX1 [txId=GridCacheVersion [topVer=273263429, order=1661784224309,
nodeOrder=4, dataCenterId=0],
nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
TX2 [txId=GridCacheVersion [topVer=273263429, order=1661784224257,
nodeOrder=14, dataCenterId=0],
nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]

Keys:

K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]/

I can see that the same key (e9228c1) is used in a transaction on two
different nodes.

Ignite documentation says: /"One major rule that you must follow when
working with distributed transactions is that locks for the keys
participating in a transaction must be acquired in the same order.
Violating this rule can lead to a distributed deadlock."/

If the order of keys in the transaction must be in the same order, how
can the same key cause a deadlock here? Is it because it's in two
different caches? Maybe I don't fully understand how the transaction
lock works.

Is there a code sample that demonstrates a potential violation? How can
I now try to find in my source code where the issue happens on both nodes?

Thanks,
Thomas.

Re: Deadlock analysis

Posted by Thomas Kramer <do...@gmx.de>.

Perfect, thanks a lot. I get it now!


On 15.09.22 12:33, Stephen Darlington wrote:
> Ignite locks /rows/ not /caches/, so no, that would not cause a deadlock.
>
> This could cause a deadlock:
>
> First:
>
> #1 tx.start();
> #2 cacheA.put(1, 1);
> #3 cacheB.put(2, 2);
> #4 tx.commit();
>
> Second:
>
> #5 tx.start();
> #6 cacheB.put(2, 2);
> #7 cacheA.put(1, 1);
> #8 tx.commit();
>
> Both these operations can be executing in parallel. The first
> transaction starts. Locks record 1 in Cache A. Meanwhile, the second
> starts and locks record 2 in cache B.
>
> Now, the first transaction tries to lock record 2 in cache B, but it
> can’t because it’s locked by the second transaction.
>
> The second transaction tries to lock record 1 in cache A, but it can’t
> because it’s locked by the first transaction.
>
> The solution depends on your use case. You can lock your records in a
> predictable order. Or you can switch to using optimistic locking (in
> which case, one of your transactions will fail on the commit.)
>
>> On 15 Sep 2022, at 10:56, Thomas Kramer <do...@gmx.de> wrote:
>>
>> Modifying previous example. Would this still potentially result in
>> deadlock?
>>
>> First:
>>
>> #1 tx.start();
>> #2 cacheA.put(1, 1);
>> #3 cacheB.put(2, 2);
>> #4 tx.commit();
>>
>> Second:
>>
>> #5 tx.start();
>> #6 cacheB.put(1, 1);
>> #7 cacheA.put(2, 2);
>> #8 tx.commit();
>>
>> Ignite locks cacheA on line #2 in first thread. In parallel second
>> thread blocks cacheB on line #6 and then has to wait on line #7 for
>> blocked cacheA. At the same time first thread must wait on line #3
>> second thread has already locked cacheB in the meantime. So both
>> threads can't continue. Is that understanding correct?
>>
>> Do the keys matter in this scenario for the deadlock or will the
>> cache be locked on any key value?
>>
>> Thanks!
>>
>>
>> On 15.09.22 11:36, Stephen Darlington wrote:
>>> The important part is that they’re both waiting for each other to
>>> complete. Whether it’s one cache or ten is not significant.
>>>
>>>> On 14 Sep 2022, at 12:44, Thomas Kramer <do...@gmx.de> wrote:
>>>>
>>>> OK, that makes sense. However, in my logs below the deadlock says
>>>> it's in two different caches. How does this work?
>>>>
>>>> /K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]/
>>>>
>>>>
>>>> On 14.09.22 11:59, Николай Ижиков wrote:
>>>>> Basically, deadlock looks like the following:
>>>>>
>>>>> First:
>>>>>
>>>>> tx.start();
>>>>>
>>>>> cache.put(1, 1);
>>>>> cache.put(2, 2);
>>>>>
>>>>> tx.commit();
>>>>>
>>>>> Second:
>>>>>
>>>>> tx.start();
>>>>>
>>>>> cache.put(2, 2);
>>>>> cache.put(1, 1);
>>>>>
>>>>> tx.commit();
>>>>>
>>>>> So if «first» locks key=1 and «second» locks key=2 concurrently
>>>>> both process hangs trying to lock key=2(key=1) respectively.
>>>>>
>>>>>> 14 сент. 2022 г., в 12:39, Thomas Kramer <do...@gmx.de>
>>>>>> написал(а):
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> does the group here have any suggestion on this? I'm trying to
>>>>>> find the root of the deadlock we're getting on the production
>>>>>> servers from time to time.
>>>>>>
>>>>>> So I'm trying to better understand why this can happen, and maybe
>>>>>> looking for sample code how to demonstrate such scenario in order
>>>>>> to better understand.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> On 05.09.22 22:48, Thomas Kramer wrote:
>>>>>>>
>>>>>>> I'm experiencing a transaction deadlock and would like to
>>>>>>> understand how to find out the cause of it.
>>>>>>>
>>>>>>> Snipped from the log I get:
>>>>>>>
>>>>>>> /Deadlock detected:
>>>>>>>
>>>>>>> K1: TX1 holds lock, TX2 waits lock.
>>>>>>> K2: TX2 holds lock, TX1 waits lock.
>>>>>>>
>>>>>>> Transactions:
>>>>>>>
>>>>>>> TX1 [txId=GridCacheVersion [topVer=273263429,
>>>>>>> order=1661784224309, nodeOrder=4, dataCenterId=0],
>>>>>>> nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
>>>>>>> TX2 [txId=GridCacheVersion [topVer=273263429,
>>>>>>> order=1661784224257, nodeOrder=14, dataCenterId=0],
>>>>>>> nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]
>>>>>>>
>>>>>>> Keys:
>>>>>>>
>>>>>>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>>>>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]/
>>>>>>>
>>>>>>> I can see that the same key (e9228c1) is used in a transaction
>>>>>>> on two different nodes.
>>>>>>>
>>>>>>> Ignite documentation says: /"One major rule that you must follow
>>>>>>> when working with distributed transactions is that locks for the
>>>>>>> keys participating in a transaction must be acquired in the same
>>>>>>> order. Violating this rule can lead to a distributed deadlock."/
>>>>>>>
>>>>>>> If the order of keys in the transaction must be in the same
>>>>>>> order, how can the same key cause a deadlock here? Is it because
>>>>>>> it's in two different caches? Maybe I don't fully understand how
>>>>>>> the transaction lock works.
>>>>>>>
>>>>>>> Is there a code sample that demonstrates a potential violation?
>>>>>>> How can I now try to find in my source code where the issue
>>>>>>> happens on both nodes?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Thomas.
>>>>>>>
>>>>>>>
>>>>>
>>>
>

Re: Deadlock analysis

Posted by Stephen Darlington <st...@gridgain.com>.

Ignite locks rows not caches, so no, that would not cause a deadlock.

This could cause a deadlock:

First:

#1 tx.start();
#2 cacheA.put(1, 1);
#3 cacheB.put(2, 2);
#4 tx.commit();

Second:

#5 tx.start();
#6 cacheB.put(2, 2);
#7 cacheA.put(1, 1);
#8 tx.commit();

Both these operations can be executing in parallel. The first transaction starts. Locks record 1 in Cache A. Meanwhile, the second starts and locks record 2 in cache B.

Now, the first transaction tries to lock record 2 in cache B, but it can’t because it’s locked by the second transaction.

The second transaction tries to lock record 1 in cache A, but it can’t because it’s locked by the first transaction.

The solution depends on your use case. You can lock your records in a predictable order. Or you can switch to using optimistic locking (in which case, one of your transactions will fail on the commit.)

> On 15 Sep 2022, at 10:56, Thomas Kramer <do...@gmx.de> wrote:
> 
> Modifying previous example. Would this still potentially result in deadlock?
> 
> First:
> 
> #1 tx.start();
> #2 cacheA.put(1, 1);
> #3 cacheB.put(2, 2);
> #4 tx.commit();
> 
> Second:
> 
> #5 tx.start();
> #6 cacheB.put(1, 1);
> #7 cacheA.put(2, 2);
> #8 tx.commit();
> 
> Ignite locks cacheA on line #2 in first thread. In parallel second thread blocks cacheB on line #6 and then has to wait on line #7 for blocked cacheA. At the same time first thread must wait on line #3 second thread has already locked cacheB in the meantime. So both threads can't continue. Is that understanding correct?
> 
> Do the keys matter in this scenario for the deadlock or will the cache be locked on any key value?
> 
> Thanks!
> 
> 
> On 15.09.22 11:36, Stephen Darlington wrote:
>> The important part is that they’re both waiting for each other to complete. Whether it’s one cache or ten is not significant.
>> 
>>> On 14 Sep 2022, at 12:44, Thomas Kramer <don.tequila@gmx.de <ma...@gmx.de>> wrote:
>>> 
>>> OK, that makes sense. However, in my logs below the deadlock says it's in two different caches. How does this work?
>>> 
>>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]
>>> 
>>> 
>>> On 14.09.22 11:59, Николай Ижиков wrote:
>>>> Basically, deadlock looks like the following:
>>>> 
>>>> First:
>>>> 
>>>> tx.start();
>>>> 
>>>> cache.put(1, 1);
>>>> cache.put(2, 2);
>>>> 
>>>> tx.commit();
>>>> 
>>>> Second:
>>>> 
>>>> tx.start();
>>>> 
>>>> cache.put(2, 2);
>>>> cache.put(1, 1);
>>>> 
>>>> tx.commit();
>>>> 
>>>> So if «first» locks key=1 and «second» locks key=2 concurrently both process hangs trying to lock key=2(key=1) respectively. 
>>>> 
>>>>> 14 сент. 2022 г., в 12:39, Thomas Kramer <don.tequila@gmx.de <ma...@gmx.de>> написал(а):
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> does the group here have any suggestion on this? I'm trying to find the root of the deadlock we're getting on the production servers from time to time.
>>>>> 
>>>>> So I'm trying to better understand why this can happen, and maybe looking for sample code how to demonstrate such scenario in order to better understand.
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> 
>>>>> 
>>>>> On 05.09.22 22:48, Thomas Kramer wrote:
>>>>>> I'm experiencing a transaction deadlock and would like to understand how to find out the cause of it.
>>>>>> 
>>>>>> Snipped from the log I get:
>>>>>> Deadlock detected:
>>>>>> 
>>>>>> K1: TX1 holds lock, TX2 waits lock.
>>>>>> K2: TX2 holds lock, TX1 waits lock.
>>>>>> 
>>>>>> Transactions:
>>>>>> 
>>>>>> TX1 [txId=GridCacheVersion [topVer=273263429, order=1661784224309, nodeOrder=4, dataCenterId=0], nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
>>>>>> TX2 [txId=GridCacheVersion [topVer=273263429, order=1661784224257, nodeOrder=14, dataCenterId=0], nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]
>>>>>> 
>>>>>> Keys:
>>>>>> 
>>>>>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>>>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]
>>>>>> 
>>>>>> I can see that the same key (e9228c1) is used in a transaction on two different nodes.
>>>>>> 
>>>>>> Ignite documentation says: "One major rule that you must follow when working with distributed transactions is that locks for the keys participating in a transaction must be acquired in the same order. Violating this rule can lead to a distributed deadlock."
>>>>>> 
>>>>>> If the order of keys in the transaction must be in the same order, how can the same key cause a deadlock here? Is it because it's in two different caches? Maybe I don't fully understand how the transaction lock works.
>>>>>> 
>>>>>> Is there a code sample that demonstrates a potential violation? How can I now try to find in my source code where the issue happens on both nodes?
>>>>>> 
>>>>>> Thanks,
>>>>>> Thomas.
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>

Re: Deadlock analysis

Posted by Thomas Kramer <do...@gmx.de>.

Modifying previous example. Would this still potentially result in
deadlock?

First:

#1 tx.start();
#2 cacheA.put(1, 1);
#3 cacheB.put(2, 2);
#4 tx.commit();

Second:

#5 tx.start();
#6 cacheB.put(1, 1);
#7 cacheA.put(2, 2);
#8 tx.commit();

Ignite locks cacheA on line #2 in first thread. In parallel second
thread blocks cacheB on line #6 and then has to wait on line #7 for
blocked cacheA. At the same time first thread must wait on line #3
second thread has already locked cacheB in the meantime. So both threads
can't continue. Is that understanding correct?

Do the keys matter in this scenario for the deadlock or will the cache
be locked on any key value?

Thanks!


On 15.09.22 11:36, Stephen Darlington wrote:
> The important part is that they’re both waiting for each other to
> complete. Whether it’s one cache or ten is not significant.
>
>> On 14 Sep 2022, at 12:44, Thomas Kramer <do...@gmx.de> wrote:
>>
>> OK, that makes sense. However, in my logs below the deadlock says
>> it's in two different caches. How does this work?
>>
>> /K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]/
>>
>>
>> On 14.09.22 11:59, Николай Ижиков wrote:
>>> Basically, deadlock looks like the following:
>>>
>>> First:
>>>
>>> tx.start();
>>>
>>> cache.put(1, 1);
>>> cache.put(2, 2);
>>>
>>> tx.commit();
>>>
>>> Second:
>>>
>>> tx.start();
>>>
>>> cache.put(2, 2);
>>> cache.put(1, 1);
>>>
>>> tx.commit();
>>>
>>> So if «first» locks key=1 and «second» locks key=2 concurrently both
>>> process hangs trying to lock key=2(key=1) respectively.
>>>
>>>> 14 сент. 2022 г., в 12:39, Thomas Kramer <do...@gmx.de>
>>>> написал(а):
>>>>
>>>> Hi,
>>>>
>>>> does the group here have any suggestion on this? I'm trying to find
>>>> the root of the deadlock we're getting on the production servers
>>>> from time to time.
>>>>
>>>> So I'm trying to better understand why this can happen, and maybe
>>>> looking for sample code how to demonstrate such scenario in order
>>>> to better understand.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On 05.09.22 22:48, Thomas Kramer wrote:
>>>>>
>>>>> I'm experiencing a transaction deadlock and would like to
>>>>> understand how to find out the cause of it.
>>>>>
>>>>> Snipped from the log I get:
>>>>>
>>>>> /Deadlock detected:
>>>>>
>>>>> K1: TX1 holds lock, TX2 waits lock.
>>>>> K2: TX2 holds lock, TX1 waits lock.
>>>>>
>>>>> Transactions:
>>>>>
>>>>> TX1 [txId=GridCacheVersion [topVer=273263429, order=1661784224309,
>>>>> nodeOrder=4, dataCenterId=0],
>>>>> nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
>>>>> TX2 [txId=GridCacheVersion [topVer=273263429, order=1661784224257,
>>>>> nodeOrder=14, dataCenterId=0],
>>>>> nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]
>>>>>
>>>>> Keys:
>>>>>
>>>>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]/
>>>>>
>>>>> I can see that the same key (e9228c1) is used in a transaction on
>>>>> two different nodes.
>>>>>
>>>>> Ignite documentation says: /"One major rule that you must follow
>>>>> when working with distributed transactions is that locks for the
>>>>> keys participating in a transaction must be acquired in the same
>>>>> order. Violating this rule can lead to a distributed deadlock."/
>>>>>
>>>>> If the order of keys in the transaction must be in the same order,
>>>>> how can the same key cause a deadlock here? Is it because it's in
>>>>> two different caches? Maybe I don't fully understand how the
>>>>> transaction lock works.
>>>>>
>>>>> Is there a code sample that demonstrates a potential violation?
>>>>> How can I now try to find in my source code where the issue
>>>>> happens on both nodes?
>>>>>
>>>>> Thanks,
>>>>> Thomas.
>>>>>
>>>>>
>>>
>

Re: Deadlock analysis

Posted by Stephen Darlington <st...@gridgain.com>.

The important part is that they’re both waiting for each other to complete. Whether it’s one cache or ten is not significant.

> On 14 Sep 2022, at 12:44, Thomas Kramer <do...@gmx.de> wrote:
> 
> OK, that makes sense. However, in my logs below the deadlock says it's in two different caches. How does this work?
> 
> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]
> 
> 
> On 14.09.22 11:59, Николай Ижиков wrote:
>> Basically, deadlock looks like the following:
>> 
>> First:
>> 
>> tx.start();
>> 
>> cache.put(1, 1);
>> cache.put(2, 2);
>> 
>> tx.commit();
>> 
>> Second:
>> 
>> tx.start();
>> 
>> cache.put(2, 2);
>> cache.put(1, 1);
>> 
>> tx.commit();
>> 
>> So if «first» locks key=1 and «second» locks key=2 concurrently both process hangs trying to lock key=2(key=1) respectively. 
>> 
>>> 14 сент. 2022 г., в 12:39, Thomas Kramer <don.tequila@gmx.de <ma...@gmx.de>> написал(а):
>>> 
>>> Hi,
>>> 
>>> does the group here have any suggestion on this? I'm trying to find the root of the deadlock we're getting on the production servers from time to time.
>>> 
>>> So I'm trying to better understand why this can happen, and maybe looking for sample code how to demonstrate such scenario in order to better understand.
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>> On 05.09.22 22:48, Thomas Kramer wrote:
>>>> I'm experiencing a transaction deadlock and would like to understand how to find out the cause of it.
>>>> 
>>>> Snipped from the log I get:
>>>> Deadlock detected:
>>>> 
>>>> K1: TX1 holds lock, TX2 waits lock.
>>>> K2: TX2 holds lock, TX1 waits lock.
>>>> 
>>>> Transactions:
>>>> 
>>>> TX1 [txId=GridCacheVersion [topVer=273263429, order=1661784224309, nodeOrder=4, dataCenterId=0], nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
>>>> TX2 [txId=GridCacheVersion [topVer=273263429, order=1661784224257, nodeOrder=14, dataCenterId=0], nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]
>>>> 
>>>> Keys:
>>>> 
>>>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]
>>>> 
>>>> I can see that the same key (e9228c1) is used in a transaction on two different nodes.
>>>> 
>>>> Ignite documentation says: "One major rule that you must follow when working with distributed transactions is that locks for the keys participating in a transaction must be acquired in the same order. Violating this rule can lead to a distributed deadlock."
>>>> 
>>>> If the order of keys in the transaction must be in the same order, how can the same key cause a deadlock here? Is it because it's in two different caches? Maybe I don't fully understand how the transaction lock works.
>>>> 
>>>> Is there a code sample that demonstrates a potential violation? How can I now try to find in my source code where the issue happens on both nodes?
>>>> 
>>>> Thanks,
>>>> Thomas.
>>>> 
>>>> 
>>>> 
>>

Re: Deadlock analysis

Posted by Thomas Kramer <do...@gmx.de>.

OK, that makes sense. However, in my logs below the deadlock says it's
in two different caches. How does this work?

/K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]/


On 14.09.22 11:59, Николай Ижиков wrote:
> Basically, deadlock looks like the following:
>
> First:
>
> tx.start();
>
> cache.put(1, 1);
> cache.put(2, 2);
>
> tx.commit();
>
> Second:
>
> tx.start();
>
> cache.put(2, 2);
> cache.put(1, 1);
>
> tx.commit();
>
> So if «first» locks key=1 and «second» locks key=2 concurrently both
> process hangs trying to lock key=2(key=1) respectively.
>
>> 14 сент. 2022 г., в 12:39, Thomas Kramer <do...@gmx.de> написал(а):
>>
>> Hi,
>>
>> does the group here have any suggestion on this? I'm trying to find
>> the root of the deadlock we're getting on the production servers from
>> time to time.
>>
>> So I'm trying to better understand why this can happen, and maybe
>> looking for sample code how to demonstrate such scenario in order to
>> better understand.
>>
>> Thanks!
>>
>>
>> On 05.09.22 22:48, Thomas Kramer wrote:
>>>
>>> I'm experiencing a transaction deadlock and would like to understand
>>> how to find out the cause of it.
>>>
>>> Snipped from the log I get:
>>>
>>> /Deadlock detected:
>>>
>>> K1: TX1 holds lock, TX2 waits lock.
>>> K2: TX2 holds lock, TX1 waits lock.
>>>
>>> Transactions:
>>>
>>> TX1 [txId=GridCacheVersion [topVer=273263429, order=1661784224309,
>>> nodeOrder=4, dataCenterId=0],
>>> nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
>>> TX2 [txId=GridCacheVersion [topVer=273263429, order=1661784224257,
>>> nodeOrder=14, dataCenterId=0],
>>> nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]
>>>
>>> Keys:
>>>
>>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]/
>>>
>>> I can see that the same key (e9228c1) is used in a transaction on
>>> two different nodes.
>>>
>>> Ignite documentation says: /"One major rule that you must follow
>>> when working with distributed transactions is that locks for the
>>> keys participating in a transaction must be acquired in the same
>>> order. Violating this rule can lead to a distributed deadlock."/
>>>
>>> If the order of keys in the transaction must be in the same order,
>>> how can the same key cause a deadlock here? Is it because it's in
>>> two different caches? Maybe I don't fully understand how the
>>> transaction lock works.
>>>
>>> Is there a code sample that demonstrates a potential violation? How
>>> can I now try to find in my source code where the issue happens on
>>> both nodes?
>>>
>>> Thanks,
>>> Thomas.
>>>
>>>
>

Re: Deadlock analysis

Posted by Николай Ижиков <ni...@apache.org>.

Basically, deadlock looks like the following:

First:

tx.start();

cache.put(1, 1);
cache.put(2, 2);

tx.commit();

Second:

tx.start();

cache.put(2, 2);
cache.put(1, 1);

tx.commit();

So if «first» locks key=1 and «second» locks key=2 concurrently both process hangs trying to lock key=2(key=1) respectively. 

> 14 сент. 2022 г., в 12:39, Thomas Kramer <do...@gmx.de> написал(а):
> 
> Hi,
> 
> does the group here have any suggestion on this? I'm trying to find the root of the deadlock we're getting on the production servers from time to time.
> 
> So I'm trying to better understand why this can happen, and maybe looking for sample code how to demonstrate such scenario in order to better understand.
> 
> Thanks!
> 
> 
> 
> On 05.09.22 22:48, Thomas Kramer wrote:
>> I'm experiencing a transaction deadlock and would like to understand how to find out the cause of it.
>> 
>> Snipped from the log I get:
>> Deadlock detected:
>> 
>> K1: TX1 holds lock, TX2 waits lock.
>> K2: TX2 holds lock, TX1 waits lock.
>> 
>> Transactions:
>> 
>> TX1 [txId=GridCacheVersion [topVer=273263429, order=1661784224309, nodeOrder=4, dataCenterId=0], nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
>> TX2 [txId=GridCacheVersion [topVer=273263429, order=1661784224257, nodeOrder=14, dataCenterId=0], nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]
>> 
>> Keys:
>> 
>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]
>> 
>> I can see that the same key (e9228c1) is used in a transaction on two different nodes.
>> 
>> Ignite documentation says: "One major rule that you must follow when working with distributed transactions is that locks for the keys participating in a transaction must be acquired in the same order. Violating this rule can lead to a distributed deadlock."
>> 
>> If the order of keys in the transaction must be in the same order, how can the same key cause a deadlock here? Is it because it's in two different caches? Maybe I don't fully understand how the transaction lock works.
>> 
>> Is there a code sample that demonstrates a potential violation? How can I now try to find in my source code where the issue happens on both nodes?
>> 
>> Thanks,
>> Thomas.
>> 
>> 
>>

Re: Deadlock analysis

Posted by Thomas Kramer <do...@gmx.de>.

Hi,

does the group here have any suggestion on this? I'm trying to find the
root of the deadlock we're getting on the production servers from time
to time.

So I'm trying to better understand why this can happen, and maybe
looking for sample code how to demonstrate such scenario in order to
better understand.

Thanks!


On 05.09.22 22:48, Thomas Kramer wrote:
>
> I'm experiencing a transaction deadlock and would like to understand
> how to find out the cause of it.
>
> Snipped from the log I get:
>
> /Deadlock detected:
>
> K1: TX1 holds lock, TX2 waits lock.
> K2: TX2 holds lock, TX1 waits lock.
>
> Transactions:
>
> TX1 [txId=GridCacheVersion [topVer=273263429, order=1661784224309,
> nodeOrder=4, dataCenterId=0],
> nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
> TX2 [txId=GridCacheVersion [topVer=273263429, order=1661784224257,
> nodeOrder=14, dataCenterId=0],
> nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]
>
> Keys:
>
> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]/
>
> I can see that the same key (e9228c1) is used in a transaction on two
> different nodes.
>
> Ignite documentation says: /"One major rule that you must follow when
> working with distributed transactions is that locks for the keys
> participating in a transaction must be acquired in the same order.
> Violating this rule can lead to a distributed deadlock."/
>
> If the order of keys in the transaction must be in the same order, how
> can the same key cause a deadlock here? Is it because it's in two
> different caches? Maybe I don't fully understand how the transaction
> lock works.
>
> Is there a code sample that demonstrates a potential violation? How
> can I now try to find in my source code where the issue happens on
> both nodes?
>
> Thanks,
> Thomas.
>
>