You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Mikhail Petrov (Jira)" <ji...@apache.org> on 2022/09/20 13:41:00 UTC

[jira] [Created] (IGNITE-17731) Possible LRT in case of postponed GridDhtLockRequest

Mikhail Petrov created IGNITE-17731:
---------------------------------------

Summary: Possible LRT in case of postponed GridDhtLockRequest
Key: IGNITE-17731
URL: https://issues.apache.org/jira/browse/IGNITE-17731
Project: Ignite
Issue Type: Bug
Reporter: Mikhail Petrov

Let's assume the foowing scenario:

1. TX coordinator starts transaction and sends GridDhtLockRequest to "near" nodes.
2. Some GridDhtLockRequest messages was delayed by the network.
3. Not all "near" nodes receive GridDhtLockRequest and as result not all of them respond to the TX coordinator.
4. TX coordinator aborts TX by the timeout.
5. Completed TX ID is stored in IgniteTxManager#completedVersHashMap.
6. TX load continuous (assume puts in TX cache) and record about described above completed TX is evicted from the map.
7. GridDhtLockRequest from the clause 2 is finally recived by the "near" nodes. They lock keys, start the local TX, and respond to the TX coordinator.
But currently TX coordinator ignores GridDhtLockResponce as info about initial TX was evicted and does nothing.

As a result near nodes keep holding key locks and waiting for next steps of TX protocol that will never happen as TX was already completed.

As a WA TX can be explicitly KILLED on the near node.

It is proposed to handle this situation and not aquire locks on the near node if TX coordinator or other cluster nodes do not have notion about TX to which current lock request belongs to.

--
This message was sent by Atlassian Jira
(v8.20.10#820010)