You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Evgeny Stanilovsky (Jira)" <ji...@apache.org> on 2023/01/11 07:24:03 UTC

[jira] [Updated] (IGNITE-18326) SQL query may forget to finish implicit TX.

     [ https://issues.apache.org/jira/browse/IGNITE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Stanilovsky updated IGNITE-18326:
----------------------------------------
    Description: 
Scenario
* Start grid of [CGM, MetaStorage, DataNode] nodes.
* Stop DataNode.
* Run sql query, and wait on future for timeout.
* Observe: Query can't be started due to DataNode with the partition is absent, and
Future throws CancelledException.
There is no way to get cursor closed because of future failure. Implicit transaction object can't be accessed.
* Start DataNode back.
* Run the same query again
* Observe: Query failed because it can't lock the entry due to previous Tx wasn't committed or rolled back.

Most likely, noone read from the cursor or we forget to close it when session was closed.

---- *UPDATED* ---

After some investigations i found that tx commited and rolled back correctly, the only problem i can found for now is mentioned above "it can't lock the entry due to previous Tx". Check [1], test called : *testImplicitTransaction0* it makes all described above, by Andrey, sometimes it passed but frequently we can obtain :


{noformat}
2023-01-09 14:41:53:674 +0300 [WARNING][ForkJoinPool.commonPool-worker-11][ReplicaManager] Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl [binaryRows=ArrayList [org.apache.ignite.internal.schema.row.Row@57114800], commitPartitionId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, groupId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, requestType=RW_INSERT_ALL, term=3, timestamp=HybridTimestamp [physical=1673264513670, logical=0], transactionId=000edb17-d281-0000-8a18-8deb88e18dfa]]
java.util.concurrent.CompletionException: org.apache.ignite.internal.tx.LockException: IGN-TX-5 TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to a conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl [txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, prevLockMode=null, lockMode=X, locked=true, ex=null, isDone=true]]
	Suppressed: java.lang.RuntimeException: This is a trimmed root
		at org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:747)
		at org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:767)
		at org.apache.ignite.internal.sql.engine.util.CursorUtils.getAllFromCursor(CursorUtils.java:70)
		at org.apache.ignite.internal.cluster.AbstractClusterStartStopTest.sql(AbstractClusterStartStopTest.java:269)
Caused by: org.apache.ignite.internal.tx.LockException: IGN-TX-5 TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to a conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl [txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, prevLockMode=null, lockMode=X, locked=true, ex=null, isDone=true]]
	at app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.isWaiterReadyToNotify(HeapLockManager.java:240)
	at app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.tryAcquire(HeapLockManager.java:197)
	at app//org.apache.ignite.internal.tx.impl.HeapLockManager.acquire(HeapLockManager.java:76)
	at app//org.apache.ignite.internal.table.distributed.HashIndexLocker.locksForLookup(HashIndexLocker.java:68)
	at app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.resolveRowByPk(PartitionReplicaListener.java:1035)
	at app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processMultiEntryAction(PartitionReplicaListener.java:1228)
	at app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$0(PartitionReplicaListener.java:255)
{noformat}


[1] https://github.com/gridgain/apache-ignite-3/tree/ignite-18171-new-test

  was:
Scenario
* Start grid of [CGM, MetaStorage, DataNode] nodes.
* Stop DataNode.
* Run sql query, and wait on future for timeout.
* Observe: Query can't be started due to DataNode with the partition is absent, and
Future throws CancelledException.
There is no way to get cursor closed because of future failure. Implicit transaction object can't be accessed.
* Start DataNode back.
* Run the same query again
* Observe: Query failed because it can't lock the entry due to previous Tx wasn't committed or rolled back.

Most likely, noone read from the cursor or we forget to close it when session was closed.
Find reproducer in IGNITE-18171 PR in ignite-runner module  org.apache.ignite.internal.cluster.ItNodeRestartTest#testImplicitTransaction


> SQL query may forget to finish implicit TX.
> -------------------------------------------
>
>                 Key: IGNITE-18326
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18326
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Andrey Mashenkov
>            Assignee: Evgeny Stanilovsky
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> Scenario
> * Start grid of [CGM, MetaStorage, DataNode] nodes.
> * Stop DataNode.
> * Run sql query, and wait on future for timeout.
> * Observe: Query can't be started due to DataNode with the partition is absent, and
> Future throws CancelledException.
> There is no way to get cursor closed because of future failure. Implicit transaction object can't be accessed.
> * Start DataNode back.
> * Run the same query again
> * Observe: Query failed because it can't lock the entry due to previous Tx wasn't committed or rolled back.
> Most likely, noone read from the cursor or we forget to close it when session was closed.
> ---- *UPDATED* ---
> After some investigations i found that tx commited and rolled back correctly, the only problem i can found for now is mentioned above "it can't lock the entry due to previous Tx". Check [1], test called : *testImplicitTransaction0* it makes all described above, by Andrey, sometimes it passed but frequently we can obtain :
> {noformat}
> 2023-01-09 14:41:53:674 +0300 [WARNING][ForkJoinPool.commonPool-worker-11][ReplicaManager] Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl [binaryRows=ArrayList [org.apache.ignite.internal.schema.row.Row@57114800], commitPartitionId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, groupId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, requestType=RW_INSERT_ALL, term=3, timestamp=HybridTimestamp [physical=1673264513670, logical=0], transactionId=000edb17-d281-0000-8a18-8deb88e18dfa]]
> java.util.concurrent.CompletionException: org.apache.ignite.internal.tx.LockException: IGN-TX-5 TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to a conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl [txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, prevLockMode=null, lockMode=X, locked=true, ex=null, isDone=true]]
> 	Suppressed: java.lang.RuntimeException: This is a trimmed root
> 		at org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:747)
> 		at org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:767)
> 		at org.apache.ignite.internal.sql.engine.util.CursorUtils.getAllFromCursor(CursorUtils.java:70)
> 		at org.apache.ignite.internal.cluster.AbstractClusterStartStopTest.sql(AbstractClusterStartStopTest.java:269)
> Caused by: org.apache.ignite.internal.tx.LockException: IGN-TX-5 TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to a conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl [txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, prevLockMode=null, lockMode=X, locked=true, ex=null, isDone=true]]
> 	at app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.isWaiterReadyToNotify(HeapLockManager.java:240)
> 	at app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.tryAcquire(HeapLockManager.java:197)
> 	at app//org.apache.ignite.internal.tx.impl.HeapLockManager.acquire(HeapLockManager.java:76)
> 	at app//org.apache.ignite.internal.table.distributed.HashIndexLocker.locksForLookup(HashIndexLocker.java:68)
> 	at app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.resolveRowByPk(PartitionReplicaListener.java:1035)
> 	at app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processMultiEntryAction(PartitionReplicaListener.java:1228)
> 	at app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$0(PartitionReplicaListener.java:255)
> {noformat}
> [1] https://github.com/gridgain/apache-ignite-3/tree/ignite-18171-new-test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)