You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Andrey Mashenkov (Jira)" <ji...@apache.org> on 2020/05/27 12:13:00 UTC

[jira] [Created] (IGNITE-13082) Deadlock between topology update and CQ registration.

Andrey Mashenkov created IGNITE-13082:
-----------------------------------------

             Summary: Deadlock between topology update and CQ registration.
                 Key: IGNITE-13082
                 URL: https://issues.apache.org/jira/browse/IGNITE-13082
             Project: Ignite
          Issue Type: Task
            Reporter: Andrey Mashenkov


Relevant stack traces:Relevant stack traces:
{code}"sys-stripe-0-#65483%cache.BinaryMetadataRegistrationInsideEntryProcessorTest0%" #85739 prio=5 os_prio=0 tid=0x00007fda80139800 nid=0x5618 waiting on condition [0x00007fdc018e8000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000fc138298> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockListenerReadLock(GridCacheMapEntry.java:5032) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2262) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2574) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2034) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1854) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3239) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1626) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1246) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:142) at org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1137) at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748)\{code}
{code}"disco-notifier-worker-#65517%cache.BinaryMetadataRegistrationInsideEntryProcessorTest0%" #85777 prio=5 os_prio=0 tid=0x00007fda800a9800 nid=0x5639 waiting on condition [0x00007fdc006d9000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000fbde5f30> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.localUpdateCounters(GridDhtPartitionTopologyImpl.java:2810) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onRegister(CacheContinuousQueryHandler.java:379) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.registerListener(CacheContinuousQueryManager.java:946) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.register(CacheContinuousQueryHandler.java:628) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.registerHandler(GridContinuousProcessor.java:1818) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1444) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:113) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:205) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:196) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:639) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:510) - locked <0x00000000fb58bbc8> (a java.lang.Object) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$$Lambda$91/1259207939.run(Unknown Source) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2650) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2688) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748)\{code}
The problematic code is in \{{CacheContinuousQueryManager.registerListener}}. It first acquires CQ listener write lock, and then it acquires topology read lock when update counters are being read.During cache update, we first acquire topology read lock and then acquire CQ listener read lock.If some other thread will try to acquire topology write lock in between, those two threads are deadlocked.
The issue seems to be introduced by IGNITE-10755 (topology read lock is inserted inside CQ write lock), so only 2.7/8.7 and upwards are affected.
The issue causes occasional Cache 1 suite hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)