You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ignite TC Bot (Jira)" <ji...@apache.org> on 2020/05/28 10:25:00 UTC

[jira] [Commented] (IGNITE-13082) Deadlock between topology update and CQ registration.

    [ https://issues.apache.org/jira/browse/IGNITE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118531#comment-17118531 ] 

Ignite TC Bot commented on IGNITE-13082:
----------------------------------------

{panel:title=Branch: [pull/7858/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=5342710&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Deadlock between topology update and CQ registration.
> -----------------------------------------------------
>
>                 Key: IGNITE-13082
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13082
>             Project: Ignite
>          Issue Type: Task
>    Affects Versions: 2.7
>            Reporter: Andrey Mashenkov
>            Assignee: Andrey Mashenkov
>            Priority: Major
>              Labels: deadlock
>             Fix For: 2.9
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Relevant stack traces:Relevant stack traces:
>  
> {code:java}
> "sys-stripe-0-#65483%cache.BinaryMetadataRegistrationInsideEntryProcessorTest0%" #85739 prio=5 os_prio=0 tid=0x00007fda80139800 nid=0x5618 waiting on condition [0x00007fdc018e8000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000fc138298> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockListenerReadLock(GridCacheMapEntry.java:5032) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2262) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2574) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2034) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1854) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3239) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1626) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1246) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:142) at org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1137) at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748) 
> {code}
>  
> {code:java}
> "disco-notifier-worker-#65517%cache.BinaryMetadataRegistrationInsideEntryProcessorTest0%" #85777 prio=5 os_prio=0 tid=0x00007fda800a9800 nid=0x5639 waiting on condition [0x00007fdc006d9000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000fbde5f30> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.localUpdateCounters(GridDhtPartitionTopologyImpl.java:2810) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onRegister(CacheContinuousQueryHandler.java:379) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.registerListener(CacheContinuousQueryManager.java:946) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.register(CacheContinuousQueryHandler.java:628) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.registerHandler(GridContinuousProcessor.java:1818) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1444) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:113) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:205) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:196) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:639) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:510) - locked <0x00000000fb58bbc8> (a java.lang.Object) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$$Lambda$91/1259207939.run(Unknown Source) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2650) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2688) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748){code}
> The problematic code is in {{CacheContinuousQueryManager.registerListener}}. It first acquires CQ listener write lock, and then it acquires topology read lock when update counters are being read.During cache update, we first acquire topology read lock and then acquire CQ listener read lock.If some other thread will try to acquire topology write lock in between, those two threads are deadlocked.
>  The issue seems to be introduced by IGNITE-10755 (topology read lock is inserted inside CQ write lock).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)