You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Owen Nichols (Jira)" <ji...@apache.org> on 2022/06/22 20:47:03 UTC

[jira] [Closed] (GEODE-9887) Deadlock when shutting down gws threads unnecessarily delay shutdown of server for 15 seconds

     [ https://issues.apache.org/jira/browse/GEODE-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen Nichols closed GEODE-9887.
-------------------------------

> Deadlock when shutting down gws threads unnecessarily delay shutdown of server for 15 seconds
> ---------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9887
>                 URL: https://issues.apache.org/jira/browse/GEODE-9887
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> See deadlock in below logs:
> 1. "Distributed system shutdown hook" takes lock 0x00000000c445e988, initiate "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread" threads and waits for them to finish.
> 2. "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread5" set flag AckReaderThread.shutdown to true and wait for shutdown to finish by joining threads for max 15 seconds.
> 3. "AckReaderThread for : Event Processor for GatewaySender_sender1_4" thread waits for the lock 0x00000000c445e988 owned by "Distributed system shutdown hook"  thread
> This deadlock only last for 15 seconds, because thread join will expire for all "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread" threads forcing them to finish. After these threads finish then "Distributed system shutdown hook" can continue the execution, release the lock and conclude the shutdown of the server.
>  
> {code:java}
> "Distributed system shutdown hook" #14 prio=5 os_prio=0 cpu=20.78ms elapsed=11.33s tid=0x00007f848c005000 nid=0x1e04 waiting on condition  [0x00007f83ec415000]
>    java.lang.Thread.State: WAITING (parking)
>         at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
>         - parking to wait for  <0x00000000fcc00e50> (a java.util.concurrent.FutureTask)
>         at java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
>         at java.util.concurrent.FutureTask.awaitDone(java.base@11.0.13/FutureTask.java:447)
>         at java.util.concurrent.FutureTask.get(java.base@11.0.13/FutureTask.java:190)
>         at java.util.concurrent.AbstractExecutorService.invokeAll(java.base@11.0.13/AbstractExecutorService.java:247)
>         at org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderEventProcessor.stopProcessing(ConcurrentParallelGatewaySenderEventProcessor.java:258)
>         at org.apache.geode.internal.cache.wan.AbstractGatewaySender.stopProcessing(AbstractGatewaySender.java:726)
>         at org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderImpl.stop(ParallelGatewaySenderImpl.java:118)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2165)
>         - locked <0x00000000c11a7400> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
>         at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559)
>         - locked <0x00000000c11a7400> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
>         at org.apache.geode.distributed.internal.InternalDistributedSystem.lambda$static$7(InternalDistributedSystem.java:2202)
>         at org.apache.geode.distributed.internal.InternalDistributedSystem$$Lambda$110/0x0000000100226840.run(Unknown Source)
>         at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)
>    Locked ownable synchronizers:
>         - <0x00000000c445e988> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> "AckReaderThread for : Event Processor for GatewaySender_sender1_4" #402 daemon prio=5 os_prio=0 cpu=3168.26ms elapsed=640.74s tid=0x00007f8434023000 nid=0x1181 waiting on condition  [0x00007f83eda2b000]
>    java.lang.Thread.State: WAITING (parking)
>     at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
>     - parking to wait for  <0x00000000c445e988> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
>    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.13/AbstractQueuedSynchronizer.java:885)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.13/AbstractQueuedSynchronizer.java:917)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.13/AbstractQueuedSynchronizer.java:1240)
>     at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java.base@11.0.13/ReentrantReadWriteLock.java:959)
>     at org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.run(GatewaySenderEventRemoteDispatcher.java:665)
>   Locked ownable synchronizers:
>     - None
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread5" #872 daemon prio=5 os_prio=0 cpu=1.39ms elapsed=14.09s tid=0x00007f849801a000 nid=0x1e13 in Object.wait()  [0x00007f849c442000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(java.base@11.0.13/Native Method)
>         - waiting on <no object reference available>
>         at java.lang.Thread.join(java.base@11.0.13/Thread.java:1308)
>         - waiting to re-lock in wait() <0x00000000c542ce20> (a org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread)
>         at org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.shutdown(GatewaySenderEventRemoteDispatcher.java:771)
>         at org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.stopAckReaderThread(GatewaySenderEventRemoteDispatcher.java:802)
>         at org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.stop(GatewaySenderEventRemoteDispatcher.java:826)
>         at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1222)
>         at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
>         at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
>         at java.util.concurrent.FutureTask.run(java.base@11.0.13/FutureTask.java:264)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/ThreadPoolExecutor.java:1128)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/ThreadPoolExecutor.java:628)
>         at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)   Locked ownable synchronizers:
>         - <0x00000000fcf4daa8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)