You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/03/13 00:09:00 UTC
[jira] [Commented] (GEODE-6517) Race condition exists that a node failed to be shutdown as it is stuck on PRHARedundancyProvider.waitForPersistentBucketRecovery()

    [ https://issues.apache.org/jira/browse/GEODE-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791122#comment-16791122 ] 

ASF subversion and git services commented on GEODE-6517:
--------------------------------------------------------

Commit 6751717585dd3a6405578013f0d1bea5f289d8e6 in geode's branch refs/heads/feature/GEODE-6517 from eshu
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=6751717 ]

GEODE-6517: Fix a race by counting down the latch.


> Race condition exists that a node failed to be shutdown as it is stuck on PRHARedundancyProvider.waitForPersistentBucketRecovery()
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-6517
>                 URL: https://issues.apache.org/jira/browse/GEODE-6517
>             Project: Geode
>          Issue Type: Bug
>          Components: regions
>    Affects Versions: 1.1.0
>            Reporter: Eric Shu
>            Assignee: Eric Shu
>            Priority: Major
>
> The hang thread stack:
> "Shutdown Disconnector1" #93 prio=10 os_prio=0 tid=0x00007f84b8002800 nid=0x6875 waiting on condition [0x00007f844ee31000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000f14f0490> (a java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>         at org.apache.geode.internal.cache.PRHARedundancyProvider.waitForPersistentBucketRecovery(PRHARedundancyProvider.java:2019)
>         at org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7536)
>         at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
>         at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6308)
>         at org.apache.geode.internal.cache.LocalRegion.handleCacheClose(LocalRegion.java:7387)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2281)
>         - locked <0x00000000f0abeb00> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
>         at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1593)
>         - locked <0x00000000f0abeb00> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
>         at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1255)
>         at org.apache.geode.management.internal.cli.functions.ShutDownFunction.lambda$disconnectInNonDaemonThread$0(ShutDownFunction.java:78)
>         at org.apache.geode.management.internal.cli.functions.ShutDownFunction$$Lambda$94/665093117.run(Unknown Source)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> The race occurs during recoverPersistentBuckets, between following latch is created and then nulled out, shutdown thread could get hold of the reference of latch and wait for countDown forever.
>     allBucketsRecoveredFromDisk = new CountDownLatch(proxyBucketArray.length);
>     try {
>       if (proxyBucketArray.length > 0) {
>         this.redundancyLogger = new RedundancyLogger(this);
>         Thread loggingThread = new LoggingThread(
>             "RedundancyLogger for region " + this.prRegion.getName(), false, this.redundancyLogger);
>         loggingThread.start();
>       }
>     } catch (RuntimeException e) {
>       allBucketsRecoveredFromDisk = null;
>       throw e;
>     }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)