You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Kirk Lund (Jira)" <ji...@apache.org> on 2019/12/02 17:30:00 UTC

[jira] [Resolved] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

     [ https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk Lund resolved GEODE-7082.
------------------------------
    Fix Version/s: 1.12.0
       Resolution: Fixed

> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long 
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-7082
>                 URL: https://issues.apache.org/jira/browse/GEODE-7082
>             Project: Geode
>          Issue Type: Bug
>          Components: tests
>            Reporter: Mark Hanson
>            Assignee: Kirk Lund
>            Priority: Major
>              Labels: flaky
>             Fix For: 1.12.0
>
>         Attachments: callstacks-2019-08-12-19-22-55.txt, callstacks-2019-08-12-19-23-06.txt, callstacks-2019-08-12-19-23-16.txt, dunit-hangs.txt
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]
> The run of the distributed test for java 8 failed because a timeout was exceeded.
> I've attached the dunit-hangs.txt and callstack files from dunit #981. It looks like PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung. Several threads are trying to create buckets while another is trying to shutdown the cluster. Thread stacks are below.
> There are several threads creating buckets and stuck waiting for replies to PrepareNewPersistentMemberMessage:
> {noformat}
> "Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e1fb4410> (a java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
>         at org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
>         at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
>         at org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
>         at org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
>         at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
>         at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
>         at org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
>         at org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
>         - locked <0x00000000e1fb4b28> (a org.apache.geode.internal.cache.ProxyBucketRegion)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
>         at org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
>         at org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
>         at org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
>         at org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
>         at org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown Source)
>         at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
>         at org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown Source)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000e1fb4ee8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> There's one thread that appears to be stuck calling AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should have any tests using this deprecated internal-API to shutdown the cluster)::
> {noformat}
>       vm0.invoke(new SerializableCallable() {
>         @Override
>         public Object call() throws Exception {
>           InternalDistributedSystem ds =
>               (InternalDistributedSystem) getCache().getDistributedSystem();
>           AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 600000);
>           return null;
>         }
>       });
> {noformat}
> Here's the above thread's callstack:
> {noformat}
> "RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e1fb0600> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
>         at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
>         at org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197)
>         at org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399)
>         at org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140)
>         at org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716)
>         at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
>         at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866)
>         - locked <0x00000000e1fb0900> (a org.apache.geode.internal.cache.PRHARedundancyProvider)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745)
>         - locked <0x00000000e01bfd78> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
>         at org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181)
>         at org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88)
>         at org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342)
>         at org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
>         at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
>         at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>         at sun.rmi.transport.Transport$1.run(Transport.java:200)
>         at sun.rmi.transport.Transport$1.run(Transport.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>         at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown Source)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000e00ed098> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)