You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Shelley Lynn Hughes-Godfrey (JIRA)" <ji...@apache.org> on 2018/03/19 18:31:00 UTC
[jira] [Created] (GEODE-4892) CI failure: geode-wan distributedTest
hung in
org.apache.geode.internal.cache.wan.wancommand.StatusGatewaySenderCommandDUnitTest
testGatewaySenderStatus_OnMember Ended
Shelley Lynn Hughes-Godfrey created GEODE-4892:
--------------------------------------------------
Summary: CI failure: geode-wan distributedTest hung in org.apache.geode.internal.cache.wan.wancommand.StatusGatewaySenderCommandDUnitTest testGatewaySenderStatus_OnMember Ended
Key: GEODE-4892
URL: https://issues.apache.org/jira/browse/GEODE-4892
Project: Geode
Issue Type: Bug
Components: distributed lock service, gfsh
Reporter: Shelley Lynn Hughes-Godfrey
geode-wan distributed tests hung after 8 hours (usually takes 5-6 hours)
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/DistributedTest/builds/207
The progress files show this DistributedTest run starting ~4 am ... but it also shows this test completing at 9:05 ... which is only 5 hours. Are we somehow stuck in cache close for 3 hours before the hang is declared while trying to shut down the DUnit VM.
dunit-hangs.txt
```
Started @ 2018-03-17 03:57:50.567 +0000
Ended @ 2018-03-17 03:59:28.268 +0000
Started @ 2018-03-17 04:00:05.872 +0000
Ended @ 2018-03-17 04:52:41.879 +0000
Started @ 2018-03-17 04:53:00.358 +0000
Ended @ 2018-03-17 04:55:07.622 +0000
Started @ 2018-03-17 04:55:33.550 +0000
Ended @ 2018-03-17 07:34:11.639 +0000
Started @ 2018-03-17 07:34:51.022 +0000
Ended @ 2018-03-17 08:08:34.849 +0000
Started @ 2018-03-17 08:09:33.853 +0000
Ended @ 2018-03-17 08:31:43.646 +0000
Started @ 2018-03-17 08:31:54.175 +0000
Ended @ 2018-03-17 08:32:02.523 +0000
Started @ 2018-03-17 08:32:13.946 +0000
Ended @ 2018-03-17 08:32:34.626 +0000
Started @ 2018-03-17 08:33:20.100 +0000
2018-03-17 08:56:15.914 +0000 org.apache.geode.internal.cache.wan.wancommand.StatusGatewaySenderCommandDUnitTest testGatewaySenderStatus_OnMember
Ended @ 2018-03-17 09:05:29.197 +0000
```
Last distributedTest running:
```
Started @ 2018-03-17 08:33:20.100 +0000
2018-03-17 08:56:15.914 +0000 org.apache.geode.internal.cache.wan.wancommand.StatusGatewaySenderCommandDUnitTest testGatewaySenderStatus_OnMember
Ended @ 2018-03-17 09:05:29.197 +0000
```
Stack dumps are taken at 5-10 second intervals and they all look the same ... so it looks like 433 is stuck in waiting during cache close => distributedLockServiceDestroys waiting for replies from 493 for ElderInit. 493 is in ClusterDistributionManager.elderChangeWait().
********* Dumping stack for process 433:
2018-03-17 10:55:59
```
"RMI TCP Connection(1)-172.17.0.3" #30 daemon prio=5 os_prio=0 tid=0x00007f9760001800 nid=0x1ec waiting on condition [0x00007f975bffb000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e08dd870> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
at org.apache.geode.distributed.internal.locks.ElderInitProcessor.init(ElderInitProcessor.java:72)
at org.apache.geode.distributed.internal.locks.ElderState.<init>(ElderState.java:56)
at org.apache.geode.distributed.internal.ClusterDistributionManager.getElderStateWithTryLock(ClusterDistributionManager.java:3359)
at org.apache.geode.distributed.internal.ClusterDistributionManager.getElderState(ClusterDistributionManager.java:3309)
at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.startElderCall(GrantorRequestProcessor.java:238)
at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:347)
at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:327)
at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.clearGrantor(GrantorRequestProcessor.java:318)
at org.apache.geode.distributed.internal.locks.DLockService.clearGrantor(DLockService.java:872)
at org.apache.geode.distributed.internal.locks.DLockGrantor.destroy(DLockGrantor.java:1227)
- locked <0x00000000e09a8768> (a org.apache.geode.distributed.internal.locks.DLockGrantor)
at org.apache.geode.distributed.internal.locks.DLockService.nullLockGrantorId(DLockService.java:646)
at org.apache.geode.distributed.internal.locks.DLockService.basicDestroy(DLockService.java:2358)
at org.apache.geode.distributed.internal.locks.DLockService.destroyAndRemove(DLockService.java:2276)
- locked <0x00000000e040b400> (a java.lang.Object)
at org.apache.geode.distributed.internal.locks.DLockService.destroyServiceNamed(DLockService.java:2214)
at org.apache.geode.distributed.DistributedLockService.destroy(DistributedLockService.java:84)
at org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderImpl.stop(SerialGatewaySenderImpl.java:136)
at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2169)
- locked <0x00000000e03ddf60> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1961)
at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1951)
at org.apache.geode.test.junit.rules.ServerStarterRule.stopMember(ServerStarterRule.java:102)
at org.apache.geode.test.junit.rules.MemberStarterRule.after(MemberStarterRule.java:83)
at org.apache.geode.test.dunit.rules.ClusterStartupRule.stopElementInsideVM(ClusterStartupRule.java:326)
at org.apache.geode.test.junit.rules.VMProvider.lambda$stopVM$fe0d42dc$1(VMProvider.java:35)
at org.apache.geode.test.junit.rules.VMProvider$$Lambda$79/501183203.run(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
```
********* Dumping stack for process 493:
2018-03-17 10:56:02
```
"Pooled Message Processor 1" #179 daemon prio=10 os_prio=0 tid=0x00007f3640002800 nid=0x763 in Object.wait() [0x00007f35edd65000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.geode.distributed.internal.ClusterDistributionManager.elderChangeWait(ClusterDistributionManager.java:3487)
at org.apache.geode.distributed.internal.ClusterDistributionManager.waitForElder(ClusterDistributionManager.java:3431)
- locked <0x00000000e069bde0> (a java.lang.Object)
at org.apache.geode.distributed.internal.locks.ElderInitProcessor$ElderInitMessage.process(ElderInitProcessor.java:178)
at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:382)
at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:448)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1118)
at org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
at org.apache.geode.distributed.internal.ClusterDistributionManager$4$1.run(ClusterDistributionManager.java:789)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000000e07073a8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
```
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)