You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bruce J Schuchardt (Jira)" <ji...@apache.org> on 2020/07/28 17:33:00 UTC

[jira] [Commented] (GEODE-8267) serverRestartsAfterOneLocatorDies hangs

    [ https://issues.apache.org/jira/browse/GEODE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166582#comment-17166582 ] 

Bruce J Schuchardt commented on GEODE-8267:
-------------------------------------------

See also GEODE-8389, which has a suspicious auto-reconnect error from the same run.

Stack traces from the artifacts also show these dangling auto-reconnect threads, which would be from a previous test and may be blocking the test that hung.
{noformat}
"ReconnectThread" #97 prio=5 os_prio=0 cpu=6655.62ms elapsed=5595.75s tid=0x00007f6f2c4e5800 nid=0x2ea in Object.wait()  [0x00007f6e6fbfc000]"ReconnectThread" #97 prio=5 os_prio=0 cpu=6655.62ms elapsed=5595.75s tid=0x00007f6f2c4e5800 nid=0x2ea in Object.wait()  [0x00007f6e6fbfc000]   java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(java.base@11.0.7/Native Method) - waiting on <no object reference available> at org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2569) at org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424) - waiting to re-lock in wait() <0x00000000e063ad70> (a java.lang.Object) - locked <0x00000000e10b6060> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) - locked <0x00000000e0cd2348> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder) at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275) at org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315) at org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1287) at org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:2030) at org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl$$Lambda$453/0x0000000840bbe440.run(Unknown Source) at java.lang.Thread.run(java.base@11.0.7/Thread.java:834)
   Locked ownable synchronizers: - None
"RMI TCP Connection(5)-172.17.0.11" #310 daemon prio=5 os_prio=0 cpu=269.79ms elapsed=5330.58s tid=0x00007f6f30001800 nid=0x5b8 waiting for monitor entry  [0x00007f6f359da000]   java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:156) - waiting to lock <0x00000000e0cd2348> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder) at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142) at org.apache.geode.test.junit.rules.ServerStarterRule.startServer(ServerStarterRule.java:199) at org.apache.geode.test.junit.rules.ServerStarterRule.before(ServerStarterRule.java:91) at org.apache.geode.test.dunit.rules.ClusterStartupRule.lambda$startServerVM$729766c4$1(ClusterStartupRule.java:277) at org.apache.geode.test.dunit.rules.ClusterStartupRule$$Lambda$139/0x0000000840a2b440.call(Unknown Source) at org.apache.geode.test.dunit.internal.IdentifiableCallable.call(IdentifiableCallable.java:41) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.7/Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.7/NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.7/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.7/Method.java:566) at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123) at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78) at jdk.internal.reflect.GeneratedMethodAccessor250.invoke(Unknown Source) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.7/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.7/Method.java:566) at sun.rmi.server.UnicastServerRef.dispatch(java.rmi@11.0.7/UnicastServerRef.java:359) at sun.rmi.transport.Transport$1.run(java.rmi@11.0.7/Transport.java:200) at sun.rmi.transport.Transport$1.run(java.rmi@11.0.7/Transport.java:197) at java.security.AccessController.doPrivileged(java.base@11.0.7/Native Method) at sun.rmi.transport.Transport.serviceCall(java.rmi@11.0.7/Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(java.rmi@11.0.7/TCPTransport.java:562) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(java.rmi@11.0.7/TCPTransport.java:796) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(java.rmi@11.0.7/TCPTransport.java:677) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$134/0x0000000840a25840.run(java.rmi@11.0.7/Unknown Source) at java.security.AccessController.doPrivileged(java.base@11.0.7/Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(java.rmi@11.0.7/TCPTransport.java:676) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.7/Thread.java:834) {noformat}

> serverRestartsAfterOneLocatorDies hangs
> ---------------------------------------
>
>                 Key: GEODE-8267
>                 URL: https://issues.apache.org/jira/browse/GEODE-8267
>             Project: Geode
>          Issue Type: Bug
>          Components: configuration, locator, membership
>            Reporter: Bill Burcham
>            Priority: Major
>
> hang: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/275#A]
>  
> The test hung in serverRestartsAfterOneLocatorDies after another failure in the same test class.
> Here's the hung thread:
> {noformat}
> "Test worker" #27 prio=5 os_prio=0 cpu=5016.73ms elapsed=5638.52s tid=0x00007f01c8ad4800 nid=0x18 runnable  [0x00007f019872c000]"Test worker" #27 prio=5 os_prio=0 cpu=5016.73ms elapsed=5638.52s tid=0x00007f01c8ad4800 nid=0x18 runnable  [0x00007f019872c000]   java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(java.base@11.0.7/Native Method) at java.net.SocketInputStream.socketRead(java.base@11.0.7/SocketInputStream.java:115) at java.net.SocketInputStream.read(java.base@11.0.7/SocketInputStream.java:168) at java.net.SocketInputStream.read(java.base@11.0.7/SocketInputStream.java:140) at java.io.BufferedInputStream.fill(java.base@11.0.7/BufferedInputStream.java:252) at java.io.BufferedInputStream.read(java.base@11.0.7/BufferedInputStream.java:271) - locked <0x00000000d08fe7a0> (a java.io.BufferedInputStream) at java.io.DataInputStream.readByte(java.base@11.0.7/DataInputStream.java:270) at sun.rmi.transport.StreamRemoteCall.executeCall(java.rmi@11.0.7/StreamRemoteCall.java:240) at sun.rmi.server.UnicastRef.invoke(java.rmi@11.0.7/UnicastRef.java:164) at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(java.rmi@11.0.7/RemoteObjectInvocationHandler.java:217) at java.rmi.server.RemoteObjectInvocationHandler.invoke(java.rmi@11.0.7/RemoteObjectInvocationHandler.java:162) at com.sun.proxy.$Proxy53.executeMethodOnObject(Unknown Source) at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:607) at org.apache.geode.test.dunit.VM.invoke(VM.java:450) at org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:268) at org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:261) at org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:256) at org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest.serverRestartsAfterOneLocatorDies(ClusterConfigLocatorRestartDUnitTest.java:114) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.7/Native Method) {noformat}
> Here's the previous test failure, which may have affected the test that hung:
> {code:java}
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest > serverRestartHangsWaitingForStartupMessageResponse FAILED
>     org.junit.runners.model.TestTimedOutException: test timed out after 300000 milliseconds
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
>         at java.net.SocketInputStream.read(SocketInputStream.java:168)
>         at java.net.SocketInputStream.read(SocketInputStream.java:140)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:271)
>         at java.io.DataInputStream.readByte(DataInputStream.java:270)
>         at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:240)
>         at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:164)
>         at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:217)
>         at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:162)
>         at com.sun.proxy.$Proxy53.executeMethodOnObject(Unknown Source)
>         at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:607)
>         at org.apache.geode.test.dunit.VM.invoke(VM.java:437)
>         at org.apache.geode.test.junit.rules.VMProvider.invoke(VMProvider.java:94)
>         at org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest.serverRestartHangsWaitingForStartupMessageResponse(ClusterConfigLocatorRestartDUnitTest.java:176)
> {code}
> Seems like 300s should be long enough so I fear there may be a real problem here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)