You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/10/16 22:27:01 UTC
[jira] [Commented] (GEODE-5676) ClusterConfigLocatorRestartDUnitTest hung in CI

    [ https://issues.apache.org/jira/browse/GEODE-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652550#comment-16652550 ] 

ASF subversion and git services commented on GEODE-5676:
--------------------------------------------------------

Commit df30df1c8e9a1216a3f9bd07b712e3c4fa99031d in geode's branch refs/heads/develop from Dan Smith
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=df30df1 ]

GEODE-5676: Disconnect system before closing SocketCreatorFactory

The MemberStarterRule was closing SocketCreatorFactory before calling
DistributedSystem.disconnect. In the case of
ClusterConfigLocatorRestartDUnitTest there was a reconnect thread
running in the background that ended up throwing a NullPointerException
if the SocketCreatorFactory was closed. This led to an infinite loop in
the reconnect thread.

We should not be messing with the internal state of Geode until we call
disconnect to stop all of Geode's background threads.

Co-Authored-By: Dale Emery <de...@pivotal.io>


> ClusterConfigLocatorRestartDUnitTest hung in CI
> -----------------------------------------------
>
>                 Key: GEODE-5676
>                 URL: https://issues.apache.org/jira/browse/GEODE-5676
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Dan Smith
>            Assignee: Dan Smith
>            Priority: Major
>              Labels: pull-request-available, swat
>             Fix For: 1.8.0
>
>         Attachments: callstacks.txt
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This test hung in a couple of runs of DistributedTest
>   https://concourse.apachegeode-ci.info/teams/staging/pipelines/concourse-staging/jobs/DistributedTest/builds//430
>   https://concourse.apachegeode-ci.info/teams/staging/pipelines/concourse-staging/jobs/DistributedTest/builds//370
> {noformat}
> Started @ 2018-08-30 04:23:46.599 +0000
> 2018-08-30 04:48:33.135 +0000  org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest serverRestartsAfterLocatorReconnects
> Ended @ 2018-08-30 05:21:34.897 +0000
> {noformat}
> It seems to be stuck in tear down
> {noformat}
> "ReconnectThread" #416 prio=5 os_prio=0 tid=0x00007fa86cad2000 nid=0xd07 in Object.wait() [0x00007fa744ecd000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	at org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2697)
> 	at org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2558)
> 	- locked <0x00000000e00bedc8> (a java.lang.Object)
> 	- locked <0x00000000e07af498> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
> 	- locked <0x00000000e00bedd8> (a java.lang.Class for org.apache.geode.cache.CacheFactory)
> 	at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1041)
> 	at org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:3987)
> 	at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.uncleanShutdown(GMSMembershipManager.java:1552)
> 	at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.lambda$forceDisconnect$1(GMSMembershipManager.java:2564)
> 	at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$$Lambda$81/1816825082.run(Unknown Source)
> 	at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
> 	- None
> "RMI TCP Connection(8)-172.17.0.13" #32 daemon prio=5 os_prio=0 tid=0x00007fa874001800 nid=0x2ff waiting for monitor entry [0x00007fa8f0d15000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1367)
> 	- waiting to lock <0x00000000e07af498> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
> 	at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1022)
> 	at org.apache.geode.test.junit.rules.MemberStarterRule.disconnectDSIfAny(MemberStarterRule.java:182)
> 	at org.apache.geode.test.junit.rules.MemberStarterRule.after(MemberStarterRule.java:129)
> 	at org.apache.geode.test.dunit.rules.ClusterStartupRule.stopElementInsideVM(ClusterStartupRule.java:385)
> 	at org.apache.geode.test.junit.rules.VMProvider.lambda$stop$fe0d42dc$1(VMProvider.java:42)
> 	at org.apache.geode.test.junit.rules.VMProvider$$Lambda$77/1844235204.run(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at hydra.MethExecutor.executeObject(MethExecutor.java:244)
> 	at org.apache.geode.test.dunit.standalone.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:70)
> 	at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
> 	at sun.rmi.transport.Transport$1.run(Transport.java:200)
> 	at sun.rmi.transport.Transport$1.run(Transport.java:197)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> 	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$7/137422085.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
> 	- <0x00000000e0639ed0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)