You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bruce J Schuchardt (Jira)" <ji...@apache.org> on 2020/01/30 16:20:00 UTC

[jira] [Resolved] (GEODE-7460) DistributedMemberDUnitTest > testGroupsInAllVMs Failure

     [ https://issues.apache.org/jira/browse/GEODE-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bruce J Schuchardt resolved GEODE-7460.
---------------------------------------
    Fix Version/s: 1.12.0
       Resolution: Fixed

The code in the commit that Mark's bisecting pointed to has all been rewritten since 10/2019.  I don't think it was actually involved in the failure anyway.  Looking at the log files the dunit Locator did not receive heartbeat messages from the node that was kicked out and that node also failed a tcp/ip availability check.  In pretty much all cases I've ever looked at this is due to lack of CPU, garbage collection pauses or insufficient memory on the test machine.

This test doesn't do much except spin up a cluster and check that the group names were properly distributed.  The test didn't get to the point of doing any verification before it hit the ForcedDisconnectException.

> DistributedMemberDUnitTest > testGroupsInAllVMs Failure
> -------------------------------------------------------
>
>                 Key: GEODE-7460
>                 URL: https://issues.apache.org/jira/browse/GEODE-7460
>             Project: Geode
>          Issue Type: Bug
>          Components: core
>            Reporter: Robert Houghton
>            Assignee: Bruce J Schuchardt
>            Priority: Major
>             Fix For: 1.12.0
>
>
> From the failing job:
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.12.0-SNAPSHOT.0016/test-results/distributedTest/1573784422/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.12.0-SNAPSHOT.0016/test-artifacts/1573784422/distributedtestfiles-OpenJDK8-1.12.0-SNAPSHOT.0016.tgz
> DistributedTest failure due to exception:
> org.apache.geode.distributed.DistributedMemberDUnitTest > testGroupsInAllVMs FAILED
>     org.apache.geode.test.dunit.RMIException: While invoking org.apache.geode.distributed.DistributedMemberDUnitTest$6.run in VM 0 running on Host 3e09f1029b44 with 4 VMs
>         at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
>         at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
>         at org.apache.geode.distributed.DistributedMemberDUnitTest.testGroupsInAllVMs(DistributedMemberDUnitTest.java:333)
>         Caused by:
>         org.apache.geode.SystemConnectException: One or more peers generated exceptions during connection attempt
>             at org.apache.geode.distributed.internal.ClusterDistributionManager.sendStartupMessage(ClusterDistributionManager.java:1625)
>             at org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:354)
>             at org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:759)
>             at org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:136)
>             at org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3009)
>             at org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:269)
>             at org.apache.geode.distributed.DistributedSystem.connect(DistributedSystem.java:159)
>             at org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.getSystem(JUnit4DistributedTestCase.java:181)
>             at org.apache.geode.distributed.DistributedMemberDUnitTest$6.run(DistributedMemberDUnitTest.java:339)
>             Caused by:
>             org.apache.geode.distributed.DistributedSystemDisconnectedException: DistributedSystem is shutting down, caused by org.apache.geode.ForcedDisconnectException: Exiting due to possible network partition event due to loss of 1 cache processes: [172.17.0.14(myName:1)<v18>:41001]
>                 at org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.directChannelSend(GMSMembershipManager.java:1591)
>                 at org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.send(GMSMembershipManager.java:1751)
>                 at org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2058)
>                 at org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1985)
>                 at org.apache.geode.distributed.internal.StartupOperation.sendStartupMessage(StartupOperation.java:74)
>                 at org.apache.geode.distributed.internal.ClusterDistributionManager.sendStartupMessage(ClusterDistributionManager.java:1622)
>                 ... 8 more
>                 Caused by:
>                 org.apache.geode.ForcedDisconnectException: Exiting due to possible network partition event due to loss of 1 cache processes: [172.17.0.14(myName:1)<v18>:41001]
>     java.lang.AssertionError: Suspicious strings were written to the log during this run.
>     Fix the strings or use IgnoredException.addIgnoredException to ignore.
>     -----------------------------------------------------------------------
>     Found suspect string in log4j at line 554
>     [fatal 2019/11/15 00:40:19.902 GMT <unicast receiver,3e09f1029b44-61225> tid=165] Possible loss of quorum due to the loss of 1 cache processes: [172.17.0.14(myName:1)<v18>:41001]
>     -----------------------------------------------------------------------
>     Found suspect string in log4j at line 556
>     [fatal 2019/11/15 00:40:19.903 GMT <unicast receiver,3e09f1029b44-61225> tid=165] Membership service failure: Exiting due to possible network partition event due to loss of 1 cache processes: [172.17.0.14(myName:1)<v18>:41001]
>     org.apache.geode.ForcedDisconnectException: Exiting due to possible network partition event due to loss of 1 cache processes: [172.17.0.14(myName:1)<v18>:41001]
>     	at org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager$ManagerImpl.forceDisconnect(GMSMembershipManager.java:2586)
>     	at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1073)
>     	at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.installView(GMSJoinLeave.java:1500)
>     	at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:1053)
>     	at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1330)
>     	at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1269)
>     	at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
>     	at org.jgroups.JChannel.up(JChannel.java:741)
>     	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
>     	at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
>     	at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
>     	at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
>     	at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
>     	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
>     	at org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:73)
>     	at org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:72)
>     	at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
>     	at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
>     	at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
>     	at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
>     	at org.jgroups.protocols.TP.receive(TP.java:1714)
>     	at org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:152)
>     	at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
>     	at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)