You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Galen O'Sullivan (JIRA)" <ji...@apache.org> on 2018/05/21 23:19:00 UTC

[jira] [Assigned] (GEODE-5238) CI failure: SerialWANPropagationsFeatureDUnitTest.testSerialPropagationWithFilter fails with Possible loss of quorum suspect string

     [ https://issues.apache.org/jira/browse/GEODE-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Galen O'Sullivan reassigned GEODE-5238:
---------------------------------------

    Assignee: Galen O'Sullivan

> CI failure: SerialWANPropagationsFeatureDUnitTest.testSerialPropagationWithFilter fails with Possible loss of quorum suspect string
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-5238
>                 URL: https://issues.apache.org/jira/browse/GEODE-5238
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Barry Oglesby
>            Assignee: Galen O'Sullivan
>            Priority: Major
>         Attachments: TEST-org.apache.geode.internal.cache.wan.serial.SerialWANPropagationsFeatureDUnitTest.xml
>
>
> The members startup:
> {noformat}
> [vm1] [info 2018/05/19 02:18:54.055 UTC <RMI TCP Connection(3)-172.17.0.5> tid=27] Starting DistributionManager 172.17.0.5(154:locator)<ec><v0>:32771. (took 49 ms)
> [vm2] [info 2018/05/19 02:18:54.479 UTC <RMI TCP Connection(2)-172.17.0.5> tid=27] Starting DistributionManager 172.17.0.5(159)<v1>:32772. (took 336 ms)
> [vm3] [info 2018/05/19 02:18:55.477 UTC <RMI TCP Connection(3)-172.17.0.5> tid=27] Starting DistributionManager 172.17.0.5(168)<v2>:32773. (took 401 ms)
> [vm3] [info 2018/05/19 02:18:55.478 UTC <RMI TCP Connection(3)-172.17.0.5> tid=27] Initial (distribution manager) view = View[172.17.0.5(154:locator)<ec><v0>:32771|2] members: [172.17.0.5(154:locator)<ec><v0>:32771, 172.17.0.5(159)<v1>:32772\{lead}, 172.17.0.5(168)<v2>:32773]
> {noformat}
> So the members are:
> {noformat}
> vm1 - locator, coordinator (weight=3)
> vm2 - lead member (weight=15)
> vm3 - member (weight=10)``````
> vm1 - locator, coordinator (weight=3)
> vm2 - lead member (weight=15)
> vm3 - member (weight=10)
> {noformat}
> The WANTestBase cleanupVM method shuts down the members simultaneously.
> Log messages show vm1 (locator) shutting down first (and the other two members realize it):
> {noformat}
> [vm1] [info 2018/05/19 02:19:10.400 UTC <RMI TCP Connection(4)-172.17.0.5> tid=27] Shutting down DistributionManager 172.17.0.5(154:locator)<ec><v0>:32771. 
> [vm1] [info 2018/05/19 02:19:10.513 UTC <RMI TCP Connection(4)-172.17.0.5> tid=27] Now closing distribution for 172.17.0.5(154:locator)<ec><v0>:32771
> [vm1] [info 2018/05/19 02:19:10.513 UTC <RMI TCP Connection(4)-172.17.0.5> tid=27] Stopping membership services
> [vm1] [info 2018/05/19 02:19:10.513 UTC <RMI TCP Connection(4)-172.17.0.5> tid=27] Now closing distribution for 172.17.0.5(154:locator)<ec><v0>:32771
> [vm2] [info 2018/05/19 02:19:10.420 UTC <Pooled High Priority Message Processor 1> tid=310] Member at 172.17.0.5(154:locator)<ec><v0>:32771 gracefully left the distributed cache: shutdown message received
> [vm3] [info 2018/05/19 02:19:10.622 UTC <Pooled High Priority Message Processor 1> tid=310] Member at 172.17.0.5(154:locator)<ec><v0>:32771 gracefully left the distributed cache: shutdown message received
> {noformat}
> Then vm2 becomes coordinator:
> {noformat}
> [vm2] [info 2018/05/19 02:19:10.402 UTC <Pooled High Priority Message Processor 1> tid=310] This member is becoming the membership coordinator with address 172.17.0.5(159)<v1>:32772
> {noformat}
> Then vm2 shuts down its DistributionManager:
> {noformat}
> [vm2] [info 2018/05/19 02:19:11.239 UTC <RMI TCP Connection(2)-172.17.0.5> tid=27] Shutting down DistributionManager 172.17.0.5(159)<v1>:32772. 
> [vm2] [info 2018/05/19 02:19:11.352 UTC <RMI TCP Connection(2)-172.17.0.5> tid=27] Now closing distribution for 172.17.0.5(159)<v1>:32772
> [vm2] [info 2018/05/19 02:19:11.352 UTC <RMI TCP Connection(2)-172.17.0.5> tid=27] Stopping membership services
> [vm2] [info 2018/05/19 02:19:11.377 UTC <RMI TCP Connection(2)-172.17.0.5> tid=27] DistributionManager stopped in 138ms.
> [vm2] [info 2018/05/19 02:19:11.377 UTC <RMI TCP Connection(2)-172.17.0.5> tid=27] Marking DistributionManager 172.17.0.5(159)<v1>:32772 as closed.
> {noformat}
> Then vm3 does some suspect checking and logs the quorum loss warning:
> {noformat}
> [vm3] [info 2018/05/19 02:19:11.414 UTC <P2P message reader for 172.17.0.5(159)<v1>:32772 shared unordered uid=14 port=35502> tid=296] Performing final check for suspect member 172.17.0.5(159)<v1>:32772 reason=member unexpectedly shut down shared, unordered connection
> [vm3] [info 2018/05/19 02:19:11.434 UTC <P2P message reader for 172.17.0.5(159)<v1>:32772 shared unordered uid=14 port=35502> tid=296] This member is becoming the membership coordinator with address 172.17.0.5(168)<v2>:32773
> [vm3] [info 2018/05/19 02:19:11.436 UTC <Geode Membership View Creator> tid=360] View Creator thread is starting
> [vm3] [info 2018/05/19 02:19:11.436 UTC <Geode Membership View Creator> tid=360] 172.17.0.5(159)<v1>:32772 had a weight of 15
> [vm3] [warn 2018/05/19 02:19:11.436 UTC <Geode Membership View Creator> tid=360] total weight lost in this view change is 15 of 25. Quorum has been lost!
> [vm3] [fatal 2018/05/19 02:19:11.438 UTC <Geode Membership View Creator> tid=360] Possible loss of quorum due to the loss of 1 cache processes: [172.17.0.5(159)<v1>:32772]
> {noformat}
> Then vm3 shuts down its DistributionManager:
> {noformat}
> [vm3] [info 2018/05/19 02:19:11.448 UTC <RMI TCP Connection(3)-172.17.0.5> tid=27] Shutting down DistributionManager 172.17.0.5(168)<v2>:32773. 
> [vm3] [info 2018/05/19 02:19:11.588 UTC <RMI TCP Connection(3)-172.17.0.5> tid=27] Now closing distribution for 172.17.0.5(168)<v2>:32773
> [vm3] [info 2018/05/19 02:19:11.588 UTC <RMI TCP Connection(3)-172.17.0.5> tid=27] Stopping membership services
> [vm3] [info 2018/05/19 02:19:11.617 UTC <RMI TCP Connection(3)-172.17.0.5> tid=27] DistributionManager stopped in 168ms.
> [vm3] [info 2018/05/19 02:19:11.617 UTC <RMI TCP Connection(3)-172.17.0.5> tid=27] Marking DistributionManager 172.17.0.5(168)<v2>:32773 as closed.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)