You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bruce J Schuchardt (Jira)" <ji...@apache.org> on 2020/11/05 19:09:00 UTC

[jira] [Created] (GEODE-8690) Member that fails availability check is never suspected again

Bruce J Schuchardt created GEODE-8690:
-----------------------------------------

             Summary: Member that fails availability check is never suspected again
                 Key: GEODE-8690
                 URL: https://issues.apache.org/jira/browse/GEODE-8690
             Project: Geode
          Issue Type: Bug
          Components: membership
    Affects Versions: 1.13.0, 1.12.0, 1.14.0
            Reporter: Bruce J Schuchardt


In a test run on support/1.12 there was a cluster with 3 locators and a number of servers.  It had a membership view like this:
{noformat}
[ loc1, loc2, loc3, server1, server2, etc]
{noformat}

The test killed loc1 and loc2 and tried to restart loc2.  In this scenario loc3 should have detected the loss of the other two locators and it should have become the membership coordinator but it didn't.  Loc3 detected the loss of loc2 and then received a LEAVE request from loc1.  At that point it ought to have either started examining loc2 again or perhaps just become the coordinator, but it did neither of these and the cluster had no coordinator.

This is similar to GEODE-3780 but in that case an earlier availability check passed.

In the test run the names of the locators are
loc1=locatorgemfire_4_3
loc2=locatorgemfire_4_4 and
loc3=locatorgemfire_4_2

{noformat}
[info 2020/10/30 21:51:51.197 PDT <P2P message reader for (locatorgemfire_4_4_host2_3884:3884:locator)<ec><v1>:41005 shared unordered uid=2 port=42550> tid=0x36] Performing availability check for suspect member (locatorgemfire_4_4_host2_3884:3884:locator)<ec><v1>:41005 reason=member unexpectedly shut down shared, unordered connection

[info 2020/10/30 21:51:51.309 PDT <Pooled High Priority Message Processor 3> tid=0x51] received leave request from (locatorgemfire_4_3_host2_3866:3866:locator)<ec><v0>:41004 for (locatorgemfire_4_3_host2_3866:3866:locator)<ec><v0>:41004

[info 2020/10/30 21:51:51.345 PDT <Pooled High Priority Message Processor 3> tid=0x51] Checking to see if I should become coordinator.  My address is (locatorgemfire_4_2_host2_3852:3852:locator)<ec><v1>:41007

[info 2020/10/30 21:51:51.346 PDT <Pooled High Priority Message Processor 3> tid=0x51] View with removed and left members removed is View[rs-(locatorgemfire_4_3_host2_3866:3866:locator)<ec><v0>:41004|3] members: [(locatorgemfire_4_4_host2_3884:3884:locator)<ec><v1>:41005, (locatorgemfire_4_2_host2_3852:3852:locator)<ec><v1>:41007, (locatorgemfire_4_1_host2_3843:3843:locator)<ec><v1>:41006, (peergemfire_4_1_host2_3959:3959)<ec><v2>:41010{lead}, (peergemfire_4_2_host2_3967:3967)<ec><v2>:41009] and coordinator would be (locatorgemfire_4_4_host2_3884:3884:locator)<ec><v1>:41005
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)