You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bill Burcham (Jira)" <ji...@apache.org> on 2021/11/18 18:20:00 UTC

[jira] [Created] (GEODE-9822) Split-brain Possible During Network Partition in Two-Locator Cluster

Bill Burcham created GEODE-9822:
-----------------------------------

             Summary: Split-brain Possible During Network Partition in Two-Locator Cluster
                 Key: GEODE-9822
                 URL: https://issues.apache.org/jira/browse/GEODE-9822
             Project: Geode
          Issue Type: Bug
          Components: membership
            Reporter: Bill Burcham


In a two-locator cluster with default member weights and default setting (true) of enable-network-partition-detection, if a long-lived network partition separates the two members, a split-brain will arise: there will be two coordinators at the same time.

The reason for this can be found in the GMSJoinLeave.isNetworkPartition() method. That method's name is misleading. A name like majorityLost() would probably be more apt. It needs to return true iff the weight of "crashed" members (in the prospective view) is greater-than-or-equal-to 50% of the total weight (of all members in the current view).

What the method actually does is return true iff the weight of "crashed" members is greater-than 51% of the total weight. As a result, if we have two members of equal weight, and the coordinator sees that the non-coordinator is "crashed", the coordinator will keep running. If a network partition is happening, and the non-coordinator is still running, then it will become a coordinator and start producing views. Now we'll have two coordinators producing views concurrently.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)