You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bruce J Schuchardt (Jira)" <ji...@apache.org> on 2020/11/17 21:20:00 UTC

[jira] [Updated] (GEODE-8721) member that should become coordinator never detects loss of current coordinator

     [ https://issues.apache.org/jira/browse/GEODE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bruce J Schuchardt updated GEODE-8721:
--------------------------------------
    Labels: release-blocker  (was: )

> member that should become coordinator never detects loss of current coordinator
> -------------------------------------------------------------------------------
>
>                 Key: GEODE-8721
>                 URL: https://issues.apache.org/jira/browse/GEODE-8721
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.14.0
>            Reporter: Bruce J Schuchardt
>            Priority: Major
>              Labels: release-blocker
>
> During a network partition a server that should have become membership coordinator and shut down its side of the partition never detected the loss of a server on the other side of the partition.  Instead it continually performed availability checks on that other server and the checks passed.  Its log file had continually increasing timestamps for when it claimed the other server had contacted it, which was not possible due to the network partition (which was formed through iptable manipulation).
> At least one other server on its side of the network partition was doing the same thing.  It looks like they were interfering with each others availability checks in some way.
> {noformat}
> locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:12 PDT 2020
> locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:14 PDT 2020
> bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability check for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups
> bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability check for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups
> bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:16 PDT 2020
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)