You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Owen Nichols (Jira)" <ji...@apache.org> on 2020/09/02 01:38:00 UTC
[jira] [Updated] (GEODE-8467) server fails to notify of a
ForcedDisconnect and fails to tear down the cache
[ https://issues.apache.org/jira/browse/GEODE-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen Nichols updated GEODE-8467:
--------------------------------
Fix Version/s: 1.14.0
> server fails to notify of a ForcedDisconnect and fails to tear down the cache
> -----------------------------------------------------------------------------
>
> Key: GEODE-8467
> URL: https://issues.apache.org/jira/browse/GEODE-8467
> Project: Geode
> Issue Type: Bug
> Components: membership
> Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0
> Reporter: Bruce J Schuchardt
> Assignee: Bruce J Schuchardt
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.13.0, 1.14.0
>
>
> A test having auto-reconnect enabled failed while restarting a server and hung. The restarting server was building its cache when it was kicked out of the cluster due to very high load on the test machine. Membership initiated a forced-disconnect
> {noformat}
> [fatal 2020/08/22 00:51:04.508 PDT <unicast receiver,rs-GEM-3035-PG2231-2a2i3large-hydra-client-25-42721> tid=0x23] Membership service failure: Member isn't responding to heartbeat requests
> org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException: Member isn't responding to heartbeat requests
> at org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:2012)
> at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1085)
> at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:688)
> at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1331)
> at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1267)
> {noformat}
>
> and then logged that it was generating a description of the cache
> {noformat}
> [info 2020/08/22 00:51:05.933 PDT <unicast receiver,rs-GEM-3035-PG2231-2a2i3large-hydra-client-25-42721> tid=0x23] generating XML to rebuild the cache after reconnect completes {noformat}
>
> but it never logged completion of this step and never forked a thread to tear down the cache. Any exception thrown by XML generation would have been caught by JGroups code, which logs the problem at a WARNING level. We have JGroups logging set to FATAL level so you wouldn't see the issue.
> We need to add exception handling around XML generation and, if detected, disable reconnect attempts and have the server shut down.
> The bug isn't easy to hit. I've run the test that failed over 5000 times without encountering it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)