You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/02/20 18:16:00 UTC

[jira] [Commented] (GEODE-6423) availability checks sometimes immediately initiate removal

    [ https://issues.apache.org/jira/browse/GEODE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773251#comment-16773251 ] 

ASF subversion and git services commented on GEODE-6423:
--------------------------------------------------------

Commit c2d3e389e79434141ea723d1dd974c0843f61270 in geode's branch refs/heads/feature/GEODE-6423 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c2d3e38 ]

GEODE-6423 availability checks sometimes immediately initiate removal

Ensure that the availability check is performed for the contracted
member-timeout period.  This allows a suspect to survive the check if
it's having a momentary glitch like a brief garbage-collection, or if
there is short network outage.


> availability checks sometimes immediately initiate removal
> ----------------------------------------------------------
>
>                 Key: GEODE-6423
>                 URL: https://issues.apache.org/jira/browse/GEODE-6423
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the network goes down the JGroupsMessenger service initiates suspect processing when it tries to send messages.  In 1.8 this seems to initiate immediate removal of the suspect.
> ioexception sending udp message initiates suspicion
> suspect processing initiates a final check
> the final check fails immediately (it's using a timed Socket.connect() which fails immediately)
> the member is declared dead
> {noformat}
> [info 2019/02/13 17:44:59.366 CST perf157-130-167-server1 <Geode Failure Detection thread 3> tid=0xc2] received suspect message from myself for 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] Performing final check for suspect member 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 5> tid=0xc4] Performing final check for suspect member 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 reason=Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] Failure detection is now watching 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 5> tid=0xc4] Failure detection is now watching 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 3> tid=0xc2] received suspect message from myself for 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201: Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure Detection thread 6> tid=0xc5] Performing final check for suspect member 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201 reason=Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure Detection thread 6> tid=0xc5] Failure detection is now watching 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 5> tid=0xc4] Final check failed for member 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 5> tid=0xc4] Requesting removal of suspect member 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] Final check failed for member 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] Requesting removal of suspect member 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] This member is becoming the membership coordinator with address 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 6> tid=0xc5] Final check failed for member 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201
> [info 2019/02/13 17:44:59.373 CST perf157-130-167-server1 <Geode Failure Detection thread 6> tid=0xc5] Requesting removal of suspect member 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201
> [info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] ViewCreator starting on:192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Membership View Creator> tid=0xc6] View Creator thread is starting
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership View Creator> tid=0xc6] 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 had a weight of 3
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership View Creator> tid=0xc6] 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 had a weight of 10
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership View Creator> tid=0xc6] preparing new view View[192.168.130.167(perf157-130-167-server1:225263)<v1>:16200|10] members: [192.168.130.167(perf157-130-167-server1:225263)<v1>:16200{lead}, 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201] crashed: [192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000, 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202]
> [info 2019/02/13 17:45:03.627 CST perf157-130-167-server1 <unicast receiver,perf157-130-167-62066> tid=0x21] received suspect message from 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 for 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: Unable to send messages to this member via JGroups
> [info 2019/02/13 17:45:03.718 CST perf157-130-167-server1 <unicast receiver,perf157-130-167-62066> tid=0x21] Membership received a request to remove 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 from 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups
> [severe 2019/02/13 17:45:03.719 CST perf157-130-167-server1 <unicast receiver,perf157-130-167-62066> tid=0x21] Membership service failure: Unable to send messages to this member via JGroups
> org.apache.geode.ForcedDisconnectException: Unable to send messages to this member via JGroups
> {noformat}
>  
> We expect the final check to respect the member-timeout setting.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)