You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/04/02 20:51:00 UTC

[jira] [Commented] (GEODE-6423) availability checks sometimes immediately initiate removal

    [ https://issues.apache.org/jira/browse/GEODE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808147#comment-16808147 ] 

ASF subversion and git services commented on GEODE-6423:
--------------------------------------------------------

Commit 251436cfb8618580a331841c8dd7938e48b56a8c in geode's branch refs/heads/feature/GEODE-6423b from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=251436c ]

GEODE-6423 availability checks sometimes immediately initiate removal

Do not loop in trying to form a tcp/ip connection to a suspect unless
the next step is to remove the suspect from membership.  In this case
there will be another invocation of the same method that will take the
removal step next.


> availability checks sometimes immediately initiate removal
> ----------------------------------------------------------
>
>                 Key: GEODE-6423
>                 URL: https://issues.apache.org/jira/browse/GEODE-6423
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>             Fix For: 1.9.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> If the network goes down the JGroupsMessenger service initiates suspect processing when it tries to send messages.  In 1.8 this seems to initiate immediate removal of the suspect.
> ioexception sending udp message initiates suspicion
> suspect processing initiates a final check
> the final check fails immediately (it's using a timed Socket.connect() which fails immediately)
> the member is declared dead
> {noformat}
> [info 2019/02/13 17:44:59.366 CST perf157-130-167-server1 <Geode Failure Detection thread 3> tid=0xc2] received suspect message from myself for 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] Performing final check for suspect member 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 5> tid=0xc4] Performing final check for suspect member 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 reason=Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] Failure detection is now watching 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 5> tid=0xc4] Failure detection is now watching 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure Detection thread 3> tid=0xc2] received suspect message from myself for 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201: Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure Detection thread 6> tid=0xc5] Performing final check for suspect member 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201 reason=Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure Detection thread 6> tid=0xc5] Failure detection is now watching 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 5> tid=0xc4] Final check failed for member 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 5> tid=0xc4] Requesting removal of suspect member 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] Final check failed for member 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] Requesting removal of suspect member 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] This member is becoming the membership coordinator with address 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure Detection thread 6> tid=0xc5] Final check failed for member 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201
> [info 2019/02/13 17:44:59.373 CST perf157-130-167-server1 <Geode Failure Detection thread 6> tid=0xc5] Requesting removal of suspect member 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201
> [info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Failure Detection thread 4> tid=0xc3] ViewCreator starting on:192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Membership View Creator> tid=0xc6] View Creator thread is starting
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership View Creator> tid=0xc6] 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 had a weight of 3
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership View Creator> tid=0xc6] 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 had a weight of 10
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership View Creator> tid=0xc6] preparing new view View[192.168.130.167(perf157-130-167-server1:225263)<v1>:16200|10] members: [192.168.130.167(perf157-130-167-server1:225263)<v1>:16200{lead}, 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201] crashed: [192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000, 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202]
> [info 2019/02/13 17:45:03.627 CST perf157-130-167-server1 <unicast receiver,perf157-130-167-62066> tid=0x21] received suspect message from 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 for 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: Unable to send messages to this member via JGroups
> [info 2019/02/13 17:45:03.718 CST perf157-130-167-server1 <unicast receiver,perf157-130-167-62066> tid=0x21] Membership received a request to remove 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 from 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups
> [severe 2019/02/13 17:45:03.719 CST perf157-130-167-server1 <unicast receiver,perf157-130-167-62066> tid=0x21] Membership service failure: Unable to send messages to this member via JGroups
> org.apache.geode.ForcedDisconnectException: Unable to send messages to this member via JGroups
> {noformat}
>  
> We expect the final check to respect the member-timeout setting.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)