You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by "Alexey Scherbakov (Jira)" <ji...@apache.org> on 2021/04/12 09:11:00 UTC

[jira] [Created] (IGNITE-14517) Inconsistent behavior of the method NetworkCluster#allMembers

Alexey Scherbakov created IGNITE-14517:
------------------------------------------

             Summary: Inconsistent behavior of the method NetworkCluster#allMembers
                 Key: IGNITE-14517
                 URL: https://issues.apache.org/jira/browse/IGNITE-14517
             Project: Ignite
          Issue Type: Bug
            Reporter: Alexey Scherbakov


This method reports invalid number of alive nodes when a node is stopped gracefully.

The scenario:
 # Start the cluster of 3 nodes: n1, n2, n3.
 # Request n2 NetworkCluster#allMembers. It will return 3.
 # Stop node n1.
 # Request n2 NetworkCluster#allMembers again. It will return 3, but should return 2.

Here is a failed test from my working branch [1][2]

I've looked into scalecube code and found out the node is stuck in the io.scalecube.cluster.membership.MembershipProtocolImpl#membershipTable having MemberStatus.LEAVING state.

The possible fix would avoid using cluster.members at all and instead use events to maintain local topology. Events seem to work fine.

[1] https://github.com/gridgain/apache-ignite-3/blob/ignite-13885/modules/network/src/integrationTest/java/org/apache/ignite/network/scalecube/ITScaleCubeNetworkClusterMessagingTest.java 

[2] https://ci.ignite.apache.org/viewLog.html?buildId=5963158&tab=buildResultsDiv&buildTypeId=ignite3_Tests_IntegrationTests&branch_ignite3_Tests=pull%2F78



--
This message was sent by Atlassian Jira
(v8.3.4#803005)