You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Semen Boikov (JIRA)" <ji...@apache.org> on 2017/10/23 10:10:00 UTC

[jira] [Created] (IGNITE-6700) Node considered as failed can cause failure of others nodes

Semen Boikov created IGNITE-6700:
------------------------------------

             Summary: Node considered as failed can cause failure of others nodes
                 Key: IGNITE-6700
                 URL: https://issues.apache.org/jira/browse/IGNITE-6700
             Project: Ignite
          Issue Type: Bug
      Security Level: Public (Viewable by anyone)
          Components: general
            Reporter: Semen Boikov
            Assignee: Semen Boikov
            Priority: Critical


Node considered as failed can cause failure of others nodes in cluster. 

There is an issue in TcpDiscoveryAbstractMessage.failedNodes processing, if message is received from node considered as failed, then failedNodes should be ignored.

Possible scenario:
- there are 4 nodes (1 -> 2 -> 3 -> 4)
- node 3 temporary lost connection with others
- node 2 considers 3 as failed, node failed event is fired for 3
- node 3 considers 4 as failed, adds 4 in nodeFailedList, then it restores connection with 1 and currently 1 will process nodeFailedList from 3 (even if 3 is already considered as failed)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)