You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Semen Boikov (JIRA)" <ji...@apache.org> on 2017/10/23 10:10:00 UTC
[jira] [Created] (IGNITE-6700) Node considered as failed can cause
failure of others nodes
Semen Boikov created IGNITE-6700:
------------------------------------
Summary: Node considered as failed can cause failure of others nodes
Key: IGNITE-6700
URL: https://issues.apache.org/jira/browse/IGNITE-6700
Project: Ignite
Issue Type: Bug
Security Level: Public (Viewable by anyone)
Components: general
Reporter: Semen Boikov
Assignee: Semen Boikov
Priority: Critical
Node considered as failed can cause failure of others nodes in cluster.
There is an issue in TcpDiscoveryAbstractMessage.failedNodes processing, if message is received from node considered as failed, then failedNodes should be ignored.
Possible scenario:
- there are 4 nodes (1 -> 2 -> 3 -> 4)
- node 3 temporary lost connection with others
- node 2 considers 3 as failed, node failed event is fired for 3
- node 3 considers 4 as failed, adds 4 in nodeFailedList, then it restores connection with 1 and currently 1 will process nodeFailedList from 3 (even if 3 is already considered as failed)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)