You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Roman Puchkovskiy (Jira)" <ji...@apache.org> on 2023/02/01 08:14:00 UTC
[jira] [Created] (IGNITE-18685) Do not allow a node excluded from Logical Topology to enter Physical Topology again
Roman Puchkovskiy created IGNITE-18685:
------------------------------------------
Summary: Do not allow a node excluded from Logical Topology to enter Physical Topology again
Key: IGNITE-18685
URL: https://issues.apache.org/jira/browse/IGNITE-18685
Project: Ignite
Issue Type: Improvement
Reporter: Roman Puchkovskiy
Assignee: Roman Puchkovskiy
Fix For: 3.0.0-beta2
As per IGNITE-18630, a node excluded from Logical Topology (LT) must be excluded from Physical Topology (PT).
The following scenario is possible:
# A node is a part of both PT and LT
# Its network cable gets unplugged, but the node keeps being alive
# After proper timeouts, the cluster removes the node from LT (and, hence, PT)
# The network cable gets plugged again, so the node attempts to enter the PT with the same old ID (aka Launch ID)
In such a situation, the node must be refused entry, namely, a connection must be terminated on a handshake attempt. This has to be done both in {{RecoveryServerHandshakeManager}} and {{{}RecoveryClientHandshakeManager{}}}.
When a node is refused a connection attempt, the refusing node must first send an explaining message (like 'your ID is stale') and then close the physical connection.
The refused node must take measures to refresh its identity (like initiating a critical failure using a Failure Handler).
A subtle thing is how we persist the fact that some node ID is stale. For starters, we could make this information volatile (only keep it in memory), but later we could record this information using CMG.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)