You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Roman Puchkovskiy (Jira)" <ji...@apache.org> on 2023/02/01 08:14:00 UTC

[jira] [Created] (IGNITE-18685) Do not allow a node excluded from Logical Topology to enter Physical Topology again

Roman Puchkovskiy created IGNITE-18685:
------------------------------------------

             Summary: Do not allow a node excluded from Logical Topology to enter Physical Topology again
                 Key: IGNITE-18685
                 URL: https://issues.apache.org/jira/browse/IGNITE-18685
             Project: Ignite
          Issue Type: Improvement
            Reporter: Roman Puchkovskiy
            Assignee: Roman Puchkovskiy
             Fix For: 3.0.0-beta2


As per IGNITE-18630, a node excluded from Logical Topology (LT) must be excluded from Physical Topology (PT).

The following scenario is possible:
 # A node is a part of both PT and LT
 # Its network cable gets unplugged, but the node keeps being alive
 # After proper timeouts, the cluster removes the node from LT (and, hence, PT)
 # The network cable gets plugged again, so the node attempts to enter the PT with the same old ID (aka Launch ID)

In such a situation, the node must be refused entry, namely, a connection must be terminated on a handshake attempt. This has to be done both in {{RecoveryServerHandshakeManager}} and {{{}RecoveryClientHandshakeManager{}}}.

When a node is refused a connection attempt, the refusing node must first send an explaining message (like 'your ID is stale') and then close the physical connection.

The refused node must take measures to refresh its identity (like initiating a critical failure using a Failure Handler).

A subtle thing is how we persist the fact that some node ID is stale. For starters, we could make this information volatile (only keep it in memory), but later we could record this information using CMG.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)