You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Roman Puchkovskiy (Jira)" <ji...@apache.org> on 2022/12/15 14:32:00 UTC

[jira] [Created] (IGNITE-18423) It is possible for a node to receive a message before the sender appears in topology

Roman Puchkovskiy created IGNITE-18423:
------------------------------------------

             Summary: It is possible for a node to receive a message before the sender appears in topology
                 Key: IGNITE-18423
                 URL: https://issues.apache.org/jira/browse/IGNITE-18423
             Project: Ignite
          Issue Type: Bug
          Components: networking
            Reporter: Roman Puchkovskiy
             Fix For: 3.0.0-beta2


The scenario is as follows:
 # Two nodes (A and B) know about each other (i.e. each of them has the other node in its physical topology)
 # A network partition happens, so the nodes disappear from their counterparts' topologies
 # The partition disappears, and A starts seeing B (i.e. B gets added to A's view of the topology)
 # A sends a message to B using invoke
 # B fails to process the message (at least, it fails to send a response) because A is not yet seen by B (A is not in B's view of the topology)
 # B starts seeing A

The problem is that a channel might be established before the other side node is added to the topology.

It seems that there are two paths we could follow:
 # We don't use Scalecube's transport for sending messages; instead, we only use its discovery capabilities, and Scalecube uses our transport. As a result, our event listener (notified about a node being added to the topology) might be out of sync with the actual connection event. It is probable that Scalecube guarantees these events to be in sync, if its native transport is used. We could investigate how it does it (if it does) and see whether we can do the same.
 # Alternatively, a node establishing a connection might present itself (send the corresponding {{{}ClusterNode{}}}) as a part of a handshake procedure, and we could immediately simulate the corresponding event on the receiving side, hence closing the gap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)