You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ignite TC Bot (Jira)" <ji...@apache.org> on 2021/09/08 16:30:00 UTC

[jira] [Commented] (IGNITE-13980) Remove duplicated ping: processing and raising StatusCheckMessage.

    [ https://issues.apache.org/jira/browse/IGNITE-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412068#comment-17412068 ] 

Ignite TC Bot commented on IGNITE-13980:
----------------------------------------

{panel:title=Branch: [pull/8696/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/8696/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6170374&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Remove duplicated ping: processing and raising StatusCheckMessage.
> ------------------------------------------------------------------
>
>                 Key: IGNITE-13980
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13980
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Minor
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Suggestion: remove duplicated ‘ping’, make the code simpler.
> To ensure some node isn't failed TcpDiscoverySpi has robust ping (TcpDiscoveryConnectionCheckMessage) and the backward connection check. But there is also status check message (TcpDiscoveryStatusCheckMessage) which looks outdated. This message was introduced with first versions of the discovery when the cluster stability and message delivery were under developing.
> Currently, TcpDiscoveryStatusCheckMessage is actually launched only at cluster start sometimes. And doesn't happen later due to the ping. The ping updates time of the message received which is the reason not to raise the status check.
> It is possible that node loses all incoming connection but keeps connection to next node. In this case the node gets removed from the ring by its follower. But cannot recognize the failure because it still successfully send message to next node. Instead of complex processing of TcpDiscoveryStatusCheckMessage, it iseems enough to answer on message 'OK, but you are not in the ring'. Every other node sees failure of malfunction node and can notify about it in the message response.
> The ticket has been additionally verified with the integration discovery test: https://github.com/apache/ignite/pull/8716
> We can keep TcpDiscoveryStatusCheckMessage for backward compatibility with older versions of Ignite. The subtask (IGNITE-14053) suggests complete removal of TcpDiscoveryStatusCheckMessage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)