You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Vinogradov (Jira)" <ji...@apache.org> on 2020/06/15 14:50:00 UTC
[jira] [Commented] (IGNITE-13012) Fix failure detection timeout.
Simplify node ping routine.
[ https://issues.apache.org/jira/browse/IGNITE-13012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135931#comment-17135931 ]
Anton Vinogradov commented on IGNITE-13012:
-------------------------------------------
LGTM,
[~sergey-chugunov], Could you please perform the @final check?
[~vladsz83], Could you please provide some bench (jmh?) as a proof?
> Fix failure detection timeout. Simplify node ping routine.
> ----------------------------------------------------------
>
> Key: IGNITE-13012
> URL: https://issues.apache.org/jira/browse/IGNITE-13012
> Project: Ignite
> Issue Type: Improvement
> Affects Versions: 2.8.1
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Labels: iep-45
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> Connection failure may not be detected within IgniteConfiguration.failureDetectionTimeout. Actual worst delay is: ServerImpl.CON_CHECK_INTERVAL + IgniteConfiguration.failureDetectionTimeout. Node ping routine is duplicated.
> We should fix:
> 1. Failure detection timeout should take in account last sent message. Current ping is bound to own time:
> {code:java}ServerImpl. RingMessageWorker.lastTimeConnCheckMsgSent{code}
> This is weird because any discovery message check connection.
> 2. Make connection check interval depend on failure detection timeout (FTD). Current value is a constant:
> {code:java}static int ServerImpls.CON_CHECK_INTERVAL = 500{code}
> 3. Remove additional, quickened connection checking. Once we do fix 1, this will become even more useless.
> Despite TCP discovery has a period of connection checking, it may send ping before this period exhausts. This premature node ping relies on the time of any sent or even any received message.
> 4. Do not worry user with “Node seems disconnected” when everything is OK. Once we do fix 1 and 3, this will become even more useless.
> Node may log on INFO: “Local node seems to be disconnected from topology …” whereas it is not actually disconnected at all.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)