You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vladimir Steshin (Jira)" <ji...@apache.org> on 2020/06/05 10:56:00 UTC
[jira] [Commented] (IGNITE-13014) Remove double checking of node
availability.
[ https://issues.apache.org/jira/browse/IGNITE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126662#comment-17126662 ]
Vladimir Steshin commented on IGNITE-13014:
-------------------------------------------
Closed. Interferes with IGNITE-7163. The backward connection checking is required.
> Remove double checking of node availability.
> ---------------------------------------------
>
> Key: IGNITE-13014
> URL: https://issues.apache.org/jira/browse/IGNITE-13014
> Project: Ignite
> Issue Type: Improvement
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Attachments: FailureDetectionResearch.txt, NodeFailureResearch.patch, WostCaseStepByStep.txt
>
>
> Proposal:
> Do not check failed node second time. Double node checking prolongs node failure detection and gives no additional benefits. There are mesh and hardcoded values in this routine.
> For the present, we have double checking of node availability. Let's imagine node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks Node 3 to establish permanent connection instead of node 2. Node 3 may try to check node 2 too. Or may not.
> Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 2 * IgniteConfiguretion.failureDetectionTimeout + 300ms.
> See:
> * ‘NodeFailureResearch.patch'. It creates test 'FailureDetectionResearch' which emulates long answears on a failed node and measures failure detection delays.
> * 'FailureDetectionResearch.txt' - results of the test.
> * 'WostCaseStepByStep.txt' - description how the worst case happens.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)