You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vladimir Steshin (Jira)" <ji...@apache.org> on 2021/01/29 11:17:00 UTC
[jira] [Updated] (IGNITE-14068) Infinite node persistance in the
ring while outgoing connections are lost
[ https://issues.apache.org/jira/browse/IGNITE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Steshin updated IGNITE-14068:
--------------------------------------
Summary: Infinite node persistance in the ring while outgoing connections are lost (was: Infinite node persistance in the ring while outcoming connections are lost)
> Infinite node persistance in the ring while outgoing connections are lost
> -------------------------------------------------------------------------
>
> Key: IGNITE-14068
> URL: https://issues.apache.org/jira/browse/IGNITE-14068
> Project: Ignite
> Issue Type: Bug
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> If node loses +outgoing+ connections, it can decide it is alone in the cluster and won't fail. Happens on small clusters where failed node is able to unsuccessfully try to connect to all other nodes before _connRecoveryTimeout_ expires.
> Consider:
> The cluster n1 -> n2 -> n3 -> n4 -> n1
> * n4 looses all outgoing connections.
> * n3 keeps successful ping to n4.
> * n4 attempts to connect to n1, n2, n3. Fails with each due to outgoing network failure.
> * spi.connrecoveryTimeout is not reached. n4 decides it is alone and continues working.
> * n3 still sends messages to n4. n4 does not lack incoming connections.
> * ring is actually broken because of n4. n3 cannot determine failure of n4.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)