You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ignite TC Bot (Jira)" <ji...@apache.org> on 2021/09/09 10:59:00 UTC

[jira] [Commented] (IGNITE-14068) Infinite node presence in the ring while outgoing connections are lost

    [ https://issues.apache.org/jira/browse/IGNITE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412512#comment-17412512 ] 

Ignite TC Bot commented on IGNITE-14068:
----------------------------------------

{panel:title=Branch: [pull/8881/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/8881/head] Base: [master] : New Tests (3)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}SPI{color} [[tests 3|https://ci.ignite.apache.org/viewLog.html?buildId=6171580]]
* {color:#013220}IgniteSpiTestSuite: TcpDiscoverySslSelfTest.testOutgoingConnectionsFailure - PASSED{color}
* {color:#013220}IgniteSpiTestSuite: TcpDiscoverySslTrustedSelfTest.testOutgoingConnectionsFailure - PASSED{color}
* {color:#013220}IgniteSpiTestSuite: TcpDiscoverySelfTest.testOutgoingConnectionsFailure - PASSED{color}

{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6171651&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Infinite node presence in the ring while outgoing connections are lost
> ----------------------------------------------------------------------
>
>                 Key: IGNITE-14068
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14068
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>          Time Spent: 8h
>  Remaining Estimate: 0h
>
> If node looses outgoing connections, it can decide it is alone in the cluster and won't fail. Happens on small clusters where failed node attempts to connect to every other node before connRecoveryTimeout expires.
> Consider:
> - The cluster n1 -> n2 -> n3 -> n4 -> n1
> - n4 looses all outgoing connections.
> - n3 keeps successful ping to n4.
> - n4 attempts to connect to n1, n2, n3. Fails with each due to outgoing network failure.
> - spi.connrecoveryTimeout is not reached. n4 decides it is alone and continues working.
> - n3 still sends messages to n4. n4 does not lack incoming connections.
> - ring is actually broken because of n4. n3 cannot determine failure of n4.
> Solution: node could watch its incoming traffic which notyfies of the incoming network. If all the outgoing connections are lost but messages are received, node must left the grid to prevent ring break.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)