You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ignite TC Bot (Jira)" <ji...@apache.org> on 2021/03/17 12:17:00 UTC

[jira] [Commented] (IGNITE-13374) Initial PME hangs because of multiple blinking nodes

    [ https://issues.apache.org/jira/browse/IGNITE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303334#comment-17303334 ] 

Ignite TC Bot commented on IGNITE-13374:
----------------------------------------

{panel:title=Branch: [pull/8850/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/8850/head] Base: [master] : New Tests (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}Cache 6{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=5919076]]
* {color:#013220}IgniteCacheTestSuite6: ClientFastReplyCoordinatorFailureTest.testClientRepeatedReply - PASSED{color}

{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=5900929&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Initial PME hangs because of multiple blinking nodes
> ----------------------------------------------------
>
>                 Key: IGNITE-13374
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13374
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Root cause* of the issue is a race inside GridDhtPartitionsExchangeFuture on client side between two processes:
>  # When old coordinator fails and the new one takes over it sends GridDhtPartitionsSingleRequest messages to all nodes including clients to restore exchange results. Processing this message on client includes updating current coordinator reference (crd field).
>  # When future receives discovery notification about old coordinator failure it should detect change of coordinator and send GridDhtPartitionsSingleMessage to new coordinator to obtain affinity. But updated crd field prevents client from detecting coordinator failure and sending SingleMessage to new coordinator which in turn leads to hanging client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)