You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ignite TC Bot (Jira)" <ji...@apache.org> on 2023/05/24 19:41:00 UTC
[jira] [Commented] (IGNITE-19410) Node failure in case multiple nodes join and leave a cluster simultaneously with security is enabled.

    [ https://issues.apache.org/jira/browse/IGNITE-19410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725933#comment-17725933 ] 

Ignite TC Bot commented on IGNITE-19410:
----------------------------------------

{panel:title=Branch: [pull/10701/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/10701/head] Base: [master] : New Tests (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}Security{color} [[tests 1|https://ci2.ignite.apache.org/viewLog.html?buildId=7188616]]
* {color:#013220}SecurityTestSuite: NodeSecurityContextPropagationTest.testProcessCustomDiscoveryMessageFromLeftNode - PASSED{color}

{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7186616&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Node failure in case multiple nodes  join and leave a cluster simultaneously with security is enabled.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-19410
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19410
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Petrov
>            Priority: Major
>              Labels: ise
>         Attachments: NodeSecurityContextTest.java
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The case when nodes with security enabled join and leave the cluster simultaneously can cause the joining nodes to fail with the following exception:
> {code:java}
> [2023-05-03T14:54:31,208][ERROR][disco-notifier-worker-#332%ignite.NodeSecurityContextTest2%][IgniteTestResources] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Failed to find security context for subject with given ID : 4725544a-f144-4486-a705-46b2ac200011]]
>  java.lang.IllegalStateException: Failed to find security context for subject with given ID : 4725544a-f144-4486-a705-46b2ac200011
>     at org.apache.ignite.internal.processors.security.IgniteSecurityProcessor.withContext(IgniteSecurityProcessor.java:164) ~[classes/:?]
>     at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$3$SecurityAwareNotificationTask.run(GridDiscoveryManager.java:949) ~[classes/:?]
>     at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2822) ~[classes/:?]
>     at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2860) [classes/:?]
>     at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) [classes/:?]
>     at java.lang.Thread.run(Thread.java:750) [?:1.8.0_351] {code}
> Reproducer is attached.
> Simplified steps that leads to the failure:
> 1. The client node sends an arbitrary discovery message which produces an acknowledgement message when it processed by the all cluster nodes .
> 2. The client node gracefully leaves the cluster.
> 3. The new node joins the cluster and receives a topology snapshot that does not include the left client node.
> 4. The new node receives an acknowledgment for the message from the step 1 and fails during its processing because message originator node is not listed in the current discovery cache or discovery cache history (see IgniteSecurityProcessor#withContext(java.util.UUID)) . This is because currently the GridDiscoveryManager#historicalNode method only aware of the topology history that occurs after a node has joined the cluster. The complete cluster topology history that exists at the time a new node joined the cluster is stored in GridDiscoveryManager#topHist and is not taken into account by the GridDiscoveryManager#historicalNode method.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)