You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Vyacheslav Koptilin (Jira)" <ji...@apache.org> on 2021/07/01 09:29:00 UTC

[jira] [Commented] (IGNITE-14394) Baseline auto-adjustment does not happen when 2+ nodes join the cluster

    [ https://issues.apache.org/jira/browse/IGNITE-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372577#comment-17372577 ] 

Vyacheslav Koptilin commented on IGNITE-14394:
----------------------------------------------

Hi [~agidaspov],

release notes have been updated.

> Baseline auto-adjustment does not happen when 2+ nodes join the cluster
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-14394
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14394
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vyacheslav Koptilin
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>             Fix For: 2.11
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the case of baseline topology autoadjustment is enabled and a few server nodes join the cluster the autoadjustment may not be triggered.
> Steps to reproduce:
> 1. start ignite cluster and enable baseline autoadjustment
> 2. start 2+ server nodes (in order to reproduce the issue, the corresponding exchange should be merged)
> 3. wait for baseline autoadjustment 
> *RESOLUTION*
> The root cause of the issue relates to merging exchanges. Let's consider the following scenario:
> 1. 2 new nodes join the cluster
> 2. the first step triggers a call BaselineTopologyUpdater#triggerBaselineUpdate twice and creates the following baseline data objects:
>     data1 - target topology version - X   
>     data2 - target topology version - X+1. This data object invalidates the previous one (data1)
> 3. in the case the exchanges are merged and the first data (data1) binds/listens to real {{GridDhtPartitionsExchangeFuture}} instead of affinityReadyFuture, see {{GridCachePartitionExchangeManager#affinityReadyFuture}} (data2 listens to affinityReadyFuture(X+1)), the implementation schedules data2 in the first place, and after that, it replaces data2 with data1 which is already invalidated at this point.
> The fix is quite obvious - we should not schedule baseline data instances that are related to "invalidated" target topology.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)