You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Вячеслав Коптилин <sl...@gmail.com> on 2021/02/04 16:10:33 UTC

Re: data rebalancing and partition map exchange with persistence

Hello Allan,

> Does data rebalancing occur when a node leaves or joins, or only when you
manually change the baseline topology (assuming automatic baseline
adjustment is disabled)? Again, this is on a cluster with persistence
enabled.
Yes, this can happen when a node joins the cluster, for instance.
Let's consider the following scenario: you shut down a node that is a part
of the current baseline topology, and so, this node cannot apply updates.
After a while, this node was restarted and returned to the cluster. In this
case, rebalancing can be triggered in order to transfer that updates.

> 2. Sometimes I look at the partition counts of a cache across all the
nodes using
Arrays.stream(ignite.affinity(cacheName).primaryPartitions(severNode) and I
see 0 partition
> After a while it returns to a balanced state. What's going on here?
Well, when a partition needs to be rebalanced from one node (supplier) to
another one (demander) we create a partition on the demander in a MOVING
state (this means a backup that applies updates but cannot be used for
reads).
When this partition fully rebalanced it is switched to OWNING state and the
next PME (Late Affinity Assignment) may mark this partition as a primary.

> 3. Is there a way to manually invoke the partition map exchange process?
I don't think so.

> 4. Sometimes I see 'partition lost' errors. If i am using persistence and
all the baseline nodes are online and connected, is it safe to assume no
data has been lost and just call cache.resetLostPartitions(myCaches)?
If I am not mistaken, the answer is yes. Please take a look at
https://ignite.apache.org/docs/latest/configuring-caches/partition-loss-policy#recovering-from-a-partition-loss

Thanks,
S.

пт, 29 янв. 2021 г. в 15:56, Alan Ward <ar...@gmail.com>:

> I'm using Ignite 2.9.1, a 5 node cluster with persistence enabled,
> partitioned caches with 1 backup.
>
> I'm a bit confused about the difference between data rebalancing and
> partition map exchange in this context.
>
> 1. Does data rebalancing occur when a node leaves or joins, or only when
> you manually change the baseline topology (assuming automatic baseline
> adjustment is disabled)? Again, this is on a cluster with persistence
> enabled.
>
> 2. Sometimes I look at the partition counts of a cache across all the
> nodes using
> Arrays.stream(ignite.affinity(cacheName).primaryPartitions(severNode) and I
> see 0 partitions on one or even two nodes for some of the caches. After a
> while it returns to a balanced state. What's going on here? Is this data
> rebalancing at work, or is this the result of the partition map exchange
> process determining that one node is/was down and thus switching to use the
> backup partitions?
>
> 3. Is there a way to manually invoke the partition map exchange process? I
> figured it would happen on cluster restart, but even after restarting the
> cluster and seeing all baseline nodes connect I still observe the partition
> imbalance. It often takes hours for this to resolve.
>
> 4. Sometimes I see 'partition lost' errors. If i am using persistence and
> all the baseline nodes are online and connected, is it safe to assume no
> data has been lost and just call cache.resetLostPartitions(myCaches)? Is
> there a way calling that method would lead to data loss with persistence
> enabled?
>
> thanks for your help!
>

Re: data rebalancing and partition map exchange with persistence

Posted by Вячеслав Коптилин <sl...@gmail.com>.
Hi Alan,

I am sorry for the typo in your name, it was done unintentionally.

Thanks,
S.

чт, 4 февр. 2021 г. в 19:10, Вячеслав Коптилин <sl...@gmail.com>:

> Hello Allan,
>
> > Does data rebalancing occur when a node leaves or joins, or only when
> you manually change the baseline topology (assuming automatic baseline
> adjustment is disabled)? Again, this is on a cluster with persistence
> enabled.
> Yes, this can happen when a node joins the cluster, for instance.
> Let's consider the following scenario: you shut down a node that is a part
> of the current baseline topology, and so, this node cannot apply updates.
> After a while, this node was restarted and returned to the cluster. In
> this case, rebalancing can be triggered in order to transfer that updates.
>
> > 2. Sometimes I look at the partition counts of a cache across all the
> nodes using
> Arrays.stream(ignite.affinity(cacheName).primaryPartitions(severNode) and I
> see 0 partition
> > After a while it returns to a balanced state. What's going on here?
> Well, when a partition needs to be rebalanced from one node (supplier) to
> another one (demander) we create a partition on the demander in a MOVING
> state (this means a backup that applies updates but cannot be used for
> reads).
> When this partition fully rebalanced it is switched to OWNING state and
> the next PME (Late Affinity Assignment) may mark this partition as a
> primary.
>
> > 3. Is there a way to manually invoke the partition map exchange process?
> I don't think so.
>
> > 4. Sometimes I see 'partition lost' errors. If i am using persistence
> and all the baseline nodes are online and connected, is it safe to assume
> no data has been lost and just call cache.resetLostPartitions(myCaches)?
> If I am not mistaken, the answer is yes. Please take a look at
> https://ignite.apache.org/docs/latest/configuring-caches/partition-loss-policy#recovering-from-a-partition-loss
>
> Thanks,
> S.
>
> пт, 29 янв. 2021 г. в 15:56, Alan Ward <ar...@gmail.com>:
>
>> I'm using Ignite 2.9.1, a 5 node cluster with persistence enabled,
>> partitioned caches with 1 backup.
>>
>> I'm a bit confused about the difference between data rebalancing and
>> partition map exchange in this context.
>>
>> 1. Does data rebalancing occur when a node leaves or joins, or only when
>> you manually change the baseline topology (assuming automatic baseline
>> adjustment is disabled)? Again, this is on a cluster with persistence
>> enabled.
>>
>> 2. Sometimes I look at the partition counts of a cache across all the
>> nodes using
>> Arrays.stream(ignite.affinity(cacheName).primaryPartitions(severNode) and I
>> see 0 partitions on one or even two nodes for some of the caches. After a
>> while it returns to a balanced state. What's going on here? Is this data
>> rebalancing at work, or is this the result of the partition map exchange
>> process determining that one node is/was down and thus switching to use the
>> backup partitions?
>>
>> 3. Is there a way to manually invoke the partition map exchange process?
>> I figured it would happen on cluster restart, but even after restarting the
>> cluster and seeing all baseline nodes connect I still observe the partition
>> imbalance. It often takes hours for this to resolve.
>>
>> 4. Sometimes I see 'partition lost' errors. If i am using persistence and
>> all the baseline nodes are online and connected, is it safe to assume no
>> data has been lost and just call cache.resetLostPartitions(myCaches)? Is
>> there a way calling that method would lead to data loss with persistence
>> enabled?
>>
>> thanks for your help!
>>
>