You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by maxi628 <ma...@gmail.com> on 2021/01/13 21:15:53 UTC

Ignite rebalancing when a server is rebooted w/ persistance enabled.

 Hello everyone.

I have several ignite clusters with version 2.7.6 and persistence enabled.
I have a 3 caches on every cluster, with 10M~ records each.

Sometimes when I reboot a node, it takes a lot of time to boot, it can be
hours.

With rebooting I mean stopping the container that's running ignite and
starting it again, without ever changing the baseline topology, it can take
2 minutes to restart the container.
The node joins the topology just fine but takes a long time to start serving
traffic.

Checking the logs I've found that there are several lines like this ones
here:



So for some reason after booting it starts a process called
PartitionsEvictManager, which can take a lot of time.
What is the intended functionality behind PartitionsEvictManager?
It is something that we should expect?

This is a problem because a rolling restart of all nodes in a cluster can
take up to a day.

Thanks.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite rebalancing when a server is rebooted w/ persistance enabled.

Posted by maxi628 <ma...@gmail.com>.

The cluster was almost idle.
It didn't receive lots of updates while that node was down.

Is there any way to confirm which of those two options you mentioned was
executed?
Is there any way to configure a threshold to choose one of those two
options?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite rebalancing when a server is rebooted w/ persistance enabled.

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

While the node was down, the partitions that it previously owned had their
data updated.

At this point we only have two options:
- Throw out existing partitions and rebalance them. AFAIK it involves WAL
so it will take some time. I have heard that if you wipe node's persistence
then it won't use WAL during rebalancing, which should help a lot. However,
I'm not confident here.
- Use historical rebalance, where the node will try to use other nodes'
WALs to get its partitions up to speed. Should be pretty fast, at least if
the rate of change in the cluster is low. However, as far as I know it will
only be used under specific circumstances, maybe you didn't get lucky here.

Regards,
-- 
Ilya Kasnacheev


чт, 14 янв. 2021 г. в 20:02, maxi628 <ma...@gmail.com>:

> Sorry, I'm attaching the log here  ignite_eviction.log
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t3058/ignite_eviction.log>
>
>
> I've read https://issues.apache.org/jira/browse/IGNITE-11974 and the thing
> is, this isn't an infinite loop.
> The remainingPartsToEvict=$something starts going down until it reaches 0,
> and that's when we consider the node completely up.
>
> My question is, it is expected for a node to try rebalance if it only went
> down for 2 minutes being part of a baseline topology with persistence
> enabled?
> All caches are Partitioned with 2 backups, and only 1 node is being
> restarted at a time.
> So shouldn't the other nodes with backups of the primary partitions of this
> node cover up for that node until it boots up again?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

RE: Ignite rebalancing when a server is rebooted w/ persistance enabled.

Posted by maxi628 <ma...@gmail.com>.

Sorry, I'm attaching the log here  ignite_eviction.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t3058/ignite_eviction.log>  

I've read https://issues.apache.org/jira/browse/IGNITE-11974 and the thing
is, this isn't an infinite loop. 
The remainingPartsToEvict=$something starts going down until it reaches 0,
and that's when we consider the node completely up.

My question is, it is expected for a node to try rebalance if it only went
down for 2 minutes being part of a baseline topology with persistence
enabled? 
All caches are Partitioned with 2 backups, and only 1 node is being
restarted at a time. 
So shouldn't the other nodes with backups of the primary partitions of this
node cover up for that node until it boots up again?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Ignite rebalancing when a server is rebooted w/ persistance enabled.

Posted by Alexandr Shapkin <le...@gmail.com>.

Hi,



Looks like the error message is truncated. Could you please re-send it or
attach the full log file?



PartitionsEvictManager is part of rebalancing routine and it clears local data
before demanding it from other nodes.



Also I see the following JIRA
https://issues.apache.org/jira/browse/IGNITE-11974





 **From:**[maxi628](mailto:maximiliano.628@gmail.com)  
 **Sent:** Thursday, January 14, 2021 12:16 AM  
 **To:**[user@ignite.apache.org](mailto:user@ignite.apache.org)  
 **Subject:** Ignite rebalancing when a server is rebooted w/ persistance
enabled.



Hello everyone.



I have several ignite clusters with version 2.7.6 and persistence enabled.

I have a 3 caches on every cluster, with 10M~ records each.



Sometimes when I reboot a node, it takes a lot of time to boot, it can be

hours.



With rebooting I mean stopping the container that's running ignite and

starting it again, without ever changing the baseline topology, it can take

2 minutes to restart the container.

The node joins the topology just fine but takes a long time to start serving

traffic.



Checking the logs I've found that there are several lines like this ones

here:







So for some reason after booting it starts a process called

PartitionsEvictManager, which can take a lot of time.

What is the intended functionality behind PartitionsEvictManager?

It is something that we should expect?



This is a problem because a rolling restart of all nodes in a cluster can

take up to a day.



Thanks.









\--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/