You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "Lo, Marcus " <ma...@citi.com> on 2021/05/20 09:50:07 UTC

Multiple ignite nodes crashed at the same time due to "Maximum number of retries 100000 reached for Put operation" error

Hi,

We have a 4 node ignite cluster setup. After running the cluster for 1 day, we encounter the following error almost at the same time at node #2, #3, and #4:

Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Maximum number of retries 1000 reached for Put operation (the tree may be corrupted). Increase IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this message (current value is 1000).]] org.apache.ignite.IgniteCheckedException: Maximum number of retries 1000 reached for Put operation (the tree may be corrupted). Increase IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this message (current value is 1000). at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Get.checkLockRetry(BPlusTree.java:3109) [ignite-core-2.10.0.jar:2.10.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.checkLockRetry(BPlusTree.java:3906) [ignite-core-2.10.0.jar:2.10.0]

Tried increasing IGNITE_BPLUS_TREE_LOCK_RETRIES to 100,000 and restarted the nodes, but it didn't help and the node went into the same error straight away.

Can you please shed some lights on how to resolve the issue? Thanks.

I also attach the logs for your reference:
ignite-node-[1,2,3,4].log: the full log files for all nodes
ignite-restart.log: the log for node 2 when it crashed

Regards,
Marcus


RE: Multiple ignite nodes crashed at the same time due to "Maximum number of retries 100000 reached for Put operation" error

Posted by "Lo, Marcus " <ma...@citi.com>.
Hi Ilya,

Unfortunately I have to rebuild the database and did not keep the persistence files. But you are right that the failed nodes fails every time on restart.

I will see if I can reproduce the issue – in the meantime do you have any suggestions on what I should check?

Regards,
Marcus

From: [gmail.com] Ilya Kasnacheev <il...@gmail.com>
Sent: Thursday, May 20, 2021 11:11 PM
To: user@ignite.apache.org
Subject: Re: Multiple ignite nodes crashed at the same time due to "Maximum number of retries 100000 reached for Put operation" error

Hello!

This looks like a PDS corruption to me. Can you by chance share persistence files from problematic node? I am assuming that it fails every time on restart?

Regards,
--
Ilya Kasnacheev


чт, 20 мая 2021 г. в 12:52, Lo, Marcus <ma...@citi.com>>:
Hi,

We have a 4 node ignite cluster setup. After running the cluster for 1 day, we encounter the following error almost at the same time at node #2, #3, and #4:

Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Maximum number of retries 1000 reached for Put operation (the tree may be corrupted). Increase IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this message (current value is 1000).]] org.apache.ignite.IgniteCheckedException: Maximum number of retries 1000 reached for Put operation (the tree may be corrupted). Increase IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this message (current value is 1000). at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Get.checkLockRetry(BPlusTree.java<https://urldefense.com/v3/__http:/BPlusTree.java__;!!Jkho33Y!wXDvTy9zDPsGD_42OvuMYDtim1VCECJc2bGN7afJsQSV61qWiDKm48UYwDkgwA$>:3109) [ignite-core-2.10.0.jar:2.10.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.checkLockRetry(BPlusTree.java<https://urldefense.com/v3/__http:/BPlusTree.java__;!!Jkho33Y!wXDvTy9zDPsGD_42OvuMYDtim1VCECJc2bGN7afJsQSV61qWiDKm48UYwDkgwA$>:3906) [ignite-core-2.10.0.jar:2.10.0]

Tried increasing IGNITE_BPLUS_TREE_LOCK_RETRIES to 100,000 and restarted the nodes, but it didn’t help and the node went into the same error straight away.

Can you please shed some lights on how to resolve the issue? Thanks.

I also attach the logs for your reference:
ignite-node-[1,2,3,4].log: the full log files for all nodes
ignite-restart.log: the log for node 2 when it crashed

Regards,
Marcus


Re: Multiple ignite nodes crashed at the same time due to "Maximum number of retries 100000 reached for Put operation" error

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

This looks like a PDS corruption to me. Can you by chance share persistence
files from problematic node? I am assuming that it fails every time on
restart?

Regards,
-- 
Ilya Kasnacheev


чт, 20 мая 2021 г. в 12:52, Lo, Marcus <ma...@citi.com>:

> Hi,
>
>
>
> We have a 4 node ignite cluster setup. After running the cluster for 1
> day, we encounter the following error almost at the same time at node #2,
> #3, and #4:
>
>
>
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [
> SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.IgniteCheckedException: Maximum number of retries 1000 reached for
> Put operation (the tree may be corrupted). Increase
> IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this
> message (current value is 1000).]]
> org.apache.ignite.IgniteCheckedException: Maximum number of retries 1000
> reached for Put operation (the tree may be corrupted). Increase
> IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this
> message (current value is 1000). at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Get.checkLockRetry
> (BPlusTree.java:3109) [ignite-core-2.10.0.jar:2.10.0] at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.checkLockRetry
> (BPlusTree.java:3906) [ignite-core-2.10.0.jar:2.10.0]
>
>
>
> Tried increasing IGNITE_BPLUS_TREE_LOCK_RETRIES to 100,000 and restarted
> the nodes, but it didn’t help and the node went into the same error
> straight away.
>
>
>
> Can you please shed some lights on how to resolve the issue? Thanks.
>
>
>
> I also attach the logs for your reference:
>
> ignite-node-[1,2,3,4].log: the full log files for all nodes
>
> ignite-restart.log: the log for node 2 when it crashed
>
>
>
> Regards,
>
> Marcus
>
>
>