You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "Abhishek Gupta (BLOOMBERG/ 919 3RD A)" <ag...@bloomberg.net> on 2020/01/28 22:58:08 UTC

"Adding entry to partition that is concurrently evicted" error

Hello!
     I've got a 6 node Ignite 2.7.5 grid. I had this strange issue where multiple nodes hit the following exception - 

[ERROR] [sys-stripe-53-#54] GridCacheIoManager - Failed to process message [senderId=f4a736b6-cfff-4548-a8b4-358d54d19ac6, messageType=class o.a.i.i.processors.cache.distributed.near.GridNearGetRequest]
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException: Adding entry to partition that is concurrently evicted [grp=mainCache, part=733, shouldBeMoving=, belongs=false, topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1]]

and then died after 
2020-01-27 13:30:19.849 [ERROR] [ttl-cleanup-worker-#159]  - JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.i.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException [part=1013, msg=Adding entry to partition that is concurrently evicted [grp=mainCache, part=1013, shouldBeMoving=, belongs=false, topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1]]]]]

The sequence of events was simply the following - 
One of the nodes (lets call it node 1) was down for 2.5 hours and restarted. After a configured delay of 20 mins, it started to rebalance from the other 5 nodes. There were no other nodes that joined or left in this period. 40 minutes into the rebalance the the above errors started showing in the other nodes and they just bounced, and therefore there was data loss. 

I found a few links related to this but nothing that explained the root cause or what my work around could be - 

* http://apache-ignite-users.70518.x6.nabble.com/Adding-entry-to-partition-that-is-concurrently-evicted-td24782.html#a24786
* https://issues.apache.org/jira/browse/IGNITE-9803
* https://issues.apache.org/jira/browse/IGNITE-11620


Thanks,
Abhishek


Re: "Adding entry to partition that is concurrently evicted" error

Posted by Andrei Aleksandrov <ae...@gmail.com>.
Hi,

Current problem should be solved in ignite-2.8. I am not sure why this 
fix isn't a part of ignite-2.7.6.

https://issues.apache.org/jira/browse/IGNITE-11127

Your cluster was stopped because of failure handler work.

https://apacheignite.readme.io/docs/critical-failures-handling#section-failure-handling

I am not sure about possible workarounds here (probably you can set the 
NoOpFailureHandler). You also can try to create the thread on developer 
user list:

http://apache-ignite-developers.2346864.n4.nabble.com/Apache-Ignite-2-7-release-td34076i40.html

BR,
Andrei<http://apache-ignite-developers.2346864.n4.nabble.com/Apache-Ignite-2-7-release-td34076i40.html>

1/29/2020 1:58 AM, Abhishek Gupta (BLOOMBERG/ 919 3RD A) пишет:
> Hello! I've got a 6 node Ignite 2.7.5 grid. I had this strange issue 
> where multiple nodes hit the following exception - [ERROR] 
> [sys-stripe-53-#54] GridCacheIoManager - Failed to process message 
> [senderId=f4a736b6-cfff-4548-a8b4-358d54d19ac6, messageType=class 
> o.a.i.i.processors.cache.distributed.near.GridNearGetRequest] 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException: 
> Adding entry to partition that is concurrently evicted [grp=mainCache, 
> part=733, shouldBeMoving=, belongs=false, 
> topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], 
> curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1]] and 
> then died after 2020-01-27 13:30:19.849 [ERROR] 
> [ttl-cleanup-worker-#159] - JVM will be halted immediately due to the 
> failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
> err=class 
> o.a.i.i.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException 
> [part=1013, msg=Adding entry to partition that is concurrently evicted 
> [grp=mainCache, part=1013, shouldBeMoving=, belongs=false, 
> topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], 
> curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1]]]]] The 
> sequence of events was simply the following -
> One of the nodes (lets call it node 1) was down for 2.5 hours and 
> restarted. After a configured delay of 20 mins, it started to 
> rebalance from the other 5 nodes. There were no other nodes that 
> joined or left in this period. 40 minutes into the rebalance the the 
> above errors started showing in the other nodes and they just bounced, 
> and therefore there was data loss. I found a few links related to this 
> but nothing that explained the root cause or what my work around could 
> be - * 
> http://apache-ignite-users.70518.x6.nabble.com/Adding-entry-to-partition-that-is-concurrently-evicted-td24782.html#a24786 
> * https://issues.apache.org/jira/browse/IGNITE-9803
> * https://issues.apache.org/jira/browse/IGNITE-11620
> Thanks, Abhishek