You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Igor (Jira)" <ji...@apache.org> on 2024/03/21 15:09:00 UTC
[jira] (IGNITE-21639) Server after kill does not start and stuck on election
[ https://issues.apache.org/jira/browse/IGNITE-21639 ]
Igor deleted comment on IGNITE-21639:
-------------------------------
was (Author: JIRAUSER299771):
The run with logs https://ggtc.gridgain.com/buildConfiguration/Qa_PocTesterAwsBuildTypeAI3/10704411?hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildDeploymentsSection=false&expandBuildChangesSection=true
> Server after kill does not start and stuck on election
> -------------------------------------------------------
>
> Key: IGNITE-21639
> URL: https://issues.apache.org/jira/browse/IGNITE-21639
> Project: Ignite
> Issue Type: Improvement
> Components: general, networking, platforms
> Affects Versions: 3.0.0-beta1
> Reporter: Igor
> Priority: Major
> Labels: ignite-3
> Attachments: poc-tester-SERVER-192.168.1.117-id-0-2024-02-29-22-56-11-client.log.0
>
>
> *Steps to reproduce:*
> # Start the 3 nodes cluster on different machine each (not in docker).
> # Insert about 500 000 rows across 500 tables. Replication is 3.
> # Kill one node.
> # Start killed node.
> *Expected:*
> The node is started, joined to the cluster and works normally.
> Actual:
> The node stucks on starting with repeating messages like this:
> {code:java}
> 2024-02-29 23:06:21:261 +0300 [INFO][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-ElectionTimer-18][NodeImpl] Unsuccessful election round number 128
> 2024-02-29 23:06:21:261 +0300 [INFO][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-ElectionTimer-18][NodeImpl] Node <154_part_24/poc-tester-SERVER-192.168.1.117-id-0> term 3 start preVote.
> 2024-02-29 23:06:21:282 +0300 [ERROR][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-FSMCaller-Disruptor_stripe_5-0][StripedDisruptor] Handle disruptor event error [name=%poc-tester-SERVER-192.168.1.117-id-0%JRaft-FSMCaller-Disruptor-, event=org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTask@efb699b, hasHandler=false]
> java.lang.AssertionError: Safe time reordering detected [current=112016525904248838, proposed=112016523364991002]
> at org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:169)
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:159)
> at org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:674)
> at org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:557)
> at org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:525)
> at org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:444)
> at org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:136)
> at org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:130)
> at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:266)
> at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:231)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
> at java.base/java.lang.Thread.run(Thread.java:829){code}
>
> [^poc-tester-SERVER-192.168.1.117-id-0-2024-02-29-22-56-11-client.log.0]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)