You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Igor (Jira)" <ji...@apache.org> on 2024/03/21 15:09:00 UTC

[jira] (IGNITE-21639) Server after kill does not start and stuck on election

    [ https://issues.apache.org/jira/browse/IGNITE-21639 ]


    Igor deleted comment on IGNITE-21639:
    -------------------------------

was (Author: JIRAUSER299771):
The run with logs https://ggtc.gridgain.com/buildConfiguration/Qa_PocTesterAwsBuildTypeAI3/10704411?hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildDeploymentsSection=false&expandBuildChangesSection=true

> Server after kill does not start and stuck on election 
> -------------------------------------------------------
>
>                 Key: IGNITE-21639
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21639
>             Project: Ignite
>          Issue Type: Improvement
>          Components: general, networking, platforms
>    Affects Versions: 3.0.0-beta1
>            Reporter: Igor
>            Priority: Major
>              Labels: ignite-3
>         Attachments: poc-tester-SERVER-192.168.1.117-id-0-2024-02-29-22-56-11-client.log.0
>
>
> *Steps to reproduce:*
>  # Start the 3 nodes cluster on different machine each (not in docker).
>  # Insert about 500 000 rows across 500 tables. Replication is 3.
>  # Kill one node.
>  # Start killed node.
> *Expected:*
> The node is started, joined to the cluster and works normally.
> Actual:
> The node stucks on starting with repeating messages like this:
> {code:java}
> 2024-02-29 23:06:21:261 +0300 [INFO][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-ElectionTimer-18][NodeImpl] Unsuccessful election round number 128
> 2024-02-29 23:06:21:261 +0300 [INFO][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-ElectionTimer-18][NodeImpl] Node <154_part_24/poc-tester-SERVER-192.168.1.117-id-0> term 3 start preVote. 
> 2024-02-29 23:06:21:282 +0300 [ERROR][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-FSMCaller-Disruptor_stripe_5-0][StripedDisruptor] Handle disruptor event error [name=%poc-tester-SERVER-192.168.1.117-id-0%JRaft-FSMCaller-Disruptor-, event=org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTask@efb699b, hasHandler=false]
> java.lang.AssertionError: Safe time reordering detected [current=112016525904248838, proposed=112016523364991002]
>     at org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:169)
>     at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
>     at org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:159)
>     at org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:674)
>     at org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:557)
>     at org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:525)
>     at org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:444)
>     at org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:136)
>     at org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:130)
>     at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:266)
>     at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:231)
>     at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
>     at java.base/java.lang.Thread.run(Thread.java:829){code}
>  
> [^poc-tester-SERVER-192.168.1.117-id-0-2024-02-29-22-56-11-client.log.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)