You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Denis Chudov (Jira)" <ji...@apache.org> on 2024/01/18 16:10:00 UTC

[jira] [Created] (IGNITE-21307) Drop the node in case of failure in watch listener

Denis Chudov created IGNITE-21307:
-------------------------------------

             Summary: Drop the node in case of failure in watch listener
                 Key: IGNITE-21307
                 URL: https://issues.apache.org/jira/browse/IGNITE-21307
             Project: Ignite
          Issue Type: Improvement
            Reporter: Denis Chudov


For the linearized watch processing, we have WatchProcessor#notificationFuture that is rewritten for each revision processing and meta storage safe time advance. If some watch processor completes exceptionally, this means that no further updates will be processed, because they need the previous updates to be processed successfully. This is implemented in futures chaining like this:

 
{code:java}
notificationFuture = notificationFuture
        .thenRunAsync(() -> revisionCallback.onSafeTimeAdvanced(time), watchExecutor)
        .whenComplete((ignored, e) -> {
            if (e != null) {
                LOG.error("Error occurred when notifying safe time advanced callback", e);
            }
        }); {code}
For now, we dont have any failure handing of exceptionally completed notification future. It leads to the endless log records with the same exception's stack trace, caused by meta storage safe time advances:

 
{code:java}
[2024-01-16T21:42:35,515][ERROR][%isot_n_0%JRaft-FSMCaller-Disruptor-metastorage-_stripe_0-0][WatchProcessor] Error occurred when notifying safe time advanced callback
java.util.concurrent.CompletionException: org.apache.ignite.internal.lang.IgniteInternalException: IGN-CMN-65535 TraceId:3877e098-6a1b-4f30-88a8-a4c13411d573 Peers are not ready [groupId=5_part_0]
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?]
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?]
    at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?]
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.apache.ignite.internal.lang.IgniteInternalException: Peers are not ready [groupId=5_part_0]
    at org.apache.ignite.internal.raft.RaftGroupServiceImpl.randomNode(RaftGroupServiceImpl.java:725) ~[ignite-raft-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.raft.RaftGroupServiceImpl.randomNode(RaftGroupServiceImpl.java:709) ~[ignite-raft-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.raft.RaftGroupServiceImpl.refreshLeader(RaftGroupServiceImpl.java:234) ~[ignite-raft-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.raft.RaftGroupServiceImpl.start(RaftGroupServiceImpl.java:190) ~[ignite-raft-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.raft.client.TopologyAwareRaftGroupService.start(TopologyAwareRaftGroupService.java:187) ~[ignite-replicator-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.raft.client.TopologyAwareRaftGroupServiceFactory.startRaftGroupService(TopologyAwareRaftGroupServiceFactory.java:73) ~[ignite-replicator-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupService(Loza.java:350) ~[ignite-raft-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.table.distributed.TableManager.lambda$startPartitionAndStartClient$27(TableManager.java:917) ~[ignite-table-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:827) ~[ignite-core-9.0.127-SNAPSHOT.jar:?]
    at org.apache.ignite.internal.table.distributed.TableManager.lambda$startPartitionAndStartClient$28(TableManager.java:913) ~[ignite-table-9.0.127-SNAPSHOT.jar:?]
    at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?]
    ... 4 more {code}
So, the node can't operate properly and just produces tons of logs. Such nodes should be halted.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)