You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Konstantin Orlov (Jira)" <ji...@apache.org> on 2023/03/22 15:56:00 UTC

[jira] [Commented] (IGNITE-19095) Cyclic retry of ActionRequest in RaftGroupServiceImpl

    [ https://issues.apache.org/jira/browse/IGNITE-19095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703723#comment-17703723 ] 

Konstantin Orlov commented on IGNITE-19095:
-------------------------------------------

Perhaps, I'm wrong about root cause, but the problem is real. Please see attached log [^log_pollution.txt]. It's polluted with exception {{{}[RaftGroupServiceImpl] Recoverable error during the request{}}}.

> Cyclic retry of ActionRequest in RaftGroupServiceImpl
> -----------------------------------------------------
>
>                 Key: IGNITE-19095
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19095
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Konstantin Orlov
>            Priority: Critical
>              Labels: ignite-3
>         Attachments: log_pollution.txt
>
>
> Please take a look at the following snippet:
> {code:java}
> private void handleThrowable(
>            ...
>     ) {
>         if (recoverable(err)) {
>             ...
>             scheduleRetry(() -> sendWithRetry(randomNode(peer), requestFactory, stopTime, fut));
>         } else {
>             fut.completeExceptionally(err);
>         }
>     }
> {code}
> In case of a recoverable error, the request will be sent once again. But if 2 out of 3 nodes had already been stopped, this retry logic will stuck in an infinite loop. The reason is that ConnectException is considered recoverable, and we are choosing another node keeping in mind only the node that had failed during current iteration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)