You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vladislav Pyatkov (Jira)" <ji...@apache.org> on 2023/03/28 10:50:00 UTC

[jira] [Updated] (IGNITE-19136) Handling timeout on waiting for replica readiness

     [ https://issues.apache.org/jira/browse/IGNITE-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladislav Pyatkov updated IGNITE-19136:
---------------------------------------
    Summary: Handling timeout on waiting for replica readiness  (was: Handling timeout on waiting for replica rediness)

> Handling timeout on waiting for replica readiness
> -------------------------------------------------
>
>                 Key: IGNITE-19136
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19136
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3
>
> *Motivation*
> There are several reasons by the replica can respond _ReplicaNotReadyException_ (storage recovery has not completed yet, indexes have not created). In this case, required sending AwaitReplicaRequest and don't try requesting any more until AwaitReplicaResponse doesn't be received.
> But the reason is not obvious when we receive a timeout on waiting for the replica readiness. The result is an unhandled exception:
> {noformat}
> Replica is not ready [replicationGroupId=474283c9-a39e-431a-895f-751003052d7a_part_10, nodeName=irott_n_1]
>   at app//org.apache.ignite.internal.replicator.ReplicaManager.sendReplicaUnavailableErrorResponse(ReplicaManager.java:385)
>   at app//org.apache.ignite.internal.replicator.ReplicaManager.onReplicaMessageReceived(ReplicaManager.java:167)
>   at app//org.apache.ignite.network.DefaultMessagingService.onMessage(DefaultMessagingService.java:358)
>   at app//org.apache.ignite.network.DefaultMessagingService.lambda$onMessage$3(DefaultMessagingService.java:314)
>   at java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base@11.0.17/java.lang.Thread.run(Thread.java:834)
> {noformat}
> *Workaround*
> Currently, when we use RW transaction before to use RO transaction, the issue won't be reproduced. Because RW transaction is waiting for the replica becomes ready.
> *Definition of Done*
> All types of request (RW, RO) should handle _ReplicaNotReadyException_ and wait for the replica will be ready to handle them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)