You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Xinyu Tan (Jira)" <ji...@apache.org> on 2023/05/13 03:00:00 UTC
[jira] [Updated] (RATIS-1841) Fixed bug where cluster restart failed to transfer snapshot
[ https://issues.apache.org/jira/browse/RATIS-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xinyu Tan updated RATIS-1841:
-----------------------------
Attachment: screenshot-1.png
> Fixed bug where cluster restart failed to transfer snapshot
> -----------------------------------------------------------
>
> Key: RATIS-1841
> URL: https://issues.apache.org/jira/browse/RATIS-1841
> Project: Ratis
> Issue Type: Bug
> Affects Versions: 2.4.1, 2.5.0
> Reporter: Xinyu Tan
> Assignee: Xinyu Tan
> Priority: Major
> Attachments: image-2023-05-13-10-50-14-537.png, screenshot-1.png
>
>
> Hi, We have discovered that the [problem|https://issues.apache.org/jira/browse/RATIS-1838] we reported earlier is still recurring. The problem is that a multi-replica cluster may fail to restart after transfering a outdated snapshot
> Upon further investigation, we found that the issue originates from the resetClient function. Specifically, there is a flaw in its logic that causes it to incorrectly set the nextIndex of followers to 0, which leads to the error message shown in the attached screenshot.
> !image-2023-05-13-10-50-14-537.png!
> Upon reviewing the code, we determined that the issue arose only after merging the [PR|https://github.com/apache/ratis/pull/805]. Surprisingly, the code was correct prior to merging.
> After investigating further, we determined that the solution was to remove the index judgment, as the conditions onError and request == null were sufficient to encompass the required test conditions.
> PTAL~[~szetszwo]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)