You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Xinyu Tan (Jira)" <ji...@apache.org> on 2023/05/13 02:58:00 UTC
[jira] [Created] (RATIS-1841) Fixed bug where cluster restart failed to transfer snapshot
Xinyu Tan created RATIS-1841:
--------------------------------
Summary: Fixed bug where cluster restart failed to transfer snapshot
Key: RATIS-1841
URL: https://issues.apache.org/jira/browse/RATIS-1841
Project: Ratis
Issue Type: Bug
Affects Versions: 2.5.0, 2.4.1
Reporter: Xinyu Tan
Assignee: Xinyu Tan
Attachments: image-2023-05-13-10-50-14-537.png
Hi, We have discovered that the [problem|https://issues.apache.org/jira/browse/RATIS-1838] we reported earlier is still recurring. The problem is that a multi-replica cluster may fail to restart after transfering a outdated snapshot
Upon further investigation, we found that the issue originates from the resetClient function. Specifically, there is a flaw in its logic that causes it to incorrectly set the nextIndex of followers to 0, which leads to the error message shown in the attached screenshot.
!image-2023-05-13-10-50-14-537.png!
Upon reviewing the code, we determined that the issue arose only after merging the [PR|https://github.com/apache/ratis/pull/805]. Surprisingly, the code was correct prior to merging.
After investigating further, we determined that the solution was to remove the index judgment, as the conditions onError and request == null were sufficient to encompass the required test conditions.
PTAL~[~szetszwo]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)