You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Roman Puchkovskiy (Jira)" <ji...@apache.org> on 2023/01/03 14:38:00 UTC

[jira] [Created] (IGNITE-18495) Fix RAFT snapshot installation hang due to response swap on retry

Roman Puchkovskiy created IGNITE-18495:
------------------------------------------

             Summary: Fix RAFT snapshot installation hang due to response swap on retry
                 Key: IGNITE-18495
                 URL: https://issues.apache.org/jira/browse/IGNITE-18495
             Project: Ignite
          Issue Type: Bug
            Reporter: Roman Puchkovskiy
            Assignee: Roman Puchkovskiy
             Fix For: 3.0.0-beta2


The scenario follows:
 # InstallSnapshot request is sent, its processing starts hanging forever (it will be cancelled on step 3)
 # After a timeout, second InstallSnapshot request is sent with same index+term as the first had; in JRaft, it causes a special handling (previous request processing is NOT cancelled)
 # After a timeout, third InstallSnapshot request is sent with DIFFERENT index, so it cancels the first snapshot processing effectively unblocking the first thread

In the original JRaft implementation, after being unblocked, the first thread fails to clean up, so subsequent retries will always see a phantom of an unfinished snapshot, so the snapshotting process will be jammed. Also, node stop might stuck because one 'download' task will remain unfinished forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)