You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Roman Puchkovskiy (Jira)" <ji...@apache.org> on 2022/11/03 10:17:00 UTC

[jira] [Created] (IGNITE-18077) Handle 'RAFT log abruptly deleted' scenario

Roman Puchkovskiy created IGNITE-18077:
------------------------------------------

             Summary: Handle 'RAFT log abruptly deleted' scenario
                 Key: IGNITE-18077
                 URL: https://issues.apache.org/jira/browse/IGNITE-18077
             Project: Ignite
          Issue Type: Improvement
            Reporter: Roman Puchkovskiy
             Fix For: 3.0.0-beta2


The following case is possible: some writes happen to storage (via RAFT infrastructure), then Ignite is stopped, RAFT log is deleted (or its FS partition is simply unmounted) and node is started again. When it starts, it sees no RAFT log, so, according to RAFT semantics, it might think that this is a shiny fresh node that just entered the RAFT group, reset index to 0 and request log/snapshot from a leader, while in reality it has some data in its state machine storage and the correct action would be to remount the FS partition with RAFT log.

So, we need a special handling for situations when there is no RAFT log (at all), but the main storage reports that its persistedIndex is non-zero. One option would be to put the whole Ignite node in a Maintenance node (or its equivalent).

Please note that the described situation is different from the 'fresh node' scenario where RAFT log is absent, but main storage's persistedIndex is 0. This is a normal situation when joining a RAFT group for the first time, no special handling is needed (it's handled by JRaft).

A design is needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)