You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ratis.apache.org by 宋子阳 <sz...@163.com> on 2022/04/11 08:24:08 UTC

Bugs related to installSnapshot

Hi folks,

I’ve discovered a bug in installSnapshot RPC handler, causing the follower to reply success where it actually failed.

org.apache.ratis.server.storage.SnapshotManager.java

public void installSnapshot(StateMachine stateMachine,
InstallSnapshotRequestProto request) throws IOException {
...
if (snapshotChunkRequest.getDone()) {
    LOG.info("Install snapshot is done, renaming tnp dir:{} to:{}",
        tmpDir, dir.getStateMachineDir());
    dir.getStateMachineDir().delete(); // Here delete() may fail
    tmpDir.renameTo(dir.getStateMachineDir());
    }
}


After the follower receives the entire snapshot data, it will first store the file in a tmp dir, then renames to StateMachineDir. However, when the StateMachineDir is not empty, delete() will fail, and renamTo() will fail too. Under this scenario, the latest snapshot file will remain in tmp dir and the statemachine cannot fetch the this snapshot.

The StateMachineDir can be non-empty since the old installed snapshots are stored in StateMachineDir and may not be cleaned up due to retention policy, next time when leader want to install snapshot again this circumstance will appear.

Thanks!

William Song
Apache IoTDB 

Re: Bugs related to installSnapshot

Posted by 宋子阳 <sz...@163.com>.
Here is the jira issue: https://issues.apache.org/jira/browse/RATIS-1564, please take a look : )
Thanks

> 2022年4月11日 16:36,Tsz Wo Sze <sz...@gmail.com> 写道:
> 
> Hi William,
> 
> Thanks a lot for reporting the bug.  Could you file a JIRA?
> 
> Tsz-Wo
> 
> 
> On Mon, Apr 11, 2022 at 4:24 PM 宋子阳 <sz...@163.com> wrote:
> 
>> Hi folks,
>> 
>> I’ve discovered a bug in installSnapshot RPC handler, causing the follower
>> to reply success where it actually failed.
>> 
>> org.apache.ratis.server.storage.SnapshotManager.java
>> 
>> public void installSnapshot(StateMachine stateMachine,
>> InstallSnapshotRequestProto request) throws IOException {
>> ...
>> if (snapshotChunkRequest.getDone()) {
>>    LOG.info("Install snapshot is done, renaming tnp dir:{} to:{}",
>>        tmpDir, dir.getStateMachineDir());
>>    dir.getStateMachineDir().delete(); // Here delete() may fail
>>    tmpDir.renameTo(dir.getStateMachineDir());
>>    }
>> }
>> 
>> 
>> After the follower receives the entire snapshot data, it will first store
>> the file in a tmp dir, then renames to StateMachineDir. However, when the
>> StateMachineDir is not empty, delete() will fail, and renamTo() will fail
>> too. Under this scenario, the latest snapshot file will remain in tmp dir
>> and the statemachine cannot fetch the this snapshot.
>> 
>> The StateMachineDir can be non-empty since the old installed snapshots are
>> stored in StateMachineDir and may not be cleaned up due to retention
>> policy, next time when leader want to install snapshot again this
>> circumstance will appear.
>> 
>> Thanks!
>> 
>> William Song
>> Apache IoTDB


Re: Bugs related to installSnapshot

Posted by Tsz Wo Sze <sz...@gmail.com>.
Hi William,

Thanks a lot for reporting the bug.  Could you file a JIRA?

Tsz-Wo


On Mon, Apr 11, 2022 at 4:24 PM 宋子阳 <sz...@163.com> wrote:

> Hi folks,
>
> I’ve discovered a bug in installSnapshot RPC handler, causing the follower
> to reply success where it actually failed.
>
> org.apache.ratis.server.storage.SnapshotManager.java
>
> public void installSnapshot(StateMachine stateMachine,
> InstallSnapshotRequestProto request) throws IOException {
> ...
> if (snapshotChunkRequest.getDone()) {
>     LOG.info("Install snapshot is done, renaming tnp dir:{} to:{}",
>         tmpDir, dir.getStateMachineDir());
>     dir.getStateMachineDir().delete(); // Here delete() may fail
>     tmpDir.renameTo(dir.getStateMachineDir());
>     }
> }
>
>
> After the follower receives the entire snapshot data, it will first store
> the file in a tmp dir, then renames to StateMachineDir. However, when the
> StateMachineDir is not empty, delete() will fail, and renamTo() will fail
> too. Under this scenario, the latest snapshot file will remain in tmp dir
> and the statemachine cannot fetch the this snapshot.
>
> The StateMachineDir can be non-empty since the old installed snapshots are
> stored in StateMachineDir and may not be cleaned up due to retention
> policy, next time when leader want to install snapshot again this
> circumstance will appear.
>
> Thanks!
>
> William Song
> Apache IoTDB