You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ratis.apache.org by William Song <sz...@163.com> on 2022/10/20 13:46:10 UTC

Does Ratis require all snapshot files be stored under statemachine dir?

Hi,

IoTDB StateMachine manages both in-memory states and data files persisted on disk. These data files are stored in another directory and even in another disk. When asked to take a snapshot, instead of copying these data files to the statemachine dir(which consumes lots of unnecessary disk space and time), statemachine only records absolute paths linking to these files.

When leader installs snapshot to follower, the statemachine will provide the absolute paths of all data files in FileInfo. The leader seems to assume that all these files are under statemachine dir, and calculates its relative path to sm and uses its as filename [1]. This will produce messing relative paths like ../../../root/disk2/data/a.tsfile.

The receiving follower uses the relative filename to create files holding the incoming snapshot chunks. As a result, the file may be names as sm/tmp/snapshot-uuid/../../../root/dik2/data/a.tsfile, sitting outside the tmp dir. Later the rename operation won’t place these files to the correct place and the statemachine can’t find the snapshot after InstallSnapshot.

What’s the proper solution to handle this?

Regards,
William


[1] https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/FileChunkReader.java#L55

[2] https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/SnapshotManager.java#L90


Re: Does Ratis require all snapshot files be stored under statemachine dir?

Posted by William Song <sz...@163.com>.
Hi Tsz-Wo,

It seems the key part is 'internal mechanism’. I just checked Ozone and found that SCMStateMachine would download snapshot from leader using gRPC. It seems that NotifyInstallSnapshot provides great flexibility while leaves the challenges of implementation details.

I’ll start a discussion in IoTDB to see if we shall also adopt this mechanism. Thanks for your reply!

William


> 2022年10月21日 14:31,Tsz Wo Sze <sz...@gmail.com> 写道:
> 
> The leader sends an InstallSnapshotNotification to the new follower.   The new follower will call StateMachine.followerEvent().notifyInstallSnapshotFromLeader(..) [1].  Then, the state machine transfers a snapshot from another node using an internal mechanism.
> 
> [1] https://github.com/apache/ratis/blob/master/ratis-server-api/src/main/java/org/apache/ratis/statemachine/StateMachine.java#L259 <https://github.com/apache/ratis/blob/master/ratis-server-api/src/main/java/org/apache/ratis/statemachine/StateMachine.java#L259>
> 
> Tsz-Wo
> 
> On Fri, Oct 21, 2022 at 12:21 PM William Song <szywilliam@163.com <ma...@163.com>> wrote:
> Hi Tsz-Wo,
> 
> If Ozone disables InstallSnapshot, how should leader sync data when a fresh new follower starts up and joins the cluster while the leader does not have the logs before latest snapshot?
> 
> William
> 
>> 2022年10月21日 11:53,Tsz Wo Sze <szetszwo@gmail.com <ma...@gmail.com>> 写道:
>> 
>> Hi William,
>> 
>> The disk layout of IoTDB looks similar to Ozone.  Unfortunately, it is hard to have a general mechanism to handle such a complicated layout.  Also, when the data size is large, we may want to have incremental snapshots instead of transferring everything.
>> 
>> Does IoTDB have a large snapshot size?  
>> 
>> Ozone disables InstallSnapshot (raft.server.log.appender.install.snapshot.enabled) and uses InstallSnapshotNotification.  Then, the leader only notifies followers to install a snapshot.  The followers use its own mechanism to install a snapshot.
>> 
>> Currently, Ratis assumes all the directories (raft log, snapshot & tmp) are under the same root directory.  We may support a more flexible directory layout (e.g. allowing directories under different roots) if it helps.
>> 
>> Tsz-Wo
>> 
>> 
>> On Thu, Oct 20, 2022 at 9:46 PM William Song <szywilliam@163.com <ma...@163.com>> wrote:
>> Hi,
>> 
>> IoTDB StateMachine manages both in-memory states and data files persisted on disk. These data files are stored in another directory and even in another disk. When asked to take a snapshot, instead of copying these data files to the statemachine dir(which consumes lots of unnecessary disk space and time), statemachine only records absolute paths linking to these files.
>> 
>> When leader installs snapshot to follower, the statemachine will provide the absolute paths of all data files in FileInfo. The leader seems to assume that all these files are under statemachine dir, and calculates its relative path to sm and uses its as filename [1]. This will produce messing relative paths like ../../../root/disk2/data/a.tsfile.
>> 
>> The receiving follower uses the relative filename to create files holding the incoming snapshot chunks. As a result, the file may be names as sm/tmp/snapshot-uuid/../../../root/dik2/data/a.tsfile, sitting outside the tmp dir. Later the rename operation won’t place these files to the correct place and the statemachine can’t find the snapshot after InstallSnapshot.
>> 
>> What’s the proper solution to handle this?
>> 
>> Regards,
>> William
>> 
>> 
>> [1] https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/FileChunkReader.java#L55 <https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/FileChunkReader.java#L55>
>> 
>> [2] https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/SnapshotManager.java#L90 <https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/SnapshotManager.java#L90>
>> 
> 


Re: Does Ratis require all snapshot files be stored under statemachine dir?

Posted by Tsz Wo Sze <sz...@gmail.com>.
The leader sends an InstallSnapshotNotification to the new follower.   The
new follower will call
StateMachine.followerEvent().notifyInstallSnapshotFromLeader(..) [1].
Then, the state machine transfers a snapshot from another node using an
internal mechanism.

[1]
https://github.com/apache/ratis/blob/master/ratis-server-api/src/main/java/org/apache/ratis/statemachine/StateMachine.java#L259

Tsz-Wo

On Fri, Oct 21, 2022 at 12:21 PM William Song <sz...@163.com> wrote:

> Hi Tsz-Wo,
>
> If Ozone disables InstallSnapshot, how should leader sync data when a
> fresh new follower starts up and joins the cluster while the leader does
> not have the logs before latest snapshot?
>
> William
>
> 2022年10月21日 11:53,Tsz Wo Sze <sz...@gmail.com> 写道:
>
> Hi William,
>
> The disk layout of IoTDB looks similar to Ozone.  Unfortunately, it is
> hard to have a general mechanism to handle such a complicated layout.
> Also, when the data size is large, we may want to have incremental
> snapshots instead of transferring everything.
>
> Does IoTDB have a large snapshot size?
>
> Ozone disables InstallSnapshot
> (raft.server.log.appender.install.snapshot.enabled) and
> uses InstallSnapshotNotification.  Then, the leader only notifies followers
> to install a snapshot.  The followers use its own mechanism to install a
> snapshot.
>
> Currently, Ratis assumes all the directories (raft log, snapshot & tmp)
> are under the same root directory.  We may support a more flexible
> directory layout (e.g. allowing directories under different roots) if it
> helps.
>
> Tsz-Wo
>
>
> On Thu, Oct 20, 2022 at 9:46 PM William Song <sz...@163.com> wrote:
>
>> Hi,
>>
>> IoTDB StateMachine manages both in-memory states and data files persisted
>> on disk. These data files are stored in another directory and even in
>> another disk. When asked to take a snapshot, instead of copying these data
>> files to the statemachine dir(which consumes lots of unnecessary disk space
>> and time), statemachine only records absolute paths linking to these files.
>>
>> When leader installs snapshot to follower, the statemachine will provide
>> the absolute paths of all data files in FileInfo. The leader seems to
>> assume that all these files are under statemachine dir, and calculates its
>> relative path to sm and uses its as filename [1]. This will produce messing
>> relative paths like ../../../root/disk2/data/a.tsfile.
>>
>> The receiving follower uses the relative filename to create files holding
>> the incoming snapshot chunks. As a result, the file may be names as
>> sm/tmp/snapshot-uuid/../../../root/dik2/data/a.tsfile, sitting outside the
>> tmp dir. Later the rename operation won’t place these files to the correct
>> place and the statemachine can’t find the snapshot after InstallSnapshot.
>>
>> What’s the proper solution to handle this?
>>
>> Regards,
>> William
>>
>>
>> [1]
>> https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/FileChunkReader.java#L55
>>
>> [2]
>> https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/SnapshotManager.java#L90
>>
>>
>

Re: Does Ratis require all snapshot files be stored under statemachine dir?

Posted by William Song <sz...@163.com>.
Hi Tsz-Wo,

If Ozone disables InstallSnapshot, how should leader sync data when a fresh new follower starts up and joins the cluster while the leader does not have the logs before latest snapshot?

William

> 2022年10月21日 11:53,Tsz Wo Sze <sz...@gmail.com> 写道:
> 
> Hi William,
> 
> The disk layout of IoTDB looks similar to Ozone.  Unfortunately, it is hard to have a general mechanism to handle such a complicated layout.  Also, when the data size is large, we may want to have incremental snapshots instead of transferring everything.
> 
> Does IoTDB have a large snapshot size?  
> 
> Ozone disables InstallSnapshot (raft.server.log.appender.install.snapshot.enabled) and uses InstallSnapshotNotification.  Then, the leader only notifies followers to install a snapshot.  The followers use its own mechanism to install a snapshot.
> 
> Currently, Ratis assumes all the directories (raft log, snapshot & tmp) are under the same root directory.  We may support a more flexible directory layout (e.g. allowing directories under different roots) if it helps.
> 
> Tsz-Wo
> 
> 
> On Thu, Oct 20, 2022 at 9:46 PM William Song <szywilliam@163.com <ma...@163.com>> wrote:
> Hi,
> 
> IoTDB StateMachine manages both in-memory states and data files persisted on disk. These data files are stored in another directory and even in another disk. When asked to take a snapshot, instead of copying these data files to the statemachine dir(which consumes lots of unnecessary disk space and time), statemachine only records absolute paths linking to these files.
> 
> When leader installs snapshot to follower, the statemachine will provide the absolute paths of all data files in FileInfo. The leader seems to assume that all these files are under statemachine dir, and calculates its relative path to sm and uses its as filename [1]. This will produce messing relative paths like ../../../root/disk2/data/a.tsfile.
> 
> The receiving follower uses the relative filename to create files holding the incoming snapshot chunks. As a result, the file may be names as sm/tmp/snapshot-uuid/../../../root/dik2/data/a.tsfile, sitting outside the tmp dir. Later the rename operation won’t place these files to the correct place and the statemachine can’t find the snapshot after InstallSnapshot.
> 
> What’s the proper solution to handle this?
> 
> Regards,
> William
> 
> 
> [1] https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/FileChunkReader.java#L55 <https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/FileChunkReader.java#L55>
> 
> [2] https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/SnapshotManager.java#L90 <https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/SnapshotManager.java#L90>
> 


Re: Does Ratis require all snapshot files be stored under statemachine dir?

Posted by Tsz Wo Sze <sz...@gmail.com>.
Hi William,

The disk layout of IoTDB looks similar to Ozone.  Unfortunately, it is hard
to have a general mechanism to handle such a complicated layout.  Also,
when the data size is large, we may want to have incremental snapshots
instead of transferring everything.

Does IoTDB have a large snapshot size?

Ozone disables InstallSnapshot
(raft.server.log.appender.install.snapshot.enabled) and
uses InstallSnapshotNotification.  Then, the leader only notifies followers
to install a snapshot.  The followers use its own mechanism to install a
snapshot.

Currently, Ratis assumes all the directories (raft log, snapshot & tmp) are
under the same root directory.  We may support a more flexible directory
layout (e.g. allowing directories under different roots) if it helps.

Tsz-Wo


On Thu, Oct 20, 2022 at 9:46 PM William Song <sz...@163.com> wrote:

> Hi,
>
> IoTDB StateMachine manages both in-memory states and data files persisted
> on disk. These data files are stored in another directory and even in
> another disk. When asked to take a snapshot, instead of copying these data
> files to the statemachine dir(which consumes lots of unnecessary disk space
> and time), statemachine only records absolute paths linking to these files.
>
> When leader installs snapshot to follower, the statemachine will provide
> the absolute paths of all data files in FileInfo. The leader seems to
> assume that all these files are under statemachine dir, and calculates its
> relative path to sm and uses its as filename [1]. This will produce messing
> relative paths like ../../../root/disk2/data/a.tsfile.
>
> The receiving follower uses the relative filename to create files holding
> the incoming snapshot chunks. As a result, the file may be names as
> sm/tmp/snapshot-uuid/../../../root/dik2/data/a.tsfile, sitting outside the
> tmp dir. Later the rename operation won’t place these files to the correct
> place and the statemachine can’t find the snapshot after InstallSnapshot.
>
> What’s the proper solution to handle this?
>
> Regards,
> William
>
>
> [1]
> https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/FileChunkReader.java#L55
>
> [2]
> https://github.com/apache/ratis/blob/75a1071a3c62a5a1a09c356cc1cdf281cc506baf/ratis-server/src/main/java/org/apache/ratis/server/storage/SnapshotManager.java#L90
>
>