You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ratis.apache.org by Asad Awadia <as...@gmail.com> on 2021/12/14 19:39:44 UTC

Raft log persistance

Hello,

Is ratis keeping a copy of all the writes made? On top of what is persisted in my own DB? So ~2x storage used

I read that the file system example avoids that but I don't see how that is being done in the code?

Regards,
Asad

Re: Raft log persistance

Posted by Asad Awadia <as...@gmail.com>.
Perfect understood completely now.

Thank you so much!
________________________________
From: Tsz Wo Sze <sz...@gmail.com>
Sent: Tuesday, December 21, 2021 12:22:17 AM
To: user@ratis.apache.org <us...@ratis.apache.org>
Subject: Re: Raft log persistance

The application defines its data format, which is opaque to Ratis.  For example,

1) the application client may use BlockingApi.send(message) to send a message as a ByteString.  In this example, the message contains a header and a body but this format is opaque to Ratis.

2) Ratis packs the message as a RaftClientRequest and then delivers the request to the leader.

3) The leader involves StateMachine.startTransaction(request).  The StateMachine implementation should use the request to build a TransactionContext.  It may choose to separate the state machine data by calling setLogData(..) and setStateMachineData(..) in TransactionContext.Builder.  In this example, the StateMachine implementation calls setLogData(header) and setStateMachineData(body).  Note that the StateMachine implementation is a part of the application so that it knows the message format.  See also FileStoreStateMachine.startTransaction(..).

Tsz-Wo


On Mon, Dec 20, 2021 at 11:20 PM Asad Awadia <as...@gmail.com>> wrote:
Ok I understand now.

How do you specify what goes where? Since the raft message is just a byte string and can't specify what is what.

In the filestoreclient it sends the bytes of a proto and in the statemachine the code just assumes that the logdata and statemachinedata are in the right places?

How is that being done?
________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Monday, December 20, 2021 1:07:55 AM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance

> ...  what is the difference between StateMachineLogEntryProto#logData and  StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?

StateMachineLogEntryProto#logData is the data that will be written to RaftLog.  In contrast, StateMachineLogEntryProto#stateMachineEntry#stateMachineData won't be written to the RaftLog.  The state machine is going to store it.

> In the counter example the log data is the 'increment request' and in filestore the stateMachineData is the stateMachineLogEntry is being used?

In the counter example, the log data contains all the data in the request and the request is just the "increment request".

In filestore, a client may send a request to write a file containing data D to a particular path P.  The stateMachineData is the file data D and the log data is the meta data such as file path P.

> What should be used where/when?

StateMachineLogEntryProto#logData is the logData specified in the RAFT algorithm.

StateMachineLogEntryProto#stateMachineEntry#stateMachineData is the data that the state machine wants to store the data itself only but not in the RaftLog in order to avoid storing two copies of data.

Please feel free to let me know if you have more questions.

Tsz-Wo


On Sat, Dec 18, 2021 at 11:42 PM Asad Awadia <as...@gmail.com>> wrote:
I think I am starting to understand.


Reading the raft.proto file - what is the difference between StateMachineLogEntryProto#logData and  StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?

In the counter example the log data is the 'increment request' and in filestore the stateMachineData is the stateMachineLogEntry is being used?

What should be used where/when?
________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Saturday, December 18, 2021 1:36:18 AM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance

The file store example is just a simple example.  It does not intend to support all the features.  It indeed does not work for the case that the data is deleted -- it will fail to read the file.

You are right that, for real applications supporting multiple versions of data, it should retrieve the data from the version corresponding to the given log index.

Tsz-Wo


On Sat, Dec 18, 2021 at 1:05 PM Asad Awadia <as...@gmail.com>> wrote:
In file store example read method - it looks like it is returning the file at the path without taking the raft index in consideration i.e it would keep returning the same file for many entry indices provided the file hadn't changed?

What is going on there? Shouldn't it return the data snapshot at instant entry.getIndex?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Wednesday, December 15, 2021 7:37:20 PM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance


> If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?

The write method should write the stateMachineData (i.e. LogEntryProto.getStateMachineLogEntry().getStateMachineEntry().getStateMachineData()) to its storage.  The stateMachineData will be removed from the logEntryProto.   The logEntryProto with removed stateMachineData will be written to the raft log.  Yes, stateMachineData won't be written to the raft log.

> How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

When reading from the raft log, the server in addition calls the DataApi.read(LogEntryProto) method.  The state machine should return back the stateMachineData corresponding to the given logEntryProto as a CompletableFuture<ByteString>.  Then, the server will add the stateMachineData back to the logEntryProto and then send the logEntryProto to the other servers.

Tsz-Wo


On Thu, Dec 16, 2021 at 3:13 AM Asad Awadia <as...@gmail.com>> wrote:
I read through the file store example.

If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?


How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Tuesday, December 14, 2021 8:16:23 PM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance

Hi Asad,

By default, all the write requests (including the data inside) are recorded in the raft log.  This is specified in the standard RAFT algorithm so that any server can replay the log in order to sync up to the current state.  If the state machine keeps another persistent copy of the data, you are right that there will be two copies.  In this sense, the standard RAFT algorithm is not suitable for data intensive applications.

In Ratis, we want to support data intensive applications such as Apache Ozone.  State machine implementations may choose to manage the data itself.  In that case, the data won't be recorded in the raft log.  In order to use this feature, state machine implementations must override the read and write methods defined in StateMachine.DataApi.  The FileStore example indeed has overridden these two methods.

Hope it helps.
Tsz-Wo


On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com>> wrote:
Hello,

Is ratis keeping a copy of all the writes made? On top of what is persisted in my own DB? So ~2x storage used

I read that the file system example avoids that but I don't see how that is being done in the code?

Regards,
Asad

Re: Raft log persistance

Posted by Tsz Wo Sze <sz...@gmail.com>.
The application defines its data format, which is opaque to Ratis.  For
example,

1) the application client may use BlockingApi.send(message) to send a
message as a ByteString.  In this example, the message contains a header
and a body but this format is opaque to Ratis.

2) Ratis packs the message as a RaftClientRequest and then delivers the
request to the leader.

3) The leader involves StateMachine.startTransaction(request).  The
StateMachine implementation should use the request to build a
TransactionContext.  It may choose to separate the state machine data by
calling setLogData(..) and setStateMachineData(..) in
TransactionContext.Builder.  In this example, the StateMachine
implementation calls setLogData(header) and setStateMachineData(body).
Note that the StateMachine implementation is a part of the application so
that it knows the message format.  See also
FileStoreStateMachine.startTransaction(..).

Tsz-Wo


On Mon, Dec 20, 2021 at 11:20 PM Asad Awadia <as...@gmail.com> wrote:

> Ok I understand now.
>
> How do you specify what goes where? Since the raft message is just a byte
> string and can't specify what is what.
>
> In the filestoreclient it sends the bytes of a proto and in the
> statemachine the code just assumes that the logdata and statemachinedata
> are in the right places?
>
> How is that being done?
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Monday, December 20, 2021 1:07:55 AM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
> > ...  what is the difference between StateMachineLogEntryProto#logData
> and  StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?
>
> StateMachineLogEntryProto#logData is the data that will be written to
> RaftLog.  In contrast, StateMachineLogEntryProto#stateMachineEntry#stateMachineData
> won't be written to the RaftLog.  The state machine is going to store it.
>
> > In the counter example the log data is the 'increment request' and in
> filestore the stateMachineData is the stateMachineLogEntry is being used?
>
> In the counter example, the log data contains all the data in the request
> and the request is just the "increment request".
>
> In filestore, a client may send a request to write a file containing data
> D to a particular path P.  The stateMachineData is the file data D and the
> log data is the meta data such as file path P.
>
> > What should be used where/when?
>
> StateMachineLogEntryProto#logData is the logData specified in the RAFT
> algorithm.
>
> StateMachineLogEntryProto#stateMachineEntry#stateMachineData is the data
> that the state machine wants to store the data itself only but not in
> the RaftLog in order to avoid storing two copies of data.
>
> Please feel free to let me know if you have more questions.
>
> Tsz-Wo
>
>
> On Sat, Dec 18, 2021 at 11:42 PM Asad Awadia <as...@gmail.com> wrote:
>
> I think I am starting to understand.
>
>
> Reading the raft.proto file - what is the difference between
> StateMachineLogEntryProto#logData and
> StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?
>
> In the counter example the log data is the 'increment request' and in
> filestore the stateMachineData is the stateMachineLogEntry is being used?
>
> What should be used where/when?
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Saturday, December 18, 2021 1:36:18 AM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
> The file store example is just a simple example.  It does not intend to
> support all the features.  It indeed does not work for the case that the
> data is deleted -- it will fail to read the file.
>
> You are right that, for real applications supporting multiple versions of
> data, it should retrieve the data from the version corresponding to the
> given log index.
>
> Tsz-Wo
>
>
> On Sat, Dec 18, 2021 at 1:05 PM Asad Awadia <as...@gmail.com> wrote:
>
> In file store example read method - it looks like it is returning the file
> at the path without taking the raft index in consideration i.e it would
> keep returning the same file for many entry indices provided the file
> hadn't changed?
>
> What is going on there? Shouldn't it return the data snapshot at instant
> entry.getIndex?
>
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Wednesday, December 15, 2021 7:37:20 PM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
>
> > If I understand correctly,  in the write method we should take the
> request and apply it right there in the method into the state machine? So
> it only gets persisted into the state machine and not the raft log?
>
> The write method should write the stateMachineData
> (i.e. LogEntryProto.getStateMachineLogEntry().getStateMachineEntry().getStateMachineData())
> to its storage.  The stateMachineData will be removed from the
> logEntryProto.   The logEntryProto with removed stateMachineData will be
> written to the raft log.  Yes, stateMachineData won't be written to the
> raft log.
>
> > How will syncing up and backups work then? If there is no 'raft log'
> anymore since we are overriding the write method to only apply writes to
> the state machine. How will the other servers be able to sync and catch up?
>
> When reading from the raft log, the server in addition calls the
> DataApi.read(LogEntryProto) method.  The state machine should return back
> the stateMachineData corresponding to the given logEntryProto as a
> CompletableFuture<ByteString>.  Then, the server will add
> the stateMachineData back to the logEntryProto and then send
> the logEntryProto to the other servers.
>
> Tsz-Wo
>
>
> On Thu, Dec 16, 2021 at 3:13 AM Asad Awadia <as...@gmail.com> wrote:
>
> I read through the file store example.
>
> If I understand correctly,  in the write method we should take the request
> and apply it right there in the method into the state machine? So it only
> gets persisted into the state machine and not the raft log?
>
>
> How will syncing up and backups work then? If there is no 'raft log'
> anymore since we are overriding the write method to only apply writes to
> the state machine. How will the other servers be able to sync and catch up?
>
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Tuesday, December 14, 2021 8:16:23 PM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
> Hi Asad,
>
> By default, all the write requests (including the data inside) are
> recorded in the raft log.  This is specified in the standard RAFT algorithm
> so that any server can replay the log in order to sync up to the current
> state.  If the state machine keeps another persistent copy of the data, you
> are right that there will be two copies.  In this sense, the standard RAFT
> algorithm is not suitable for data intensive applications.
>
> In Ratis, we want to support data intensive applications such as Apache
> Ozone.  State machine implementations may choose to manage the data
> itself.  In that case, the data won't be recorded in the raft log.  In
> order to use this feature, state machine implementations must override the
> read and write methods defined in StateMachine.DataApi.  The FileStore
> example indeed has overridden these two methods.
>
> Hope it helps.
> Tsz-Wo
>
>
> On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com> wrote:
>
> Hello,
>
> Is ratis keeping a copy of all the writes made? On top of what is
> persisted in my own DB? So ~2x storage used
>
> I read that the file system example avoids that but I don't see how that
> is being done in the code?
>
> Regards,
> Asad
>
>

Re: Raft log persistance

Posted by Asad Awadia <as...@gmail.com>.
Ok I understand now.

How do you specify what goes where? Since the raft message is just a byte string and can't specify what is what.

In the filestoreclient it sends the bytes of a proto and in the statemachine the code just assumes that the logdata and statemachinedata are in the right places?

How is that being done?
________________________________
From: Tsz Wo Sze <sz...@gmail.com>
Sent: Monday, December 20, 2021 1:07:55 AM
To: user@ratis.apache.org <us...@ratis.apache.org>
Subject: Re: Raft log persistance

> ...  what is the difference between StateMachineLogEntryProto#logData and  StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?

StateMachineLogEntryProto#logData is the data that will be written to RaftLog.  In contrast, StateMachineLogEntryProto#stateMachineEntry#stateMachineData won't be written to the RaftLog.  The state machine is going to store it.

> In the counter example the log data is the 'increment request' and in filestore the stateMachineData is the stateMachineLogEntry is being used?

In the counter example, the log data contains all the data in the request and the request is just the "increment request".

In filestore, a client may send a request to write a file containing data D to a particular path P.  The stateMachineData is the file data D and the log data is the meta data such as file path P.

> What should be used where/when?

StateMachineLogEntryProto#logData is the logData specified in the RAFT algorithm.

StateMachineLogEntryProto#stateMachineEntry#stateMachineData is the data that the state machine wants to store the data itself only but not in the RaftLog in order to avoid storing two copies of data.

Please feel free to let me know if you have more questions.

Tsz-Wo


On Sat, Dec 18, 2021 at 11:42 PM Asad Awadia <as...@gmail.com>> wrote:
I think I am starting to understand.


Reading the raft.proto file - what is the difference between StateMachineLogEntryProto#logData and  StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?

In the counter example the log data is the 'increment request' and in filestore the stateMachineData is the stateMachineLogEntry is being used?

What should be used where/when?
________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Saturday, December 18, 2021 1:36:18 AM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance

The file store example is just a simple example.  It does not intend to support all the features.  It indeed does not work for the case that the data is deleted -- it will fail to read the file.

You are right that, for real applications supporting multiple versions of data, it should retrieve the data from the version corresponding to the given log index.

Tsz-Wo


On Sat, Dec 18, 2021 at 1:05 PM Asad Awadia <as...@gmail.com>> wrote:
In file store example read method - it looks like it is returning the file at the path without taking the raft index in consideration i.e it would keep returning the same file for many entry indices provided the file hadn't changed?

What is going on there? Shouldn't it return the data snapshot at instant entry.getIndex?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Wednesday, December 15, 2021 7:37:20 PM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance


> If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?

The write method should write the stateMachineData (i.e. LogEntryProto.getStateMachineLogEntry().getStateMachineEntry().getStateMachineData()) to its storage.  The stateMachineData will be removed from the logEntryProto.   The logEntryProto with removed stateMachineData will be written to the raft log.  Yes, stateMachineData won't be written to the raft log.

> How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

When reading from the raft log, the server in addition calls the DataApi.read(LogEntryProto) method.  The state machine should return back the stateMachineData corresponding to the given logEntryProto as a CompletableFuture<ByteString>.  Then, the server will add the stateMachineData back to the logEntryProto and then send the logEntryProto to the other servers.

Tsz-Wo


On Thu, Dec 16, 2021 at 3:13 AM Asad Awadia <as...@gmail.com>> wrote:
I read through the file store example.

If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?


How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Tuesday, December 14, 2021 8:16:23 PM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance

Hi Asad,

By default, all the write requests (including the data inside) are recorded in the raft log.  This is specified in the standard RAFT algorithm so that any server can replay the log in order to sync up to the current state.  If the state machine keeps another persistent copy of the data, you are right that there will be two copies.  In this sense, the standard RAFT algorithm is not suitable for data intensive applications.

In Ratis, we want to support data intensive applications such as Apache Ozone.  State machine implementations may choose to manage the data itself.  In that case, the data won't be recorded in the raft log.  In order to use this feature, state machine implementations must override the read and write methods defined in StateMachine.DataApi.  The FileStore example indeed has overridden these two methods.

Hope it helps.
Tsz-Wo


On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com>> wrote:
Hello,

Is ratis keeping a copy of all the writes made? On top of what is persisted in my own DB? So ~2x storage used

I read that the file system example avoids that but I don't see how that is being done in the code?

Regards,
Asad

Re: Raft log persistance

Posted by Tsz Wo Sze <sz...@gmail.com>.
> ...  what is the difference between StateMachineLogEntryProto#logData
and  StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?

StateMachineLogEntryProto#logData is the data that will be written to
RaftLog.  In contrast,
StateMachineLogEntryProto#stateMachineEntry#stateMachineData
won't be written to the RaftLog.  The state machine is going to store it.

> In the counter example the log data is the 'increment request' and in
filestore the stateMachineData is the stateMachineLogEntry is being used?

In the counter example, the log data contains all the data in the request
and the request is just the "increment request".

In filestore, a client may send a request to write a file containing data D
to a particular path P.  The stateMachineData is the file data D and the
log data is the meta data such as file path P.

> What should be used where/when?

StateMachineLogEntryProto#logData is the logData specified in the RAFT
algorithm.

StateMachineLogEntryProto#stateMachineEntry#stateMachineData is the data
that the state machine wants to store the data itself only but not in
the RaftLog in order to avoid storing two copies of data.

Please feel free to let me know if you have more questions.

Tsz-Wo


On Sat, Dec 18, 2021 at 11:42 PM Asad Awadia <as...@gmail.com> wrote:

> I think I am starting to understand.
>
>
> Reading the raft.proto file - what is the difference between
> StateMachineLogEntryProto#logData and
> StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?
>
> In the counter example the log data is the 'increment request' and in
> filestore the stateMachineData is the stateMachineLogEntry is being used?
>
> What should be used where/when?
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Saturday, December 18, 2021 1:36:18 AM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
> The file store example is just a simple example.  It does not intend to
> support all the features.  It indeed does not work for the case that the
> data is deleted -- it will fail to read the file.
>
> You are right that, for real applications supporting multiple versions of
> data, it should retrieve the data from the version corresponding to the
> given log index.
>
> Tsz-Wo
>
>
> On Sat, Dec 18, 2021 at 1:05 PM Asad Awadia <as...@gmail.com> wrote:
>
> In file store example read method - it looks like it is returning the file
> at the path without taking the raft index in consideration i.e it would
> keep returning the same file for many entry indices provided the file
> hadn't changed?
>
> What is going on there? Shouldn't it return the data snapshot at instant
> entry.getIndex?
>
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Wednesday, December 15, 2021 7:37:20 PM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
>
> > If I understand correctly,  in the write method we should take the
> request and apply it right there in the method into the state machine? So
> it only gets persisted into the state machine and not the raft log?
>
> The write method should write the stateMachineData
> (i.e. LogEntryProto.getStateMachineLogEntry().getStateMachineEntry().getStateMachineData())
> to its storage.  The stateMachineData will be removed from the
> logEntryProto.   The logEntryProto with removed stateMachineData will be
> written to the raft log.  Yes, stateMachineData won't be written to the
> raft log.
>
> > How will syncing up and backups work then? If there is no 'raft log'
> anymore since we are overriding the write method to only apply writes to
> the state machine. How will the other servers be able to sync and catch up?
>
> When reading from the raft log, the server in addition calls the
> DataApi.read(LogEntryProto) method.  The state machine should return back
> the stateMachineData corresponding to the given logEntryProto as a
> CompletableFuture<ByteString>.  Then, the server will add
> the stateMachineData back to the logEntryProto and then send
> the logEntryProto to the other servers.
>
> Tsz-Wo
>
>
> On Thu, Dec 16, 2021 at 3:13 AM Asad Awadia <as...@gmail.com> wrote:
>
> I read through the file store example.
>
> If I understand correctly,  in the write method we should take the request
> and apply it right there in the method into the state machine? So it only
> gets persisted into the state machine and not the raft log?
>
>
> How will syncing up and backups work then? If there is no 'raft log'
> anymore since we are overriding the write method to only apply writes to
> the state machine. How will the other servers be able to sync and catch up?
>
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Tuesday, December 14, 2021 8:16:23 PM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
> Hi Asad,
>
> By default, all the write requests (including the data inside) are
> recorded in the raft log.  This is specified in the standard RAFT algorithm
> so that any server can replay the log in order to sync up to the current
> state.  If the state machine keeps another persistent copy of the data, you
> are right that there will be two copies.  In this sense, the standard RAFT
> algorithm is not suitable for data intensive applications.
>
> In Ratis, we want to support data intensive applications such as Apache
> Ozone.  State machine implementations may choose to manage the data
> itself.  In that case, the data won't be recorded in the raft log.  In
> order to use this feature, state machine implementations must override the
> read and write methods defined in StateMachine.DataApi.  The FileStore
> example indeed has overridden these two methods.
>
> Hope it helps.
> Tsz-Wo
>
>
> On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com> wrote:
>
> Hello,
>
> Is ratis keeping a copy of all the writes made? On top of what is
> persisted in my own DB? So ~2x storage used
>
> I read that the file system example avoids that but I don't see how that
> is being done in the code?
>
> Regards,
> Asad
>
>

Re: Raft log persistance

Posted by Asad Awadia <as...@gmail.com>.
I think I am starting to understand.


Reading the raft.proto file - what is the difference between StateMachineLogEntryProto#logData and  StateMachineLogEntryProto#stateMachineEntry#stateMachineData ?

In the counter example the log data is the 'increment request' and in filestore the stateMachineData is the stateMachineLogEntry is being used?

What should be used where/when?
________________________________
From: Tsz Wo Sze <sz...@gmail.com>
Sent: Saturday, December 18, 2021 1:36:18 AM
To: user@ratis.apache.org <us...@ratis.apache.org>
Subject: Re: Raft log persistance

The file store example is just a simple example.  It does not intend to support all the features.  It indeed does not work for the case that the data is deleted -- it will fail to read the file.

You are right that, for real applications supporting multiple versions of data, it should retrieve the data from the version corresponding to the given log index.

Tsz-Wo


On Sat, Dec 18, 2021 at 1:05 PM Asad Awadia <as...@gmail.com>> wrote:
In file store example read method - it looks like it is returning the file at the path without taking the raft index in consideration i.e it would keep returning the same file for many entry indices provided the file hadn't changed?

What is going on there? Shouldn't it return the data snapshot at instant entry.getIndex?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Wednesday, December 15, 2021 7:37:20 PM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance


> If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?

The write method should write the stateMachineData (i.e. LogEntryProto.getStateMachineLogEntry().getStateMachineEntry().getStateMachineData()) to its storage.  The stateMachineData will be removed from the logEntryProto.   The logEntryProto with removed stateMachineData will be written to the raft log.  Yes, stateMachineData won't be written to the raft log.

> How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

When reading from the raft log, the server in addition calls the DataApi.read(LogEntryProto) method.  The state machine should return back the stateMachineData corresponding to the given logEntryProto as a CompletableFuture<ByteString>.  Then, the server will add the stateMachineData back to the logEntryProto and then send the logEntryProto to the other servers.

Tsz-Wo


On Thu, Dec 16, 2021 at 3:13 AM Asad Awadia <as...@gmail.com>> wrote:
I read through the file store example.

If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?


How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Tuesday, December 14, 2021 8:16:23 PM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance

Hi Asad,

By default, all the write requests (including the data inside) are recorded in the raft log.  This is specified in the standard RAFT algorithm so that any server can replay the log in order to sync up to the current state.  If the state machine keeps another persistent copy of the data, you are right that there will be two copies.  In this sense, the standard RAFT algorithm is not suitable for data intensive applications.

In Ratis, we want to support data intensive applications such as Apache Ozone.  State machine implementations may choose to manage the data itself.  In that case, the data won't be recorded in the raft log.  In order to use this feature, state machine implementations must override the read and write methods defined in StateMachine.DataApi.  The FileStore example indeed has overridden these two methods.

Hope it helps.
Tsz-Wo


On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com>> wrote:
Hello,

Is ratis keeping a copy of all the writes made? On top of what is persisted in my own DB? So ~2x storage used

I read that the file system example avoids that but I don't see how that is being done in the code?

Regards,
Asad

Re: Raft log persistance

Posted by Tsz Wo Sze <sz...@gmail.com>.
The file store example is just a simple example.  It does not intend to
support all the features.  It indeed does not work for the case that the
data is deleted -- it will fail to read the file.

You are right that, for real applications supporting multiple versions of
data, it should retrieve the data from the version corresponding to the
given log index.

Tsz-Wo


On Sat, Dec 18, 2021 at 1:05 PM Asad Awadia <as...@gmail.com> wrote:

> In file store example read method - it looks like it is returning the file
> at the path without taking the raft index in consideration i.e it would
> keep returning the same file for many entry indices provided the file
> hadn't changed?
>
> What is going on there? Shouldn't it return the data snapshot at instant
> entry.getIndex?
>
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Wednesday, December 15, 2021 7:37:20 PM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
>
> > If I understand correctly,  in the write method we should take the
> request and apply it right there in the method into the state machine? So
> it only gets persisted into the state machine and not the raft log?
>
> The write method should write the stateMachineData
> (i.e. LogEntryProto.getStateMachineLogEntry().getStateMachineEntry().getStateMachineData())
> to its storage.  The stateMachineData will be removed from the
> logEntryProto.   The logEntryProto with removed stateMachineData will be
> written to the raft log.  Yes, stateMachineData won't be written to the
> raft log.
>
> > How will syncing up and backups work then? If there is no 'raft log'
> anymore since we are overriding the write method to only apply writes to
> the state machine. How will the other servers be able to sync and catch up?
>
> When reading from the raft log, the server in addition calls the
> DataApi.read(LogEntryProto) method.  The state machine should return back
> the stateMachineData corresponding to the given logEntryProto as a
> CompletableFuture<ByteString>.  Then, the server will add
> the stateMachineData back to the logEntryProto and then send
> the logEntryProto to the other servers.
>
> Tsz-Wo
>
>
> On Thu, Dec 16, 2021 at 3:13 AM Asad Awadia <as...@gmail.com> wrote:
>
> I read through the file store example.
>
> If I understand correctly,  in the write method we should take the request
> and apply it right there in the method into the state machine? So it only
> gets persisted into the state machine and not the raft log?
>
>
> How will syncing up and backups work then? If there is no 'raft log'
> anymore since we are overriding the write method to only apply writes to
> the state machine. How will the other servers be able to sync and catch up?
>
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Tuesday, December 14, 2021 8:16:23 PM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
> Hi Asad,
>
> By default, all the write requests (including the data inside) are
> recorded in the raft log.  This is specified in the standard RAFT algorithm
> so that any server can replay the log in order to sync up to the current
> state.  If the state machine keeps another persistent copy of the data, you
> are right that there will be two copies.  In this sense, the standard RAFT
> algorithm is not suitable for data intensive applications.
>
> In Ratis, we want to support data intensive applications such as Apache
> Ozone.  State machine implementations may choose to manage the data
> itself.  In that case, the data won't be recorded in the raft log.  In
> order to use this feature, state machine implementations must override the
> read and write methods defined in StateMachine.DataApi.  The FileStore
> example indeed has overridden these two methods.
>
> Hope it helps.
> Tsz-Wo
>
>
> On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com> wrote:
>
> Hello,
>
> Is ratis keeping a copy of all the writes made? On top of what is
> persisted in my own DB? So ~2x storage used
>
> I read that the file system example avoids that but I don't see how that
> is being done in the code?
>
> Regards,
> Asad
>
>

Re: Raft log persistance

Posted by Asad Awadia <as...@gmail.com>.
In file store example read method - it looks like it is returning the file at the path without taking the raft index in consideration i.e it would keep returning the same file for many entry indices provided the file hadn't changed?

What is going on there? Shouldn't it return the data snapshot at instant entry.getIndex?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>
Sent: Wednesday, December 15, 2021 7:37:20 PM
To: user@ratis.apache.org <us...@ratis.apache.org>
Subject: Re: Raft log persistance


> If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?

The write method should write the stateMachineData (i.e. LogEntryProto.getStateMachineLogEntry().getStateMachineEntry().getStateMachineData()) to its storage.  The stateMachineData will be removed from the logEntryProto.   The logEntryProto with removed stateMachineData will be written to the raft log.  Yes, stateMachineData won't be written to the raft log.

> How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

When reading from the raft log, the server in addition calls the DataApi.read(LogEntryProto) method.  The state machine should return back the stateMachineData corresponding to the given logEntryProto as a CompletableFuture<ByteString>.  Then, the server will add the stateMachineData back to the logEntryProto and then send the logEntryProto to the other servers.

Tsz-Wo


On Thu, Dec 16, 2021 at 3:13 AM Asad Awadia <as...@gmail.com>> wrote:
I read through the file store example.

If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?


How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>>
Sent: Tuesday, December 14, 2021 8:16:23 PM
To: user@ratis.apache.org<ma...@ratis.apache.org> <us...@ratis.apache.org>>
Subject: Re: Raft log persistance

Hi Asad,

By default, all the write requests (including the data inside) are recorded in the raft log.  This is specified in the standard RAFT algorithm so that any server can replay the log in order to sync up to the current state.  If the state machine keeps another persistent copy of the data, you are right that there will be two copies.  In this sense, the standard RAFT algorithm is not suitable for data intensive applications.

In Ratis, we want to support data intensive applications such as Apache Ozone.  State machine implementations may choose to manage the data itself.  In that case, the data won't be recorded in the raft log.  In order to use this feature, state machine implementations must override the read and write methods defined in StateMachine.DataApi.  The FileStore example indeed has overridden these two methods.

Hope it helps.
Tsz-Wo


On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com>> wrote:
Hello,

Is ratis keeping a copy of all the writes made? On top of what is persisted in my own DB? So ~2x storage used

I read that the file system example avoids that but I don't see how that is being done in the code?

Regards,
Asad

Re: Raft log persistance

Posted by Tsz Wo Sze <sz...@gmail.com>.
> If I understand correctly,  in the write method we should take the
request and apply it right there in the method into the state machine? So
it only gets persisted into the state machine and not the raft log?

The write method should write the stateMachineData
(i.e. LogEntryProto.getStateMachineLogEntry().getStateMachineEntry().getStateMachineData())
to its storage.  The stateMachineData will be removed from the
logEntryProto.   The logEntryProto with removed stateMachineData will be
written to the raft log.  Yes, stateMachineData won't be written to the
raft log.

> How will syncing up and backups work then? If there is no 'raft log'
anymore since we are overriding the write method to only apply writes to
the state machine. How will the other servers be able to sync and catch up?

When reading from the raft log, the server in addition calls the
DataApi.read(LogEntryProto) method.  The state machine should return back
the stateMachineData corresponding to the given logEntryProto as a
CompletableFuture<ByteString>.  Then, the server will add
the stateMachineData back to the logEntryProto and then send
the logEntryProto to the other servers.

Tsz-Wo


On Thu, Dec 16, 2021 at 3:13 AM Asad Awadia <as...@gmail.com> wrote:

> I read through the file store example.
>
> If I understand correctly,  in the write method we should take the request
> and apply it right there in the method into the state machine? So it only
> gets persisted into the state machine and not the raft log?
>
>
> How will syncing up and backups work then? If there is no 'raft log'
> anymore since we are overriding the write method to only apply writes to
> the state machine. How will the other servers be able to sync and catch up?
>
> ------------------------------
> *From:* Tsz Wo Sze <sz...@gmail.com>
> *Sent:* Tuesday, December 14, 2021 8:16:23 PM
> *To:* user@ratis.apache.org <us...@ratis.apache.org>
> *Subject:* Re: Raft log persistance
>
> Hi Asad,
>
> By default, all the write requests (including the data inside) are
> recorded in the raft log.  This is specified in the standard RAFT algorithm
> so that any server can replay the log in order to sync up to the current
> state.  If the state machine keeps another persistent copy of the data, you
> are right that there will be two copies.  In this sense, the standard RAFT
> algorithm is not suitable for data intensive applications.
>
> In Ratis, we want to support data intensive applications such as Apache
> Ozone.  State machine implementations may choose to manage the data
> itself.  In that case, the data won't be recorded in the raft log.  In
> order to use this feature, state machine implementations must override the
> read and write methods defined in StateMachine.DataApi.  The FileStore
> example indeed has overridden these two methods.
>
> Hope it helps.
> Tsz-Wo
>
>
> On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com> wrote:
>
> Hello,
>
> Is ratis keeping a copy of all the writes made? On top of what is
> persisted in my own DB? So ~2x storage used
>
> I read that the file system example avoids that but I don't see how that
> is being done in the code?
>
> Regards,
> Asad
>
>

Re: Raft log persistance

Posted by Asad Awadia <as...@gmail.com>.
I read through the file store example.

If I understand correctly,  in the write method we should take the request and apply it right there in the method into the state machine? So it only gets persisted into the state machine and not the raft log?


How will syncing up and backups work then? If there is no 'raft log' anymore since we are overriding the write method to only apply writes to the state machine. How will the other servers be able to sync and catch up?

________________________________
From: Tsz Wo Sze <sz...@gmail.com>
Sent: Tuesday, December 14, 2021 8:16:23 PM
To: user@ratis.apache.org <us...@ratis.apache.org>
Subject: Re: Raft log persistance

Hi Asad,

By default, all the write requests (including the data inside) are recorded in the raft log.  This is specified in the standard RAFT algorithm so that any server can replay the log in order to sync up to the current state.  If the state machine keeps another persistent copy of the data, you are right that there will be two copies.  In this sense, the standard RAFT algorithm is not suitable for data intensive applications.

In Ratis, we want to support data intensive applications such as Apache Ozone.  State machine implementations may choose to manage the data itself.  In that case, the data won't be recorded in the raft log.  In order to use this feature, state machine implementations must override the read and write methods defined in StateMachine.DataApi.  The FileStore example indeed has overridden these two methods.

Hope it helps.
Tsz-Wo


On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com>> wrote:
Hello,

Is ratis keeping a copy of all the writes made? On top of what is persisted in my own DB? So ~2x storage used

I read that the file system example avoids that but I don't see how that is being done in the code?

Regards,
Asad

Re: Raft log persistance

Posted by Tsz Wo Sze <sz...@gmail.com>.
Hi Asad,

By default, all the write requests (including the data inside) are recorded
in the raft log.  This is specified in the standard RAFT algorithm so that
any server can replay the log in order to sync up to the current state.  If
the state machine keeps another persistent copy of the data, you are right
that there will be two copies.  In this sense, the standard RAFT algorithm is
not suitable for data intensive applications.

In Ratis, we want to support data intensive applications such as Apache
Ozone.  State machine implementations may choose to manage the data
itself.  In that case, the data won't be recorded in the raft log.  In
order to use this feature, state machine implementations must override the
read and write methods defined in StateMachine.DataApi.  The FileStore
example indeed has overridden these two methods.

Hope it helps.
Tsz-Wo


On Wed, Dec 15, 2021 at 3:40 AM Asad Awadia <as...@gmail.com> wrote:

> Hello,
>
> Is ratis keeping a copy of all the writes made? On top of what is
> persisted in my own DB? So ~2x storage used
>
> I read that the file system example avoids that but I don't see how that
> is being done in the code?
>
> Regards,
> Asad
>