You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by SHI Xiaogang <sh...@gmail.com> on 2017/02/14 03:18:25 UTC

[DISCUSS] Support Incremental Checkpointing in Flink

Hi all,


Incremental checkpointing can help a lot in improving the efficiency of
fault tolerance and recovery in Flink. I wrote an initial design of
incremental checkpointing in Flink, and am looking forwards for your
comments.


https://docs.google.com/document/d/1VvvPp09gGdVb9D2wHx6NX99yQK0jSUMWHQPrQ_mn520/edit?usp=sharing


Some more issues, I think, are needed to be discussed in the introduction
of incremental checkpointing.


One is the implementation of savepoints. Savepoints are supposed to be full
and independent of backend implementation. Currently, the implementation of
Savepoints and Checkpoints are identical in backends. With the introduction
of incremental checkpointing, I think backends should take different
snapshots for them.


Regards,

Xiaogang

Re: [DISCUSS] Support Incremental Checkpointing in Flink

Posted by "wenlong.lwl" <we...@gmail.com>.

Hi, happy to see such a great proposal. Increment Checkpointing is really
necessary for jobs with states in large scale, which can save both space
and time spent on checkpointing.

For checkpoints, I think that it is better to leave the state backend to
decide whether a checkpoint is incremental or full, because only the
statebackends knows whether an increment checkpoint or a full checkpoint
checkpoint is better. What we need to do is that telling the state backend
which checkpoints we want to reserve, just like the methods added in
StateHandler in the proposal, and then the state backend can calculate what
is needed to be clean. In another word, the concept of incremental
checkpoint exists in the implementation of statebackend not in the flink
framework.

For savepoints, if we can set a backend as storage engine for savepoint, I
think it is similar to create a checkpoint or create a savepoint. When
savepoints and checkpoints share the same backend(currently they are the
same naturally), we can do increment snapshot too when creating a
savepoint. What we need to do is to save the meta info of savepoints
created and make sure that when we delete data for checkpoints, the data
shared by savepoints will be reserved.

Best regards
Wenlong Lyu

On 20 February 2017 at 10:21, 施晓罡(星罡) <xi...@alibaba-inc.com> wrote:

> Hi Gyula,
> It's the backend that decides the data contained in the snapshot. The
> backend can take a complete snapshot when it finds it cost more to take an
> incremental snapshot. In such cases, the taken snapshots are self-contained
> and will not reference any previous deltas.
> It's true that the growth in the number of deltas will increase the time
> taken to recover. But the backend can merge the deltas to limit the number
> of deltas. The methods to merge deltas may vary a lot in different
> backends. For example, in RocksDBStateBackend, the merging of deltas is
> already done in RocksDB via compactions.
> Regards,Xiaogang
>
> ------------------------------------------------------------------发件人：Gyula
> Fóra <gy...@apache.org>发送时间：2017年2月18日(星期六) 05:37收件人：dev <
> dev@flink.apache.org>主 题：Re: [DISCUSS] Support Incremental Checkpointing
> in Flink
> Hi!
>
> This is an awesome proposal, I am looking forward to
> seeing it in action :)
>
> Some things I have been wondering:
>
> Which component decides whether the next checkpoint should be a delta or
> not? I guess the more deltas we take the longer the recovery time will be
> if there is many overwrites in the database, on the other
> hand if we rarely
> overwrite values it might make sense to keep a lot of deltas. Maybe the
> statebackend should be able to decide this, it might not make sense to
> preconfigure this to a fix value, but I am not sure. We could for instance
> use bloom filters to make the decision.
>
> If we created deltas with a lot of duplicate keys then the recovery will
> suffer potentially outweighing the benefits of the incremental checkpoint
> itself (it might violate some strict SLA on recovery), would it make sense
> in some cases to do a merge of the deltas in background batch jobs? This
> would probably make bookkeeping much harder, so just an idea I wanted to
> throw in there.
>
> Otherwise it seems that a lot of thought went into this, and looks very
> good!
>
> Have a nice weekend!
> Gyula
>
> SHI Xiaogang <sh...@gmail.com> ezt írta (időpont: 2017. febr. 14.,
> K, 4:18):
>
> > Hi all,
> >
> >
> > Incremental checkpointing can help a lot in improving the efficiency of
> > fault tolerance and recovery in Flink. I wrote an initial design of
> > incremental checkpointing in Flink, and am looking forwards for your
> > comments.
> >
> >
> >
> > https://docs.google.com/document/d/1VvvPp09gGdVb9D2wHx6NX99yQK0jS
> UMWHQPrQ_mn520/edit?usp=sharing
> >
> >
> > Some more issues, I think, are needed to be discussed in
> the introduction
> > of incremental checkpointing.
> >
> >
> > One is the implementation of savepoints. Savepoints are
> supposed to be full
> > and independent of backend implementation. Currently,
> the implementation of
> > Savepoints and Checkpoints are identical in backends.
> With the introduction
> > of incremental checkpointing, I think backends should take different
> > snapshots for them.
> >
> >
> > Regards,
> >
> > Xiaogang
> >
>

回复：[DISCUSS] Support Incremental Checkpointing in Flink

Posted by "施晓罡(星罡)" <xi...@alibaba-inc.com>.

Hi Gyula,
It's the backend that decides the data contained in the snapshot. The backend can take a complete snapshot when it finds it cost more to take an incremental snapshot. In such cases, the taken snapshots are self-contained and will not reference any previous deltas.
It's true that the growth in the number of deltas will increase the time taken to recover. But the backend can merge the deltas to limit the number of deltas. The methods to merge deltas may vary a lot in different backends. For example, in RocksDBStateBackend, the merging of deltas is already done in RocksDB via compactions.
Regards,Xiaogang

------------------------------------------------------------------发件人：Gyula Fóra <gy...@apache.org>发送时间：2017年2月18日(星期六) 05:37收件人：dev <de...@flink.apache.org>主　题：Re: [DISCUSS] Support Incremental Checkpointing in Flink
Hi!

This is an awesome proposal, I am looking forward to seeing it in action :)

Some things I have been wondering:

Which component decides whether the next checkpoint should be a delta or
not? I guess the more deltas we take the longer the recovery time will be
if there is many overwrites in the database, on the other hand if we rarely
overwrite values it might make sense to keep a lot of deltas. Maybe the
statebackend should be able to decide this, it might not make sense to
preconfigure this to a fix value, but I am not sure. We could for instance
use bloom filters to make the decision.

If we created deltas with a lot of duplicate keys then the recovery will
suffer potentially outweighing the benefits of the incremental checkpoint
itself (it might violate some strict SLA on recovery), would it make sense
in some cases to do a merge of the deltas in background batch jobs? This
would probably make bookkeeping much harder, so just an idea I wanted to
throw in there.

Otherwise it seems that a lot of thought went into this, and looks very
good!

Have a nice weekend!
Gyula

SHI Xiaogang <sh...@gmail.com> ezt írta (időpont: 2017. febr. 14.,
K, 4:18):

> Hi all,
>
>
> Incremental checkpointing can help a lot in improving the efficiency of
> fault tolerance and recovery in Flink. I wrote an initial design of
> incremental checkpointing in Flink, and am looking forwards for your
> comments.
>
>
>
> https://docs.google.com/document/d/1VvvPp09gGdVb9D2wHx6NX99yQK0jSUMWHQPrQ_mn520/edit?usp=sharing
>
>
> Some more issues, I think, are needed to be discussed in the introduction
> of incremental checkpointing.
>
>
> One is the implementation of savepoints. Savepoints are supposed to be full
> and independent of backend implementation. Currently, the implementation of
> Savepoints and Checkpoints are identical in backends. With the introduction
> of incremental checkpointing, I think backends should take different
> snapshots for them.
>
>
> Regards,
>
> Xiaogang
>

Re: [DISCUSS] Support Incremental Checkpointing in Flink

Posted by Gyula Fóra <gy...@apache.org>.

Hi!

This is an awesome proposal, I am looking forward to seeing it in action :)

Some things I have been wondering:

Which component decides whether the next checkpoint should be a delta or
not? I guess the more deltas we take the longer the recovery time will be
if there is many overwrites in the database, on the other hand if we rarely
overwrite values it might make sense to keep a lot of deltas. Maybe the
statebackend should be able to decide this, it might not make sense to
preconfigure this to a fix value, but I am not sure. We could for instance
use bloom filters to make the decision.

If we created deltas with a lot of duplicate keys then the recovery will
suffer potentially outweighing the benefits of the incremental checkpoint
itself (it might violate some strict SLA on recovery), would it make sense
in some cases to do a merge of the deltas in background batch jobs? This
would probably make bookkeeping much harder, so just an idea I wanted to
throw in there.

Otherwise it seems that a lot of thought went into this, and looks very
good!

Have a nice weekend!
Gyula

SHI Xiaogang <sh...@gmail.com> ezt írta (időpont: 2017. febr. 14.,
K, 4:18):

> Hi all,
>
>
> Incremental checkpointing can help a lot in improving the efficiency of
> fault tolerance and recovery in Flink. I wrote an initial design of
> incremental checkpointing in Flink, and am looking forwards for your
> comments.
>
>
>
> https://docs.google.com/document/d/1VvvPp09gGdVb9D2wHx6NX99yQK0jSUMWHQPrQ_mn520/edit?usp=sharing
>
>
> Some more issues, I think, are needed to be discussed in the introduction
> of incremental checkpointing.
>
>
> One is the implementation of savepoints. Savepoints are supposed to be full
> and independent of backend implementation. Currently, the implementation of
> Savepoints and Checkpoints are identical in backends. With the introduction
> of incremental checkpointing, I think backends should take different
> snapshots for them.
>
>
> Regards,
>
> Xiaogang
>