You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Ken Krugler <kk...@transpac.com> on 2020/01/30 00:58:06 UTC

Using retained checkpoints as savepoints

Hi all,

Currently https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html#difference-to-savepoints <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html#difference-to-savepoints> says checkpoints…

"do not support Flink specific features like rescaling"

But I believe they do, and really must if you can use them like a savepoint. Should that sentence be changed, or removed?

Also this page doesn’t talk about state migration, which is another aspect of restarting a (modified) workflow from a retained checkpoint…will that work?

This sentence about checkpoints on https://ci.apache.org/projects/flink/flink-docs-master/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint> implies not:

"Optimizations towards those goals can exploit certain properties, e.g. that the job code doesn’t change between the execution attempts"

Thanks,

— Ken

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Re: Using retained checkpoints as savepoints

Posted by Stephan Ewen <se...@apache.org>.

Maybe one small addition:
  - for the heap state backend, there is no difference at all between the
format and behavior of retained checkpoints (after the job is canceled) and
savepoints. Same format and features.
  - For RocksDB incremental checkpoints, we do in fact support re-scaling,
and I think we should commit to doing that always in the future. But we
wanted to keep it open to not support state migration, for the reasons
mentioned by Aljoscha.

Supporting re-scaling on checkpoints is important for the upcoming work on
(reactive) auto-scaling, which means we need to commit to supporting this.
Which also means we can update the docs to say that.

Best,
Stephan



On Tue, Feb 18, 2020 at 1:06 PM Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
>
> the reason why we are quite conservative when it comes to stating
> properties of checkpoints is that we don't want to prevent ourselves
> from implementing possibly optimized checkpoint formats that would not
> support these features.
>
> You're right that currently checkpoints support most of the features of
> savepoints because they did not diverge far in their formats (or not at
> all).
>
> AFAIK, this is not written down anywhere so it would be good to discuss
> if we want to give those guarantees (which ties our hands a bit more) or
> keep it as is but properly document it.
>
> Best,
> Aljoscha
>
> On 30.01.20 01:58, Ken Krugler wrote:
> > Hi all,
> >
> > Currently
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html#difference-to-savepoints
> <
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html#difference-to-savepoints>
> says checkpoints…
> >
> > "do not support Flink specific features like rescaling"
> >
> > But I believe they do, and really must if you can use them like a
> savepoint. Should that sentence be changed, or removed?
> >
> > Also this page doesn’t talk about state migration, which is another
> aspect of restarting a (modified) workflow from a retained checkpoint…will
> that work?
> >
> > This sentence about checkpoints on
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint
> <
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint>
> implies not:
> >
> > "Optimizations towards those goals can exploit certain properties, e.g.
> that the job code doesn’t change between the execution attempts"
> >
> > Thanks,
> >
> > — Ken
> >
> > --------------------------
> > Ken Krugler
> > http://www.scaleunlimited.com
> > custom big data solutions & training
> > Hadoop, Cascading, Cassandra & Solr
> >
> >
>

Re: Using retained checkpoints as savepoints

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,

the reason why we are quite conservative when it comes to stating 
properties of checkpoints is that we don't want to prevent ourselves 
from implementing possibly optimized checkpoint formats that would not 
support these features.

You're right that currently checkpoints support most of the features of 
savepoints because they did not diverge far in their formats (or not at 
all).

AFAIK, this is not written down anywhere so it would be good to discuss 
if we want to give those guarantees (which ties our hands a bit more) or 
keep it as is but properly document it.

Best,
Aljoscha

On 30.01.20 01:58, Ken Krugler wrote:
> Hi all,
> 
> Currently https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html#difference-to-savepoints <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html#difference-to-savepoints> says checkpoints…
> 
> "do not support Flink specific features like rescaling"
> 
> But I believe they do, and really must if you can use them like a savepoint. Should that sentence be changed, or removed?
> 
> Also this page doesn’t talk about state migration, which is another aspect of restarting a (modified) workflow from a retained checkpoint…will that work?
> 
> This sentence about checkpoints on https://ci.apache.org/projects/flink/flink-docs-master/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint> implies not:
> 
> "Optimizations towards those goals can exploit certain properties, e.g. that the job code doesn’t change between the execution attempts"
> 
> Thanks,
> 
> — Ken
> 
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
> 
>