You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "Tzu-Li (Gordon) Tai" <tz...@apache.org> on 2019/05/28 21:02:20 UTC

[DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Hi Flink devs,

Congxian, Kostas, and I have recently been discussing to unify the binary
formats for keyed state in savepoints, which would allow for more
operational flexibility such as swapping state backends across restores.

As part of this FLIP, another main proposal is to start allowing
checkpoints and savepoints to have different formats. Savepoint formats
should in the future be designed with interoperability in mind and
reasonable snapshot / restore overhead is tolerable, while checkpoints are
allowed to be backend specific for more efficient snapshots and restores.
From recent proposals in the state backends such as disk-spilling heap
backend [1], this flexibility seems to be reasonable.

The main user-facing API this would affect is of course, the binary formats
of savepoints, as well as the fact that we will no longer be guaranteeing
functional parity between savepoints and full checkpoints in the future
(w.r.t. operational features related to upgrading applications; so far they
have equal functionality).

Therefore, we would like to collect feedback on the proposal before
continuing efforts.

This is the FLIP:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
.

I'm happy to discuss details and looking forward to any feedback.

Cheers,
Gordon

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html

RE: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by "Visser, M.J.H. (Martijn)" <ma...@ing.com.INVALID>.

On a related subject, it would be interesting to have the capability to encrypt savepoints. That would allow processing and storing of sensitive data in Flink. 

-----Original Message-----
From: Tzu-Li (Gordon) Tai <tz...@apache.org> 
Sent: maandag 17 juni 2019 04:15
To: dev <de...@flink.apache.org>
Subject: Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Thanks for the inputs Yu and Aljoscha!

I agree to rename this FLIP. Will call it "Unified binary format for Keyed State".

I will proceed to open a VOTE thread to formally adopt the FLIP now.

On Fri, Jun 14, 2019 at 10:03 PM Aljoscha Krettek <al...@apache.org>
wrote:

> Please also see my comment on
> https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=168
> 64098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-t
> abpanel#comment-16864098
> <
> https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=168
> 64098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tab
> panel#comment-16864098
> >
>
> For this FLIP-41 it means we go forward with the design basically as 
> is but should call it “Unified Format” or something like it.
>
> If no-one else comments, we should proceed to a [VOTE] thread to 
> formally adopt the FLIP.
>
> Aljoscha
>
> > On 14. Jun 2019, at 15:40, Yu Li <li...@apache.org> wrote:
> >
> > Hi Aljoscha and all,
> >
> > My 2 cents here:
> >
> > 1. Conceptually it worth a second thought about introducing an 
> > optimized snapshot format for now (i.e. use checkpoint format in 
> > savepoint), just like it's not recommended to use snapshot for 
> > backup in database
> (although
> > practically it could be implemented).
> >
> > 2. Stop-with-checkpoint mechanism is like stopping database instance
> with a
> > data flush, thus (IMHO) a different story from the 
> > checkpoint/savepoint
> (db
> > snapshot/backup) diversity.
> >
> > 3. In the long run we may improve the checkpoint to allow a short 
> > enough interval thus it may become some format of transactional log, 
> > then we
> could
> > enable checkpoint-based savepoint (like transactional log based 
> > backup),
> so
> > I agree to still call the new format in FLIP-41 a "Unified Format"
> although
> > in the short term it only unifies savepoint.
> >
> > I've also wrote a document [1] to include more details and please 
> > refer
> to
> > it if interested. Thanks!
> >
> > [1] 
> > https://docs.google.com/document/d/1uE4R3wNal6e67FkDe0UvcnsIMMDpr35j
> >
> > Best Regards,
> > Yu
> >
> >
> > On Thu, 6 Jun 2019 at 19:42, Aljoscha Krettek <al...@apache.org>
> wrote:
> >
> >> Btw, I think this FLIP is a very good effort, we just need to 
> >> reframe
> the
> >> effort a tiny bit. +1
> >>
> >>> On 6. Jun 2019, at 13:41, Aljoscha Krettek <al...@apache.org>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I had a brief discussion with Stephan that helped me sort my 
> >>> thoughts
> on
> >> the broader topics of checkpoints, savepoints, binary formats, 
> >> user-triggered checkpoints, and periodic savepoints. I’ll try to
> summarise
> >> my stance on this and also comment with the same message on the 
> >> other relevant Jira Issues and threads.
> >>>
> >>> For reference, the relevant FLIP and Jira issues are these:
> >>>
> >>> -
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Key
> ed+State+Snapshot+Binary+Format+for+Savepoints
> :
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed
> +State+Snapshot+Binary+Format+for+Savepoints
> :>
> >> Unified Savepoint Format
> >>> - https://issues.apache.org/jira/browse/FLINK-12619: Add support 
> >>> for
> >> stop-with-checkpoint
> >>> - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered
> >> checkpoints
> >>> - https://issues.apache.org/jira/browse/FLINK-4620: Automatically
> >> creating savepoints
> >>> - https://issues.apache.org/jira/browse/FLINK-4511: Schedule 
> >>> periodic
> >> savepoints
> >>>
> >>> There are roughly two different dimensions in the topic of
> >> savepoints/checkpoints (I’ll use snapshot as the generic term for both):
> >>> 1) who controls the snapshot
> >>> 2) what’s the (binary) format of the snapshot
> >>>
> >>> For 1), we currently have checkpoints and savepoints. Checkpoints 
> >>> are
> >> created by the system for fault tolerance. They are managed by the
> system
> >> and the system is free to discard them when it sees fit. Savepoints 
> >> are
> in
> >> the control of the user. A user can choose to create a save point, 
> >> they
> can
> >> delete them, they can restore from them at will. The system will 
> >> not
> clean
> >> up savepoints. We should try and keep this separation and not 
> >> muddle the two concepts.
> >>>
> >>> For 2), we currently have various different formats between the
> >> different state backends and also for the same backend. I.e. 
> >> RocksDB
> can do
> >> full or incremental snapshots, local snapshots, and probably more.
> >>>
> >>> FLIP-41 aims at introducing a unified “savepoint" format that is
> >> interchangeable between the different state backends. In light of 
> >> the
> above
> >> points, we should say that FLIP-41 aims to introduce a canonical 
> >> format that is interchangeable between different backends. This 
> >> doesn’t mean
> that
> >> we should tie this format strictly to savepoints, though. For
> performance
> >> reasons, users might choose to do savepoints that use one of the
> optimised
> >> formats that the backends offer, for example incremental snapshots. 
> >> Or
> they
> >> might choose to use the canonical format for regular checkpoints so 
> >> that they can always switch between backends using periodically 
> >> created externalised checkpoints.
> >>>
> >>> The motivation behind FLINK-12619 is to have a more lightweight
> >> alternative for stop-with-savepoint, for example using the 
> >> incremental snapshot format that RocksDB has. With the above in 
> >> mind, however, this becomes “Add support for choosing the snapshot 
> >> format for stop-with-savepoint”. It should not be 
> >> stop-with-checkpoint, because checkpoints are something that the 
> >> system manages and not something that the user should trigger. The 
> >> same is true for FLINK-6755, the
> motivation is
> >> the same I think. The change should be called “Add support for 
> >> choosing
> the
> >> snapshot format for savepoints”, however.
> >>>
> >>> For the last two Jira issues mentioned above it should be quite 
> >>> clear
> >> what I think. I do, however, see a need for potentially different 
> >> overlapping checkpoint periods or intervals. Users might want to 
> >> have
> their
> >> regular checkpoints use an optimised format but they also want to 
> >> have a “canonical format” checkpoint every no and then so that the 
> >> lineage of incremental checkpoints does not become too unwieldy.
> >>>
> >>> Please let me know what you think!
> >>>
> >>> Aljoscha
> >>>
> >>>> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai 
> >>>> <tz...@apache.org>
> >> wrote:
> >>>>
> >>>> I want to quickly bump this discussion to gather more consensus 
> >>>> from
> >> others
> >>>> on the FLIP, and see if we want to aim this for the upcoming 
> >>>> 1.9.0
> >> release.
> >>>> The proposal touches binary formats of savepoints, which is a 
> >>>> major
> >> part of
> >>>> Flink's public user interface, so having explicit approval from 
> >>>> other members of the community would be nice here.
> >>>>
> >>>> Cheers,
> >>>> Gordon
> >>>>
> >>>> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <
> >> tzulitai@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I also should point out something that I forgot to mention in 
> >>>>> the
> >> initial
> >>>>> post:
> >>>>> Stefan has helped a lot in understanding the current status of 
> >>>>> state backends and also participated a lot in design choices for 
> >>>>> the FLIP
> :)
> >>>>>
> >>>>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <
> >> tzulitai@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Flink devs,
> >>>>>>
> >>>>>> Congxian, Kostas, and I have recently been discussing to unify 
> >>>>>> the
> >> binary
> >>>>>> formats for keyed state in savepoints, which would allow for 
> >>>>>> more operational flexibility such as swapping state backends 
> >>>>>> across
> >> restores.
> >>>>>>
> >>>>>> As part of this FLIP, another main proposal is to start 
> >>>>>> allowing checkpoints and savepoints to have different formats. 
> >>>>>> Savepoint
> >> formats
> >>>>>> should in the future be designed with interoperability in mind 
> >>>>>> and reasonable snapshot / restore overhead is tolerable, while
> >> checkpoints are
> >>>>>> allowed to be backend specific for more efficient snapshots and
> >> restores.
> >>>>>> From recent proposals in the state backends such as 
> >>>>>> disk-spilling
> heap
> >>>>>> backend [1], this flexibility seems to be reasonable.
> >>>>>>
> >>>>>> The main user-facing API this would affect is of course, the 
> >>>>>> binary formats of savepoints, as well as the fact that we will 
> >>>>>> no longer be guaranteeing functional parity between savepoints 
> >>>>>> and full
> >> checkpoints in
> >>>>>> the future (w.r.t. operational features related to upgrading
> >> applications;
> >>>>>> so far they have equal functionality).
> >>>>>>
> >>>>>> Therefore, we would like to collect feedback on the proposal 
> >>>>>> before continuing efforts.
> >>>>>>
> >>>>>> This is the FLIP:
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Key
> ed+State+Snapshot+Binary+Format+for+Savepoints
> >>>>>> .
> >>>>>>
> >>>>>> I'm happy to discuss details and looking forward to any feedback.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Gordon
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS
> -Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.ht
> ml
> >>>>>>
> >>>>>
> >>>
> >>
> >>
>
>

-----------------------------------------------------------------
ATTENTION:
The information in this e-mail is confidential and only meant for the intended recipient. If you are not the intended recipient, don't use or disclose it in any way. Please let the sender know and delete the message immediately.
-----------------------------------------------------------------

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.

Thanks for the inputs Yu and Aljoscha!

I agree to rename this FLIP. Will call it "Unified binary format for Keyed
State".

I will proceed to open a VOTE thread to formally adopt the FLIP now.

On Fri, Jun 14, 2019 at 10:03 PM Aljoscha Krettek <al...@apache.org>
wrote:

> Please also see my comment on
> https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16864098
> <
> https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16864098
> >
>
> For this FLIP-41 it means we go forward with the design basically as is
> but should call it “Unified Format” or something like it.
>
> If no-one else comments, we should proceed to a [VOTE] thread to formally
> adopt the FLIP.
>
> Aljoscha
>
> > On 14. Jun 2019, at 15:40, Yu Li <li...@apache.org> wrote:
> >
> > Hi Aljoscha and all,
> >
> > My 2 cents here:
> >
> > 1. Conceptually it worth a second thought about introducing an optimized
> > snapshot format for now (i.e. use checkpoint format in savepoint), just
> > like it's not recommended to use snapshot for backup in database
> (although
> > practically it could be implemented).
> >
> > 2. Stop-with-checkpoint mechanism is like stopping database instance
> with a
> > data flush, thus (IMHO) a different story from the checkpoint/savepoint
> (db
> > snapshot/backup) diversity.
> >
> > 3. In the long run we may improve the checkpoint to allow a short enough
> > interval thus it may become some format of transactional log, then we
> could
> > enable checkpoint-based savepoint (like transactional log based backup),
> so
> > I agree to still call the new format in FLIP-41 a "Unified Format"
> although
> > in the short term it only unifies savepoint.
> >
> > I've also wrote a document [1] to include more details and please refer
> to
> > it if interested. Thanks!
> >
> > [1] https://docs.google.com/document/d/1uE4R3wNal6e67FkDe0UvcnsIMMDpr35j
> >
> > Best Regards,
> > Yu
> >
> >
> > On Thu, 6 Jun 2019 at 19:42, Aljoscha Krettek <al...@apache.org>
> wrote:
> >
> >> Btw, I think this FLIP is a very good effort, we just need to reframe
> the
> >> effort a tiny bit. +1
> >>
> >>> On 6. Jun 2019, at 13:41, Aljoscha Krettek <al...@apache.org>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I had a brief discussion with Stephan that helped me sort my thoughts
> on
> >> the broader topics of checkpoints, savepoints, binary formats,
> >> user-triggered checkpoints, and periodic savepoints. I’ll try to
> summarise
> >> my stance on this and also comment with the same message on the other
> >> relevant Jira Issues and threads.
> >>>
> >>> For reference, the relevant FLIP and Jira issues are these:
> >>>
> >>> -
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> :
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> :>
> >> Unified Savepoint Format
> >>> - https://issues.apache.org/jira/browse/FLINK-12619: Add support for
> >> stop-with-checkpoint
> >>> - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered
> >> checkpoints
> >>> - https://issues.apache.org/jira/browse/FLINK-4620: Automatically
> >> creating savepoints
> >>> - https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic
> >> savepoints
> >>>
> >>> There are roughly two different dimensions in the topic of
> >> savepoints/checkpoints (I’ll use snapshot as the generic term for both):
> >>> 1) who controls the snapshot
> >>> 2) what’s the (binary) format of the snapshot
> >>>
> >>> For 1), we currently have checkpoints and savepoints. Checkpoints are
> >> created by the system for fault tolerance. They are managed by the
> system
> >> and the system is free to discard them when it sees fit. Savepoints are
> in
> >> the control of the user. A user can choose to create a save point, they
> can
> >> delete them, they can restore from them at will. The system will not
> clean
> >> up savepoints. We should try and keep this separation and not muddle the
> >> two concepts.
> >>>
> >>> For 2), we currently have various different formats between the
> >> different state backends and also for the same backend. I.e. RocksDB
> can do
> >> full or incremental snapshots, local snapshots, and probably more.
> >>>
> >>> FLIP-41 aims at introducing a unified “savepoint" format that is
> >> interchangeable between the different state backends. In light of the
> above
> >> points, we should say that FLIP-41 aims to introduce a canonical format
> >> that is interchangeable between different backends. This doesn’t mean
> that
> >> we should tie this format strictly to savepoints, though. For
> performance
> >> reasons, users might choose to do savepoints that use one of the
> optimised
> >> formats that the backends offer, for example incremental snapshots. Or
> they
> >> might choose to use the canonical format for regular checkpoints so that
> >> they can always switch between backends using periodically created
> >> externalised checkpoints.
> >>>
> >>> The motivation behind FLINK-12619 is to have a more lightweight
> >> alternative for stop-with-savepoint, for example using the incremental
> >> snapshot format that RocksDB has. With the above in mind, however, this
> >> becomes “Add support for choosing the snapshot format for
> >> stop-with-savepoint”. It should not be stop-with-checkpoint, because
> >> checkpoints are something that the system manages and not something that
> >> the user should trigger. The same is true for FLINK-6755, the
> motivation is
> >> the same I think. The change should be called “Add support for choosing
> the
> >> snapshot format for savepoints”, however.
> >>>
> >>> For the last two Jira issues mentioned above it should be quite clear
> >> what I think. I do, however, see a need for potentially different
> >> overlapping checkpoint periods or intervals. Users might want to have
> their
> >> regular checkpoints use an optimised format but they also want to have a
> >> “canonical format” checkpoint every no and then so that the lineage of
> >> incremental checkpoints does not become too unwieldy.
> >>>
> >>> Please let me know what you think!
> >>>
> >>> Aljoscha
> >>>
> >>>> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tz...@apache.org>
> >> wrote:
> >>>>
> >>>> I want to quickly bump this discussion to gather more consensus from
> >> others
> >>>> on the FLIP, and see if we want to aim this for the upcoming 1.9.0
> >> release.
> >>>> The proposal touches binary formats of savepoints, which is a major
> >> part of
> >>>> Flink's public user interface, so having explicit approval from other
> >>>> members of the community would be nice here.
> >>>>
> >>>> Cheers,
> >>>> Gordon
> >>>>
> >>>> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <
> >> tzulitai@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I also should point out something that I forgot to mention in the
> >> initial
> >>>>> post:
> >>>>> Stefan has helped a lot in understanding the current status of state
> >>>>> backends and also participated a lot in design choices for the FLIP
> :)
> >>>>>
> >>>>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <
> >> tzulitai@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Flink devs,
> >>>>>>
> >>>>>> Congxian, Kostas, and I have recently been discussing to unify the
> >> binary
> >>>>>> formats for keyed state in savepoints, which would allow for more
> >>>>>> operational flexibility such as swapping state backends across
> >> restores.
> >>>>>>
> >>>>>> As part of this FLIP, another main proposal is to start allowing
> >>>>>> checkpoints and savepoints to have different formats. Savepoint
> >> formats
> >>>>>> should in the future be designed with interoperability in mind and
> >>>>>> reasonable snapshot / restore overhead is tolerable, while
> >> checkpoints are
> >>>>>> allowed to be backend specific for more efficient snapshots and
> >> restores.
> >>>>>> From recent proposals in the state backends such as disk-spilling
> heap
> >>>>>> backend [1], this flexibility seems to be reasonable.
> >>>>>>
> >>>>>> The main user-facing API this would affect is of course, the binary
> >>>>>> formats of savepoints, as well as the fact that we will no longer be
> >>>>>> guaranteeing functional parity between savepoints and full
> >> checkpoints in
> >>>>>> the future (w.r.t. operational features related to upgrading
> >> applications;
> >>>>>> so far they have equal functionality).
> >>>>>>
> >>>>>> Therefore, we would like to collect feedback on the proposal before
> >>>>>> continuing efforts.
> >>>>>>
> >>>>>> This is the FLIP:
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> >>>>>> .
> >>>>>>
> >>>>>> I'm happy to discuss details and looking forward to any feedback.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Gordon
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
> >>>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by Aljoscha Krettek <al...@apache.org>.

Please also see my comment on https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16864098 <https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16864098>

For this FLIP-41 it means we go forward with the design basically as is but should call it “Unified Format” or something like it.

If no-one else comments, we should proceed to a [VOTE] thread to formally adopt the FLIP.

Aljoscha

> On 14. Jun 2019, at 15:40, Yu Li <li...@apache.org> wrote:
> 
> Hi Aljoscha and all,
> 
> My 2 cents here:
> 
> 1. Conceptually it worth a second thought about introducing an optimized
> snapshot format for now (i.e. use checkpoint format in savepoint), just
> like it's not recommended to use snapshot for backup in database (although
> practically it could be implemented).
> 
> 2. Stop-with-checkpoint mechanism is like stopping database instance with a
> data flush, thus (IMHO) a different story from the checkpoint/savepoint (db
> snapshot/backup) diversity.
> 
> 3. In the long run we may improve the checkpoint to allow a short enough
> interval thus it may become some format of transactional log, then we could
> enable checkpoint-based savepoint (like transactional log based backup), so
> I agree to still call the new format in FLIP-41 a "Unified Format" although
> in the short term it only unifies savepoint.
> 
> I've also wrote a document [1] to include more details and please refer to
> it if interested. Thanks!
> 
> [1] https://docs.google.com/document/d/1uE4R3wNal6e67FkDe0UvcnsIMMDpr35j
> 
> Best Regards,
> Yu
> 
> 
> On Thu, 6 Jun 2019 at 19:42, Aljoscha Krettek <al...@apache.org> wrote:
> 
>> Btw, I think this FLIP is a very good effort, we just need to reframe the
>> effort a tiny bit. +1
>> 
>>> On 6. Jun 2019, at 13:41, Aljoscha Krettek <al...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> I had a brief discussion with Stephan that helped me sort my thoughts on
>> the broader topics of checkpoints, savepoints, binary formats,
>> user-triggered checkpoints, and periodic savepoints. I’ll try to summarise
>> my stance on this and also comment with the same message on the other
>> relevant Jira Issues and threads.
>>> 
>>> For reference, the relevant FLIP and Jira issues are these:
>>> 
>>> -
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:>
>> Unified Savepoint Format
>>> - https://issues.apache.org/jira/browse/FLINK-12619: Add support for
>> stop-with-checkpoint
>>> - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered
>> checkpoints
>>> - https://issues.apache.org/jira/browse/FLINK-4620: Automatically
>> creating savepoints
>>> - https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic
>> savepoints
>>> 
>>> There are roughly two different dimensions in the topic of
>> savepoints/checkpoints (I’ll use snapshot as the generic term for both):
>>> 1) who controls the snapshot
>>> 2) what’s the (binary) format of the snapshot
>>> 
>>> For 1), we currently have checkpoints and savepoints. Checkpoints are
>> created by the system for fault tolerance. They are managed by the system
>> and the system is free to discard them when it sees fit. Savepoints are in
>> the control of the user. A user can choose to create a save point, they can
>> delete them, they can restore from them at will. The system will not clean
>> up savepoints. We should try and keep this separation and not muddle the
>> two concepts.
>>> 
>>> For 2), we currently have various different formats between the
>> different state backends and also for the same backend. I.e. RocksDB can do
>> full or incremental snapshots, local snapshots, and probably more.
>>> 
>>> FLIP-41 aims at introducing a unified “savepoint" format that is
>> interchangeable between the different state backends. In light of the above
>> points, we should say that FLIP-41 aims to introduce a canonical format
>> that is interchangeable between different backends. This doesn’t mean that
>> we should tie this format strictly to savepoints, though. For performance
>> reasons, users might choose to do savepoints that use one of the optimised
>> formats that the backends offer, for example incremental snapshots. Or they
>> might choose to use the canonical format for regular checkpoints so that
>> they can always switch between backends using periodically created
>> externalised checkpoints.
>>> 
>>> The motivation behind FLINK-12619 is to have a more lightweight
>> alternative for stop-with-savepoint, for example using the incremental
>> snapshot format that RocksDB has. With the above in mind, however, this
>> becomes “Add support for choosing the snapshot format for
>> stop-with-savepoint”. It should not be stop-with-checkpoint, because
>> checkpoints are something that the system manages and not something that
>> the user should trigger. The same is true for FLINK-6755, the motivation is
>> the same I think. The change should be called “Add support for choosing the
>> snapshot format for savepoints”, however.
>>> 
>>> For the last two Jira issues mentioned above it should be quite clear
>> what I think. I do, however, see a need for potentially different
>> overlapping checkpoint periods or intervals. Users might want to have their
>> regular checkpoints use an optimised format but they also want to have a
>> “canonical format” checkpoint every no and then so that the lineage of
>> incremental checkpoints does not become too unwieldy.
>>> 
>>> Please let me know what you think!
>>> 
>>> Aljoscha
>>> 
>>>> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tz...@apache.org>
>> wrote:
>>>> 
>>>> I want to quickly bump this discussion to gather more consensus from
>> others
>>>> on the FLIP, and see if we want to aim this for the upcoming 1.9.0
>> release.
>>>> The proposal touches binary formats of savepoints, which is a major
>> part of
>>>> Flink's public user interface, so having explicit approval from other
>>>> members of the community would be nice here.
>>>> 
>>>> Cheers,
>>>> Gordon
>>>> 
>>>> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <
>> tzulitai@apache.org>
>>>> wrote:
>>>> 
>>>>> I also should point out something that I forgot to mention in the
>> initial
>>>>> post:
>>>>> Stefan has helped a lot in understanding the current status of state
>>>>> backends and also participated a lot in design choices for the FLIP :)
>>>>> 
>>>>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <
>> tzulitai@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Hi Flink devs,
>>>>>> 
>>>>>> Congxian, Kostas, and I have recently been discussing to unify the
>> binary
>>>>>> formats for keyed state in savepoints, which would allow for more
>>>>>> operational flexibility such as swapping state backends across
>> restores.
>>>>>> 
>>>>>> As part of this FLIP, another main proposal is to start allowing
>>>>>> checkpoints and savepoints to have different formats. Savepoint
>> formats
>>>>>> should in the future be designed with interoperability in mind and
>>>>>> reasonable snapshot / restore overhead is tolerable, while
>> checkpoints are
>>>>>> allowed to be backend specific for more efficient snapshots and
>> restores.
>>>>>> From recent proposals in the state backends such as disk-spilling heap
>>>>>> backend [1], this flexibility seems to be reasonable.
>>>>>> 
>>>>>> The main user-facing API this would affect is of course, the binary
>>>>>> formats of savepoints, as well as the fact that we will no longer be
>>>>>> guaranteeing functional parity between savepoints and full
>> checkpoints in
>>>>>> the future (w.r.t. operational features related to upgrading
>> applications;
>>>>>> so far they have equal functionality).
>>>>>> 
>>>>>> Therefore, we would like to collect feedback on the proposal before
>>>>>> continuing efforts.
>>>>>> 
>>>>>> This is the FLIP:
>>>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
>>>>>> .
>>>>>> 
>>>>>> I'm happy to discuss details and looking forward to any feedback.
>>>>>> 
>>>>>> Cheers,
>>>>>> Gordon
>>>>>> 
>>>>>> [1]
>>>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
>>>>>> 
>>>>> 
>>> 
>> 
>>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by Yu Li <li...@apache.org>.

Hi Aljoscha and all,

My 2 cents here:

1. Conceptually it worth a second thought about introducing an optimized
snapshot format for now (i.e. use checkpoint format in savepoint), just
like it's not recommended to use snapshot for backup in database (although
practically it could be implemented).

2. Stop-with-checkpoint mechanism is like stopping database instance with a
data flush, thus (IMHO) a different story from the checkpoint/savepoint (db
snapshot/backup) diversity.

3. In the long run we may improve the checkpoint to allow a short enough
interval thus it may become some format of transactional log, then we could
enable checkpoint-based savepoint (like transactional log based backup), so
I agree to still call the new format in FLIP-41 a "Unified Format" although
in the short term it only unifies savepoint.

I've also wrote a document [1] to include more details and please refer to
it if interested. Thanks!

[1] https://docs.google.com/document/d/1uE4R3wNal6e67FkDe0UvcnsIMMDpr35j

Best Regards,
Yu


On Thu, 6 Jun 2019 at 19:42, Aljoscha Krettek <al...@apache.org> wrote:

> Btw, I think this FLIP is a very good effort, we just need to reframe the
> effort a tiny bit. +1
>
> > On 6. Jun 2019, at 13:41, Aljoscha Krettek <al...@apache.org> wrote:
> >
> > Hi,
> >
> > I had a brief discussion with Stephan that helped me sort my thoughts on
> the broader topics of checkpoints, savepoints, binary formats,
> user-triggered checkpoints, and periodic savepoints. I’ll try to summarise
> my stance on this and also comment with the same message on the other
> relevant Jira Issues and threads.
> >
> > For reference, the relevant FLIP and Jira issues are these:
> >
> > -
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:>
> Unified Savepoint Format
> > - https://issues.apache.org/jira/browse/FLINK-12619: Add support for
> stop-with-checkpoint
> > - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered
> checkpoints
> > - https://issues.apache.org/jira/browse/FLINK-4620: Automatically
> creating savepoints
> > - https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic
> savepoints
> >
> > There are roughly two different dimensions in the topic of
> savepoints/checkpoints (I’ll use snapshot as the generic term for both):
> > 1) who controls the snapshot
> > 2) what’s the (binary) format of the snapshot
> >
> > For 1), we currently have checkpoints and savepoints. Checkpoints are
> created by the system for fault tolerance. They are managed by the system
> and the system is free to discard them when it sees fit. Savepoints are in
> the control of the user. A user can choose to create a save point, they can
> delete them, they can restore from them at will. The system will not clean
> up savepoints. We should try and keep this separation and not muddle the
> two concepts.
> >
> > For 2), we currently have various different formats between the
> different state backends and also for the same backend. I.e. RocksDB can do
> full or incremental snapshots, local snapshots, and probably more.
> >
> > FLIP-41 aims at introducing a unified “savepoint" format that is
> interchangeable between the different state backends. In light of the above
> points, we should say that FLIP-41 aims to introduce a canonical format
> that is interchangeable between different backends. This doesn’t mean that
> we should tie this format strictly to savepoints, though. For performance
> reasons, users might choose to do savepoints that use one of the optimised
> formats that the backends offer, for example incremental snapshots. Or they
> might choose to use the canonical format for regular checkpoints so that
> they can always switch between backends using periodically created
> externalised checkpoints.
> >
> > The motivation behind FLINK-12619 is to have a more lightweight
> alternative for stop-with-savepoint, for example using the incremental
> snapshot format that RocksDB has. With the above in mind, however, this
> becomes “Add support for choosing the snapshot format for
> stop-with-savepoint”. It should not be stop-with-checkpoint, because
> checkpoints are something that the system manages and not something that
> the user should trigger. The same is true for FLINK-6755, the motivation is
> the same I think. The change should be called “Add support for choosing the
> snapshot format for savepoints”, however.
> >
> > For the last two Jira issues mentioned above it should be quite clear
> what I think. I do, however, see a need for potentially different
> overlapping checkpoint periods or intervals. Users might want to have their
> regular checkpoints use an optimised format but they also want to have a
> “canonical format” checkpoint every no and then so that the lineage of
> incremental checkpoints does not become too unwieldy.
> >
> > Please let me know what you think!
> >
> > Aljoscha
> >
> >> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
> >>
> >> I want to quickly bump this discussion to gather more consensus from
> others
> >> on the FLIP, and see if we want to aim this for the upcoming 1.9.0
> release.
> >> The proposal touches binary formats of savepoints, which is a major
> part of
> >> Flink's public user interface, so having explicit approval from other
> >> members of the community would be nice here.
> >>
> >> Cheers,
> >> Gordon
> >>
> >> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <
> tzulitai@apache.org>
> >> wrote:
> >>
> >>> I also should point out something that I forgot to mention in the
> initial
> >>> post:
> >>> Stefan has helped a lot in understanding the current status of state
> >>> backends and also participated a lot in design choices for the FLIP :)
> >>>
> >>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <
> tzulitai@apache.org>
> >>> wrote:
> >>>
> >>>> Hi Flink devs,
> >>>>
> >>>> Congxian, Kostas, and I have recently been discussing to unify the
> binary
> >>>> formats for keyed state in savepoints, which would allow for more
> >>>> operational flexibility such as swapping state backends across
> restores.
> >>>>
> >>>> As part of this FLIP, another main proposal is to start allowing
> >>>> checkpoints and savepoints to have different formats. Savepoint
> formats
> >>>> should in the future be designed with interoperability in mind and
> >>>> reasonable snapshot / restore overhead is tolerable, while
> checkpoints are
> >>>> allowed to be backend specific for more efficient snapshots and
> restores.
> >>>> From recent proposals in the state backends such as disk-spilling heap
> >>>> backend [1], this flexibility seems to be reasonable.
> >>>>
> >>>> The main user-facing API this would affect is of course, the binary
> >>>> formats of savepoints, as well as the fact that we will no longer be
> >>>> guaranteeing functional parity between savepoints and full
> checkpoints in
> >>>> the future (w.r.t. operational features related to upgrading
> applications;
> >>>> so far they have equal functionality).
> >>>>
> >>>> Therefore, we would like to collect feedback on the proposal before
> >>>> continuing efforts.
> >>>>
> >>>> This is the FLIP:
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> >>>> .
> >>>>
> >>>> I'm happy to discuss details and looking forward to any feedback.
> >>>>
> >>>> Cheers,
> >>>> Gordon
> >>>>
> >>>> [1]
> >>>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
> >>>>
> >>>
> >
>
>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by Aljoscha Krettek <al...@apache.org>.

Btw, I think this FLIP is a very good effort, we just need to reframe the effort a tiny bit. +1

> On 6. Jun 2019, at 13:41, Aljoscha Krettek <al...@apache.org> wrote:
> 
> Hi,
> 
> I had a brief discussion with Stephan that helped me sort my thoughts on the broader topics of checkpoints, savepoints, binary formats, user-triggered checkpoints, and periodic savepoints. I’ll try to summarise my stance on this and also comment with the same message on the other relevant Jira Issues and threads.
> 
> For reference, the relevant FLIP and Jira issues are these:
> 
> - https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints: <https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:> Unified Savepoint Format
> - https://issues.apache.org/jira/browse/FLINK-12619: Add support for stop-with-checkpoint
> - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered checkpoints
> - https://issues.apache.org/jira/browse/FLINK-4620: Automatically creating savepoints
> - https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic savepoints
> 
> There are roughly two different dimensions in the topic of savepoints/checkpoints (I’ll use snapshot as the generic term for both):
> 1) who controls the snapshot
> 2) what’s the (binary) format of the snapshot
> 
> For 1), we currently have checkpoints and savepoints. Checkpoints are created by the system for fault tolerance. They are managed by the system and the system is free to discard them when it sees fit. Savepoints are in the control of the user. A user can choose to create a save point, they can delete them, they can restore from them at will. The system will not clean up savepoints. We should try and keep this separation and not muddle the two concepts.
> 
> For 2), we currently have various different formats between the different state backends and also for the same backend. I.e. RocksDB can do full or incremental snapshots, local snapshots, and probably more.
> 
> FLIP-41 aims at introducing a unified “savepoint" format that is interchangeable between the different state backends. In light of the above points, we should say that FLIP-41 aims to introduce a canonical format that is interchangeable between different backends. This doesn’t mean that we should tie this format strictly to savepoints, though. For performance reasons, users might choose to do savepoints that use one of the optimised formats that the backends offer, for example incremental snapshots. Or they might choose to use the canonical format for regular checkpoints so that they can always switch between backends using periodically created externalised checkpoints.
> 
> The motivation behind FLINK-12619 is to have a more lightweight alternative for stop-with-savepoint, for example using the incremental snapshot format that RocksDB has. With the above in mind, however, this becomes “Add support for choosing the snapshot format for stop-with-savepoint”. It should not be stop-with-checkpoint, because checkpoints are something that the system manages and not something that the user should trigger. The same is true for FLINK-6755, the motivation is the same I think. The change should be called “Add support for choosing the snapshot format for savepoints”, however.
> 
> For the last two Jira issues mentioned above it should be quite clear what I think. I do, however, see a need for potentially different overlapping checkpoint periods or intervals. Users might want to have their regular checkpoints use an optimised format but they also want to have a “canonical format” checkpoint every no and then so that the lineage of incremental checkpoints does not become too unwieldy.
> 
> Please let me know what you think!
> 
> Aljoscha
> 
>> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tz...@apache.org> wrote:
>> 
>> I want to quickly bump this discussion to gather more consensus from others
>> on the FLIP, and see if we want to aim this for the upcoming 1.9.0 release.
>> The proposal touches binary formats of savepoints, which is a major part of
>> Flink's public user interface, so having explicit approval from other
>> members of the community would be nice here.
>> 
>> Cheers,
>> Gordon
>> 
>> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
>> wrote:
>> 
>>> I also should point out something that I forgot to mention in the initial
>>> post:
>>> Stefan has helped a lot in understanding the current status of state
>>> backends and also participated a lot in design choices for the FLIP :)
>>> 
>>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
>>> wrote:
>>> 
>>>> Hi Flink devs,
>>>> 
>>>> Congxian, Kostas, and I have recently been discussing to unify the binary
>>>> formats for keyed state in savepoints, which would allow for more
>>>> operational flexibility such as swapping state backends across restores.
>>>> 
>>>> As part of this FLIP, another main proposal is to start allowing
>>>> checkpoints and savepoints to have different formats. Savepoint formats
>>>> should in the future be designed with interoperability in mind and
>>>> reasonable snapshot / restore overhead is tolerable, while checkpoints are
>>>> allowed to be backend specific for more efficient snapshots and restores.
>>>> From recent proposals in the state backends such as disk-spilling heap
>>>> backend [1], this flexibility seems to be reasonable.
>>>> 
>>>> The main user-facing API this would affect is of course, the binary
>>>> formats of savepoints, as well as the fact that we will no longer be
>>>> guaranteeing functional parity between savepoints and full checkpoints in
>>>> the future (w.r.t. operational features related to upgrading applications;
>>>> so far they have equal functionality).
>>>> 
>>>> Therefore, we would like to collect feedback on the proposal before
>>>> continuing efforts.
>>>> 
>>>> This is the FLIP:
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
>>>> .
>>>> 
>>>> I'm happy to discuss details and looking forward to any feedback.
>>>> 
>>>> Cheers,
>>>> Gordon
>>>> 
>>>> [1]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
>>>> 
>>> 
>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,

I had a brief discussion with Stephan that helped me sort my thoughts on the broader topics of checkpoints, savepoints, binary formats, user-triggered checkpoints, and periodic savepoints. I’ll try to summarise my stance on this and also comment with the same message on the other relevant Jira Issues and threads.

For reference, the relevant FLIP and Jira issues are these:

- https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints: <https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:> Unified Savepoint Format
- https://issues.apache.org/jira/browse/FLINK-12619: Add support for stop-with-checkpoint
- https://issues.apache.org/jira/browse/FLINK-6755: User-triggered checkpoints
- https://issues.apache.org/jira/browse/FLINK-4620: Automatically creating savepoints
- https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic savepoints

There are roughly two different dimensions in the topic of savepoints/checkpoints (I’ll use snapshot as the generic term for both):
 1) who controls the snapshot
 2) what’s the (binary) format of the snapshot

For 1), we currently have checkpoints and savepoints. Checkpoints are created by the system for fault tolerance. They are managed by the system and the system is free to discard them when it sees fit. Savepoints are in the control of the user. A user can choose to create a save point, they can delete them, they can restore from them at will. The system will not clean up savepoints. We should try and keep this separation and not muddle the two concepts.

For 2), we currently have various different formats between the different state backends and also for the same backend. I.e. RocksDB can do full or incremental snapshots, local snapshots, and probably more.

FLIP-41 aims at introducing a unified “savepoint" format that is interchangeable between the different state backends. In light of the above points, we should say that FLIP-41 aims to introduce a canonical format that is interchangeable between different backends. This doesn’t mean that we should tie this format strictly to savepoints, though. For performance reasons, users might choose to do savepoints that use one of the optimised formats that the backends offer, for example incremental snapshots. Or they might choose to use the canonical format for regular checkpoints so that they can always switch between backends using periodically created externalised checkpoints.

The motivation behind FLINK-12619 is to have a more lightweight alternative for stop-with-savepoint, for example using the incremental snapshot format that RocksDB has. With the above in mind, however, this becomes “Add support for choosing the snapshot format for stop-with-savepoint”. It should not be stop-with-checkpoint, because checkpoints are something that the system manages and not something that the user should trigger. The same is true for FLINK-6755, the motivation is the same I think. The change should be called “Add support for choosing the snapshot format for savepoints”, however.

For the last two Jira issues mentioned above it should be quite clear what I think. I do, however, see a need for potentially different overlapping checkpoint periods or intervals. Users might want to have their regular checkpoints use an optimised format but they also want to have a “canonical format” checkpoint every no and then so that the lineage of incremental checkpoints does not become too unwieldy.

Please let me know what you think!

Aljoscha

> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tz...@apache.org> wrote:
> 
> I want to quickly bump this discussion to gather more consensus from others
> on the FLIP, and see if we want to aim this for the upcoming 1.9.0 release.
> The proposal touches binary formats of savepoints, which is a major part of
> Flink's public user interface, so having explicit approval from other
> members of the community would be nice here.
> 
> Cheers,
> Gordon
> 
> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
> 
>> I also should point out something that I forgot to mention in the initial
>> post:
>> Stefan has helped a lot in understanding the current status of state
>> backends and also participated a lot in design choices for the FLIP :)
>> 
>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
>> wrote:
>> 
>>> Hi Flink devs,
>>> 
>>> Congxian, Kostas, and I have recently been discussing to unify the binary
>>> formats for keyed state in savepoints, which would allow for more
>>> operational flexibility such as swapping state backends across restores.
>>> 
>>> As part of this FLIP, another main proposal is to start allowing
>>> checkpoints and savepoints to have different formats. Savepoint formats
>>> should in the future be designed with interoperability in mind and
>>> reasonable snapshot / restore overhead is tolerable, while checkpoints are
>>> allowed to be backend specific for more efficient snapshots and restores.
>>> From recent proposals in the state backends such as disk-spilling heap
>>> backend [1], this flexibility seems to be reasonable.
>>> 
>>> The main user-facing API this would affect is of course, the binary
>>> formats of savepoints, as well as the fact that we will no longer be
>>> guaranteeing functional parity between savepoints and full checkpoints in
>>> the future (w.r.t. operational features related to upgrading applications;
>>> so far they have equal functionality).
>>> 
>>> Therefore, we would like to collect feedback on the proposal before
>>> continuing efforts.
>>> 
>>> This is the FLIP:
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
>>> .
>>> 
>>> I'm happy to discuss details and looking forward to any feedback.
>>> 
>>> Cheers,
>>> Gordon
>>> 
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
>>> 
>>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.

I want to quickly bump this discussion to gather more consensus from others
on the FLIP, and see if we want to aim this for the upcoming 1.9.0 release.
The proposal touches binary formats of savepoints, which is a major part of
Flink's public user interface, so having explicit approval from other
members of the community would be nice here.

Cheers,
Gordon

On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> I also should point out something that I forgot to mention in the initial
> post:
> Stefan has helped a lot in understanding the current status of state
> backends and also participated a lot in design choices for the FLIP :)
>
> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
>
>> Hi Flink devs,
>>
>> Congxian, Kostas, and I have recently been discussing to unify the binary
>> formats for keyed state in savepoints, which would allow for more
>> operational flexibility such as swapping state backends across restores.
>>
>> As part of this FLIP, another main proposal is to start allowing
>> checkpoints and savepoints to have different formats. Savepoint formats
>> should in the future be designed with interoperability in mind and
>> reasonable snapshot / restore overhead is tolerable, while checkpoints are
>> allowed to be backend specific for more efficient snapshots and restores.
>> From recent proposals in the state backends such as disk-spilling heap
>> backend [1], this flexibility seems to be reasonable.
>>
>> The main user-facing API this would affect is of course, the binary
>> formats of savepoints, as well as the fact that we will no longer be
>> guaranteeing functional parity between savepoints and full checkpoints in
>> the future (w.r.t. operational features related to upgrading applications;
>> so far they have equal functionality).
>>
>> Therefore, we would like to collect feedback on the proposal before
>> continuing efforts.
>>
>> This is the FLIP:
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
>> .
>>
>> I'm happy to discuss details and looking forward to any feedback.
>>
>> Cheers,
>> Gordon
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
>>
>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.

I also should point out something that I forgot to mention in the initial
post:
Stefan has helped a lot in understanding the current status of state
backends and also participated a lot in design choices for the FLIP :)

On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> Hi Flink devs,
>
> Congxian, Kostas, and I have recently been discussing to unify the binary
> formats for keyed state in savepoints, which would allow for more
> operational flexibility such as swapping state backends across restores.
>
> As part of this FLIP, another main proposal is to start allowing
> checkpoints and savepoints to have different formats. Savepoint formats
> should in the future be designed with interoperability in mind and
> reasonable snapshot / restore overhead is tolerable, while checkpoints are
> allowed to be backend specific for more efficient snapshots and restores.
> From recent proposals in the state backends such as disk-spilling heap
> backend [1], this flexibility seems to be reasonable.
>
> The main user-facing API this would affect is of course, the binary
> formats of savepoints, as well as the fact that we will no longer be
> guaranteeing functional parity between savepoints and full checkpoints in
> the future (w.r.t. operational features related to upgrading applications;
> so far they have equal functionality).
>
> Therefore, we would like to collect feedback on the proposal before
> continuing efforts.
>
> This is the FLIP:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> .
>
> I'm happy to discuss details and looking forward to any feedback.
>
> Cheers,
> Gordon
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Posted by Congxian Qiu <qc...@gmail.com>.

Hi Gordon

Thanks for the nice work. Big +1 for this.
With the proposal, we’ll have unified binary format for savepoints, so users can switch backend using savepoint if they needed. And Users will be able to seamlessly migrate from previous savepoints of older Flink versions as the wiki said.

Best Congxian
On May 29, 2019, 05:02 +0800, Tzu-Li (Gordon) Tai <tz...@apache.org>, wrote:
> Hi Flink devs,
>
> Congxian, Kostas, and I have recently been discussing to unify the binary formats for keyed state in savepoints, which would allow for more operational flexibility such as swapping state backends across restores.
>
> As part of this FLIP, another main proposal is to start allowing checkpoints and savepoints to have different formats. Savepoint formats should in the future be designed with interoperability in mind and reasonable snapshot / restore overhead is tolerable, while checkpoints are allowed to be backend specific for more efficient snapshots and restores. From recent proposals in the state backends such as disk-spilling heap backend [1], this flexibility seems to be reasonable.
>
> The main user-facing API this would affect is of course, the binary formats of savepoints, as well as the fact that we will no longer be guaranteeing functional parity between savepoints and full checkpoints in the future (w.r.t. operational features related to upgrading applications; so far they have equal functionality).
>
> Therefore, we would like to collect feedback on the proposal before continuing efforts.
>
> This is the FLIP: https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints.
>
> I'm happy to discuss details and looking forward to any feedback.
>
> Cheers,
> Gordon
>
> [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html