You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dawid Wysakowicz (Jira)" <ji...@apache.org> on 2022/03/24 08:39:00 UTC

[jira] [Commented] (FLINK-26783) Restore from a stop-with-savepoint if failed during committing

    [ https://issues.apache.org/jira/browse/FLINK-26783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511694#comment-17511694 ] 

Dawid Wysakowicz commented on FLINK-26783:
------------------------------------------

After an offline discussion we said that simply adding the savepoint to the {{CompletedCheckpointStore}} poses a problem for the savepoint ownership, as after a restart the savepoint will remain in the `CompletedCheckpointStore` and Flink will depend on its existence.

Therefore we propose a different approach to solve the issue that if we fallback to a checkpoint we might end up with duplicated records. We suggest to already not trigger a global failover in case the savepoint completed successfully, but the job failed during committing side effects. In that case we will finish the completable future with an exception that explains that the savepoint is consistent, but it might have uncommitted side effects and ask users to manually restart a job from that savepoint if they want to commit side effects.

> Restore from a stop-with-savepoint if failed during committing
> --------------------------------------------------------------
>
>                 Key: FLINK-26783
>                 URL: https://issues.apache.org/jira/browse/FLINK-26783
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.15.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> We decided stop-with-savepoint should commit side-effects and thus we should fail over to those savepoints if a failure happens when committing side effects.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)