You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2019/04/09 14:04:00 UTC

[jira] [Commented] (FLINK-11159) Allow configuration whether to fall back to savepoints for restore

    [ https://issues.apache.org/jira/browse/FLINK-11159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813449#comment-16813449 ] 

Till Rohrmann commented on FLINK-11159:
---------------------------------------

I'm not sure whether taking an additional checkpoint while taking a savepoint is the best solution here. Doing that would double the I/O operations which could affect the cluster. I would be more in favor of a simpler solution where it is possible to select whether savepoints should take part in a recovery operation or not. So effectively, it would mean that we have an option telling us whether savepoints are added to the {{CompletedCheckpointStore}} or not.

> Allow configuration whether to fall back to savepoints for restore
> ------------------------------------------------------------------
>
>                 Key: FLINK-11159
>                 URL: https://issues.apache.org/jira/browse/FLINK-11159
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.5.5, 1.6.2, 1.7.0
>            Reporter: Nico Kruber
>            Assignee: vinoyang
>            Priority: Major
>
> Ever since FLINK-3397, upon failure, Flink would restart from the latest checkpoint/savepoint which ever is more recent. With the introduction of local recovery and the knowledge that a RocksDB checkpoint restore would just copy the files, it may be time to re-consider / making this configurable:
> In certain situations, it may be faster to restore from the latest checkpoint only (even if there is a more recent savepoint) and reprocess the data between. On the downside, though, that may not be correct because that might break side effects if the savepoint was the latest one, e.g. consider this chain: {{chk1 -> chk2 -> sp … restore chk2 -> …}}. Then all side effects between {{chk2 -> sp}} would be reproduced.
> Making this configurable will allow the user to set whatever he needs / can to get the lowest recovery time in Flink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)