You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/17 17:02:00 UTC
[jira] [Commented] (FLINK-4809) Operators should tolerate
checkpoint failures
[ https://issues.apache.org/jira/browse/FLINK-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441180#comment-16441180 ]
ASF GitHub Bot commented on FLINK-4809:
---------------------------------------
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/4883#discussion_r182155763
--- Diff: docs/dev/stream/state/checkpointing.md ---
@@ -118,6 +120,9 @@ env.getCheckpointConfig.setMinPauseBetweenCheckpoints(500)
// checkpoints have to complete within one minute, or are discarded
env.getCheckpointConfig.setCheckpointTimeout(60000)
+// prevent the tasks from failing if an error happens in their checkpointing, the checkpoint will just be declined.
+env.getCheckpointConfig.setFailTasksOnCheckpointingErrors(false)
--- End diff --
This line is missing from the Java tab.
> Operators should tolerate checkpoint failures
> ---------------------------------------------
>
> Key: FLINK-4809
> URL: https://issues.apache.org/jira/browse/FLINK-4809
> Project: Flink
> Issue Type: Sub-task
> Components: State Backends, Checkpointing
> Reporter: Stephan Ewen
> Assignee: Stefan Richter
> Priority: Major
> Fix For: 1.5.0
>
>
> Operators should try/catch exceptions in the synchronous and asynchronous part of the checkpoint and send a {{DeclineCheckpoint}} message as a result.
> The decline message should have the failure cause attached to it.
> The checkpoint barrier should be sent anyways as a first step before attempting to make a state checkpoint, to make sure that downstream operators do not block in alignment.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)