You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Piotr Nowojski (Jira)" <ji...@apache.org> on 2020/05/12 12:20:00 UTC

[jira] [Assigned] (FLINK-17350) StreamTask should always fail immediately on failures in synchronous part of a checkpoint

     [ https://issues.apache.org/jira/browse/FLINK-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Piotr Nowojski reassigned FLINK-17350:
--------------------------------------

    Assignee: Piotr Nowojski

> StreamTask should always fail immediately on failures in synchronous part of a checkpoint
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-17350
>                 URL: https://issues.apache.org/jira/browse/FLINK-17350
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / Task
>    Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.2, 1.10.0
>            Reporter: Piotr Nowojski
>            Assignee: Piotr Nowojski
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> This bugs also Affects 1.5.x branch.
> As described in point 1 here: https://issues.apache.org/jira/browse/FLINK-17327?focusedCommentId=17090576&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17090576
> {{setTolerableCheckpointFailureNumber(...)}} and its deprecated {{setFailTaskOnCheckpointError(...)}} predecessor are implemented incorrectly. Since Flink 1.5 (https://issues.apache.org/jira/browse/FLINK-4809) they can lead to operators (and especially sinks with an external state) end up in an inconsistent state. That's also true even if they are not used, because of another issue: FLINK-17351
> If we mix this with intermittent external system failure. Sink reports an exception, transaction was lost/aborted, Sink is in failed state, but if there will be a happy coincidence that it manages to accept further records, this exception can be lost and all of the records in those failed checkpoints will be lost forever as well.
> For details please check FLINK-17327.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)