You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "yelun (JIRA)" <ji...@apache.org> on 2019/07/11 07:06:00 UTC
[jira] [Commented] (FLINK-12858) Potentially not properly working
Flink job in case of stop-with-savepoint failure
[ https://issues.apache.org/jira/browse/FLINK-12858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882692#comment-16882692 ]
yelun commented on FLINK-12858:
-------------------------------
Hi,[~1u0],can you give more description about this issue,such as "non-source tasks"?thanks.
> Potentially not properly working Flink job in case of stop-with-savepoint failure
> ---------------------------------------------------------------------------------
>
> Key: FLINK-12858
> URL: https://issues.apache.org/jira/browse/FLINK-12858
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Reporter: Alex
> Priority: Minor
>
> Current implementation of stop-with-savepoint (FLINK-11458) would lock the thread (on {{syncSavepointLatch}}) that carries {{StreamTask.performCheckpoint()}}. For non-source tasks, this thread is implied to be the task's main thread (stop-with-savepoint deliberately stops any activity in the task's main thread).
> Unlocking happens either when the task is cancelled or when the corresponding checkpoint is acknowledged.
> It's possible, that other downstream tasks of the same Flink job "soft" fail the checkpoint/savepoint due to various reasons (for example, due to max buffered bytes {{BarrierBuffer.checkSizeLimit()}}. In such case, the checkpoint abortion would be notified to JM . But it looks like, the checkpoint coordinator would handle such abortion as usual and assume that the Flink job continues running.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)