You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Congxian Qiu(klion26) (Jira)" <ji...@apache.org> on 2020/08/04 02:11:00 UTC
[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum
pause duration between checkpoints
[ https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170497#comment-17170497 ]
Congxian Qiu(klion26) commented on FLINK-18675:
-----------------------------------------------
[~raviratnakar] I think the problem here is that {{CheckpointRequestDecider}} has a wrong value of {{lastCheckpointCompletionRelativeTime}} when checking whether the checkpoint request is too early.
1. We retrieve the value of {{lastCheckpointCompletionRelativeTime}} when calling {{CheckpointRequestDecider#chooseRequestToExecute}} in {{CheckpointCoordinator#triggerCheckpoint}}
2. A pending checkpoint complete, and update the valuable {{pendingCheckpoints}} and {{lastCheckpointCompletionRelativeTime}}
3. In {{CheckpointRequestDecider#chooseRequestToExecute}} we use the previous {{lastCheckpointCompletionRelativeTime}} to check whether current checkpoint request is too early
I think we can get the value of {{lastCheckpointCompletionRelativeTime}} in {{CheckpointRequestDecider#chooseRequestToExecute}} here to solve the problem here.
> Checkpoint not maintaining minimum pause duration between checkpoints
> ---------------------------------------------------------------------
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.0
> Environment: !image.png!
> Reporter: Ravi Bhushan Ratnakar
> Priority: Critical
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>
> Other configs
> Time Characteristics - Processing Time
>
> I am observing an usual behaviour. *When a checkpoint completes successfully* *and if it's end to end duration is almost equal or greater than Minimum pause duration then the next checkpoint gets triggered immediately without maintaining the Minimum pause duration*. Kindly notice this behaviour from checkpoint id 194 onward in the attached screenshot
--
This message was sent by Atlassian Jira
(v8.3.4#803005)