You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Congxian Qiu(klion26) (Jira)" <ji...@apache.org> on 2020/08/04 02:11:00 UTC

[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

    [ https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170497#comment-17170497 ] 

Congxian Qiu(klion26) commented on FLINK-18675:
-----------------------------------------------

[~raviratnakar] I think the problem here is that {{CheckpointRequestDecider}} has a wrong value of {{lastCheckpointCompletionRelativeTime}} when checking whether the checkpoint request is too early.

1. We retrieve the value of {{lastCheckpointCompletionRelativeTime}} when calling {{CheckpointRequestDecider#chooseRequestToExecute}} in {{CheckpointCoordinator#triggerCheckpoint}}
2. A pending checkpoint complete, and update the valuable {{pendingCheckpoints}} and {{lastCheckpointCompletionRelativeTime}}
3. In {{CheckpointRequestDecider#chooseRequestToExecute}} we use the previous {{lastCheckpointCompletionRelativeTime}} to check whether current checkpoint request is too early

I think we can get the value of {{lastCheckpointCompletionRelativeTime}} in {{CheckpointRequestDecider#chooseRequestToExecute}} here to solve the problem here.

> Checkpoint not maintaining minimum pause duration between checkpoints
> ---------------------------------------------------------------------
>
>                 Key: FLINK-18675
>                 URL: https://issues.apache.org/jira/browse/FLINK-18675
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.0
>         Environment: !image.png!
>            Reporter: Ravi Bhushan Ratnakar
>            Priority: Critical
>         Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* *and if it's end to end duration is almost equal or greater than Minimum pause duration then the next checkpoint gets triggered immediately without maintaining the Minimum pause duration*. Kindly notice this behaviour from checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)