You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Yu Yang <yu...@gmail.com> on 2020/02/03 08:36:36 UTC

check-pointing does not follow interval setting on some clusters

Hi all,

We use flink 1.9.1 for development, and observed irregular check-pointing
interval in one of our clusters. That is unexpected, given that we have had
"env.enableCheckpointing(Time.minutes(5).toMilliseconds)" in our code to
set the checkpointing interval to 5 minutes. It is also intriguing that
this issue appears on one cluster, and does not appear on the other cluster
that we have. Any insights on this?

The screen shot below shows the irregular check-pointing interval,
with "env.enableCheckpointing(Time.minutes(5).toMilliseconds)"
in our code.  We can see that the check-pointing intervals are > 5 minutes
[image: Screen Shot 2020-02-03 at 12.31.21 AM.png]

The screen shot below shows that check-pointing triggers at 5-minutes
cadence in another cluster. The job's check-pointing interval is at
5-minutes.

[image: Screen Shot 2020-02-03 at 12.31.01 AM.png]

Thanks!

Regards,
-Yu

Re: check-pointing does not follow interval setting on some clusters

Posted by Congxian Qiu <qc...@gmail.com>.
Hi,

This is indeed strange.
From the screenshot, the checkpoints complete very soon. Could you please
share the checkpoint configure and jobmanager log
Best,
Congxian


Fabian Hueske <fh...@gmail.com> 于2020年2月4日周二 下午6:48写道:

> Hi Yu,
>
> This looks indeed strange.
> There is a configuration that limits the number of concurrent checkpoints
> but given the end-to-end duration this cannot be the reason.
>
> Is the JobManager in the first setup maybe overloaded?
> Can you post your complete checkpointing configuration?
>
> Best,
> Fabian
>
> Am Mo., 3. Feb. 2020 um 09:37 Uhr schrieb Yu Yang <yu...@gmail.com>:
>
>> Hi all,
>>
>> We use flink 1.9.1 for development, and observed irregular check-pointing
>> interval in one of our clusters. That is unexpected, given that we have had
>> "env.enableCheckpointing(Time.minutes(5).toMilliseconds)" in our code to
>> set the checkpointing interval to 5 minutes. It is also intriguing that
>> this issue appears on one cluster, and does not appear on the other cluster
>> that we have. Any insights on this?
>>
>> The screen shot below shows the irregular check-pointing interval, with "env.enableCheckpointing(Time.minutes(5).toMilliseconds)"
>> in our code.  We can see that the check-pointing intervals are > 5 minutes
>> [image: Screen Shot 2020-02-03 at 12.31.21 AM.png]
>>
>> The screen shot below shows that check-pointing triggers at 5-minutes
>> cadence in another cluster. The job's check-pointing interval is at
>> 5-minutes.
>>
>> [image: Screen Shot 2020-02-03 at 12.31.01 AM.png]
>>
>> Thanks!
>>
>> Regards,
>> -Yu
>>
>>

Re: check-pointing does not follow interval setting on some clusters

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Yu,

This looks indeed strange.
There is a configuration that limits the number of concurrent checkpoints
but given the end-to-end duration this cannot be the reason.

Is the JobManager in the first setup maybe overloaded?
Can you post your complete checkpointing configuration?

Best,
Fabian

Am Mo., 3. Feb. 2020 um 09:37 Uhr schrieb Yu Yang <yu...@gmail.com>:

> Hi all,
>
> We use flink 1.9.1 for development, and observed irregular check-pointing
> interval in one of our clusters. That is unexpected, given that we have had
> "env.enableCheckpointing(Time.minutes(5).toMilliseconds)" in our code to
> set the checkpointing interval to 5 minutes. It is also intriguing that
> this issue appears on one cluster, and does not appear on the other cluster
> that we have. Any insights on this?
>
> The screen shot below shows the irregular check-pointing interval, with "env.enableCheckpointing(Time.minutes(5).toMilliseconds)"
> in our code.  We can see that the check-pointing intervals are > 5 minutes
> [image: Screen Shot 2020-02-03 at 12.31.21 AM.png]
>
> The screen shot below shows that check-pointing triggers at 5-minutes
> cadence in another cluster. The job's check-pointing interval is at
> 5-minutes.
>
> [image: Screen Shot 2020-02-03 at 12.31.01 AM.png]
>
> Thanks!
>
> Regards,
> -Yu
>
>