You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Dmitry Minaev <dm...@five9.com> on 2018/05/15 01:28:26 UTC

minPauseBetweenCheckpoints for failed checkpoints

Hello!

I have a question regarding checkpointing parameter minPauseBetweenCheckpoints that is the minimal pause between checkpointing attempts.
I’ve noticed the following (strange) behavior in Flink.

I set the following parameters for a sample Flink job:

Checkpointing Mode = Exactly Once
Interval = 10s
Timeout = 30s
Minimum Pause Between Checkpoints = 15s
Maximum Concurrent Checkpoints = 1
Persist Checkpoints Externally = Disabled

Then I started the job that intentionally makes some of the checkpoints fail by timeout.
I noticed that this parameter minPauseBetweenCheckpoints is taken into consideration by Flink only when checkpoint doesn’t fail by timeout:

My first checkpoint triggered at 18:03:11 and failed within expected 30 seconds. But immediately after that, a new checkpoint was triggered at 18:03:41. It doesn’t make sense to me since I’m using a minPauseBetweenCheckpoints = 15 seconds. I would expect Flink to wait for 15 seconds before starting a new checkpoint.

However, it seems like this minPauseBetweenCheckpoints works as expected for checkpoints that completed successfully within configured interval. For example, my 4th checkpoint started at 18:04:41 and completed at 18:04:56. And the next checkpoint waited another 15 seconds to start at 18:05:11.

Please see attached screenshots for configuration and checkpoint history.

My question is – is it an expected behavior or a bug? Is there a way to have a pause between checkpoints even if checkpoint fails by timeout?

Thank you!

--
Kind regards,
Dmitry Minaev

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.

Re: minPauseBetweenCheckpoints for failed checkpoints

Posted by Timo Walther <tw...@apache.org>.
Hi Dmitry,

I think the minPauseBetweenCheckpoints is intended for pausing between 
successful checkpoints. Usually a user wants to get a successful 
checkpoint as quickly as possible again. Stefan (in CC) might know more 
about.

Regards,
Timo

Am 15.05.18 um 03:28 schrieb Dmitry Minaev:
> Hello!
>
> I have a question regarding checkpointing parameter 
> minPauseBetweenCheckpoints that is the minimal pause between 
> checkpointing attempts.
> I’ve noticed the following (strange) behavior in Flink.
>
> I set the following parameters for a sample Flink job:
>
> Checkpointing Mode = Exactly Once
> Interval = 10s
> Timeout = 30s
> Minimum Pause Between Checkpoints = 15s
> Maximum Concurrent Checkpoints = 1
> Persist Checkpoints Externally = Disabled
>
> Then I started the job that intentionally makes some of the 
> checkpoints fail by timeout.
> I noticed that this parameter minPauseBetweenCheckpoints is taken into 
> consideration by Flink only when checkpoint doesn’t fail by timeout:
>
> My first checkpoint triggered at 18:03:11 and failed within expected 
> 30 seconds. But immediately after that, a new checkpoint was triggered 
> at 18:03:41. It doesn’t make sense to me since I’m using a 
> minPauseBetweenCheckpoints = 15 seconds. I would expect Flink to wait 
> for 15 seconds before starting a new checkpoint.
>
> However, it seems like this minPauseBetweenCheckpoints works as 
> expected for checkpoints that completed successfully within configured 
> interval. For example, my 4th checkpoint started at 18:04:41 and 
> completed at 18:04:56. And the next checkpoint waited another 15 
> seconds to start at 18:05:11.
>
> Please see attached screenshots for configuration and checkpoint history.
>
> My question is – is it an expected behavior or a bug? Is there a way 
> to have a pause between checkpoints even if checkpoint fails by timeout?
>
> Thank you!
>
> --
> Kind regards,
> Dmitry Minaev
>
> ------------------------------------------------------------------------
>
> CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain 
> confidential information of Five9 and/or its affiliated entities. 
> Access by the intended recipient only is authorized. Any liability 
> arising from any party acting, or refraining from acting, on any 
> information contained in this e-mail is hereby excluded. If you are 
> not the intended recipient, please notify the sender immediately, 
> destroy the original transmission and its attachments and do not 
> disclose the contents to any other person, use it for any purpose, or 
> store or copy the information in any medium. Copyright in this e-mail 
> and any attachments belongs to Five9 and/or its affiliated entities.