You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Paul Lam <pa...@gmail.com> on 2018/08/01 12:07:33 UTC

Default Restart Strategy Not Work With Checkpointing

Hi, 
I’m running a Flink 1.5.0 standalone cluster on which `restart-strategy` was set to `failure-rate`, and the web frontend shows that the JobManager and the TaskManagers are following this configuration, but streaming jobs with checkpointing enabled are still using the fixed delay strategy with no respect to the default restart strategy (no explicit overwrites in the user code). 

I read the source code and found a possible explanation for this (but not very sure): the client generates JobGraph without respect to flink-conf.yaml and sets the restart strategy to fixed delay if the checkpointing is on, and the server side (JobMaster) follows the flink-conf.yaml's default restart strategy configuration, but will gave the one in JobGraph a higher priority, so it’s always overwritten by the fixed delay strategy. 

If I understand correctly, this might be a bug. Is there anything suggestion to avoid it for now?

Best regard,
Paul Lam

Re: Default Restart Strategy Not Work With Checkpointing

Posted by Chesnay Schepler <ch...@apache.org>.
Please see FLINK-9143 <https://issues.apache.org/jira/browse/FLINK-9143>.

On 01.08.2018 14:07, Paul Lam wrote:
> Hi,
> I’m running a Flink 1.5.0 standalone cluster on which `restart-strategy` was set to `failure-rate`, and the web frontend shows that the JobManager and the TaskManagers are following this configuration, but streaming jobs with checkpointing enabled are still using the fixed delay strategy with no respect to the default restart strategy (no explicit overwrites in the user code).
>
> I read the source code and found a possible explanation for this (but not very sure): the client generates JobGraph without respect to flink-conf.yaml and sets the restart strategy to fixed delay if the checkpointing is on, and the server side (JobMaster) follows the flink-conf.yaml's default restart strategy configuration, but will gave the one in JobGraph a higher priority, so it’s always overwritten by the fixed delay strategy.
>
> If I understand correctly, this might be a bug. Is there anything suggestion to avoid it for now?
>
> Best regard,
> Paul Lam