You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alexander Smirnov <al...@gmail.com> on 2018/08/21 07:10:59 UTC

How do I investigate checkpoints failures

Hello,

I have a cluster with multiple jobs running on it. One of the jobs has
checkpoints constantly failing
[image: image.png]

How do I investigate it?

Thank you,
Alex

Re: How do I investigate checkpoints failures

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi Alex,

First thing to do in such cases is to analyze logs for jobmanager and
taskmanagers and look for exceptions there.

The cause for latest failed checkpoint says the checkpoint expired. You
can try increasing the checkpoint timeout (you can check more
configuration options for checkpoints here [1]).

Best,

Dawid


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing


On 21/08/18 09:10, Alexander Smirnov wrote:
> Hello,
>
> I have a cluster with multiple jobs running on it. One of the jobs has
> checkpoints constantly failing
> image.png
>
> How do I investigate it? 
>
> Thank you,
> Alex