You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Noah <no...@enabled.com> on 2020/11/02 22:09:30 UTC

Good tutorial troubleshoot and reading logs

Hi there,

I see there is a good page tutorial out there that explains flink's 
logging.  Specifically I am seeing hung jobs and would like to 
understand more about what is causing the jobs to hang.  Also more 
details about the checkpoint logging.

Cheers

Re: Good tutorial troubleshoot and reading logs

Posted by Robert Metzger <rm...@apache.org>.
Hi Noah,

sadly there's no generic guide on how to approach Flink logs.
What exactly do you mean by "the job hangs"?
Did you verify via the metrics that it is not making any progress anymore
at all? If so, are all operators affected, or just some?

If your Flink cluster really is stuck, and you are certain that the sources
are receiving data, then I'd suggest to to a ThreadDump of some
TaskManagers to see where they are stuck.

Best,
Robert



On Mon, Nov 2, 2020 at 11:09 PM Noah <no...@enabled.com> wrote:

> Hi there,
>
> I see there is a good page tutorial out there that explains flink's
> logging.  Specifically I am seeing hung jobs and would like to
> understand more about what is causing the jobs to hang.  Also more
> details about the checkpoint logging.
>
> Cheers
>