You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Fritz Budiyanto <fb...@icloud.com> on 2017/05/23 05:14:55 UTC

Need help debugging back pressure job

Hi All,

Any tips on debugging back pressure ? I have a workload where it get stuck after it ran for a couple of hours.
I assume the cause of the back pressure is the block next to the one showing as having the back pressure, is this right ?

Any idea on how to get the backtrace ? (I’m using standalone combined jm/tm with parallelism of 1, and the suspected block is doing ProcessFunction with event timers)

—
Fritz

Re: Need help debugging back pressure job

Posted by Till Rohrmann <tr...@apache.org>.

Hi Fritz,

you're right that back pressure should propagate upstream to the sources.
Thus, the cause of the back pressure should be the operator following the
last operator with back pressure.

In order to debug it you could take a look at the stack trace of the TM.
Simply go to the machine on which the TM runs, find out the process id via
jps and then call jstack with the respective process id.

Alternatively, you can try to debug the cluster remotely [1].

[1]
https://cwiki.apache.org/confluence/display/FLINK/Remote+Debugging+of+Flink+Clusters

Cheers,
Till

On Tue, May 23, 2017 at 7:14 AM, Fritz Budiyanto <fb...@icloud.com>
wrote:

> Hi All,
>
> Any tips on debugging back pressure ? I have a workload where it get stuck
> after it ran for a couple of hours.
> I assume the cause of the back pressure is the block next to the one
> showing as having the back pressure, is this right ?
>
> Any idea on how to get the backtrace ? (I’m using standalone combined
> jm/tm with parallelism of 1, and the suspected block is doing
> ProcessFunction with event timers)
>
> —
> Fritz
>
>
>