You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Gyula Fóra <gy...@apache.org> on 2015/12/07 09:45:42 UTC

Monitoring backpressure

Hey guys,

Is there any way to monitor the backpressure in the Flink job? I find it
hard to debug slow operators because of the backpressure mechanism so it
would be good to get some info out of the network layer on what exactly
caused the backpressure.

For example:

task1 -> task2 -> task3 -> task4

I want to figure out whether task 2 or task 3 is slow.

Any ideas?

Thanks,
Gyula

Re: Monitoring backpressure

Posted by Gyula Fóra <gy...@gmail.com>.
Thanks Stephan,

I will try with the profiler for now.

Gyula

Stephan Ewen <se...@apache.org> ezt írta (időpont: 2015. dec. 7., H, 10:51):

> I discussed about this quite a bit with other people.
>
> It is not totally straightforward. One could try and measure exhaustion of
> the output buffer pools, but that fluctuates a lot - it would need some
> work to get a stable metric from that...
>
> If you have a profiler that you can attach to the processes, you could
> check whether a lot of time is spent within the "requestBufferBlocking()"
> method of the buffer pool...
>
> Stephan
>
>
> On Mon, Dec 7, 2015 at 9:45 AM, Gyula Fóra <gy...@apache.org> wrote:
>
> > Hey guys,
> >
> > Is there any way to monitor the backpressure in the Flink job? I find it
> > hard to debug slow operators because of the backpressure mechanism so it
> > would be good to get some info out of the network layer on what exactly
> > caused the backpressure.
> >
> > For example:
> >
> > task1 -> task2 -> task3 -> task4
> >
> > I want to figure out whether task 2 or task 3 is slow.
> >
> > Any ideas?
> >
> > Thanks,
> > Gyula
> >
>

Re: Monitoring backpressure

Posted by Chesnay Schepler <ch...@apache.org>.
Hello Alan,

the backpressure information can be retrieved from the web ui's REST API 
<https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/rest_api.html>.

|/jobs/<jobid>/vertices/<vertexid>/backpressure

This will give you a JSON object that looks something like this:

{
|

    |status:"ok"|"deprecated"|
    |backpressure-level: "ok"|"low"|"high"|
    |end-timestamp:<timestamp>|
    |subtasks:[|

        |{|

            |subtask: 0|
            |backpressure-level: "ok"|"low"|"high"|
            |ratio: <ratio>|

        |},|

    |}|
    ||

|}

For more details you can check out the JobVertexBackPressureHandler class.

Regards,
Chesnay
|
On 07.12.2016 00:58, alan@opsclarity.com wrote:
> Hey Stephan,
>
> My company (OpsClarity) is building monitoring integration for flink, and
> being that backpressure is one of the most critical concepts in a streaming
> system, we need a way to expose backpressure state to a monitoring system
> (such as ours).  I see that the flink-ui has a way to sample the pipeline
> and mark a stage as high|med|ok wrt backpressure.  I'd love to be able to
> encode that as a metric, perhaps as simple as 2|1|0, so that we can plot
> backpressure state over time per stage.  This would also allow users to set
> alerts on backpressure state.
>
> How might we get access to this backpressure state information?
>
> Thanks,
> Alan
>
>
>
> --
> View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Monitoring-backpressure-tp9472p14868.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.
>


Re: Monitoring backpressure

Posted by "alan@opsclarity.com" <al...@opsclarity.com>.
Hey Stephan,

My company (OpsClarity) is building monitoring integration for flink, and
being that backpressure is one of the most critical concepts in a streaming
system, we need a way to expose backpressure state to a monitoring system
(such as ours).  I see that the flink-ui has a way to sample the pipeline
and mark a stage as high|med|ok wrt backpressure.  I'd love to be able to
encode that as a metric, perhaps as simple as 2|1|0, so that we can plot
backpressure state over time per stage.  This would also allow users to set
alerts on backpressure state.

How might we get access to this backpressure state information?

Thanks,
Alan



--
View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Monitoring-backpressure-tp9472p14868.html
Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.

Re: Monitoring backpressure

Posted by Stephan Ewen <se...@apache.org>.
I discussed about this quite a bit with other people.

It is not totally straightforward. One could try and measure exhaustion of
the output buffer pools, but that fluctuates a lot - it would need some
work to get a stable metric from that...

If you have a profiler that you can attach to the processes, you could
check whether a lot of time is spent within the "requestBufferBlocking()"
method of the buffer pool...

Stephan


On Mon, Dec 7, 2015 at 9:45 AM, Gyula Fóra <gy...@apache.org> wrote:

> Hey guys,
>
> Is there any way to monitor the backpressure in the Flink job? I find it
> hard to debug slow operators because of the backpressure mechanism so it
> would be good to get some info out of the network layer on what exactly
> caused the backpressure.
>
> For example:
>
> task1 -> task2 -> task3 -> task4
>
> I want to figure out whether task 2 or task 3 is slow.
>
> Any ideas?
>
> Thanks,
> Gyula
>