You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Barisa (JIRA)" <ji...@apache.org> on 2019/02/04 11:29:00 UTC

[jira] [Commented] (FLINK-3310) Add back pressure statistics to web frontend

    [ https://issues.apache.org/jira/browse/FLINK-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759781#comment-16759781 ] 

Barisa commented on FLINK-3310:
-------------------------------

Hi, is the backpressure operation something that is expenesive?

I'm asking, since we are considering in polling this info once a minute, and exposing as an Prometheus metric.

 

Question already asked in

[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Continuous-Monitoring-of-back-pressure-tt25869.html]

 

I'm currently writing some code to convert the back-pressure REST API data into Prometheus-compatible output. I was just curious why back-pressure wasn't already exposed as a metric in the in-built Prometheus exporter? Is it because the thread-sampling is too intensive? Or too slow (particularly if running multiple jobs)? In our case we're running a single job per cluster. Any feedback would be appreciated.
Regards,
Dave

> Add back pressure statistics to web frontend
> --------------------------------------------
>
>                 Key: FLINK-3310
>                 URL: https://issues.apache.org/jira/browse/FLINK-3310
>             Project: Flink
>          Issue Type: Improvement
>          Components: Webfrontend
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>            Priority: Minor
>             Fix For: 1.0.0
>
>
> When a task is receiving data at a higher rate than it can process, the task is back pressuring preceding tasks. Currently, there is no way to tell whether this is the case or not. An indicator for back pressure is tasks being stuck in buffer requests on the network stack. This means that they have filled all their buffers with data, but the following tasks/network are not consuming them fast enough.
> A simple way to measure back pressure is to sample running tasks and report back pressure if they are stuck in the blocking buffers calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)