You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (Jira)" <ji...@apache.org> on 2021/01/12 19:50:00 UTC

[jira] [Closed] (FLINK-20852) Enrich back pressure stats per subtask in the WebUI

     [ https://issues.apache.org/jira/browse/FLINK-20852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Piotr Nowojski closed FLINK-20852.
----------------------------------
    Resolution: Fixed

Merged to master as 56314b72123..f43fc042909.

In the end I decided to not include {{inPoolUsage}} and {{outPoolUsage}} metrics. Partially because they have (at least currently) problem with ignoring local input channels. 

However my main reason is that after trying it out, I was a bit confused by presence of those two metrics, displayed as a raw numbers just besides {{backPressuredTimeMsPerSecond}}, {{idleTimeMsPerSecond}} and {{busyTimesMsPerSecond}}. That's because the pool usage metrics are sampled (once every 10 seconds), they present only the one single temporary reading, while {{***MsPerSecond}} metrics are time based and present an average from the last couple of seconds. Now if the backpressure, pool usage values or task load are changing frequently, pool usage metrics would be jumping randomly, sometimes oscillating between 0% and 100%, while {{***MsPerSecond}} would be much more stable, as load spikes < 10 seconds would be averaged out, while load spikes > 10 seconds would be visible as gradual changes. To avoid confusion among non power users, pool usage (or other sampled metrics) would need to be sampled more frequently and presented as a chart over time.

> Enrich back pressure stats per subtask in the WebUI
> ---------------------------------------------------
>
>                 Key: FLINK-20852
>                 URL: https://issues.apache.org/jira/browse/FLINK-20852
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Web Frontend
>    Affects Versions: 1.12.0
>            Reporter: Piotr Nowojski
>            Assignee: Piotr Nowojski
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.13.0
>
>
> We can enrich the back pressure tab in the WebUI with a couple of more metrics that can help us diagnose the problem, like:
> * backPressuredTimeMsPerSecond
> * busyTimeMsPerSecond
> * idleTimeMsPerSecond
> * -inPoolUsage-
> * -outPoolUsage-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)