You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (Jira)" <ji...@apache.org> on 2019/11/13 13:12:00 UTC

[jira] [Commented] (FLINK-14712) Add NetWork metric for IOMetricsInfo

    [ https://issues.apache.org/jira/browse/FLINK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973329#comment-16973329 ] 

Piotr Nowojski commented on FLINK-14712:
----------------------------------------

First and foremost, there is an ongoing effort on re-implementing back pressure monitor (https://issues.apache.org/jira/browse/FLINK-14472). It still uses the same principle as the current thread sampling - back pressure is reported if the {{LocalBufferPool}} is exhausted (not available), but in a better lightweight, more programatic way. 

Switching back pressure monitor to relay on the metrics that you proposed might cause some issues. Those metrics I think are still not us reliable as checking for the state of the {{LocalBufferPool}}, for example some of the metrics do not work for {{LocalInputChannel}}s. However yes, using the metrics the way how we described it in the blog post, can give you more information (input buffers full, output buffers empty case).

Or do you mean we could supplement those two methods? Instead of replacing the current back pressure detection mechanism, add additional information of the buffers usage status based on the metrics? 

> Add NetWork metric for IOMetricsInfo
> ------------------------------------
>
>                 Key: FLINK-14712
>                 URL: https://issues.apache.org/jira/browse/FLINK-14712
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics, Runtime / Network, Runtime / REST
>            Reporter: lining
>            Priority: Major
>         Attachments: image-2019-11-12-14-30-16-130.png
>
>
> h4. (1) The current monitor is heavy-weight. 
>  *   Backpressure monitoring works by repeatedly taking stack trace samples of your running tasks.
> h4. (2) It is difficult to find out which vertex is the source  of  backpressure.
>  * User need to know current and upstream's network metric to judge current whether is the source of backpressure. Now user has to record relevant information.
> h3. Proposed Changes
> Update IOMetricsInfo add  outPoolUsage, inputExclusiveBuffersUsage, inputFloatingBuffersUsage:
> {code:java}
> public final class IOMetricsInfo {
>     private final float outPoolUsage;
>     private final float inputExclusiveBuffersUsage;
>     private final float inputFloatingBuffersUsage;
> }
> {code}
> JobDetailsInfo.JobVertexDetailsInfo merge use Math.max.(ps: outPoolUsage is from upstream)
> According to   !image-2019-11-12-14-30-16-130.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)