You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zhijiang (Jira)" <ji...@apache.org> on 2019/09/24 03:35:00 UTC

[jira] [Commented] (FLINK-12576) inputQueueLength metric does not work for LocalInputChannels

    [ https://issues.apache.org/jira/browse/FLINK-12576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936348#comment-16936348 ] 

zhijiang commented on FLINK-12576:
----------------------------------

Thanks for reporting this [~alpinegizmo]

I want to confirm two things:

1. The input metric here is for {{inputQueueLength}}?

2. Have you tried whether this problem exists before release-1.9, especially for the case of non-local in 2 single-slot TMs.

This ticket actually made two mainly changes before. One is for considering the input metric (inputQueueLength) for local input channel. The other is that the metric value is got out of synchronized way instead for remote input channel. So I wonder whether it would cause visibility issue for metric reporter thread. But it seems that this issue only happens for the parallelism of backpressure operator in your testing.

 

> inputQueueLength metric does not work for LocalInputChannels
> ------------------------------------------------------------
>
>                 Key: FLINK-12576
>                 URL: https://issues.apache.org/jira/browse/FLINK-12576
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics, Runtime / Network
>    Affects Versions: 1.6.4, 1.7.2, 1.8.0, 1.9.0
>            Reporter: Piotr Nowojski
>            Assignee: Aitozi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently {{inputQueueLength}} ignores LocalInputChannels ({{SingleInputGate#getNumberOfQueuedBuffers}}). This can can cause mistakes when looking for causes of back pressure (If task is back pressuring whole Flink job, but there is a data skew and only local input channels are being used).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)