You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2023/04/17 16:04:00 UTC

[jira] [Commented] (FLINK-31827) Incorrect estimation of the target data rate of a vertex when only a subset of its upstream vertex's output is consumed

    [ https://issues.apache.org/jira/browse/FLINK-31827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713181#comment-17713181 ] 

Gyula Fora commented on FLINK-31827:
------------------------------------

The same issue can also be easily reproduced with side outputs. The main problem is that Flink does not provide target jobvertex level in/out record metrics only “aggregated” ones. 

I have a prototype fix that would add the missing metrics on the Flink side and then the autoscaler can be improved to use that information from Flink 1.18 and later

> Incorrect estimation of the target data rate of a vertex when only a subset of its upstream vertex's output is consumed
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-31827
>                 URL: https://issues.apache.org/jira/browse/FLINK-31827
>             Project: Flink
>          Issue Type: Bug
>          Components: Autoscaler
>            Reporter: Zhanghao Chen
>            Priority: Major
>         Attachments: image-2023-04-17-23-37-35-280.png
>
>
> Currently, the target data rate of a vertex = SUM(target data rate * input/output ratio) for all of its upstream vertices. This assumes that all output records of an upstream vertex is consumed by the downstream vertex. However, it does not always hold. Consider the following job plan generated by a Flink SQL job. The middle vertex contains multiple chained Calc(select xx) operators, each connecting to a separate downstream sink tasks. As a result, each sink task only consumes a sub-portion of the middle vertex's output.
> To fix it, we need operator level edge info to infer the upstream-downstream relationship as well as operator level output metrics. The metrics part is easy but AFAIK, there's no way to get the operator level edge info from the Flink REST API yet.
> !image-2023-04-17-23-37-35-280.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)