You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Yingjie Cao (Jira)" <ji...@apache.org> on 2022/07/25 06:39:00 UTC

[jira] [Assigned] (FLINK-28663) Allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side

     [ https://issues.apache.org/jira/browse/FLINK-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yingjie Cao reassigned FLINK-28663:
-----------------------------------

    Assignee: Yingjie Cao

> Allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28663
>                 URL: https://issues.apache.org/jira/browse/FLINK-28663
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Yingjie Cao
>            Assignee: Yingjie Cao
>            Priority: Major
>
> Currently, one intermediate dataset can only be consumed by one downstream consumer vertex. If there are multiple consumer vertices consuming the same output of the same upstream vertex, multiple intermediate datasets will be produced. We can optimize this behavior to produce only one intermediate dataset which can be shared by multiple consumer vertices. As the first step, we should allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side. (Note that this optimization only works for blocking shuffle because pipelined shuffle result partition can not be consumed multiple times)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)