You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Yingjie Cao (Jira)" <ji...@apache.org> on 2022/07/25 06:38:00 UTC

[jira] [Created] (FLINK-28663) Allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side

Yingjie Cao created FLINK-28663:
-----------------------------------

             Summary: Allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side
                 Key: FLINK-28663
                 URL: https://issues.apache.org/jira/browse/FLINK-28663
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Coordination
            Reporter: Yingjie Cao


Currently, one intermediate dataset can only be consumed by one downstream consumer vertex. If there are multiple consumer vertices consuming the same output of the same upstream vertex, multiple intermediate datasets will be produced. We can optimize this behavior to produce only one intermediate dataset which can be shared by multiple consumer vertices. As the first step, we should allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side. (Note that this optimization only works for blocking shuffle because pipelined shuffle result partition can not be consumed multiple times)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)