You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Yingjie Cao (Jira)" <ji...@apache.org> on 2022/07/15 06:01:00 UTC

[jira] [Created] (FLINK-28561) Merge subpartition shuffle data read request for better sequential IO

Yingjie Cao created FLINK-28561:
-----------------------------------

             Summary: Merge subpartition shuffle data read request for better sequential IO
                 Key: FLINK-28561
                 URL: https://issues.apache.org/jira/browse/FLINK-28561
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Network
            Reporter: Yingjie Cao
             Fix For: 1.17.0


Currently, the shuffle data of each subpartition for blocking shuffle is read separately. To achieve better performance and reduce IOPS, we can merge consecutive data requests of the same field together and serves them in one IO request. More specifically,

1) if multiple data requests are reading the same data, for example, reading broadcast data, the reader will read the data only once and send the same piece of data to multiple downstream consumers.

2) if multiple data requests are reading the consecutive data in one file, we will merge those data requests together as one large request and read a larger size of data sequentially which is good for file IO performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)