You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/21 12:40:00 UTC

[GitHub] [flink] reswqa opened a new pull request, #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader

reswqa opened a new pull request, #20333:
URL: https://github.com/apache/flink/pull/20333

   ## What is the purpose of the change
   
   *Currently, each FileReader(PartitionFileReader or HsSubpartitionFileReaderImpl) will internally allocate a headerbuffer with the size of 8B. Beside, PartitionFileReader also has a 12B indexEntryBuf. Because FileReader is of subpartition granularity, If the parallelism becomes very big, and there are many slots on each TM, the memory occupation will even reach the MB level. In fact, all FileReader of the same ResultPartition read data in a single thread, so we only need to allocate a headerbuffer to a ResultPartition to optimize this phenomenon.*
   
   
   ## Brief change log
   
     - *Move `headerBuffer` and `indexEntryBuffer` from `PartitionedFileReader` to `SortMergeResultPartitionReadScheduler`*
     - *Move `headerBuffer` from `HsSubpartitionFileReaderImpl` to `HsResultPartitionReadScheduler`.*
   
   
   ## Verifying this change
   
   
   This change is already covered by existing tests.*
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature?  no
     - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] wsry commented on pull request #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader

Posted by GitBox <gi...@apache.org>.
wsry commented on PR #20333:
URL: https://github.com/apache/flink/pull/20333#issuecomment-1209096499

   @reswqa Thanks for the update. LGTM. Will merge after CI gives green.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] reswqa commented on a diff in pull request #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader

Posted by GitBox <gi...@apache.org>.
reswqa commented on code in PR #20333:
URL: https://github.com/apache/flink/pull/20333#discussion_r940855012


##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartitionReadScheduler.java:
##########
@@ -70,6 +71,17 @@ class SortMergeResultPartitionReadScheduler implements Runnable, BufferRecycler
      */
     private static final Duration DEFAULT_BUFFER_REQUEST_TIMEOUT = Duration.ofMinutes(5);
 
+    /** Used to read buffers from file channel. */

Review Comment:
   fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] wsry commented on a diff in pull request #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader

Posted by GitBox <gi...@apache.org>.
wsry commented on code in PR #20333:
URL: https://github.com/apache/flink/pull/20333#discussion_r940808314


##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartitionReadScheduler.java:
##########
@@ -70,6 +71,17 @@ class SortMergeResultPartitionReadScheduler implements Runnable, BufferRecycler
      */
     private static final Duration DEFAULT_BUFFER_REQUEST_TIMEOUT = Duration.ofMinutes(5);
 
+    /** Used to read buffers from file channel. */

Review Comment:
   buffers -> buffer headers



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] reswqa commented on pull request #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader

Posted by GitBox <gi...@apache.org>.
reswqa commented on PR #20333:
URL: https://github.com/apache/flink/pull/20333#issuecomment-1203410969

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] wsry closed pull request #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader

Posted by GitBox <gi...@apache.org>.
wsry closed pull request #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader
URL: https://github.com/apache/flink/pull/20333


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] wsry commented on pull request #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader

Posted by GitBox <gi...@apache.org>.
wsry commented on PR #20333:
URL: https://github.com/apache/flink/pull/20333#issuecomment-1208822800

   @reswqa Thanks for the change. I left a minor comment. Please rebase master and fix the conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #20333: [FLINK-28623][network] Optimize the use of off heap memory by blocking and hybrid shuffle reader

Posted by GitBox <gi...@apache.org>.
flinkbot commented on PR #20333:
URL: https://github.com/apache/flink/pull/20333#issuecomment-1191441650

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3a35e9801c59656845be8eac49e5cf6d3edb71eb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a35e9801c59656845be8eac49e5cf6d3edb71eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3a35e9801c59656845be8eac49e5cf6d3edb71eb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org