You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "shuwang21 (via GitHub)" <gi...@apache.org> on 2023/08/05 04:15:39 UTC

[GitHub] [spark] shuwang21 commented on a diff in pull request #41489: [SPARK-43987][Shuffle] Separate finalizeShuffleMerge Processing to Dedicated Thread Pools

shuwang21 commented on code in PR #41489:
URL: https://github.com/apache/spark/pull/41489#discussion_r1284961300


##########
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:
##########
@@ -324,6 +326,33 @@ public boolean separateChunkFetchRequest() {
     return conf.getInt("spark.shuffle.server.chunkFetchHandlerThreadsPercent", 0) > 0;
   }
 
+  /**
+   * Percentage of io.serverThreads used by netty to process FinalizeShuffleMerge. When the config
+   * `spark.shuffle.server.finalizeShuffleMergeThreadsPercent` is set, shuffle server will use a
+   * separate EventLoopGroup to process FinalizeShuffleMerge messages, which are I/O intensive and
+   * could take long time to process due to disk contentions. The number of threads used for handling
+   * finalizeShuffleMerge requests are percentage of io.serverThreads (if defined) else it is a
+   * percentage of 2 * #cores.
+   */
+  public int finalizeShuffleMergeHandlerThreads() {
+    if (!this.getModuleName().equalsIgnoreCase("shuffle")) {
+      return 0;
+    }
+    int finalizeShuffleMergeThreadsPercent =
+        Integer.parseInt(conf.get("spark.shuffle.server.finalizeShuffleMergeThreadsPercent"));

Review Comment:
   I think this is handled by `separateFinalizeShuffleMerge` we current have. 
   1. When `spark.shuffle.server.finalizeShuffleMergeThreadsPercent` is missing, `separateFinalizeShuffleMerge` return false, so we will not call `finalizeShuffleMergeHandlerThreads` to create finalize handler.
   2. Only when `spark.shuffle.server.finalizeShuffleMergeThreadsPercent` is larger than 0, we will create finalize handler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org