You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/01 06:14:27 UTC

[GitHub] [spark] Ngone51 commented on a change in pull request #34156: [WIP] [SPARK-36892] [Core] Disable batch fetch for a shuffle when push based shuffle is enabled

Ngone51 commented on a change in pull request #34156:
URL: https://github.com/apache/spark/pull/34156#discussion_r719968691



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -1448,7 +1448,7 @@ private[spark] object MapOutputTracker extends Logging {
     // TODO: improve push based shuffle to read partial merged blocks satisfying the start/end
     // TODO: map indexes
     if (mergeStatuses.exists(_.nonEmpty) && startMapIndex == 0
-      && endMapIndex == mapStatuses.length) {
+      && endMapIndex == mapStatuses.length && endPartition - startPartition == 1) {

Review comment:
       > No behavior change : other than disabling block fetch.
   
   @mridulm But this is the major concern from me now by disabling batch fetch for all the cases when PBS is enabled. Because the case like `PartialMapperPartitionSpec` can no longer use the batch fetch when PBS is enabled (which could have perf issue), while it could use in other ways:
    1) `endPartition - startPartition == 1`
    2) only disable batch fetch for the case of partition coalesce rather than all the cases
     And sound like @Victsm wants to take the way 2) as he said:
   
   > We are evaluating passing this additional information to BlockStoreShuffleReader as well so that we can properly disable batch fetch only for cases of partition coalesce but not for local shuffle read.
   
   For 2), just in case you may touch`ShuffleManager`: there're users who use `ShuffleManager` as API to implement their own manager, though it's private API. But I still don't want to see changes in this interface to cause compatibility issues in 3.2 at this monent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org