You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/17 10:02:47 UTC

[GitHub] [spark] manbuyun opened a new pull request #34304: Modify the assignment logic of dirFetchRequests variables

manbuyun opened a new pull request #34304:
URL: https://github.com/apache/spark/pull/34304


   ### What changes were proposed in this pull request?
   
   In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the MapStatus object generated in the shuffle write phase had already generated the BlockManagerId object according to externalShuffleServiceEnabled in the BlockManager.initialize method.
   
   So we don't need to judge it again.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34304: [SPARK-37029][Shuffle] Modify the assignment logic of dirFetchRequests variables

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34304:
URL: https://github.com/apache/spark/pull/34304#discussion_r730939578



##########
File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##########
@@ -623,18 +623,8 @@ final class ShuffleBlockFetcherIterator(
       logDebug(s"Asynchronous fetching host-local blocks without cached executors' dir: " +
         s"${hostLocalBlocksWithMissingDirs.mkString(", ")}")
 
-      // If the external shuffle service is enabled, we'll fetch the local directories for
-      // multiple executors from the external shuffle service, which located at the same host
-      // with the executors, in once. Otherwise, we'll fetch the local directories from those
-      // executors directly one by one. The fetch requests won't be too much since one host is
-      // almost impossible to have many executors at the same time practically.
-      val dirFetchRequests = if (blockManager.externalShuffleServiceEnabled) {
-        val host = blockManager.blockManagerId.host
-        val port = blockManager.externalShuffleServicePort
-        Seq((host, port, hostLocalBlocksWithMissingDirs.keys.toArray))
-      } else {
-        hostLocalBlocksWithMissingDirs.keys.map(bmId => (bmId.host, bmId.port, Array(bmId))).toSeq
-      }
+      val dirFetchRequests = hostLocalBlocksWithMissingDirs.keys.map(bmId =>
+        (bmId.host, bmId.port, Array(bmId))).toSeq

Review comment:
       I think there's a behavior difference after the change - previously, there would be only one call on `getHostLocalDirs` below. After this change, it becomes the number of `hostLocalBlocksWithMissingDirs`, which could introduce more RPC calls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] manbuyun closed pull request #34304: [SPARK-37029][Shuffle] Modify the assignment logic of dirFetchRequests variables

Posted by GitBox <gi...@apache.org>.
manbuyun closed pull request #34304:
URL: https://github.com/apache/spark/pull/34304


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34304: [SPARK-37029][Shuffle]Modify the assignment logic of dirFetchRequests variables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34304:
URL: https://github.com/apache/spark/pull/34304#issuecomment-945088244


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34304: [SPARK-37029][Shuffle] Modify the assignment logic of dirFetchRequests variables

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34304:
URL: https://github.com/apache/spark/pull/34304#discussion_r730939578



##########
File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##########
@@ -623,18 +623,8 @@ final class ShuffleBlockFetcherIterator(
       logDebug(s"Asynchronous fetching host-local blocks without cached executors' dir: " +
         s"${hostLocalBlocksWithMissingDirs.mkString(", ")}")
 
-      // If the external shuffle service is enabled, we'll fetch the local directories for
-      // multiple executors from the external shuffle service, which located at the same host
-      // with the executors, in once. Otherwise, we'll fetch the local directories from those
-      // executors directly one by one. The fetch requests won't be too much since one host is
-      // almost impossible to have many executors at the same time practically.
-      val dirFetchRequests = if (blockManager.externalShuffleServiceEnabled) {
-        val host = blockManager.blockManagerId.host
-        val port = blockManager.externalShuffleServicePort
-        Seq((host, port, hostLocalBlocksWithMissingDirs.keys.toArray))
-      } else {
-        hostLocalBlocksWithMissingDirs.keys.map(bmId => (bmId.host, bmId.port, Array(bmId))).toSeq
-      }
+      val dirFetchRequests = hostLocalBlocksWithMissingDirs.keys.map(bmId =>
+        (bmId.host, bmId.port, Array(bmId))).toSeq

Review comment:
       I think there's a behavior difference after the change - previously, there's only one call on `getHostLocalDirs`. After this change, it becomes the number of `hostLocalBlocksWithMissingDirs`, which introduce more RPC calls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org