You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/28 18:47:12 UTC

[GitHub] [spark] Victsm edited a comment on pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging partitions

Victsm edited a comment on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-718135591


   @tgravescs 
   What we have is the same as what's described in the paper and the SPIP doc.
   For handling DRA, we are essentially doing 2 things:
   1. Choose shuffle service locations beyond the current active Spark executors.
   2. Launching Spark executors with DRA based on locations of the chosen shuffle services.
   
   This PR enables the first.
   By keeping track of all historical locations of executors launched for a given Spark application, we get 2 benefits.
   1) When DRA kicks in later on, and significantly reduces the number of available active executors, we can still look into the historical locations of past executors to get sufficient shuffle service locations to perform block push/merge.
   2) On a YARN cluster with authentication enabled, picking historical locations of past executors would ensure that the executor can talk to the shuffle service performing SASL authentication, and upon application finishing up the local dirs storing the merged shuffle files get cleaned up.
   
   In a follow up patch for driver side change (MapOutputTracker#getPreferredLocationsForShuffle), the second is enabled.
   Preferred location for shuffle now takes into consideration of shuffle service locations for a given shuffle.
   This would set the preferred locations for the corresponding `ShuffleRDD` as well, which would then have 2 impacts.
   1) When TaskSetManager schedules tasks to executors, this would impact the task placement strategy.
   2) When ExecutorAllocationManager requests more executors for DRA, this preferred location would be passed to YARN to request containers with the preferred locality.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org