You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/24 08:08:23 UTC

[GitHub] [spark] rmcyang commented on pull request #34500: [WIP][SPARK-33574][CORE] Improve locality for push-based shuffle especially for join-like operations

rmcyang commented on PR #34500:
URL: https://github.com/apache/spark/pull/34500#issuecomment-1135547465

   > Is this still WIP @rmcyang ? Also, can you please add tests for the sql aqe codepath ? Essentially will this optimization help sql join when AQE is enabled.
   
   Did some tests in out internal branch, merger locations could be reused as expected with AQE disabled; however, every sibling stage could unfortunately use a different set of merger locations when AQE enabled. When `findCoPartitionedSiblingMapStages` got called with AQE enabled, the shuffle map stage is not able to figure out its sibling map stages, which to me is caused by every shuffle map stage now becomes a job and thus the DAG got sliced into many parts. Thus this improvement is now only valid when AQE disabled.
   @mridulm, any thoughts on whether we should figure out a better mechanism to track the sibling stages info, in order to take advantage of this improvement, when AQE is enabled?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org