You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/10 11:12:36 UTC

[GitHub] [spark] ulysses-you commented on pull request #38176: [SPARK-40715][SQL] Support preferring shuffled hash join thought LocalMapThreshold is less than advisory partition size

ulysses-you commented on PR #38176:
URL: https://github.com/apache/spark/pull/38176#issuecomment-1273154006

   `CoalesceShufflePartitions` will make each partition size close to `ADVISORY_PARTITION_SIZE_IN_BYTES` unless some partitions are skewed. So it has no meaning to compare with `AUTO_BROADCASTJOIN_THRESHOLD`...
   
   I'm thinking to add a new rule to optimize smj to shj by spliting the bigger partition to smaller (like skew join did), then we can build hash relation safely. But it only works for inner join, since we need split build side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org