You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/26 12:28:50 UTC

[GitHub] [spark] manuzhang commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

manuzhang commented on pull request #30494:
URL: https://github.com/apache/spark/pull/30494#issuecomment-734270568


   @maryannxue, @cloud-fan, 
   
   Sorry for not raising up earlier but I'd like to discuss a case which seems to not have been covered here.
   
   1. The final stage *before write* is a `SortMergeJoin` with partitioning that match the target table.
   2. AQE switches the `SortMergeJoin` to `BroadcastHashJoin` as one side is smaller than broadcast threshold.
   3. The probe side, with a different partitioning, is applied `OptimizeLocalShuffleReader`, which breaks the user intended partitioning.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org