You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/18 07:31:58 UTC

[GitHub] [spark] cloud-fan edited a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

cloud-fan edited a comment on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645380472


   Can you elaborate it more? How does this optimization help to plan broadcast join?
   
   > For example, the children of a sort merge join will be materialized as query stages in a batch. Then AQE brings the optimization in and optimize sort merge join to broadcast join.
   
   The AQE needs to wait for the stage to finish, so that it knows the size and can change SMJ to BHJ. How can we avoid unnecessary I/O after the stage is finished?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org