You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/17 01:58:16 UTC

[GitHub] [spark] viirya opened a new pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

viirya opened a new pull request #28846:
URL: https://github.com/apache/spark/pull/28846

### What changes were proposed in this pull request?

This patch changes the current way of creating query stages in AQE. Instead of creating query stages in batch, incrementally creating query stage can bring the optimization in earlier. It could avoid unnecessary local shuffle.

### Why are the changes needed?

The current way of creating query stage in AQE is in batch. For example, the children of a sort merge join will be materialized as query stages in a batch. Then AQE brings the optimization in and optimize sort merge join to broadcast join. Except for the broadcasted exchange, we don't need do any exchange on another side of join but we already materialized the exchange. Currently AQE wraps the materialized exchange with local reader, but it still brings unnecessary I/O. We can avoid unnecessary local shuffle by incrementally creating query stage.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Unit tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org