You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/06/17 02:01:00 UTC

[jira] [Commented] (SPARK-32012) Incrementally create and materialize query stage to avoid unnecessary local shuffle

    [ https://issues.apache.org/jira/browse/SPARK-32012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138024#comment-17138024 ] 

Apache Spark commented on SPARK-32012:
--------------------------------------

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/28846

> Incrementally create and materialize query stage to avoid unnecessary local shuffle
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-32012
>                 URL: https://issues.apache.org/jira/browse/SPARK-32012
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>
> The current way of creating query stage in AQE is in batch. For example, the children of a sort merge join will be materialized as query stages in a batch. Then AQE brings the optimization in and optimize sort merge join to broadcast join. Except for the broadcasted exchange, we don't need do any exchange on another side of join but we already materialized the exchange. Currently AQE wraps the materialized exchange with local reader, but it still brings unnecessary I/O. We can avoid unnecessary local shuffle by incrementally creating query stage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org