You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/16 22:53:05 UTC
[jira] [Updated] (SPARK-29375) Exchange reuse across all subquery
levels
[ https://issues.apache.org/jira/browse/SPARK-29375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-29375:
----------------------------------
Affects Version/s: (was: 3.0.0)
3.1.0
> Exchange reuse across all subquery levels
> -----------------------------------------
>
> Key: SPARK-29375
> URL: https://issues.apache.org/jira/browse/SPARK-29375
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Peter Toth
> Priority: Major
>
> Currently exchange reuse doesn't work across all subquery levels.
> Here is an example query:
> {noformat}
> SELECT
> (SELECT max(a.key) FROM testData AS a JOIN testData AS b ON b.key = a.key),
> a.key
> FROM testData AS a
> JOIN testData AS b ON b.key = a.key{noformat}
> where the plan is:
> {noformat}
> *(5) Project [Subquery scalar-subquery#240, [id=#193] AS scalarsubquery()#247, key#13]
> : +- Subquery scalar-subquery#240, [id=#193]
> : +- *(6) HashAggregate(keys=[], functions=[max(key#13)], output=[max(key)#246])
> : +- Exchange SinglePartition, true, [id=#189]
> : +- *(5) HashAggregate(keys=[], functions=[partial_max(key#13)], output=[max#251])
> : +- *(5) Project [key#13]
> : +- *(5) SortMergeJoin [key#13], [key#243], Inner
> : :- *(2) Sort [key#13 ASC NULLS FIRST], false, 0
> : : +- Exchange hashpartitioning(key#13, 5), true, [id=#145]
> : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
> : : +- Scan[obj#12]
> : +- *(4) Sort [key#243 ASC NULLS FIRST], false, 0
> : +- ReusedExchange [key#243], Exchange hashpartitioning(key#13, 5), true, [id=#145]
> +- *(5) SortMergeJoin [key#13], [key#241], Inner
> :- *(2) Sort [key#13 ASC NULLS FIRST], false, 0
> : +- Exchange hashpartitioning(key#13, 5), true, [id=#205]
> : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
> : +- Scan[obj#12]
> +- *(4) Sort [key#241 ASC NULLS FIRST], false, 0
> +- ReusedExchange [key#241], Exchange hashpartitioning(key#13, 5), true, [id=#205]
> {noformat}
> but it could be improved as here:
> {noformat}
> *(5) Project [Subquery scalar-subquery#240, [id=#211] AS scalarsubquery()#247, key#13]
> : +- Subquery scalar-subquery#240, [id=#211]
> : +- *(6) HashAggregate(keys=[], functions=[max(key#13)], output=[max(key)#246])
> : +- Exchange SinglePartition, true, [id=#207]
> : +- *(5) HashAggregate(keys=[], functions=[partial_max(key#13)], output=[max#251])
> : +- *(5) Project [key#13]
> : +- *(5) SortMergeJoin [key#13], [key#243], Inner
> : :- *(2) Sort [key#13 ASC NULLS FIRST], false, 0
> : : +- Exchange hashpartitioning(key#13, 5), true, [id=#145]
> : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
> : : +- Scan[obj#12]
> : +- *(4) Sort [key#243 ASC NULLS FIRST], false, 0
> : +- ReusedExchange [key#243], Exchange hashpartitioning(key#13, 5), true, [id=#145]
> +- *(5) SortMergeJoin [key#13], [key#241], Inner
> :- *(2) Sort [key#13 ASC NULLS FIRST], false, 0
> : +- ReusedExchange [key#13], Exchange hashpartitioning(key#13, 5), true, [id=#145]
> +- *(4) Sort [key#241 ASC NULLS FIRST], false, 0
> +- ReusedExchange [key#241], Exchange hashpartitioning(key#13, 5), true, [id=#145]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org