You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/06/21 08:03:00 UTC
[jira] [Assigned] (SPARK-28940) Subquery reuse across all subquery
levels
[ https://issues.apache.org/jira/browse/SPARK-28940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-28940:
------------------------------------
Assignee: Apache Spark
> Subquery reuse across all subquery levels
> -----------------------------------------
>
> Key: SPARK-28940
> URL: https://issues.apache.org/jira/browse/SPARK-28940
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Peter Toth
> Assignee: Apache Spark
> Priority: Major
>
> Currently subquery reuse doesn't work across all subquery levels.
> Here is an example query:
> {noformat}
> SELECT (SELECT avg(key) FROM testData), (SELECT (SELECT avg(key) FROM testData))
> FROM testData
> LIMIT 1
> {noformat}
> where the plan now is:
> {noformat}
> CollectLimit 1
> +- *(1) Project [Subquery scalar-subquery#268, [id=#231] AS scalarsubquery()#276, Subquery scalar-subquery#270, [id=#266] AS scalarsubquery()#277]
> : :- Subquery scalar-subquery#268, [id=#231]
> : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#272])
> : : +- Exchange SinglePartition, true, [id=#227]
> : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#282, count#283L])
> : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
> : : +- Scan[obj#12]
> : +- Subquery scalar-subquery#270, [id=#266]
> : +- *(1) Project [Subquery scalar-subquery#269, [id=#263] AS scalarsubquery()#275]
> : : +- Subquery scalar-subquery#269, [id=#263]
> : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#274])
> : : +- Exchange SinglePartition, true, [id=#259]
> : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#286, count#287L])
> : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
> : : +- Scan[obj#12]
> : +- *(1) Scan OneRowRelation[]
> +- *(1) SerializeFromObject
> +- Scan[obj#12]
> {noformat}
> but it could be:
> {noformat}
> CollectLimit 1
> +- *(1) Project [ReusedSubquery Subquery scalar-subquery#241, [id=#148] AS scalarsubquery()#248, Subquery scalar-subquery#242, [id=#164] AS scalarsubquery()#249]
> : :- ReusedSubquery Subquery scalar-subquery#241, [id=#148]
> : +- Subquery scalar-subquery#242, [id=#164]
> : +- *(1) Project [Subquery scalar-subquery#241, [id=#148] AS scalarsubquery()#247]
> : : +- Subquery scalar-subquery#241, [id=#148]
> : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#246])
> : : +- Exchange SinglePartition, true, [id=#144]
> : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#258, count#259L])
> : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
> : : +- Scan[obj#12]
> : +- *(1) Scan OneRowRelation[]
> +- *(1) SerializeFromObject
> +- Scan[obj#12]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org