You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/02/06 15:30:00 UTC
[jira] [Commented] (SPARK-34137) The tree string does not contain
statistics for nested scalar sub queries
[ https://issues.apache.org/jira/browse/SPARK-34137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280219#comment-17280219 ]
Apache Spark commented on SPARK-34137:
--------------------------------------
User 'AngersZhuuuu' has created a pull request for this issue:
https://github.com/apache/spark/pull/31485
> The tree string does not contain statistics for nested scalar sub queries
> -------------------------------------------------------------------------
>
> Key: SPARK-34137
> URL: https://issues.apache.org/jira/browse/SPARK-34137
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: Yuming Wang
> Priority: Major
>
> How to reproduce:
> {code:scala}
> spark.sql("create table t1 using parquet as select id as a, id as b from range(1000)")
> spark.sql("create table t2 using parquet as select id as c, id as d from range(2000)")
> spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
> spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS")
> spark.sql("set spark.sql.cbo.enabled=true")
> spark.sql(
> """
> |WITH max_store_sales AS
> | (SELECT max(csales) tpcds_cmax
> | FROM (SELECT
> | sum(b) csales
> | FROM t1 WHERE a < 100 ) x),
> |best_ss_customer AS
> | (SELECT
> | c
> | FROM t2
> | WHERE d > (SELECT * FROM max_store_sales))
> |
> |SELECT c FROM best_ss_customer
> |""".stripMargin).explain("cost")
> {code}
> Output:
> {noformat}
> == Optimized Logical Plan ==
> Project [c#4263L], Statistics(sizeInBytes=31.3 KiB, rowCount=2.00E+3)
> +- Filter (isnotnull(d#4264L) AND (d#4264L > scalar-subquery#4262 [])), Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
> : +- Aggregate [max(csales#4260L) AS tpcds_cmax#4261L]
> : +- Aggregate [sum(b#4266L) AS csales#4260L]
> : +- Project [b#4266L]
> : +- Filter ((a#4265L < 100) AND isnotnull(a#4265L))
> : +- Relation default.t1[a#4265L,b#4266L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
> +- Relation default.t2[c#4263L,d#4264L] parquet, Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
> {noformat}
> Another case is TPC-DS q23a.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org