You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/03/20 08:57:00 UTC

[jira] [Commented] (SPARK-34807) Push down filter through window after TransposeWindow

    [ https://issues.apache.org/jira/browse/SPARK-34807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305370#comment-17305370 ] 

Apache Spark commented on SPARK-34807:
--------------------------------------

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/31907

> Push down filter through window after TransposeWindow
> -----------------------------------------------------
>
>                 Key: SPARK-34807
>                 URL: https://issues.apache.org/jira/browse/SPARK-34807
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> {code:scala}
>       spark.range(10).selectExpr("id AS a", "id AS b", "id AS c", "id AS d").createTempView("t1")
>       val df = spark.sql(
>         """
>           |SELECT *
>           |  FROM (
>           |    SELECT b,
>           |      sum(d) OVER (PARTITION BY a, b),
>           |      rank() OVER (PARTITION BY a ORDER BY c)
>           |    FROM t1
>           |  ) v1
>           |WHERE b = 2
>           |""".stripMargin)
> {code}
> Current optimized plan:
> {noformat}
> == Optimized Logical Plan ==
> Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
> +- Filter (b#221L = 2)
>    +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC NULLS FIRST]
>       +- Project [b#221L, a#220L, c#222L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L]
>          +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L]
>             +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS a#220L, id#218L AS c#222L]
>                +- Range (0, 10, step=1, splits=Some(2))
> {noformat}
> Expected optimized plan:
> {noformat}
> == Optimized Logical Plan ==
> Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
> +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L]
>    +- Project [b#221L, d#223L, a#220L, RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
>       +- Filter (b#221L = 2)
>          +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC NULLS FIRST]
>             +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS a#220L, id#218L AS c#222L]
>                +- Range (0, 10, step=1, splits=Some(2))
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org