You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/09/19 05:46:00 UTC

[jira] [Assigned] (SPARK-32939) Avoid re-compute expensive expression

     [ https://issues.apache.org/jira/browse/SPARK-32939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-32939:
------------------------------------

    Assignee: Apache Spark

> Avoid re-compute expensive expression
> -------------------------------------
>
>                 Key: SPARK-32939
>                 URL: https://issues.apache.org/jira/browse/SPARK-32939
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: angerszhu
>            Assignee: Apache Spark
>            Priority: Major
>
> {code:java}
>   test("Pushdown demo") {
>     withTable("t") {
>       withTempDir { loc =>
>         sql(
>           s"""CREATE TABLE t(c1 INT, s STRING) PARTITIONED BY(P1 STRING)
>              | LOCATION '${loc.getAbsolutePath}'
>              |""".stripMargin)
>         sql(
>           """
>             |SELECT c1,
>             |case
>             |  when get_json_object(s,'$.a')=1 then "a"
>             |  when get_json_object(s,'$.a')=2 then "b"
>             |end as s_type
>             |FROM t
>             |WHERE get_json_object(s,'$.a') in (1, 2)
>           """.stripMargin).explain(true)
>          }
>     }
> }
> will got plan as 
> == Physical Plan ==
> *(1) Project [c1#1, CASE WHEN (cast(get_json_object(s#2, $.a) as int) = 1) THEN a WHEN (cast(get_json_object(s#2, $.a) as int) = 2) THEN b END AS s_type#0]
> +- *(1) Filter get_json_object(s#2, $.a) IN (1,2)
>    +- Scan hive default.t [c1#1, s#2], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1, s#2], [P1#3], Statistics(sizeInBytes=8.0 EiB)
> we can see that  get_json_object(s#2, $.a) will be computed tree times
> Always there are expensive expressions are re-computed many times in such grammar。
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org