You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2016/07/01 14:30:11 UTC

[jira] [Updated] (SPARK-16208) Add `PropagateEmptyRelation` optimizer

     [ https://issues.apache.org/jira/browse/SPARK-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheng Lian updated SPARK-16208:
-------------------------------
    Assignee: Dongjoon Hyun  (was: Apache Spark)

> Add `PropagateEmptyRelation` optimizer
> --------------------------------------
>
>                 Key: SPARK-16208
>                 URL: https://issues.apache.org/jira/browse/SPARK-16208
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>             Fix For: 2.1.0
>
>
> This issue adds a new logical optimizer, `PropagateEmptyRelation`, to collapse a logical plans consisting of only empty LocalRelations.
> **Optimizer Targets**
> 1. Binary(or Higher)-node Logical Plans
>    - Union with all empty children.
>    - Join with one or two empty children (including Intersect/Except).
> 2. Unary-node Logical Plans
>    - Project/Filter/Sample/Join/Limit/Repartition with all empty children.
>    - Aggregate with all empty children and without AggregateFunction expressions, COUNT.
>    - Generate with Explode because other UserDefinedGenerators like Hive UDTF returns results.
> **Sample Query**
> {code}
> WITH t1 AS (SELECT a FROM VALUES 1 t(a)),
>      t2 AS (SELECT b FROM VALUES 1 t(b) WHERE 1=2)
> SELECT a,b
> FROM t1, t2
> WHERE a=b
> GROUP BY a,b
> HAVING a>1
> ORDER BY a,b
> {code}
> **Before**
> {code}
> scala> sql("with t1 as (select a from values 1 t(a)), t2 as (select b from values 1 t(b) where 1=2) select a,b from t1, t2 where a=b group by a,b having a>1 order by a,b").explain
> == Physical Plan ==
> *Sort [a#0 ASC, b#1 ASC], true, 0
> +- Exchange rangepartitioning(a#0 ASC, b#1 ASC, 200)
>    +- *HashAggregate(keys=[a#0, b#1], functions=[])
>       +- Exchange hashpartitioning(a#0, b#1, 200)
>          +- *HashAggregate(keys=[a#0, b#1], functions=[])
>             +- *BroadcastHashJoin [a#0], [b#1], Inner, BuildRight
>                :- *Filter (isnotnull(a#0) && (a#0 > 1))
>                :  +- LocalTableScan [a#0]
>                +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
>                   +- *Filter (isnotnull(b#1) && (b#1 > 1))
>                      +- LocalTableScan <empty>, [b#1]
> {code}
> **After**
> {code}
> scala> sql("with t1 as (select a from values 1 t(a)), t2 as (select b from values 1 t(b) where 1=2) select a,b from t1, t2 where a=b group by a,b having a>1 order by a,b").explain
> == Physical Plan ==
> LocalTableScan <empty>, [a#0, b#1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org