You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2016/07/01 14:30:11 UTC
[jira] [Updated] (SPARK-16208) Add `PropagateEmptyRelation`
optimizer
[ https://issues.apache.org/jira/browse/SPARK-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-16208:
-------------------------------
Assignee: Dongjoon Hyun (was: Apache Spark)
> Add `PropagateEmptyRelation` optimizer
> --------------------------------------
>
> Key: SPARK-16208
> URL: https://issues.apache.org/jira/browse/SPARK-16208
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Dongjoon Hyun
> Assignee: Dongjoon Hyun
> Fix For: 2.1.0
>
>
> This issue adds a new logical optimizer, `PropagateEmptyRelation`, to collapse a logical plans consisting of only empty LocalRelations.
> **Optimizer Targets**
> 1. Binary(or Higher)-node Logical Plans
> - Union with all empty children.
> - Join with one or two empty children (including Intersect/Except).
> 2. Unary-node Logical Plans
> - Project/Filter/Sample/Join/Limit/Repartition with all empty children.
> - Aggregate with all empty children and without AggregateFunction expressions, COUNT.
> - Generate with Explode because other UserDefinedGenerators like Hive UDTF returns results.
> **Sample Query**
> {code}
> WITH t1 AS (SELECT a FROM VALUES 1 t(a)),
> t2 AS (SELECT b FROM VALUES 1 t(b) WHERE 1=2)
> SELECT a,b
> FROM t1, t2
> WHERE a=b
> GROUP BY a,b
> HAVING a>1
> ORDER BY a,b
> {code}
> **Before**
> {code}
> scala> sql("with t1 as (select a from values 1 t(a)), t2 as (select b from values 1 t(b) where 1=2) select a,b from t1, t2 where a=b group by a,b having a>1 order by a,b").explain
> == Physical Plan ==
> *Sort [a#0 ASC, b#1 ASC], true, 0
> +- Exchange rangepartitioning(a#0 ASC, b#1 ASC, 200)
> +- *HashAggregate(keys=[a#0, b#1], functions=[])
> +- Exchange hashpartitioning(a#0, b#1, 200)
> +- *HashAggregate(keys=[a#0, b#1], functions=[])
> +- *BroadcastHashJoin [a#0], [b#1], Inner, BuildRight
> :- *Filter (isnotnull(a#0) && (a#0 > 1))
> : +- LocalTableScan [a#0]
> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
> +- *Filter (isnotnull(b#1) && (b#1 > 1))
> +- LocalTableScan <empty>, [b#1]
> {code}
> **After**
> {code}
> scala> sql("with t1 as (select a from values 1 t(a)), t2 as (select b from values 1 t(b) where 1=2) select a,b from t1, t2 where a=b group by a,b having a>1 order by a,b").explain
> == Physical Plan ==
> LocalTableScan <empty>, [a#0, b#1]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org