You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2015/12/01 03:05:11 UTC
[jira] [Updated] (SPARK-12032) Filter can't be pushed down to
correct Join because of bad order of Join
[ https://issues.apache.org/jira/browse/SPARK-12032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-12032:
-------------------------------------
Issue Type: Improvement (was: Bug)
> Filter can't be pushed down to correct Join because of bad order of Join
> ------------------------------------------------------------------------
>
> Key: SPARK-12032
> URL: https://issues.apache.org/jira/browse/SPARK-12032
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Davies Liu
> Priority: Critical
>
> For this query:
> {code}
> select d.d_year, count(*) cnt
> FROM store_sales, date_dim d, customer c
> WHERE ss_customer_sk = c.c_customer_sk AND c.c_first_shipto_date_sk = d.d_date_sk
> group by d.d_year
> {code}
> Current optimized plan is
> {code}
> == Optimized Logical Plan ==
> Aggregate [d_year#147], [d_year#147,(count(1),mode=Complete,isDistinct=false) AS cnt#425L]
> Project [d_year#147]
> Join Inner, Some(((ss_customer_sk#283 = c_customer_sk#101) && (c_first_shipto_date_sk#106 = d_date_sk#141)))
> Project [d_date_sk#141,d_year#147,ss_customer_sk#283]
> Join Inner, None
> Project [ss_customer_sk#283]
> Relation[] ParquetRelation[store_sales]
> Project [d_date_sk#141,d_year#147]
> Relation[] ParquetRelation[date_dim]
> Project [c_customer_sk#101,c_first_shipto_date_sk#106]
> Relation[] ParquetRelation[customer]
> {code}
> It will join store_sales and date_dim together without any condition, the condition c.c_first_shipto_date_sk = d.d_date_sk is not pushed to it because the bad order of joins.
> The optimizer should re-order the joins, join date_dim after customer, then it can pushed down the condition correctly.
> The plan should be
> {code}
> Aggregate [d_year#147], [d_year#147,(count(1),mode=Complete,isDistinct=false) AS cnt#425L]
> Project [d_year#147]
> Join Inner, Some((c_first_shipto_date_sk#106 = d_date_sk#141))
> Project [c_first_shipto_date_sk#106]
> Join Inner, Some((ss_customer_sk#283 = c_customer_sk#101))
> Project [ss_customer_sk#283]
> Relation[store_sales]
> Project [c_first_shipto_date_sk#106,c_customer_sk#101]
> Relation[customer]
> Project [d_year#147,d_date_sk#141]
> Relation[date_dim]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org