You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/04/25 16:56:01 UTC
[jira] [Assigned] (SPARK-27280) infer filters from Join's OR
condition
[ https://issues.apache.org/jira/browse/SPARK-27280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-27280:
------------------------------------
Assignee: Apache Spark
> infer filters from Join's OR condition
> --------------------------------------
>
> Key: SPARK-27280
> URL: https://issues.apache.org/jira/browse/SPARK-27280
> Project: Spark
> Issue Type: Improvement
> Components: Optimizer, SQL
> Affects Versions: 3.0.0
> Reporter: Song Jun
> Assignee: Apache Spark
> Priority: Major
>
> In some case, We can infer filters from Join condition with OR expressions.
> for example, tpc-ds query 48:
> {code:java}
> select sum (ss_quantity)
> from store_sales, store, customer_demographics, customer_address, date_dim
> where s_store_sk = ss_store_sk
> and ss_sold_date_sk = d_date_sk and d_year = 2000
> and
> (
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'S'
> and
> cd_education_status = 'Secondary'
> and
> ss_sales_price between 100.00 and 150.00
> )
> or
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = 'College'
> and
> ss_sales_price between 50.00 and 100.00
> )
> or
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'U'
> and
> cd_education_status = '2 yr Degree'
> and
> ss_sales_price between 150.00 and 200.00
> )
> )
> and
> (
> (
> ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('AL', 'OH', 'MD')
> and ss_net_profit between 0 and 2000
> )
> or
> (ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('VA', 'TX', 'IA')
> and ss_net_profit between 150 and 3000
> )
> or
> (ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('RI', 'WI', 'KY')
> and ss_net_profit between 50 and 25000
> )
> )
> ;
> {code}
> we can infer two filters from the join or condidtion:
> {code:java}
> for customer_demographics:
> cd_marital_status in(‘D',‘U',‘M') and cd_education_status in('4 yr Degree’,’Secondary’,’Primary')
> for store_sales:
> (ss_sales_price between 100.00 and 150.00 or ss_sales_price between 50.00 and 100.00 or ss_sales_price between 150.00 and 200.00)
> {code}
> then then we can push down the above two filters to filter customer_demographics/store_sales.
> A pr will be submit soon.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org