You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Song Jun (JIRA)" <ji...@apache.org> on 2019/03/26 07:16:00 UTC

[jira] [Created] (SPARK-27280) infer filters from Join's OR condition

Song Jun created SPARK-27280:
--------------------------------

             Summary: infer filters from Join's OR condition
                 Key: SPARK-27280
                 URL: https://issues.apache.org/jira/browse/SPARK-27280
             Project: Spark
          Issue Type: Improvement
          Components: Optimizer, SQL
    Affects Versions: 3.0.0
            Reporter: Song Jun


In some case, We can infer filters from Join condition with OR expressions.

for example, tpc-ds query 48:

{code:java}
select sum (ss_quantity)
 from store_sales, store, customer_demographics, customer_address, date_dim
 where s_store_sk = ss_store_sk
 and  ss_sold_date_sk = d_date_sk and d_year = 2000
 and  
 (
  (
   cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'S'
   and 
   cd_education_status = 'Secondary'
   and 
   ss_sales_price between 100.00 and 150.00  
   )
 or
  (
  cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = 'College'
   and 
   ss_sales_price between 50.00 and 100.00   
  )
 or 
 (
  cd_demo_sk = ss_cdemo_sk
  and 
   cd_marital_status = 'U'
   and 
   cd_education_status = '2 yr Degree'
   and 
   ss_sales_price between 150.00 and 200.00  
 )
 )
 and
 (
  (
  ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('AL', 'OH', 'MD')
  and ss_net_profit between 0 and 2000  
  )
 or
  (ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('VA', 'TX', 'IA')
  and ss_net_profit between 150 and 3000 
  )
 or
  (ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('RI', 'WI', 'KY')
  and ss_net_profit between 50 and 25000 
  )
 )
;
{code}

we can infer two filters from the join or condidtion:

{code:java}
for customer_demographics:
cd_marital_status in(‘D',‘U',‘M') and cd_education_status in('4 yr Degree’,’Secondary’,’Primary')

for store_sales:
 (ss_sales_price between 100.00 and 150.00 or ss_sales_price between 50.00 and 100.00 or ss_sales_price between 150.00 and 200.00)
{code}

then then we can push down the above two filters to filter  customer_demographics/store_sales.

A pr will be submit soon.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org