You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Greg Rahn (JIRA)" <ji...@apache.org> on 2019/02/06 00:53:00 UTC
[jira] [Created] (IMPALA-8165) Planner does not push through predicates when there is a disjunction

Greg Rahn created IMPALA-8165:
---------------------------------

             Summary: Planner does not push through predicates when there is a disjunction
                 Key: IMPALA-8165
                 URL: https://issues.apache.org/jira/browse/IMPALA-8165
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Greg Rahn


If we take a simple query like such:

{noformat}
select avg(ss_quantity)
from store_sales 
join household_demographics on (ss_hdemo_sk=hd_demo_sk)
where (ss_sales_price between 0 and 100 and hd_dep_count = 1)
 or (ss_sales_price between 100 and 200 and hd_dep_count = 2);
{noformat}

and look at the plan we see that neither table scan has any predicates pushed to it, the only filter is in the join

(from impalad version 2.12.0-cdh5.16.x RELEASE (build 3f68649c7bf8a01fb6ba0cbe35dd2492adb836dd)
{noformat}
PLAN-ROOT SINK
|
06:AGGREGATE [FINALIZE]
| output: avg:merge(ss_quantity)
|
05:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE
| output: avg(ss_quantity)
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: ss_hdemo_sk = hd_demo_sk
| other predicates: (ss_sales_price >= 0 AND ss_sales_price <= 100 AND hd_dep_count = 1) OR (ss_sales_price >= 100 AND ss_sales_price <= 200 AND hd_dep_count = 2)
| runtime filters: RF000 <- hd_demo_sk
|
|--04:EXCHANGE [BROADCAST]
| |
| 01:SCAN HDFS [tpcds_1000_parquet.household_demographics]
| partitions=1/1 files=1 size=41.08KB
|
00:SCAN HDFS [tpcds_1000_parquet.store_sales]
 partitions=1824/1824 files=1824 size=189.24GB
 runtime filters: RF000 -> ss_hdemo_sk
{noformat}

If we look at PostgreSQL 11.1, we'll see that not only does the join filter, but the table scan has the appropriate filters pushed to it.

{noformat}
 Finalize Aggregate  (cost=67549.69..67549.70 rows=1 width=32)
   ->  Gather  (cost=67549.47..67549.68 rows=2 width=32)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=66549.47..66549.48 rows=1 width=32)
               ->  Hash Join  (cost=113.12..66547.64 rows=734 width=4)
                     Hash Cond: (store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk)
                     Join Filter: (((store_sales.ss_sales_price >= '0'::numeric) AND 
                                    (store_sales.ss_sales_price <= '100'::numeric) AND 
                                    (household_demographics.hd_dep_count = 1)) OR 
                                   ((store_sales.ss_sales_price >= '100'::numeric) AND 
                                    (store_sales.ss_sales_price <= '200'::numeric) AND 
                                    (household_demographics.hd_dep_count = 2)))
                     ->  Parallel Seq Scan on store_sales  (cost=0.00..66343.20 rows=7305 width=22)
                           Filter: (((ss_sales_price >= '0'::numeric) AND (ss_sales_price <= '100'::numeric)) OR 
                                    ((ss_sales_price >= '100'::numeric) AND (ss_sales_price <= '200'::numeric)))
                     ->  Hash  (cost=112.62..112.62 rows=40 width=8)
                           ->  Seq Scan on household_demographics  (cost=0.00..112.62 rows=40 width=8)
                                 Filter: ((hd_dep_count = 1) OR (hd_dep_count = 2))
{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)