You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2020/04/10 22:08:00 UTC

[jira] [Resolved] (IMPALA-8165) Planner does not push through predicates when there is a disjunction

     [ https://issues.apache.org/jira/browse/IMPALA-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha resolved IMPALA-8165.
--------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

This is fixed by IMPALA-9183. 

> Planner does not push through predicates when there is a disjunction
> --------------------------------------------------------------------
>
>                 Key: IMPALA-8165
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8165
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Greg Rahn
>            Assignee: Aman Sinha
>            Priority: Major
>              Labels: planner, sql-performance, tpc-ds
>             Fix For: Impala 4.0
>
>
> If we take a simple query like such:
> {noformat}
> select avg(ss_quantity)
> from store_sales 
> join household_demographics on (ss_hdemo_sk=hd_demo_sk)
> where (ss_sales_price between 0 and 100 and hd_dep_count = 1)
>  or (ss_sales_price between 100 and 200 and hd_dep_count = 2);
> {noformat}
> and look at the plan we see that neither table scan has any predicates pushed to it, the only filter is in the join
> (from impalad version 2.12.0-cdh5.16.x RELEASE (build 3f68649c7bf8a01fb6ba0cbe35dd2492adb836dd)
> {noformat}
> PLAN-ROOT SINK
> |
> 06:AGGREGATE [FINALIZE]
> | output: avg:merge(ss_quantity)
> |
> 05:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE
> | output: avg(ss_quantity)
> |
> 02:HASH JOIN [INNER JOIN, BROADCAST]
> | hash predicates: ss_hdemo_sk = hd_demo_sk
> | other predicates: (ss_sales_price >= 0 AND ss_sales_price <= 100 AND hd_dep_count = 1) OR 
> |                   (ss_sales_price >= 100 AND ss_sales_price <= 200 AND hd_dep_count = 2)
> | runtime filters: RF000 <- hd_demo_sk
> |
> |--04:EXCHANGE [BROADCAST]
> | |
> | 01:SCAN HDFS [tpcds_1000_parquet.household_demographics]
> | partitions=1/1 files=1 size=41.08KB
> |
> 00:SCAN HDFS [tpcds_1000_parquet.store_sales]
>  partitions=1824/1824 files=1824 size=189.24GB
>  runtime filters: RF000 -> ss_hdemo_sk
> {noformat}
> If we look at PostgreSQL 11.1, we'll see that not only does the join filter, but the table scan has the appropriate filters pushed to it.
> {noformat}
>  Finalize Aggregate  (cost=67549.69..67549.70 rows=1 width=32)
>    ->  Gather  (cost=67549.47..67549.68 rows=2 width=32)
>          Workers Planned: 2
>          ->  Partial Aggregate  (cost=66549.47..66549.48 rows=1 width=32)
>                ->  Hash Join  (cost=113.12..66547.64 rows=734 width=4)
>                      Hash Cond: (store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk)
>                      Join Filter: (((store_sales.ss_sales_price >= '0'::numeric) AND 
>                                     (store_sales.ss_sales_price <= '100'::numeric) AND 
>                                     (household_demographics.hd_dep_count = 1)) OR 
>                                    ((store_sales.ss_sales_price >= '100'::numeric) AND 
>                                     (store_sales.ss_sales_price <= '200'::numeric) AND 
>                                     (household_demographics.hd_dep_count = 2)))
>                      ->  Parallel Seq Scan on store_sales  (cost=0.00..66343.20 rows=7305 width=22)
>                            Filter: (((ss_sales_price >= '0'::numeric) AND (ss_sales_price <= '100'::numeric)) OR 
>                                     ((ss_sales_price >= '100'::numeric) AND (ss_sales_price <= '200'::numeric)))
>                      ->  Hash  (cost=112.62..112.62 rows=40 width=8)
>                            ->  Seq Scan on household_demographics  (cost=0.00..112.62 rows=40 width=8)
>                                  Filter: ((hd_dep_count = 1) OR (hd_dep_count = 2))
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org