You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/12/23 23:52:01 UTC

[jira] [Resolved] (IMPALA-2108) Improve partition pruning by extracting partition-column filters from non-trivial disjunctions.

     [ https://issues.apache.org/jira/browse/IMPALA-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-2108.
-----------------------------------
      Assignee:     (was: Bharath Vissapragada)
    Resolution: Duplicate

> Improve partition pruning by extracting partition-column filters from non-trivial disjunctions.
> -----------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-2108
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2108
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 1.2.4, Impala 1.3, Impala 1.4, Impala 2.1, Impala 2.2
>            Reporter: Alexander Behm
>            Priority: Minor
>              Labels: newbie, performance
>
> *Problem Statement*
> Impala fails to prune partitions if the partition-column filters are part of a "non-trivial" disjunction where each disjunct itself consists of conjuncts referencing both partition and non-partition columns.
> Consider the following example:
> {code}
> create table test_table (c1 INT, c2 STRING) PARTITIONED BY (pc INT);
> [localhost.localdomain:21000] > explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c');
> Query: explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c') <-- Partition-column filters inside non-trivial djsiunctions
> +----------------------------------------------------------------------------------------+
> | Explain String                                                                         |
> +----------------------------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=176.00MB VCores=1                              |
> | WARNING: The following tables are missing relevant table and/or column statistics.     |
> | default.test_table                                                                     |
> |                                                                                        |
> | 01:EXCHANGE [UNPARTITIONED]                                                            |
> | |                                                                                      |
> | 00:SCAN HDFS [default.test_table]                                                      |
> |    partitions=5/5 files=9 size=36B                                                     |
> |    predicates: (pc = 1 AND c2 = 'a') OR (pc = 2 AND c2 = 'b') OR (pc = 3 AND c2 = 'c') |
> +----------------------------------------------------------------------------------------+
> Fetched 9 row(s) in 0.04s
> [localhost.localdomain:21000] > 
> {code}
> *Cause*
> This is a limitation in how Impala filters partitions.
> *Workaround*
> The above example can be fixed by manually rewriting the predicate as follows:
> {code}
> select c1 from test_table where ((pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c')) and (pc=1 OR pc=2 OR pc=3);
> {code}
> *Proposed fix*
> The proposed fix is for Impala to automatically do what is stated in the workaround above:
> Extract the partition-column filters from the disjunctions, create a new predicate with all those partition-column filters connected with OR, and add the new predicate to the original one with AND.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)