You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/12/23 23:52:01 UTC
[jira] [Resolved] (IMPALA-2108) Improve partition pruning by
extracting partition-column filters from non-trivial disjunctions.
[ https://issues.apache.org/jira/browse/IMPALA-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-2108.
-----------------------------------
Assignee: (was: Bharath Vissapragada)
Resolution: Duplicate
> Improve partition pruning by extracting partition-column filters from non-trivial disjunctions.
> -----------------------------------------------------------------------------------------------
>
> Key: IMPALA-2108
> URL: https://issues.apache.org/jira/browse/IMPALA-2108
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 1.2.4, Impala 1.3, Impala 1.4, Impala 2.1, Impala 2.2
> Reporter: Alexander Behm
> Priority: Minor
> Labels: newbie, performance
>
> *Problem Statement*
> Impala fails to prune partitions if the partition-column filters are part of a "non-trivial" disjunction where each disjunct itself consists of conjuncts referencing both partition and non-partition columns.
> Consider the following example:
> {code}
> create table test_table (c1 INT, c2 STRING) PARTITIONED BY (pc INT);
> [localhost.localdomain:21000] > explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c');
> Query: explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c') <-- Partition-column filters inside non-trivial djsiunctions
> +----------------------------------------------------------------------------------------+
> | Explain String |
> +----------------------------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=176.00MB VCores=1 |
> | WARNING: The following tables are missing relevant table and/or column statistics. |
> | default.test_table |
> | |
> | 01:EXCHANGE [UNPARTITIONED] |
> | | |
> | 00:SCAN HDFS [default.test_table] |
> | partitions=5/5 files=9 size=36B |
> | predicates: (pc = 1 AND c2 = 'a') OR (pc = 2 AND c2 = 'b') OR (pc = 3 AND c2 = 'c') |
> +----------------------------------------------------------------------------------------+
> Fetched 9 row(s) in 0.04s
> [localhost.localdomain:21000] >
> {code}
> *Cause*
> This is a limitation in how Impala filters partitions.
> *Workaround*
> The above example can be fixed by manually rewriting the predicate as follows:
> {code}
> select c1 from test_table where ((pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c')) and (pc=1 OR pc=2 OR pc=3);
> {code}
> *Proposed fix*
> The proposed fix is for Impala to automatically do what is stated in the workaround above:
> Extract the partition-column filters from the disjunctions, create a new predicate with all those partition-column filters connected with OR, and add the new predicate to the original one with AND.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)