You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/05/03 23:48:04 UTC

[jira] [Created] (IMPALA-5280) Coalesce chains of OR conditions to an IN predicate.

Alexander Behm created IMPALA-5280:
--------------------------------------

             Summary: Coalesce chains of OR conditions to an IN predicate.
                 Key: IMPALA-5280
                 URL: https://issues.apache.org/jira/browse/IMPALA-5280
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.8.0
            Reporter: Alexander Behm
         Attachments: same_query_profile_on_CDH5.12.txt

Would be nice to implement an ExprRewriteRule that coalesces multiple compatible OR conditions to an IN predicate, e.g.:
{code}
(c=1) OR (c=2) OR (c=3) OR (c=4) ...
->
c IN (1, 2, 3, 4...)
{code}

Long chains of OR are generally unwieldy, and transforming them to IN has the following benefits:
* IN predicates with long value lists are evaluated with a hash set in the BE
* It is easier to extract min/max values from an IN predicate for Parquet min/max filtering
* The IN predicate may be faster to codegen than a deep binary tree or ORs

Note that this new rule complements existing rules to yield interesting improvements, e.g.:
{code}
(c1=1 AND c2='a') OR (c1=2 AND c2='a') OR (c1=3 AND c2='a')
->
c2='a' AND c1 IN (1, 2, 3)
{code}

I've attached a relevant query profile from one of Mostafa's experiments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)