You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2020/01/31 01:11:00 UTC

[jira] [Created] (DRILL-7556) Generalize the "Base" storage plugin filter push down mechanism

Paul Rogers created DRILL-7556:
----------------------------------

             Summary: Generalize the "Base" storage plugin filter push down mechanism
                 Key: DRILL-7556
                 URL: https://issues.apache.org/jira/browse/DRILL-7556
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.18.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
             Fix For: 1.18.0


DRILL-7458 adds a Base framework for storage plugins which includes a simplified representation of filters that can be pushed down into Drill. It makes the assumption that plugins can generally only handle filters of the form:

{code}
column relop constant
{code}

For example, {{`foo` < 10}} or {{`bar` = "Fred"}}. (The code "flips" expressions of the form {{constant relop column}}.)

[~volodymyr] suggests this is too narrow and suggests two additional cases:

{code}
column-expr relop constant
fn(column) = conststant
{code}

Examples:

{code:sql}
foo + 10 = 20
substr(bar, 2, 6) = 'Fred'
{code}

The first case should be handled by a general expression rewriter: simplify constant expressions:

{code:sql}
foo + 10 = 20 --> foo = 10
{code}

Then, filter-push down need only handle the simplified expression rather than every push-down mechanism needing to do the simplification.

For this ticket, we wish to handle the second case: any expression that contains a single column associated with the target table. Provide a new push-down node to handle the non-relop case so that simple plugins can simply ignore such expressions, but more complex plugins (such as Parquet) can optionally handle them.

A second improvement is to handle the more complex case: two or more columns, all of which come from the same target table. For example:

{code:sql}
foo + bar = 20
{code}

Where both {{foo}} and {{bar}} are from the same table. It would be a very sophisticated plugin indeed (maybe the JDBC storage plugin) which can handle this case, but it should be available.

As part of this work, we must handle join-equivalent columns:

{code:sql}
SELECT ... FROM t1, t2
  WHERE t1.a = t2.b
  AND t1.a = 20
{code}

If the plugin for table {{t2}} can handle filter push-down, then the expression {{t1.a = 20}} is join-equivalent to {{t2.b = 20}}.

It is not clear if the Drill logical plan already handles join equivalence. If not, it should be added. If so, the filter push-down mechanism should add documentation that describes how the mechanism works.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)