You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2021/04/25 21:45:00 UTC

[jira] [Updated] (DRILL-7556) Generalize the "Base" storage plugin filter push down mechanism

     [ https://issues.apache.org/jira/browse/DRILL-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-7556:
-------------------------------
    Fix Version/s:     (was: 1.19.0)

> Generalize the "Base" storage plugin filter push down mechanism
> ---------------------------------------------------------------
>
>                 Key: DRILL-7556
>                 URL: https://issues.apache.org/jira/browse/DRILL-7556
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.18.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> DRILL-7458 adds a Base framework for storage plugins which includes a simplified representation of filters that can be pushed down into Drill. It makes the assumption that plugins can generally only handle filters of the form:
> {code}
> column relop constant
> {code}
> For example, {{`foo` < 10}} or {{`bar` = "Fred"}}. (The code "flips" expressions of the form {{constant relop column}}.)
> [~volodymyr] suggests this is too narrow and suggests two additional cases:
> {code}
> column-expr relop constant
> fn(column) = conststant
> {code}
> Examples:
> {code:sql}
> foo + 10 = 20
> substr(bar, 2, 6) = 'Fred'
> {code}
> The first case should be handled by a general expression rewriter: simplify constant expressions:
> {code:sql}
> foo + 10 = 20 --> foo = 10
> {code}
> Then, filter-push down need only handle the simplified expression rather than every push-down mechanism needing to do the simplification.
> For this ticket, we wish to handle the second case: any expression that contains a single column associated with the target table. Provide a new push-down node to handle the non-relop case so that simple plugins can simply ignore such expressions, but more complex plugins (such as Parquet) can optionally handle them.
> A second improvement is to handle the more complex case: two or more columns, all of which come from the same target table. For example:
> {code:sql}
> foo + bar = 20
> {code}
> Where both {{foo}} and {{bar}} are from the same table. It would be a very sophisticated plugin indeed (maybe the JDBC storage plugin) which can handle this case, but it should be available.
> As part of this work, we must handle join-equivalent columns:
> {code:sql}
> SELECT ... FROM t1, t2
>   WHERE t1.a = t2.b
>   AND t1.a = 20
> {code}
> If the plugin for table {{t2}} can handle filter push-down, then the expression {{t1.a = 20}} is join-equivalent to {{t2.b = 20}}.
> It is not clear if the Drill logical plan already handles join equivalence. If not, it should be added. If so, the filter push-down mechanism should add documentation that describes how the mechanism works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)