You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2020/01/31 01:11:00 UTC
[jira] [Created] (DRILL-7556) Generalize the "Base" storage plugin
filter push down mechanism
Paul Rogers created DRILL-7556:
----------------------------------
Summary: Generalize the "Base" storage plugin filter push down mechanism
Key: DRILL-7556
URL: https://issues.apache.org/jira/browse/DRILL-7556
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.18.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Fix For: 1.18.0
DRILL-7458 adds a Base framework for storage plugins which includes a simplified representation of filters that can be pushed down into Drill. It makes the assumption that plugins can generally only handle filters of the form:
{code}
column relop constant
{code}
For example, {{`foo` < 10}} or {{`bar` = "Fred"}}. (The code "flips" expressions of the form {{constant relop column}}.)
[~volodymyr] suggests this is too narrow and suggests two additional cases:
{code}
column-expr relop constant
fn(column) = conststant
{code}
Examples:
{code:sql}
foo + 10 = 20
substr(bar, 2, 6) = 'Fred'
{code}
The first case should be handled by a general expression rewriter: simplify constant expressions:
{code:sql}
foo + 10 = 20 --> foo = 10
{code}
Then, filter-push down need only handle the simplified expression rather than every push-down mechanism needing to do the simplification.
For this ticket, we wish to handle the second case: any expression that contains a single column associated with the target table. Provide a new push-down node to handle the non-relop case so that simple plugins can simply ignore such expressions, but more complex plugins (such as Parquet) can optionally handle them.
A second improvement is to handle the more complex case: two or more columns, all of which come from the same target table. For example:
{code:sql}
foo + bar = 20
{code}
Where both {{foo}} and {{bar}} are from the same table. It would be a very sophisticated plugin indeed (maybe the JDBC storage plugin) which can handle this case, but it should be available.
As part of this work, we must handle join-equivalent columns:
{code:sql}
SELECT ... FROM t1, t2
WHERE t1.a = t2.b
AND t1.a = 20
{code}
If the plugin for table {{t2}} can handle filter push-down, then the expression {{t1.a = 20}} is join-equivalent to {{t2.b = 20}}.
It is not clear if the Drill logical plan already handles join equivalence. If not, it should be added. If so, the filter push-down mechanism should add documentation that describes how the mechanism works.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)