You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2021/04/25 21:46:00 UTC

[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase

     [ https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-7558:
-------------------------------
    Fix Version/s:     (was: 1.19.0)

> Generalize filter push-down planner phase
> -----------------------------------------
>
>                 Key: DRILL-7558
>                 URL: https://issues.apache.org/jira/browse/DRILL-7558
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.18.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> DRILL-7458 provides a base framework for storage plugins, including a simplified filter push-down mechanism. [~volodymyr] notes that it may be *too* simple:
> {quote}
> What about the case when this rule was applied for one filter, but planner at some point pushed another filter above the scan, for example, if we have such case:
> {code}
> Filter(a=2)
>   Join(t1.b=t2.b, type=inner)
>     Filter(b=3)
>     Scan(t1)
>     Scan(t2)
> {code}
> Filter b=3 will be pushed into scan, planner will push filter above join:
> {code}
> Join(t1.b=t2.b, type=inner)
>     Filter(a=2)
>     Scan(t1, b=3)
>     Scan(t2)
> {code}
> In this case, check whether filter was pushed is not enough.
> {quote}
> Drill divides planning into a number of *phases*, each defined by a set of *rules*. Most storage plugins perform filter push-down during the physical planning stage. However, by this point, Drill has already decided on the degree of parallelism: it is too late to use filter push-down to set the degree of parallelism. Yet, if using something like a REST API, we want to use filters to help us shard the query (that is, to set the degree of parallelism.)
>  
> DRILL-7458 performs filter push-down at *logical* planning time to work around the above limitation. (In Drill, there are three different phases that could be considered the logical phase, depending on which planning options are set to control Calcite.)
> [~volodymyr] points out that the the logical plan phase may be wrong because it will perform rewrites of the type he cited.
> Thus, we need to research where to insert filter push down. It must come:
> * After rewrites of the kind described above.
> * After join equivalence computations. (See DRILL-7556.)
> * Before the decision is made about the number of minor fragments.
> The goal of this ticket is to either:
> * Research to identify an existing phase which satisfies these requirements, or
> * Create a new phase.
> Due to the way Calcite works, it is not a good idea to have a single phase handle two tasks that depend on one another. That is, we cannot combine filter push down in a phase which defines the filters, nor can we add filter push-down in a phase that choose parallelism.
> Background: Calcite is a rule-based query planner inspired by [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
> The above issue is a flaw with rule-based planners and was identified as early as the [Cascades query framework paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf] which was the follow-up to Volcano.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)