You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/12/03 02:50:00 UTC
[jira] [Updated] (CALCITE-4920) Introduce logical space pruning to TopDownRuleDriver

     [ https://issues.apache.org/jira/browse/CALCITE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated CALCITE-4920:
------------------------------------
    Labels: pull-request-available  (was: )

> Introduce logical space pruning to TopDownRuleDriver
> ----------------------------------------------------
>
>                 Key: CALCITE-4920
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4920
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Jinpeng Wu
>            Assignee: Jinpeng Wu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Last year, we submit a PR, introducing the TopDownRuleDriver. The rule driver implements the top-down search strategy as suggested by the Cascades frameworks[1] and provides a basic branch and bound pruning mechanism according to the upper bound cost and lower bound cost as suggested by the Columbia paper[2].
> However, the previous version of TopDownRuleDriver can only prune implementation rules and enforcement rules, not transformation rules. The reason is major about logical properties.
> In the classic volcano/cascades model, logical properties, such as output row count, are properties that bind to an equivalent set and will never change during optimization. The Columbia optimizer[2] highly depends on this premise. However, calcite does not obey such rules. In calcite, logical properties of a RelSubset are likely to change during optimization. Actually, calcite is not the only optimizer engine that suffers. Orca's logical properties of an equivalent set also change. And it cannot have logical pruning, either.
> How does the logical properties problem prevent logical pruning? Take this plan as an example: sink <- op1 <- op2 <- scan.
> By applying a transformation rule, op1 <- op2 is transformed to op3 <- op4. So we get a new alternative plan, say sink <- op3 <- op4 <- scan, in which op3 is in the same equivalent set as op1.
> After implementations and enforcements, the sub plan (op1 <- op2 <- scan) gets fully optimized and yield a winner with cost C1.
> And now we are going to optimize op3. We know another plan in the same equivalent set has a cost of C1. So we can use C1 as a cost limit while optimizing op3. In the first step, we should build op3 into a physical plan, say impl-op3, and compute its self-cost as SC3.
> Ideally, if SC3 is already greater than C1, then we can decide that op3 will never be part of the best plan, thus the optimization of op4 can be skipped. That's the basic though of group pruning in the Columbia optimizer[2].
> Here comes the problem: when we calculate the self-cost of impl-op3, we need to leverage the metadata, like row count, of impl-op3, which will in turn ask impl-op3's input to derive its own metadata. However, the equivalent set of op4 is not yet fully explored and its row count may not be the final one. So the self-cost of impl-op3 may be incorrect. If we just apply group pruning according to such cost, op4 will lost its opportunities to explore, and also the opportunities to become the best.
> To ensure correctness, we require that all descendants are fully explored when calculating a node's cost. That's why our first version of TopDownRuleDriver only prunes implementation rules and enforcement rules.
> In the passed one year, We tried some ways to solve the problem. For example, we tried to make calcite's logical properties stable, as Xiening proposed. But the proposal was rejected as the changes of metadata after transformations are natural. We also tried to identify, by categories or annotations, rules who will never change the logical properties and give up the pruning for other rules. But we still failed because it introduced too much complexity for rule designers.
> Those failures drive us to consider the problem from the very essence: if we cannot make SC3 stable, what about we give up the usage of SC3 and leverage other costs for pruning?
> Here is a simple description of the new though. After achieving C1, we eagerly build op3 and op4, without further exploration on them. Because op4's input, the scan, is fully optimized during the optimization of op1, we can compute a stable cumulative cost of impl-op4. Let's denote it as C4. And if we find that C4 is already greater than C1, then we know C4 will never be the best node and some optimization steps could be skipped (to make it simple, let impl-op4 be the only input of impl-op3):
> 1. The enforcement rules among impl-sink, impl-op3 and impl-op4, as well as trait pass-though. These steps are not handle properly in previous version.
> 2. The traits derivation of impl-op4 and impl-op3.
> 3. The explorations of op3, if the substitution of explorations always use op4 as input. This is the key of logical pruning. I will explain it in more details later on.
> Note that, the exploration of op4 is not pruned as we don't know whether op4's other alternatives would yield a lower cost. Moreover, the implementation of op3 is not skipped as it is already applied. But the implementation of other alternatives of op3 could be skipped if the exploration is pruned.
> The new solution is a hybrid of top-down and bottom-up optimization. Optimization requests with cost limits are passed down in a top-down manner while cost propagation and pruning take place in a bottom-up manner. And it ensures that when a logical node is explored, it should already been built, if it could, and all its implementation's inputs are fully optimized. Besides better space pruning, this design also brings some other benefits:
> 1. When deriving traits, knowledge about inputs are stable and concrete. Trait derivation will only be executed once for each node (except for some corner cases). And the OMAKASE mode is fully supported.
> 2. No more cost improvement propagation in subset. And hence no more RelMetadataQuery cache invalidations.
> 3. Exploration rules and enforcement rules could use MetadataQuery without considering of regression. It is ensured that the input metadata is complete and stable.
> According to the new thought, we implements a new TopDownRuleDriver. Note that the new version is very different from the previous one. So calcite community may decide whether to and how to merge it.
> The assertion "if the substitution always use op4 as input" is easier to say than to implement. For example, op3 can transform to op5 via exploration rule, and op5<-op4 can transform to op6 <- op7. In this case, op6<-op7 does not use op4 as input any more. So the transformation from op3 to op5 should take place. How to identify them?
> An ideal solution may be we looking into the rule set, building the networks of node transformations and determining that no rules will accept the pattern op5<-op4 and than deciding that the transformation of op3 to op5 could be skipped.
> However, calcite's interface does not allow the rule driver to know what kinds of node a transformation may produce without invoking the rule. That is, we cannot know in previous that the exploration of op3 will produce op5 or any other nodes. So we use a simpler but weaker solution, we only check whether there are rules that match op4 as non-root operators (RelOptRuleOperand's ordinalInRule is not 0). This solution is not perfect, of course. There could be further improvements.
> One may doubt the correctness the assumption, that op5 always have op4 as its input if there are no rules matching for xx<-op4. Exception do exists when a rule discards inputs. Such as the FilterReduceExpression rule transform the whole subtree into an empty Values when it finds the condition is always false. Luckily, I only see substitution rules have such behaviors, and substitute rules are always applied.
> Another consideration is although we cannot identify whether an exploration will have subsequent exploration, experience tells us that it is quite rare, especially when substitute rules are already performed. So it is sometimes useful that we just give up such explorations for the consideration of performance. So we provide an option, the aggressivePruning option, with which turned on the pruning will consider no more subsequent effect of an exploration. Users may turn on this feature if they want better performance. Note that this option is default ON in the code, as we found little regressions during testings.
> To make the pruning more efficient, we introduce another rule: when optimizing a subset, the subset with the default distribution and default collation will be optimized first. This rule does not introduce regressions considering the fact that there should always be a converter from the default subset to a concrete subset and the default subset always has a cost no larger than the concrete subset (because all rels in the concrete subset should belongs to the default subset). But it introduce lots of benefits:
> - When computes the lower bound of an exploration rules, we do not know which physical properties will the future implementation requires. So the default subset provides a lower bound cost.
> - Logical nodes could be completely ignored while optimizing other subsets as they must have already been optimized when optimizing the default subset.
> - We can still use the root cost of a subtree (the SC3 in the example) to prune enforcement rules because when applying enforcement rules, we know that all inputs are fully optimized.
> Another interesting topic is introducing the parallel optimization like orca does. The new driver introduces a TaskScheduler that manages tasks. Ideally, it could records the dependencies between tasks and schedules leaf tasks in parallel. However, current version does not implement such a parallel scheduler because most data structure of VolcanoPlanner is thread safe. Community members who are interested in the parallel optimization could help improve it.
> [1] https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf
> [2] http://web.cecs.pdx.edu/~len/IDEAS01.pdf



--
This message was sent by Atlassian Jira
(v8.20.1#820001)