You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Андрей Цвелодуб <a....@gmail.com> on 2018/11/08 20:50:47 UTC

Question about complex rule operands and Rel tree in general

Hello everyone!

I have a question that I can't find an answer to, so maybe someone could
help me.
As a part of Rel Rules, there is always an operand, that matches a part of
the tree, and says if the rule should be executed.
The operand can be complex, so I can say for example - match an Aggregate
on top of Project on top of Filter. AFAIU, this operand will only match if
exactly this three nodes will be somewhere in the tree.
But here is my question - what if I want a rule that will match a more
generic structure, like this
>Aggregate
>-...
>--* any number of any nodes in any levels
>---...
>----  Project
Is there an official way to do that?

My first approach was to match any Aggregate and then try to inspect the
underlying tree in matches()/onMatch(), but this turned out to be quite
unreliable since it involves inspecting RelSubsets (and this shouldn't be
done, as follows from
https://lists.apache.org/thread.html/ee2349272e9d344228595c0940820b2fc525cc6115388c48e99495a6@%3Cdev.calcite.apache.org%3E).


In case I'm doing it all wrong, I can formulate my question even broader -
is there a mechanism to perform validation of the execution tree during the
planning process, i.e. skip some plans as unimplementable based on their
internal structure. As an example imagine I want to say that in
JdbcConvention, all plans that have a Filter node, over a Project node that
has more than three fields, should not be implemented. (Modifying cost
calculation is also not an option since the plan still has RelSubsets)

I hope this makes sense, and thanks in advance!

Best Regards,
Andrew Tsvielodub

Re: Question about complex rule operands and Rel tree in general

Posted by Андрей Цвелодуб <a....@gmail.com>.
Yes, I have already appreciated the value and flexibility that predefined
rules provide, and would not want to lose it :)
I didn't look into the query metadata much before, but that looks promising!
Thanks!

On Thu, Nov 8, 2018 at 11:05 PM Julian Hyde <jh...@apache.org> wrote:

> For validation (i.e. that doesn’t modify the tree), I would use a visitor.
> RelVisitor may suffice.
>
> There are also a few “whole tree” transformations, e.g. column pruning.
> Use sparingly.
>
> You are correct that rules and their operands do not “scale” to match
> large sections of the tree. We could in principle extend matching a little
> (e.g. better handing of Union with many inputs) but the locality is mostly
> a good thing. In a Volcano graph, there are multiple nodes in each
> equivalence set, therefore huge numbers of paths through the graph. Deep
> matches would quickly become intractable.
>
> I strongly recommend using traits, and in particular predicates
> (RelMdPredicates / RelOptPredicateList). Let’s suppose you want to know
> whether a particular input column is always equal to 5. You could write a
> rule that looks for a Project several layers down whose expression is the
> literal 5. But much better is to look at the predicates. Predicates are
> propagated up the tree, which means you don’t need to look at the
> structure, and you can reason and act locally.
>
> Similar arguments apply for sort and distribution (which are also traits).
>
> If are able to package your logic into a RelOptRule you will be pleased
> with the results. It composes beautifully and efficiently with the hundreds
> of other rules, and with all the flavors of metadata.
>
> Julian
>
>
> > On Nov 8, 2018, at 12:50 PM, Андрей Цвелодуб <a....@gmail.com>
> wrote:
> >
> > Hello everyone!
> >
> > I have a question that I can't find an answer to, so maybe someone could
> > help me.
> > As a part of Rel Rules, there is always an operand, that matches a part
> of
> > the tree, and says if the rule should be executed.
> > The operand can be complex, so I can say for example - match an Aggregate
> > on top of Project on top of Filter. AFAIU, this operand will only match
> if
> > exactly this three nodes will be somewhere in the tree.
> > But here is my question - what if I want a rule that will match a more
> > generic structure, like this
> >> Aggregate
> >> -...
> >> --* any number of any nodes in any levels
> >> ---...
> >> ----  Project
> > Is there an official way to do that?
> >
> > My first approach was to match any Aggregate and then try to inspect the
> > underlying tree in matches()/onMatch(), but this turned out to be quite
> > unreliable since it involves inspecting RelSubsets (and this shouldn't be
> > done, as follows from
> >
> https://lists.apache.org/thread.html/ee2349272e9d344228595c0940820b2fc525cc6115388c48e99495a6@%3Cdev.calcite.apache.org%3E
> ).
> >
> >
> > In case I'm doing it all wrong, I can formulate my question even broader
> -
> > is there a mechanism to perform validation of the execution tree during
> the
> > planning process, i.e. skip some plans as unimplementable based on their
> > internal structure. As an example imagine I want to say that in
> > JdbcConvention, all plans that have a Filter node, over a Project node
> that
> > has more than three fields, should not be implemented. (Modifying cost
> > calculation is also not an option since the plan still has RelSubsets)
> >
> > I hope this makes sense, and thanks in advance!
> >
> > Best Regards,
> > Andrew Tsvielodub
>
>

Re: Question about complex rule operands and Rel tree in general

Posted by Julian Hyde <jh...@apache.org>.
For validation (i.e. that doesn’t modify the tree), I would use a visitor. RelVisitor may suffice.

There are also a few “whole tree” transformations, e.g. column pruning. Use sparingly.

You are correct that rules and their operands do not “scale” to match large sections of the tree. We could in principle extend matching a little (e.g. better handing of Union with many inputs) but the locality is mostly a good thing. In a Volcano graph, there are multiple nodes in each equivalence set, therefore huge numbers of paths through the graph. Deep matches would quickly become intractable.

I strongly recommend using traits, and in particular predicates (RelMdPredicates / RelOptPredicateList). Let’s suppose you want to know whether a particular input column is always equal to 5. You could write a rule that looks for a Project several layers down whose expression is the literal 5. But much better is to look at the predicates. Predicates are propagated up the tree, which means you don’t need to look at the structure, and you can reason and act locally.

Similar arguments apply for sort and distribution (which are also traits).

If are able to package your logic into a RelOptRule you will be pleased with the results. It composes beautifully and efficiently with the hundreds of other rules, and with all the flavors of metadata.

Julian


> On Nov 8, 2018, at 12:50 PM, Андрей Цвелодуб <a....@gmail.com> wrote:
> 
> Hello everyone!
> 
> I have a question that I can't find an answer to, so maybe someone could
> help me.
> As a part of Rel Rules, there is always an operand, that matches a part of
> the tree, and says if the rule should be executed.
> The operand can be complex, so I can say for example - match an Aggregate
> on top of Project on top of Filter. AFAIU, this operand will only match if
> exactly this three nodes will be somewhere in the tree.
> But here is my question - what if I want a rule that will match a more
> generic structure, like this
>> Aggregate
>> -...
>> --* any number of any nodes in any levels
>> ---...
>> ----  Project
> Is there an official way to do that?
> 
> My first approach was to match any Aggregate and then try to inspect the
> underlying tree in matches()/onMatch(), but this turned out to be quite
> unreliable since it involves inspecting RelSubsets (and this shouldn't be
> done, as follows from
> https://lists.apache.org/thread.html/ee2349272e9d344228595c0940820b2fc525cc6115388c48e99495a6@%3Cdev.calcite.apache.org%3E).
> 
> 
> In case I'm doing it all wrong, I can formulate my question even broader -
> is there a mechanism to perform validation of the execution tree during the
> planning process, i.e. skip some plans as unimplementable based on their
> internal structure. As an example imagine I want to say that in
> JdbcConvention, all plans that have a Filter node, over a Project node that
> has more than three fields, should not be implemented. (Modifying cost
> calculation is also not an option since the plan still has RelSubsets)
> 
> I hope this makes sense, and thanks in advance!
> 
> Best Regards,
> Andrew Tsvielodub