You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Priyendra Deshwal <pr...@gmail.com> on 2021/03/07 15:52:14 UTC

TopDown Optimizer Questions

Hello friends,

I have been learning about the top-down optimizer and I am trying to
understand the semantics of the passThrough and derive calls. I have a few
questions about those.

   1. The main passThrough call is `RelNode passThrough(RelTraitSet
   required)`. It is my understanding that this is called when the optimizer
   wants to "rewrite" a `RelNode` to have certain `required` traits. Few
   questions here:
      1. This "rewrite" may involve passing the traits down to the
      children. All the example implementations that I have come across only
      propagate the traits down one depth from parent to children.
That is, there
      is no recursive `passThrough` call on the children. Is that recursive
      propagation handled in the `TopDownRuleDriver` somewhere?
      2. What should the behavior of a RelNode be when it is unable to
      honor the requirements? Should it return null?
      3. At what point does the enforcer operator come into play? Is it
      the RelNodes responsibility to add the enforcer operator for
cases where it
      is unable to honor a traitset or does the planner somehow do that
      automatically?
      4. The current `RelNode` may already have certain traits. Should the
      required traits overwrite the existing traits or should they be "merged"
      into the existing traits?
   2. I am unclear on the semantics of the `derive` call. The various
   derive modes add to my confusion.
      1. I tried searching for DeriveMode.BOTH and found no references in
      the main planner machinery for it. Is the implementation of that
currently
      incomplete? Only LEFT_FIRST, RIGHT_FIRST and OMAKASE are referenced in
      TopDownRuleDriver.java.
      2. Can someone briefly explain what are the expected semantics of the
      following calls? I am confused by the semantics because the first one
      returns a single RelNode vs the second one returns a List<> and the
      arguments are also quite different between the two methods.
         1. RelNode derive(RelTraitSet childTraits, int childId);
         2. List<RelNode> derive(List<List<RelTraitSet>> inputTraits);
         3. Looking at the default implementation of RelNode
      derive(RelTraitSet childTraits, int childId) confused me a bit. My
      understanding of derive was that it will take the child's traits and pass
      those up the tree. But the default implementation is also
calling `convert`
      on the children. That suggests that even the children are being
modified in
      this call. Why is that necessary?

Apologies for the barrage of questions and thanks in advance for in advance.

Regards!

Re: TopDown Optimizer Questions

Posted by Fan Liya <li...@gmail.com>.

Hi Priyendra,

I am not sure if I fully understand your questions, and I want to try to
answer this one:

"*At what point does the enforcer operator come into play? Is it the
RelNodes responsibility *
*to add the enforcer operator for cases where it is unable to honor a
traitset or does the planner *
*somehow do that automatically?*"

When a new RelSubset is created, some AbstractConverter objects are created
connecting the
different RelSubsets in the same RelSet. Please see the code of
RelSubset#getOrCreateSubset,
and you can see it calls the RelSet#addConverters method, which creates
the  AbstractConverter
objects.

Later in the optimization process, (e.g. in the plan implementation phase),
the  AbstractConverter objects are converted into some physical nodes
depending on the specific context.
For example, when a Collation is required, the  AbstractConverter object
may be translated to a Sort operator.
The conversion is usually performed by a ConverterRule, and often depends
on the specific system.

After the  AbstractConverter objects are converted, a feasible physical
plan can be produced, and the
planner can choose the one with the minimum cost.

Best,
Liya Fan

On Sun, Mar 7, 2021 at 11:53 PM Priyendra Deshwal <pr...@gmail.com>
wrote:

> Hello friends,
>
> I have been learning about the top-down optimizer and I am trying to
> understand the semantics of the passThrough and derive calls. I have a few
> questions about those.
>
>    1. The main passThrough call is `RelNode passThrough(RelTraitSet
>    required)`. It is my understanding that this is called when the
> optimizer
>    wants to "rewrite" a `RelNode` to have certain `required` traits. Few
>    questions here:
>       1. This "rewrite" may involve passing the traits down to the
>       children. All the example implementations that I have come across
> only
>       propagate the traits down one depth from parent to children.
> That is, there
>       is no recursive `passThrough` call on the children. Is that recursive
>       propagation handled in the `TopDownRuleDriver` somewhere?
>       2. What should the behavior of a RelNode be when it is unable to
>       honor the requirements? Should it return null?
>       3. At what point does the enforcer operator come into play? Is it
>       the RelNodes responsibility to add the enforcer operator for
> cases where it
>       is unable to honor a traitset or does the planner somehow do that
>       automatically?
>       4. The current `RelNode` may already have certain traits. Should the
>       required traits overwrite the existing traits or should they be
> "merged"
>       into the existing traits?
>    2. I am unclear on the semantics of the `derive` call. The various
>    derive modes add to my confusion.
>       1. I tried searching for DeriveMode.BOTH and found no references in
>       the main planner machinery for it. Is the implementation of that
> currently
>       incomplete? Only LEFT_FIRST, RIGHT_FIRST and OMAKASE are referenced
> in
>       TopDownRuleDriver.java.
>       2. Can someone briefly explain what are the expected semantics of the
>       following calls? I am confused by the semantics because the first one
>       returns a single RelNode vs the second one returns a List<> and the
>       arguments are also quite different between the two methods.
>          1. RelNode derive(RelTraitSet childTraits, int childId);
>          2. List<RelNode> derive(List<List<RelTraitSet>> inputTraits);
>          3. Looking at the default implementation of RelNode
>       derive(RelTraitSet childTraits, int childId) confused me a bit. My
>       understanding of derive was that it will take the child's traits and
> pass
>       those up the tree. But the default implementation is also
> calling `convert`
>       on the children. That suggests that even the children are being
> modified in
>       this call. Why is that necessary?
>
> Apologies for the barrage of questions and thanks in advance for in
> advance.
>
> Regards!
>