You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Roman Kondakov <ko...@mail.ru.INVALID> on 2020/01/07 10:35:00 UTC

Question about Volcano's planner root converters

Hello!

I have a couple questions about method
VolcanoPlanner#ensureRootConverters [1]. In this method we first
calculate the difference of traits between root subset and other subsets
which belong to the same set as the root:

ImmutableList<RelTrait> difference =
          root.getTraitSet().difference(subset.getTraitSet());

and then, if the differences list size equals to 1, we register a new
AbstractConverter from the given subset's trait to the root trait.

The problems I can see here are caused by the strict condition that
difference.size() should be equals to 1 in order to create converters to
the root traits. This requirement may often lead to CanNotPlanException.
Let's consider an example: In the root set  we have a root node with
traits of distribution SINGLETON and empty sortedness [] (I omitted
convention trait for simplicity) and Subset of physical nodes with
distribution HASH and sortedness [1]. Set dump would look like this in
the given case:

Set#1 <-- root set
  rel#29:Subset#1.SINGLETON.[] <-- root subset
    [empty subset]
  rel#20:Subset#2.HASH.[1]
    rel#27:IgniteTableScan.HASH.[1]

Root converters were not created here because the difference size
between traitsets was 2: HASH != SINGLETON and [] != [1]. But it looks
strange because an empty collation [] includes [1] as well as all
possible collations. So, the actual traits difference size is 1. And
converter from HASH to SINGLETON might be created here.

My questions are
1. Should we change the condition of root converters creation from the
direct traitset difference to the difference with considering traits
hierarchy (like [] includes [1])?
2. Should we cover the case when the actual traits difference size is
more than 1? I.e. root traits are SINGLETON.[2] and neighbor traits are
HASH.[1]. In this case we can create the chain of converters:

Subset.HASH.[1] ->
AbstractConverter[HASH->SINGLETON] ->
AbstractConverter[[1]->[2]] ->
RootSubset[SINGLETON.[2]]

This may help us to avoid CanNotPlanException in many cases at the cost
of creation of N! chains of abstract converters. I guess N should be
relatively small here (N <= 3) because traits dimensionality is usually
small.

What do you think?


[1]
https://github.com/apache/calcite/blob/963a266d7b3859d141964217f4d16cae350b10e1/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L675

-- 
Kind Regards
Roman Kondakov


Re: Question about Volcano's planner root converters

Posted by Haisheng Yuan <h....@alibaba-inc.com>.
> 1. Should we change the condition of root converters creation from the
direct traitset difference to the difference with considering traits
hierarchy (like [] includes [1])?

I think so. Specifically non-empty required trait diff. Singleton is the only diff in your example. It doesn't care about sort order.

> 2. Should we cover the case when the actual traits difference size is
more than 1? I.e. root traits are SINGLETON.[2] and neighbor traits are
HASH.[1].

Yes. This is quite common for MPP databases where there is a master node to gather all the data from segments. Note that it should not be N!. Ideally, each physical operator should have API to derive properties from child. e.g. Singleton/Gather physical operator will deliver singleton distribution, but will break sort order unless you gather merge the data, even the child is a Sort node. But Sort operator will keep the distribution of child operator.

So we can either do
Sort
  +-- GatherMotion
               +-- TableScan
or
GatherMerge
       +-- Sort
               +-- TableScan

- Haisheng

------------------------------------------------------------------
发件人:Roman Kondakov<ko...@mail.ru.INVALID>
日 期:2020年01月07日 18:35:00
收件人:<de...@calcite.apache.org>
主 题:Question about Volcano's planner root converters

Hello!

I have a couple questions about method
VolcanoPlanner#ensureRootConverters [1]. In this method we first
calculate the difference of traits between root subset and other subsets
which belong to the same set as the root:

ImmutableList<RelTrait> difference =
          root.getTraitSet().difference(subset.getTraitSet());

and then, if the differences list size equals to 1, we register a new
AbstractConverter from the given subset's trait to the root trait.

The problems I can see here are caused by the strict condition that
difference.size() should be equals to 1 in order to create converters to
the root traits. This requirement may often lead to CanNotPlanException.
Let's consider an example: In the root set  we have a root node with
traits of distribution SINGLETON and empty sortedness [] (I omitted
convention trait for simplicity) and Subset of physical nodes with
distribution HASH and sortedness [1]. Set dump would look like this in
the given case:

Set#1 <-- root set
  rel#29:Subset#1.SINGLETON.[] <-- root subset
    [empty subset]
  rel#20:Subset#2.HASH.[1]
    rel#27:IgniteTableScan.HASH.[1]

Root converters were not created here because the difference size
between traitsets was 2: HASH != SINGLETON and [] != [1]. But it looks
strange because an empty collation [] includes [1] as well as all
possible collations. So, the actual traits difference size is 1. And
converter from HASH to SINGLETON might be created here.

My questions are
1. Should we change the condition of root converters creation from the
direct traitset difference to the difference with considering traits
hierarchy (like [] includes [1])?
2. Should we cover the case when the actual traits difference size is
more than 1? I.e. root traits are SINGLETON.[2] and neighbor traits are
HASH.[1]. In this case we can create the chain of converters:

Subset.HASH.[1] ->
AbstractConverter[HASH->SINGLETON] ->
AbstractConverter[[1]->[2]] ->
RootSubset[SINGLETON.[2]]

This may help us to avoid CanNotPlanException in many cases at the cost
of creation of N! chains of abstract converters. I guess N should be
relatively small here (N <= 3) because traits dimensionality is usually
small.

What do you think?


[1]
https://github.com/apache/calcite/blob/963a266d7b3859d141964217f4d16cae350b10e1/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L675

-- 
Kind Regards
Roman Kondakov