You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Vladimir Ozerov <pp...@gmail.com> on 2021/03/08 07:25:38 UTC

Make RelNode attribute order part of MEMO

Hi,

Currently, the order of attributes is used to define operators equivalence.
This leads to several interesting problems, such as possible duplication of
expressions (e.g., "a,a,b") or additional work in rules to detect trivial
projects and/or input permutations (e.g. "a,b -> b,a").

But the biggest problem is the join order planning. In Calcite, AxB is not
equivalent to BxA:

   1. It makes the ruleset [JoinCommuteRule, JoinAssociateRule]
   insufficient to explore all bushy trees because the commute operation adds
   a project on top of the new join (AxB -> project(BxA)), thus not
   allowing for the associate rule to be executed on the upper join. The
   solution is to add the ProjectJoinTransposeRule to the ruleset, but this
   increases the planning time dramatically, making Apache Calcite unsuitable
   for the cost-based join planning even for relatively simple join graphs.
   2. It increases the number of join-related rule calls, which complicates
   the implementation of the new join enumeration planning rules (e.g.
   top-down enumerators) because duplicate derivations compromise performance.

My question is - has the community considered an idea to make the order of
columns a common property of all operators, somewhat similar to the trait,
but without an explicit enforcer?

For example, consider the following MEMO which the planner creates when
trying to transform the join AxB to BxA:

G1: { Scan1[table=t1, cols=a,b] }
G2: { Scan2[table=t2, cols=c,d] }
G3: { AxB[G1, G2], Project[G4, cols=$2,$3,$0,$1] }
G4: { BxA[G2, G1] }

However, if we make the column order part of the MEMO, we may potentially
have something like this:

G1: { Scan1[table=t1, cols=a,b] }
G2: { Scan2[table=t2, cols=c,d] }
G3 [cols=G1.$0, G1.$1, G2.$0, G2.$1]: { AxB[G1, G2], BxA[G2, G1] }

Notice, how we were able to put AxB and BxA to the same equivalence group.
To my knowledge, CockroachDB uses a somewhat similar design.

I realize that this is rather a radical idea, and more design work is
required to come with a proper proposal. At this point, I just would like
to kindly ask the community to share high-level feedback on that. Were
similar ideas proposed before?

Thank you,
Vladimir.

Re: Make RelNode attribute order part of MEMO

Posted by Julian Hyde <jh...@gmail.com>.

> There are hundreds of usages of the RexInputRef, so the implementation of
> this idea might be prohibitively expensive. But putting this problem aside,
> do you envision any other potential blockers for the implementation of that
> idea?

Suppose we wanted to build permutations into every registered RelNode. It’s rather difficult to put aside this problem, given that it would affect every RelOptRule (including ones in user code that we can’t fix).

Practically, we could achieve this by introducing a new API for rules (say an alternative to RelOptRuleCall.rels that returns the matched relational expressions and their field permutations.

One issue I see with unique field IDs is self-joins. Consider the query

SELECT *
FROM Emp AS e
  JOIN Emp AS m ON e.mgr = m.empno
WHERE e.sal > m.sal

Do e.sal and m.sal have the same field ID? For some purposes we would want them to have the same ID. But then the self-join brings them side-by-side, and we can’t have the same field ID twice in the result, so we have to arbitrarily give one of them a new ID. That arbitrariness is concerning.

We would save a lot of effort figuring out mappings between input and output fields. But we would have to put a lot of effort figuring out when to split fields (the self-join example above), combine them (when two columns become equal due to an = predicate) and other mappings (applying aggregate functions and expressions with certain identities).

We would end up maintaining equivalence sets of fields, because we did not know that two fields were identical when we created the RelNodes and discover the fact later. In the current scheme, based on int field offsets, all of that reasoning is local to rules and RelNodes, and that is a good thing. 

Julian


> On Mar 13, 2021, at 8:55 AM, Vladimir Ozerov <pp...@gmail.com> wrote:
> 
> Hi Julian,
> 
> Thank you for sharing these issues. We end up with almost the same ideas.
> We attempted to add an input permute to Join, which allowed us to avoid
> projects. However, that complicates the integration with other rules, just
> as you mention in [2]. Globally unique column IDs seem like a better option
> from that perspective. Moreover, unique IDs may simplify the implementation
> of other optimizations. For example, many join enumeration techniques,
> whether DP-based or top-down, require the decomposition of the join graph
> into independent vertices (inputs) and edges (conditions) and careful
> reconstruction of the alternative join trees, and RexInputRef is not very
> suitable for that process. Another possible example is recently reported
> [3].
> 
> There are hundreds of usages of the RexInputRef, so the implementation of
> this idea might be prohibitively expensive. But putting this problem aside,
> do you envision any other potential blockers for the implementation of that
> idea?
> 
> Regards,
> Vladimir.
> 
> [1] https://github.com/apache/calcite/pull/2359
> [2] https://issues.apache.org/jira/browse/CALCITE-62
> [3] https://issues.apache.org/jira/browse/CALCITE-4534
> 
> вт, 9 мар. 2021 г. в 22:40, Julian Hyde <jh...@apache.org>:
> 
>> I investigated something similar a long time ago. We noticed that a
>> lot of trivial Project operators were being generated to compensate
>> for field re-ordering due to join transposition. And so the idea was
>> to allow each RelNode (and especially Join) to permute its output
>> fields.
>> 
>> Here is the case: https://issues.apache.org/jira/browse/CALCITE-62.
>> https://issues.apache.org/jira/browse/CALCITE-55 is related.
>> 
>> The problem, as I noted in CALCITE-62, is that people writing rules
>> have another mapping to deal with.
>> 
>> I believe that other systems, such as Spark's Catalyst planner, use
>> globally unique IDs for columns (as opposed to Calcite, whose column
>> references are only locally unique, ordinals of the input
>> operator(s)). Globally unique IDs would be superior for this problem
>> but perhaps introduce other challenges.
>> 
>> Julian
>> 
>> On Sun, Mar 7, 2021 at 11:25 PM Vladimir Ozerov <pp...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Currently, the order of attributes is used to define operators
>> equivalence.
>>> This leads to several interesting problems, such as possible duplication
>> of
>>> expressions (e.g., "a,a,b") or additional work in rules to detect trivial
>>> projects and/or input permutations (e.g. "a,b -> b,a").
>>> 
>>> But the biggest problem is the join order planning. In Calcite, AxB is
>> not
>>> equivalent to BxA:
>>> 
>>>   1. It makes the ruleset [JoinCommuteRule, JoinAssociateRule]
>>>   insufficient to explore all bushy trees because the commute operation
>> adds
>>>   a project on top of the new join (AxB -> project(BxA)), thus not
>>>   allowing for the associate rule to be executed on the upper join. The
>>>   solution is to add the ProjectJoinTransposeRule to the ruleset, but
>> this
>>>   increases the planning time dramatically, making Apache Calcite
>> unsuitable
>>>   for the cost-based join planning even for relatively simple join
>> graphs.
>>>   2. It increases the number of join-related rule calls, which
>> complicates
>>>   the implementation of the new join enumeration planning rules (e.g.
>>>   top-down enumerators) because duplicate derivations compromise
>> performance.
>>> 
>>> My question is - has the community considered an idea to make the order
>> of
>>> columns a common property of all operators, somewhat similar to the
>> trait,
>>> but without an explicit enforcer?
>>> 
>>> For example, consider the following MEMO which the planner creates when
>>> trying to transform the join AxB to BxA:
>>> 
>>> G1: { Scan1[table=t1, cols=a,b] }
>>> G2: { Scan2[table=t2, cols=c,d] }
>>> G3: { AxB[G1, G2], Project[G4, cols=$2,$3,$0,$1] }
>>> G4: { BxA[G2, G1] }
>>> 
>>> However, if we make the column order part of the MEMO, we may potentially
>>> have something like this:
>>> 
>>> G1: { Scan1[table=t1, cols=a,b] }
>>> G2: { Scan2[table=t2, cols=c,d] }
>>> G3 [cols=G1.$0, G1.$1, G2.$0, G2.$1]: { AxB[G1, G2], BxA[G2, G1] }
>>> 
>>> Notice, how we were able to put AxB and BxA to the same equivalence
>> group.
>>> To my knowledge, CockroachDB uses a somewhat similar design.
>>> 
>>> I realize that this is rather a radical idea, and more design work is
>>> required to come with a proper proposal. At this point, I just would like
>>> to kindly ask the community to share high-level feedback on that. Were
>>> similar ideas proposed before?
>>> 
>>> Thank you,
>>> Vladimir.
>>

Re: Make RelNode attribute order part of MEMO

Posted by Vladimir Ozerov <pp...@gmail.com>.

Hi Julian,

Thank you for sharing these issues. We end up with almost the same ideas.
We attempted to add an input permute to Join, which allowed us to avoid
projects. However, that complicates the integration with other rules, just
as you mention in [2]. Globally unique column IDs seem like a better option
from that perspective. Moreover, unique IDs may simplify the implementation
of other optimizations. For example, many join enumeration techniques,
whether DP-based or top-down, require the decomposition of the join graph
into independent vertices (inputs) and edges (conditions) and careful
reconstruction of the alternative join trees, and RexInputRef is not very
suitable for that process. Another possible example is recently reported
[3].

There are hundreds of usages of the RexInputRef, so the implementation of
this idea might be prohibitively expensive. But putting this problem aside,
do you envision any other potential blockers for the implementation of that
idea?

Regards,
Vladimir.

[1] https://github.com/apache/calcite/pull/2359
[2] https://issues.apache.org/jira/browse/CALCITE-62
[3] https://issues.apache.org/jira/browse/CALCITE-4534

вт, 9 мар. 2021 г. в 22:40, Julian Hyde <jh...@apache.org>:

> I investigated something similar a long time ago. We noticed that a
> lot of trivial Project operators were being generated to compensate
> for field re-ordering due to join transposition. And so the idea was
> to allow each RelNode (and especially Join) to permute its output
> fields.
>
> Here is the case: https://issues.apache.org/jira/browse/CALCITE-62.
> https://issues.apache.org/jira/browse/CALCITE-55 is related.
>
> The problem, as I noted in CALCITE-62, is that people writing rules
> have another mapping to deal with.
>
> I believe that other systems, such as Spark's Catalyst planner, use
> globally unique IDs for columns (as opposed to Calcite, whose column
> references are only locally unique, ordinals of the input
> operator(s)). Globally unique IDs would be superior for this problem
> but perhaps introduce other challenges.
>
> Julian
>
> On Sun, Mar 7, 2021 at 11:25 PM Vladimir Ozerov <pp...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Currently, the order of attributes is used to define operators
> equivalence.
> > This leads to several interesting problems, such as possible duplication
> of
> > expressions (e.g., "a,a,b") or additional work in rules to detect trivial
> > projects and/or input permutations (e.g. "a,b -> b,a").
> >
> > But the biggest problem is the join order planning. In Calcite, AxB is
> not
> > equivalent to BxA:
> >
> >    1. It makes the ruleset [JoinCommuteRule, JoinAssociateRule]
> >    insufficient to explore all bushy trees because the commute operation
> adds
> >    a project on top of the new join (AxB -> project(BxA)), thus not
> >    allowing for the associate rule to be executed on the upper join. The
> >    solution is to add the ProjectJoinTransposeRule to the ruleset, but
> this
> >    increases the planning time dramatically, making Apache Calcite
> unsuitable
> >    for the cost-based join planning even for relatively simple join
> graphs.
> >    2. It increases the number of join-related rule calls, which
> complicates
> >    the implementation of the new join enumeration planning rules (e.g.
> >    top-down enumerators) because duplicate derivations compromise
> performance.
> >
> > My question is - has the community considered an idea to make the order
> of
> > columns a common property of all operators, somewhat similar to the
> trait,
> > but without an explicit enforcer?
> >
> > For example, consider the following MEMO which the planner creates when
> > trying to transform the join AxB to BxA:
> >
> > G1: { Scan1[table=t1, cols=a,b] }
> > G2: { Scan2[table=t2, cols=c,d] }
> > G3: { AxB[G1, G2], Project[G4, cols=$2,$3,$0,$1] }
> > G4: { BxA[G2, G1] }
> >
> > However, if we make the column order part of the MEMO, we may potentially
> > have something like this:
> >
> > G1: { Scan1[table=t1, cols=a,b] }
> > G2: { Scan2[table=t2, cols=c,d] }
> > G3 [cols=G1.$0, G1.$1, G2.$0, G2.$1]: { AxB[G1, G2], BxA[G2, G1] }
> >
> > Notice, how we were able to put AxB and BxA to the same equivalence
> group.
> > To my knowledge, CockroachDB uses a somewhat similar design.
> >
> > I realize that this is rather a radical idea, and more design work is
> > required to come with a proper proposal. At this point, I just would like
> > to kindly ask the community to share high-level feedback on that. Were
> > similar ideas proposed before?
> >
> > Thank you,
> > Vladimir.
>

Re: Make RelNode attribute order part of MEMO

Posted by Julian Hyde <jh...@apache.org>.

I investigated something similar a long time ago. We noticed that a
lot of trivial Project operators were being generated to compensate
for field re-ordering due to join transposition. And so the idea was
to allow each RelNode (and especially Join) to permute its output
fields.

Here is the case: https://issues.apache.org/jira/browse/CALCITE-62.
https://issues.apache.org/jira/browse/CALCITE-55 is related.

The problem, as I noted in CALCITE-62, is that people writing rules
have another mapping to deal with.

I believe that other systems, such as Spark's Catalyst planner, use
globally unique IDs for columns (as opposed to Calcite, whose column
references are only locally unique, ordinals of the input
operator(s)). Globally unique IDs would be superior for this problem
but perhaps introduce other challenges.

Julian

On Sun, Mar 7, 2021 at 11:25 PM Vladimir Ozerov <pp...@gmail.com> wrote:
>
> Hi,
>
> Currently, the order of attributes is used to define operators equivalence.
> This leads to several interesting problems, such as possible duplication of
> expressions (e.g., "a,a,b") or additional work in rules to detect trivial
> projects and/or input permutations (e.g. "a,b -> b,a").
>
> But the biggest problem is the join order planning. In Calcite, AxB is not
> equivalent to BxA:
>
>    1. It makes the ruleset [JoinCommuteRule, JoinAssociateRule]
>    insufficient to explore all bushy trees because the commute operation adds
>    a project on top of the new join (AxB -> project(BxA)), thus not
>    allowing for the associate rule to be executed on the upper join. The
>    solution is to add the ProjectJoinTransposeRule to the ruleset, but this
>    increases the planning time dramatically, making Apache Calcite unsuitable
>    for the cost-based join planning even for relatively simple join graphs.
>    2. It increases the number of join-related rule calls, which complicates
>    the implementation of the new join enumeration planning rules (e.g.
>    top-down enumerators) because duplicate derivations compromise performance.
>
> My question is - has the community considered an idea to make the order of
> columns a common property of all operators, somewhat similar to the trait,
> but without an explicit enforcer?
>
> For example, consider the following MEMO which the planner creates when
> trying to transform the join AxB to BxA:
>
> G1: { Scan1[table=t1, cols=a,b] }
> G2: { Scan2[table=t2, cols=c,d] }
> G3: { AxB[G1, G2], Project[G4, cols=$2,$3,$0,$1] }
> G4: { BxA[G2, G1] }
>
> However, if we make the column order part of the MEMO, we may potentially
> have something like this:
>
> G1: { Scan1[table=t1, cols=a,b] }
> G2: { Scan2[table=t2, cols=c,d] }
> G3 [cols=G1.$0, G1.$1, G2.$0, G2.$1]: { AxB[G1, G2], BxA[G2, G1] }
>
> Notice, how we were able to put AxB and BxA to the same equivalence group.
> To my knowledge, CockroachDB uses a somewhat similar design.
>
> I realize that this is rather a radical idea, and more design work is
> required to come with a proper proposal. At this point, I just would like
> to kindly ask the community to share high-level feedback on that. Were
> similar ideas proposed before?
>
> Thank you,
> Vladimir.