You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2019/12/29 19:09:30 UTC

[DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Hi,

We have a 1-year old issue with an idea to sort RexNode operands so they
are consistent.

For instance, "x=5" and "5=x" have the same semantics, so it would make
sense to stick to a single implementation.
A discussion can be found in
https://issues.apache.org/jira/browse/CALCITE-2450

We do not normalize RexNodes, thus it results in excessive planning time,
especially when the planner is trying to reorder joins.
For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are
different joins, however, they are equivalent.

The normalization does not seem to cost much, however, it enables me to
activate more rules (e.g. EnumerabeMergeRule),
so it is good as it enables to consider more sophisticated plans.

I see two approaches:
a) Normalize in RexNode constructor. This seems easy to implement, however,
there's a catch
if someone assumed that the order of operands would be the same as the one
that was passed to the constructor.
I don't think there are such assumptions in the wild, but there might be.
The javadoc for the relevant methods says nothing regarding the operand
order.
However, the good thing would be RexNode would feel the same in the
debugger and in its toString representation.

b) Normalize at RexCall#computeDigest only.
In other words, keep the operands unsorted, but make sure the digest is
created as if the operands were sorted.
This seems to be the most transparent change, however, it might surprise
that `toString` does not match to whatever is seen in the debugger.

In any case, making `RexCall#toString` print sorted representation would
alter lots of tests.
For :core it is like 5540 tests completed, 358 failed, 91 skipped :((

WDYT?

Hopefully, making the RexNode representation sorted would reduce the number
of `$1=$0` vs `$0=$1` plan diffs.

Vladimir

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

It turned out "b" (sort operands in computeDigest) is easier to implement.
I've filed a PR: https://github.com/apache/calcite/pull/1703

>($0, 2) vs <(2, $0) might be less trivial to implement, but I think it is
worth doing at the same time.
In any case, lots of expressions will need to be updated, and it is better
if we make < vs > stable as well.

A similar question is if we want to normalize >($0, 2) and <(2, $0)

The suggested ordering is:
1) Order by string length
2) Order by string representation

In other words,  >($0, 2) normalizes to <(2, $0) because 2 is shorter.
And >($0, +(2,3)) is kept intact


Note: I do not have an immediate demand for normalizing < vs > (and >= vs
<=),
however, it looks like it is worth doing to minimize the overall damage.

Vladimir

Re: Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

>> =(CAST(PREV(UP.$0, 0)):INTEGER NOT NULL, 100)
I don't see the value of reordering this kind of expression. In GPDB, var vs constant comparison is reordered for input ref only, without additional function calls, like $1=5, because it is transformed into range constraints for rex simplification. Perhaps Calcite doesn't need this.


>> What do you think re "$n.field = 42" where $n.field is a dot operator I'm not fond of adding complicated checks there, however, I think I can move the literal to the right if the other side is a non-literal.

Sounds good. I am also fine to keep it unchanged.


>> "sort by numeric value" for cases like $1=$8, so that semantics would be more or less automatic.
I think this is good enough for Calcite, and it has same effect with what I said. Because operand with smaller index always come from left relation, if the two come from different relations.


- Haisheng

------------------------------------------------------------------
发件人：Vladimir Sitnikov<si...@gmail.com>
日　期：2019年12月30日 04:33:11
收件人：Apache Calcite dev list<de...@calcite.apache.org>
主　题：Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Just in case, my motivation of comparing by string length first is for the
cases like below:

=(CAST(PREV(UP.$0, 0)):INTEGER NOT NULL, 100)

vs

=(100, CAST(PREV(UP.$0, 0)):INTEGER NOT NULL)

As for me, the second one is easier to understand, do the expression starts
with simpler bits, and
the complicated parts are put later.

In the same way, it is not amusing to see cases like
AND(...,...,...,...,,,..,  null, ....)

Vladimir

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

Just in case, my motivation of comparing by string length first is for the
cases like below:

=(CAST(PREV(UP.$0, 0)):INTEGER NOT NULL, 100)

vs

=(100, CAST(PREV(UP.$0, 0)):INTEGER NOT NULL)

As for me, the second one is easier to understand, do the expression starts
with simpler bits, and
the complicated parts are put later.

In the same way, it is not amusing to see cases like
AND(...,...,...,...,,,..,  null, ....)

Vladimir

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

Haisheng> variable always left, constant always right for applicable binary
operators;

Oh, I did not think of making different behavior for literals, variables.

What do you think re "$n.field = 42" where $n.field is a dot operator
I'm not fond of adding complicated checks there, however, I think
I can move the literal to the right if the other side is a non-literal.

Haisheng>- for join conditions, left operand always comes from left
relation, right operand always comes from right relation for reversible
binary operators.

I'm afraid it would be hard to implement :(
I don't like to implement operator-specific logic.

On the other hand "sort by length, then sort by string representation" is
more or less the same as
"sort by numeric value" for cases like $1=$8, so that semantics would be
more or less automatic.

Vladimir

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

I recommend the way GPDB does.

Normalize the the logical plan expression in the preprocessing phase: 
- variable always left, constant always right for applicable binary operators;
- for join conditions, left operand always comes from left relation, right operand always comes from right relation for reversable binary operators.

- Haisheng

------------------------------------------------------------------
发件人：Enrico Olivelli<eo...@gmail.com>
日　期：2019年12月30日 03:28:38
收件人：<de...@calcite.apache.org>
主　题：Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Il dom 29 dic 2019, 20:09 Vladimir Sitnikov <si...@gmail.com>
ha scritto:

> Hi,
>
> We have a 1-year old issue with an idea to sort RexNode operands so they
> are consistent.
>
> For instance, "x=5" and "5=x" have the same semantics, so it would make
> sense to stick to a single implementation.
> A discussion can be found in
> https://issues.apache.org/jira/browse/CALCITE-2450
>
> We do not normalize RexNodes, thus it results in excessive planning time,
> especially when the planner is trying to reorder joins.
> For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are
> different joins, however, they are equivalent.
>
> The normalization does not seem to cost much, however, it enables me to
> activate more rules (e.g. EnumerabeMergeRule),
> so it is good as it enables to consider more sophisticated plans.
>
> I see two approaches:
> a) Normalize in RexNode constructor. This seems easy to implement, however,
> there's a catch
> if someone assumed that the order of operands would be the same as the one
> that was passed to the constructor.
> I don't think there are such assumptions in the wild, but there might be.
> The javadoc for the relevant methods says nothing regarding the operand
> order.
> However, the good thing would be RexNode would feel the same in the
> debugger and in its toString representation.
>
> b) Normalize at RexCall#computeDigest only.
> In other words, keep the operands unsorted, but make sure the digest is
> created as if the operands were sorted.
> This seems to be the most transparent change, however, it might surprise
> that `toString` does not match to whatever is seen in the debugger.
>
> In any case, making `RexCall#toString` print sorted representation would
> alter lots of tests.
> For :core it is like 5540 tests completed, 358 failed, 91 skipped :((
>
> WDYT?
>

I really would love this feature.

Just my 2 cents
Enrico



> Hopefully, making the RexNode representation sorted would reduce the number
> of `$1=$0` vs `$0=$1` plan diffs.
>
> Vladimir
>

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Enrico Olivelli <eo...@gmail.com>.

Il dom 29 dic 2019, 20:09 Vladimir Sitnikov <si...@gmail.com>
ha scritto:

> Hi,
>
> We have a 1-year old issue with an idea to sort RexNode operands so they
> are consistent.
>
> For instance, "x=5" and "5=x" have the same semantics, so it would make
> sense to stick to a single implementation.
> A discussion can be found in
> https://issues.apache.org/jira/browse/CALCITE-2450
>
> We do not normalize RexNodes, thus it results in excessive planning time,
> especially when the planner is trying to reorder joins.
> For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are
> different joins, however, they are equivalent.
>
> The normalization does not seem to cost much, however, it enables me to
> activate more rules (e.g. EnumerabeMergeRule),
> so it is good as it enables to consider more sophisticated plans.
>
> I see two approaches:
> a) Normalize in RexNode constructor. This seems easy to implement, however,
> there's a catch
> if someone assumed that the order of operands would be the same as the one
> that was passed to the constructor.
> I don't think there are such assumptions in the wild, but there might be.
> The javadoc for the relevant methods says nothing regarding the operand
> order.
> However, the good thing would be RexNode would feel the same in the
> debugger and in its toString representation.
>
> b) Normalize at RexCall#computeDigest only.
> In other words, keep the operands unsorted, but make sure the digest is
> created as if the operands were sorted.
> This seems to be the most transparent change, however, it might surprise
> that `toString` does not match to whatever is seen in the debugger.
>
> In any case, making `RexCall#toString` print sorted representation would
> alter lots of tests.
> For :core it is like 5540 tests completed, 358 failed, 91 skipped :((
>
> WDYT?
>

I really would love this feature.

Just my 2 cents
Enrico



> Hopefully, making the RexNode representation sorted would reduce the number
> of `$1=$0` vs `$0=$1` plan diffs.
>
> Vladimir
>

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

> For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are different joins, however, they are equivalent.

How is the alternative generated? I would rather check how to stop generating this alternative than reordering predicates. 

- Haisheng

------------------------------------------------------------------
发件人：Vladimir Sitnikov<si...@gmail.com>
日　期：2019年12月30日 15:04:25
收件人：Apache Calcite dev list<de...@calcite.apache.org>
主　题：Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Danny>How much cases are there in production ? This example itself seems
very marginalized. I’m not against with it, I’m suspicious about the value
of the feature.

It improves JdbcTest#testJoinManyWay 2 times or so.

master.
JdbcTest#testJoinManyWay: 5.8sec
https://travis-ci.org/apache/calcite/jobs/630602718#L1646

with normalization fix:
JdbcTest#testJoinManyWay: 3.1sec
https://travis-ci.org/apache/calcite/jobs/630719276#L1432

The fix is vital for the proper support of EnumerableMergeJoin as well.
If EnumerableMergeJoin is activated, then JdbcTest#testJoinManyWay can't
complete within 5 minutes (see PR 1702)

Vladimir

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Chunwei Lei <ch...@gmail.com>.

Personally I would prefer to display the expression as what users write.
But I agree with Haisheng.

Maybe we can add an optimizer config option to enable/disable this feature.


Best,
Chunwei


On Tue, Dec 31, 2019 at 9:50 AM Haisheng Yuan <h....@alibaba-inc.com>
wrote:

> Normalizing scalar expressions is helpful, especially for deduplicating
> derived constraints that is pushed down to child relations. We should not
> stop making improvements just because there are many plan changes.
>
> How about adding optimizer config option to enable/disable the feature?
>
> Normalizing every binary operator might seem not necessary, but simply
> normalizing equal expression with input refs or literals alone can still
> help a lot, I guess, because many of them are generated from equivalent
> classes.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人：Rui Wang<am...@apache.org>
> 日 期：2019年12月31日 07:09:44
> 收件人：<de...@calcite.apache.org>
> 主 题：Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
>
> I think the concern of breaking plan comparison has appeared more than
> once. Not only to this proposal but also to others (e.g. replace "$" with
> "_" in names).
>
> From a think of another perspective, the widely used practice of
> string comparison based plan evaluation also reduces the flexibility of
> making changes in Calcite. We need to carefully think about if a change has
> a large impact that affects toString(), or something similar.
>
> So if Calcite community could offer another solution for downstream
> projects, and announce that don't guarantee the backward compatibility of
> toString(), then it could be in a better situation.
>
> A few candidate solutions:
> 1. offer utils that convert a plan represented in string to Relnodes.
> 2. use RelBuilder to build plans and does comparison.
>
> and then just maintain the backward compatibility of the solution.
>
> just my two cents.
>
>
> -Rui
>
>
> On Mon, Dec 30, 2019 at 12:20 PM Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> wrote:
>
> > The change improves slow tests from 80 min to 60, and the changes are
> > minimal
> >
> > Vladimir
> >
>
>

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

Stamatis>This is a change that most likely will have impact on many
projects

I don't see how it will impact projects. Really.

Are there projects that use up to date Calcite versions?
Are they ready for adding a CI job to test with Calcite master branch?

It is very disappointing to hear that it will impact many projects while
there's zero evidence.
Just in case you missed: PR#1703 produces ZERO user-visible differences in
the execution plan output.
Note: PR#1703 does not change optimizer rules, so it should even produce
the same outputs (as you see, there are virtually zero changes to the
execution plans in Calcite code). There's a single change regarding the
plan, and that is PlannerTest which uses DIGEST_ATTRIBUTES (a non-default
flag)

For instance, Sort#computeSelfCost was broken in 1.14
https://issues.apache.org/jira/browse/CALCITE-1842, and nobody complained.
In the same way, (broken) EnumerableNestedLoopJoin was added in 1.20
https://issues.apache.org/jira/browse/CALCITE-2969, and nobody complained.
Just in case, EnumerableNestedLoopJoin did not account it would have to
restart the inner input multiple times, so it underestimated the cost.

I don't blame the authors/committers, but I highlight a couple of examples
that produce noticeable plan changes where execution plans
both have user-visible changes, and the plan could easily degrade (because
the costing was screwed).
There was not a single discussion on the number of test cases Calcite
consumers would have to change in order to adapt new Join nodes in the
execution plan output.

It does not sound fair to block normalization (which is vital for
performance) and push Join refactorings and cost adjustments like trivial
changes.

Vladimir

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Stamatis Zampetakis <za...@gmail.com>.

This is a change that most likely will have impact on many projects so I
think it deserves a bit more discussion. Given that 1.22 should be out
quite soon and at the same time this period many people are on holidays it
would seem better to at least postpone the merge for 1.23.

In terms of code, I would prefer if the changes were part of another class
(i.e., RexNormalizer?) that would leave the RexNode hierarchy unchanged.
Many projects use Calcite as a library, combining components the way they
like. To this end the approach of a separate class seems more appropriate.
This seems inline with the spirit of the other classes that we have for
simplifying, flattening etc.

Best,
Stamatis

On Tue, Dec 31, 2019, 2:47 PM Vladimir Sitnikov <si...@gmail.com>
wrote:

> Feng>Maybe we can go a step further to provide configuration interfaces,
> which
> Feng>enable users to determine whether to enable a specified optimization
>
> It might be OK to add kill-flags so added features can be disabled for
> debugging purposes.
> However, I would prefer to avoid adding options that will be used in
> non-default values in production systems.
>
> For instance, I think https://github.com/apache/calcite/pull/1703 (Reorder
> RexCall predicates to a canonical form)
> is ready for merge.
>
> It improves planning performance across many tests, and it a small
> patch: +255 -84.
>
> The patch is an improvement, there are very tiny drawbacks, so I'm inclined
> to commit the change.
> Just in case you did not know
> RelOptUtil#toString(org.apache.calcite.rel.RelNode) output is preserved by
> the patch.
> In other words, all the tests that use RelOptUtil#toString for dumping
> plans will likely work as-is.
>
> I have added a system property calcite.enable.rexnode.digest.normalize
> (default=true) so the feature can be disabled
> if someone thinks it causes issues (e.g. if there's a suspicion it slows
> down the planning for some reason).
> However, the full Calcite suite is executed with the default value only, so
> I would not recommend using normalize=false in production.
>
> Vladimir
>

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

Feng>Maybe we can go a step further to provide configuration interfaces,
which
Feng>enable users to determine whether to enable a specified optimization

It might be OK to add kill-flags so added features can be disabled for
debugging purposes.
However, I would prefer to avoid adding options that will be used in
non-default values in production systems.

For instance, I think https://github.com/apache/calcite/pull/1703 (Reorder
RexCall predicates to a canonical form)
is ready for merge.

It improves planning performance across many tests, and it a small
patch: +255 -84.

The patch is an improvement, there are very tiny drawbacks, so I'm inclined
to commit the change.
Just in case you did not know
RelOptUtil#toString(org.apache.calcite.rel.RelNode) output is preserved by
the patch.
In other words, all the tests that use RelOptUtil#toString for dumping
plans will likely work as-is.

I have added a system property calcite.enable.rexnode.digest.normalize
(default=true) so the feature can be disabled
if someone thinks it causes issues (e.g. if there's a suspicion it slows
down the planning for some reason).
However, the full Calcite suite is executed with the default value only, so
I would not recommend using normalize=false in production.

Vladimir

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Feng Zhu <we...@gmail.com>.

I personally agree with Haisheng Yuan and Rui Wang.
Expression normalization is a common technique in many database products
and benefits to query optimizer. We should not stop the efforts in this
direction.
But in current calcite code base, even mininal changes in plan may bring
great impacts, considerations on backward compatibility is necessary.
Adding optimizer config is a reasonable idea. Because every coin has two
sides, for example, some optimizations sacrifice readability when the
RelNode/RexNode be converted back to SQL again.

Maybe we can go a step further to provide configuration interfaces, which
enable users to determine whether to enable a specified optimization.
In frameworks like Hadoop/Hive/Spark/Flink, users can control nearly all of
the optimizations through configurations.


Best,
DonnyZone

Danny Chan <yu...@gmail.com> 于2019年12月31日周二 上午11:22写道：

> Or better, we can make the digest change transparent to user, keep it as
> what user writes them, but, in the Planner cache, we use the normalized
> digest as the key.
>
> Best,
> Danny Chan
> 在 2019年12月31日 +0800 AM9:50，Haisheng Yuan <h....@alibaba-inc.com>，写道：
> > Normalizing scalar expressions is helpful, especially for deduplicating
> derived constraints that is pushed down to child relations. We should not
> stop making improvements just because there are many plan changes.
> >
> > How about adding optimizer config option to enable/disable the feature?
> >
> > Normalizing every binary operator might seem not necessary, but simply
> normalizing equal expression with input refs or literals alone can still
> help a lot, I guess, because many of them are generated from equivalent
> classes.
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人：Rui Wang<am...@apache.org>
> > 日 期：2019年12月31日 07:09:44
> > 收件人：<de...@calcite.apache.org>
> > 主 题：Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
> >
> > I think the concern of breaking plan comparison has appeared more than
> > once. Not only to this proposal but also to others (e.g. replace "$" with
> > "_" in names).
> >
> > From a think of another perspective, the widely used practice of
> > string comparison based plan evaluation also reduces the flexibility of
> > making changes in Calcite. We need to carefully think about if a change
> has
> > a large impact that affects toString(), or something similar.
> >
> > So if Calcite community could offer another solution for downstream
> > projects, and announce that don't guarantee the backward compatibility of
> > toString(), then it could be in a better situation.
> >
> > A few candidate solutions:
> > 1. offer utils that convert a plan represented in string to Relnodes.
> > 2. use RelBuilder to build plans and does comparison.
> >
> > and then just maintain the backward compatibility of the solution.
> >
> > just my two cents.
> >
> >
> > -Rui
> >
> >
> > On Mon, Dec 30, 2019 at 12:20 PM Vladimir Sitnikov <
> > sitnikov.vladimir@gmail.com> wrote:
> >
> > > The change improves slow tests from 80 min to 60, and the changes are
> > > minimal
> > >
> > > Vladimir
> > >
> >
>

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Danny Chan <yu...@gmail.com>.

Or better, we can make the digest change transparent to user, keep it as what user writes them, but, in the Planner cache, we use the normalized digest as the key.

Best,
Danny Chan
在 2019年12月31日 +0800 AM9:50，Haisheng Yuan <h....@alibaba-inc.com>，写道：
> Normalizing scalar expressions is helpful, especially for deduplicating derived constraints that is pushed down to child relations. We should not stop making improvements just because there are many plan changes.
>
> How about adding optimizer config option to enable/disable the feature?
>
> Normalizing every binary operator might seem not necessary, but simply normalizing equal expression with input refs or literals alone can still help a lot, I guess, because many of them are generated from equivalent classes.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人：Rui Wang<am...@apache.org>
> 日　期：2019年12月31日 07:09:44
> 收件人：<de...@calcite.apache.org>
> 主　题：Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
>
> I think the concern of breaking plan comparison has appeared more than
> once. Not only to this proposal but also to others (e.g. replace "$" with
> "_" in names).
>
> From a think of another perspective, the widely used practice of
> string comparison based plan evaluation also reduces the flexibility of
> making changes in Calcite. We need to carefully think about if a change has
> a large impact that affects toString(), or something similar.
>
> So if Calcite community could offer another solution for downstream
> projects, and announce that don't guarantee the backward compatibility of
> toString(), then it could be in a better situation.
>
> A few candidate solutions:
> 1. offer utils that convert a plan represented in string to Relnodes.
> 2. use RelBuilder to build plans and does comparison.
>
> and then just maintain the backward compatibility of the solution.
>
> just my two cents.
>
>
> -Rui
>
>
> On Mon, Dec 30, 2019 at 12:20 PM Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> wrote:
>
> > The change improves slow tests from 80 min to 60, and the changes are
> > minimal
> >
> > Vladimir
> >
>

Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

Normalizing scalar expressions is helpful, especially for deduplicating derived constraints that is pushed down to child relations. We should not stop making improvements just because there are many plan changes.

How about adding optimizer config option to enable/disable the feature?

Normalizing every binary operator might seem not necessary, but simply normalizing equal expression with input refs or literals alone can still help a lot, I guess, because many of them are generated from equivalent classes.

- Haisheng

------------------------------------------------------------------
发件人：Rui Wang<am...@apache.org>
日　期：2019年12月31日 07:09:44
收件人：<de...@calcite.apache.org>
主　题：Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

I think the concern of breaking plan comparison has appeared more than
once. Not only to this proposal but also to others (e.g. replace "$" with
"_" in names).

From a think of another perspective, the widely used practice of
string comparison based plan evaluation also reduces the flexibility of
making changes in Calcite. We need to carefully think about if a change has
a large impact that affects toString(), or something similar.

So if Calcite community could offer another solution for downstream
projects, and announce that don't guarantee the backward compatibility of
toString(), then it could be in a better situation.

A few candidate solutions:
1. offer utils that convert a plan represented in string to Relnodes.
2. use RelBuilder to build plans and does comparison.

and then just maintain the backward compatibility of the solution.

just my two cents.

-Rui

On Mon, Dec 30, 2019 at 12:20 PM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> The change improves slow tests from 80 min to 60, and the changes are
> minimal
>
> Vladimir
>

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Rui Wang <am...@apache.org>.

I think the concern of breaking plan comparison has appeared more than
once. Not only to this proposal but also to others (e.g. replace "$" with
"_" in names).

From a think of another perspective, the widely used practice of
string comparison based plan evaluation also reduces the flexibility of
making changes in Calcite. We need to carefully think about if a change has
a large impact that affects toString(), or something similar.

So if Calcite community could offer another solution for downstream
projects, and announce that don't guarantee the backward compatibility of
toString(), then it could be in a better situation.

A few candidate solutions:
1. offer utils that convert a plan represented in string to Relnodes.
2. use RelBuilder to build plans and does comparison.

and then just maintain the backward compatibility of the solution.

just my two cents.

-Rui

On Mon, Dec 30, 2019 at 12:20 PM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> The change improves slow tests from 80 min to 60, and the changes are
> minimal
>
> Vladimir
>

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

The change improves slow tests from 80 min to 60, and the changes are
minimal

Vladimir

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Julian Hyde <jh...@apache.org>.

I think this proposed change does more harm than good. I agree with the arguments that Danny and Stamatis have made. We should not do this.

> On Dec 30, 2019, at 9:32 AM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
> Technically speaking, I would love to refrain from using toString  for
> equals/hashCode, however,
> it looks like a much more invasive change.
> 
> 
> Yet another idea is to skip normalization when rendering a plan
> with SqlExplainLevel != DIGEST_ATTRIBUTES.
> In other words, the normalization is there, it is on by default, however,
> it is deactivated for rendering the plans.
> 
> Unfortunately, that results in adding public methods and a thread-local,
> however, we could mark it as a Calcite-internal or so.
> 
> I've updated https://github.com/apache/calcite/pull/1703 accordingly.
> 
> Vladimir

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

Technically speaking, I would love to refrain from using toString  for
equals/hashCode, however,
it looks like a much more invasive change.


Yet another idea is to skip normalization when rendering a plan
with SqlExplainLevel != DIGEST_ATTRIBUTES.
In other words, the normalization is there, it is on by default, however,
it is deactivated for rendering the plans.

Unfortunately, that results in adding public methods and a thread-local,
however, we could mark it as a Calcite-internal or so.

I've updated https://github.com/apache/calcite/pull/1703 accordingly.

Vladimir

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Stamatis Zampetakis <za...@gmail.com>.

Normalization sounds useful but I would prefer to keep it optional and
separate from the RexNode classes (another RexNode visitor?).

There are various use cases where people rely on the structure of the plan
to do things and I am afraid that if we always apply the normalization we
will break many projects. For instance, all projects that have unit tests
that verify the structure of the plan (via string comparisons or visitors)
may need to be updated.

Best,
Stamatis

On Mon, Dec 30, 2019, 11:40 AM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> Danny>almost all of the plan change are meaningless
>
> What do you mean by meaningless?
> The purpose of the change is to improve planning time, and to improve plan
> stability.
>
> Danny>and the execution graph is very probably relevant with the
> state/storage, if we breaks them, the state also crashed
>
> Can you please elaborate?
> Calcite users must not depend on the string representation of RexNodes.
>
> Vladimir
>

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

Danny>almost all of the plan change are meaningless

What do you mean by meaningless?
The purpose of the change is to improve planning time, and to improve plan
stability.

Danny>and the execution graph is very probably relevant with the
state/storage, if we breaks them, the state also crashed

Can you please elaborate?
Calcite users must not depend on the string representation of RexNodes.

Vladimir

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Danny Chan <yu...@gmail.com>.

I agree for some cases, the change is useful.

But the plan compatibility is more important for production, because many systems use the plan to generate an execution graph, and the execution graph is very probably relevant with the state/storage, if we breaks them, the state also crashed.

We can change the plan for necessary requirements, but within your PR, you did not solve that, almost all of the plan change are meaningless, can you stop doing that ?

I do want to be awful when I upgrade to the next calcite version ~

Best,
Danny Chan
在 2019年12月30日 +0800 PM3:04，Vladimir Sitnikov <si...@gmail.com>，写道：
> Danny>How much cases are there in production ? This example itself seems
> very marginalized. I’m not against with it, I’m suspicious about the value
> of the feature.
>
> It improves JdbcTest#testJoinManyWay 2 times or so.
>
> master.
> JdbcTest#testJoinManyWay: 5.8sec
> https://travis-ci.org/apache/calcite/jobs/630602718#L1646
>
> with normalization fix:
> JdbcTest#testJoinManyWay: 3.1sec
> https://travis-ci.org/apache/calcite/jobs/630719276#L1432
>
> The fix is vital for the proper support of EnumerableMergeJoin as well.
> If EnumerableMergeJoin is activated, then JdbcTest#testJoinManyWay can't
> complete within 5 minutes (see PR 1702)
>
> Vladimir

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Vladimir Sitnikov <si...@gmail.com>.

Danny>How much cases are there in production ? This example itself seems
very marginalized. I’m not against with it, I’m suspicious about the value
of the feature.

It improves JdbcTest#testJoinManyWay 2 times or so.

master.
JdbcTest#testJoinManyWay: 5.8sec
https://travis-ci.org/apache/calcite/jobs/630602718#L1646

with normalization fix:
JdbcTest#testJoinManyWay: 3.1sec
https://travis-ci.org/apache/calcite/jobs/630719276#L1432

The fix is vital for the proper support of EnumerableMergeJoin as well.
If EnumerableMergeJoin is activated, then JdbcTest#testJoinManyWay can't
complete within 5 minutes (see PR 1702)

Vladimir

Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Posted by Danny Chan <yu...@gmail.com>.

> We do not normalize RexNodes, thus it results in excessive planning time,
> especially when the planner is trying to reorder joins.
> For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are
> different joins, however, they are equivalent.

How much cases are there in production ? This example itself seems very marginalized. I’m not against with it, I’m suspicious about the value of the feature.

> It turned out "b" (sort operands in computeDigest) is easier to implement.
> I've filed a PR: https://github.com/apache/calcite/pull/1703

I’m strongly -1 for this way, because it beaks the plan test where almost all of the change are meaningless.



Best,
Danny Chan
在 2019年12月30日 +0800 AM3:09，dev@calcite.apache.org，写道：
>
> We do not normalize RexNodes, thus it results in excessive planning time,
> especially when the planner is trying to reorder joins.
> For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are
> different joins, however, they are equivalent.