You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Danny Chan <yu...@gmail.com> on 2020/02/14 06:07:01 UTC

[ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

The change in CALCITE-3713 make a radical change to the plan digest, but it only brings in little promotion(at least from Calcite tests and Flink cases)

During the upgrade of Flink Calcite version to 1.22.0-SNAPSHOT, the 2000+ plan changes to due to this patch, it is hard for me to figure out the reason of the change: because of this patch, or because of other bug-fix/promotion.

Even though I change the plan verification to EXPLAIN_ATTRIBUTE, there are still 500+ plan diffs.

This makes the Flink Calcite upgrade almost impossible to go on and I’m burdened out of it.

So, PLEASE, if possible, can we REVERT this patch, 2000+ plan changes for no promotion(useless) (at lease for Flink).

I did show my options when discussing if the patch should be merged, me and Julian all gave -1 for that but the code still merged, sigh !

[1] https://ponymail-vm.apache.org/_GUI_/thread.html/54bf3ed733eb7e725ce3ea397334aad8f1323ead13e450b1753b1521%40%3Cdev.calcite.apache.org%3E

Best,
Danny Chan

Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Danny Chan <yu...@gmail.com>.
> if you use explain_digest attributes, it means you expect it might
change due to implementation details.

That is what you just defined that, why a digest should be changed just because of the implementation.

Best,
Danny Chan
在 2020年2月14日 +0800 PM2:54,Vladimir Sitnikov <si...@gmail.com>,写道:
> The subject of the mail does not match the body.
>
> Please double check and ensure you mention a single change.
>
>
> PS if you use explain_digest attributes, it means you expect it might
> change due to implementation details.
>
> Vladimir

Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Danny Chan <yu...@gmail.com>.
Thanks Vladimir, too many config options is hard to maintain, so no need to
add that, i would use the explain_attributes and update the log files.

The explain would not change frequently, right?

Vladimir Sitnikov <si...@gmail.com>于2020年2月18日 周二上午12:45写道:

> > And state that it is going to be removed before say 1.24 (about 6 months
> from now).
>
> I'm ok with keeping the property working as long as it does not impact
> features/bugfixes development provided the property does not make the code
> hard to read.
>
> Vladimir
>

Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Vladimir Sitnikov <si...@gmail.com>.
> And state that it is going to be removed before say 1.24 (about 6 months
from now).

I'm ok with keeping the property working as long as it does not impact
features/bugfixes development provided the property does not make the code
hard to read.

Vladimir

Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Julian Hyde <jh...@gmail.com>.
I can imagine that changing a lot of log files is painful to the Flink project. There’s no reason to force them to do it right away.

I think a config parameter is fine. How about making it deprecated or experimental, to discourage others from using it? And state that it is going to be removed before say 1.24 (about 6 months from now). 

Julian

> On Feb 17, 2020, at 12:06 AM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
> 
>> 
>> adding an option
> 
> I'm ok with adding an option provided:
> 1) It is "remove column names" by default
> 2) It is supported on a best effort basis. In other words, it is NOT to be
> used in production systems on a day-to-day basis
> 3) We do not add an extensive test suite for that property (see #1, #2)
> 4) We do not mean the option to be a "public API". In other words, it is
> there, it pretends to be working, however, it should not impact the
> development (== we can drop it if the maintenance takes significant effort)
> 
> That is I agree the property might be helpful to compare behaviors, however
> having too many options (e.g. an option for every bugfix) might complicate
> the code with doubtful gains.
> 
> Vladimir

Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Vladimir Sitnikov <si...@gmail.com>.
>adding an option

I'm ok with adding an option provided:
1) It is "remove column names" by default
2) It is supported on a best effort basis. In other words, it is NOT to be
used in production systems on a day-to-day basis
3) We do not add an extensive test suite for that property (see #1, #2)
4) We do not mean the option to be a "public API". In other words, it is
there, it pretends to be working, however, it should not impact the
development (== we can drop it if the maintenance takes significant effort)

That is I agree the property might be helpful to compare behaviors, however
having too many options (e.g. an option for every bugfix) might complicate
the code with doubtful gains.

Vladimir

Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Xiening Dai <xn...@gmail.com>.
I think CALCITE-3713 is overall a positive change. We have seen the number of rule applies are reduced in our test (not significant though).  After fixing the performance issue, CALCITE-3713 is fine and IMO it should stay. Having say that, I understand the pain of updating test base line. So adding an option is reasonable too.


> On Feb 16, 2020, at 7:33 PM, Danny Chan <yu...@gmail.com> wrote:
> 
> Hi, fellows, can we gave a solution here to solve the problem ? This is a blocking issue from the Apache Flink side, we rely heavily on the digest for plan tests.
> 
> I have 2 solutions here:
> 
> 
> 1. Revert the change of CALCITE-3713
> 2. We add a config option to switch the feature off
> 
> Either is okey for me, I just need a conclusion ~
> 
> 
> Best,
> Danny Chan
> 在 2020年2月14日 +0800 PM2:54,Vladimir Sitnikov <si...@gmail.com>,写道:
>> The subject of the mail does not match the body.
>> 
>> Please double check and ensure you mention a single change.
>> 
>> 
>> PS if you use explain_digest attributes, it means you expect it might
>> change due to implementation details.
>> 
>> Vladimir


Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Vladimir Sitnikov <si...@gmail.com>.
>it is hard for me to figure out the reason of the change

It is written in the JIRA description and in the commit message: it
improves planning performance by reducing the planning space.

For Calcite it reduces slow tests from 64 min to 40 min.

Before (3862sec):
https://github.com/apache/calcite/runs/382954125#step:4:794
After (2402sec):
https://github.com/apache/calcite/pull/1740/checks?check_run_id=382955763#step:4:369

I see you might be struggling with plan issues, however:
* The fix brings 30% improvement for Calcite tests
* It improves some of the plans where the old column was emp0 and the new
one is emp (see PR1740)
* You provide no clue on the nature of the failures you are facing
* You suggest making changes to Calcite based on the number of Flink
failures only

^^ That is sad

Vladimir

Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Danny Chan <yu...@gmail.com>.
Hi, fellows, can we gave a solution here to solve the problem ? This is a blocking issue from the Apache Flink side, we rely heavily on the digest for plan tests.

I have 2 solutions here:


1. Revert the change of CALCITE-3713
2. We add a config option to switch the feature off

Either is okey for me, I just need a conclusion ~


Best,
Danny Chan
在 2020年2月14日 +0800 PM2:54,Vladimir Sitnikov <si...@gmail.com>,写道:
> The subject of the mail does not match the body.
>
> Please double check and ensure you mention a single change.
>
>
> PS if you use explain_digest attributes, it means you expect it might
> change due to implementation details.
>
> Vladimir

Re: [ DISCUSS ] Revert change: CALCITE-3713 Remove column names from Project#digest

Posted by Vladimir Sitnikov <si...@gmail.com>.
The subject of the mail does not match the body.

Please double check and ensure you mention a single change.


PS if you use explain_digest attributes, it means you expect it might
change due to implementation details.

Vladimir