You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2020/02/20 18:10:48 UTC

[DISCUSS] Refactor how planner rules are parameterized

I have an idea for a refactoring to RelOptRule. I haven’t fully thought it through, but I’m going to sketch it out here to see whether folks agree about the problems/solutions.

It will be a breaking change (in the sense that people will have to change their code in order to get it to compile) but relatively safe (in that once the code compiles, it will have the same behavior as before). Also it will give Calcite developers and users a lot more flexibility going forward.

The problem I see is that people often want different variants of planner rules. An example is FilterJoinRule, which has a 'boolean smart’ parameter, a predicate (which returns whether to pull up filter conditions), operands (which determine the precise sub-classes of RelNode that the rule should match) and a relBuilderFactory (which controls the type of RelNode created by this rule).

Suppose you have an instance of FilterJoinRule and you want to change ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but you can’t easily create a clone of the rule because you don’t know the values of the other parameters. Your instance might even be (unbeknownst to you) a sub-class with extra parameters and a private constructor.

So, my proposal is to put all of the config information of a RelOptRule into a single ‘config’ parameter that contains all relevant properties. Each sub-class of RelOptRule would have one constructor with just a ‘config’ parameter. Each config knows which sub-class of RelOptRule to create. Therefore it is easy to copy a config, change one or more properties, and create a new rule instance.

Adding a property to a rule’s config does not require us to add or deprecate any constructors.

The operands are part of the config, so if you have a rule that matches a EnumerableFilter on an EnumerableJoin and you want to make it match an EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one with one changed operand.

The config is immutable and self-describing, so we can use it to automatically generate a unique description for each rule instance.

Julian

[1] https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93 <https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93>


Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Julian Hyde <jh...@apache.org>.
I have force-pushed to https://github.com/apache/calcite/pull/2024.
There are now some commits to be merged before 1.24 (hopefully today
or tomorrow) and some to be merged after 1.24.

Can someone give it a final review?

Chunwei,

Please, let's do the 1.24 release sooner rather than later. The part
of this change that has to occur after 1.24 is large (204 files, 9k
lines added, 6k lines removed) and so is susceptible to bit-rot. I
agreed to Stamatis' proposal to do this in two phases on the
assumption that the release was not far off.

Smaller releases are easier for the release manager, too. :)

Julian

On Mon, Jul 6, 2020 at 1:40 AM Chunwei Lei <ch...@gmail.com> wrote:
>
> If I am right, I am the release manager of 1.24.
>
> I suggest releasing 1.24 a bit later(at the end of this month)
> since 1.23 was released not long ago(on 2020.5.23).
>
>
> Best,
> Chunwei
>
>
> On Fri, Jul 3, 2020 at 10:54 PM Michael Mior <mm...@apache.org> wrote:
>
> > I'll just add for the relatively common case where a rule is matching
> > a node with a specific node as a child, and so on that we could easily
> > add a less verbose API to make this part of the rule definition
> > equally concise.
> >
> > --
> > Michael Mior
> > mmior@apache.org
> >
> > Le dim. 14 juin 2020 à 15:36, Haisheng Yuan <hy...@apache.org> a écrit :
> > >
> > > Hi Julian,
> > >
> > > Thanks for working on this.
> > >
> > > We haven't reached a consensus yet.
> > >
> > > Frankly speaking, I agree with what Stamatis said earlier. Flexibility
> > doesn't come with no cost. Admittedly, with this patch, any rule
> > constructor refactoring won't need to deprecate old constructors and break
> > backward compatibility, however, it also makes the rule definition much
> > more verbose, less readable and understandable. IMHO, it does more harm
> > than good.
> > >
> > > Let's see how CassandraFilterRule becomes before and after the change.
> > >
> > > Before this change:
> > >
> > >   private static class CassandraFilterRule extends RelOptRule {
> > >     private static final CassandraFilterRule INSTANCE = new
> > CassandraFilterRule();
> > >
> > >     private CassandraFilterRule() {
> > >       super(operand(LogicalFilter.class,
> > operand(CassandraTableScan.class, none())),
> > >           "CassandraFilterRule");
> > >     }
> > >   }
> > >
> > > After this change:
> > >
> > >   private static class CassandraFilterRule
> > >       extends RelOptNewRule<CassandraFilterRule.Config> {
> > >     private static final CassandraFilterRule INSTANCE =
> > >         Config.EMPTY
> > >             .withOperandSupplier(b0 ->
> > >                 b0.operand(LogicalFilter.class)
> > >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> > >                         .noInputs()))
> > >             .as(Config.class)
> > >             .toRule();
> > >
> > >     /** Creates a CassandraFilterRule. */
> > >     protected CassandraFilterRule(Config config) {
> > >       super(config);
> > >     }
> > >
> > >     /** Rule configuration. */
> > >     public interface Config extends RelOptNewRule.Config {
> > >       @Override default CassandraFilterRule toRule() {
> > >         return new CassandraFilterRule(this);
> > >       }
> > >     }
> > >   }
> > >
> > >
> > > Intuitively, if we want to check what does the rule generally match or
> > how it is defined, we just check the constructor. Now we checkout the
> > constructor, only config is there, go to Config, there is still nothing
> > interesting, we have to go to the INSTANCE. What is the difference with
> > just using operand and optionConfig as the rule constructor?
> > >
> > >    protected CassandraFilterRule(RelOptRuleOperand operand, Config
> > config) {
> > >      super(operand, config);
> > >    }
> > >
> > > Or even simply replace Config with int, with each bit represent an
> > option, if all of them are boolean options.
> > >
> > > Nothing is more flexible than just using RelOptRuleOperand as the
> > parameter, just like the base class RelOptRule does. But do we want it?
> > >
> > > At the same time, with the new approach, it is now legit to create the
> > following instance:
> > >
> > >   private static final CassandraFilterRule INSTANCE2 =
> > >         Config.EMPTY
> > >             .withOperandSupplier(b0 ->
> > >                 b0.operand(LogicalProject.class)  // Even the is
> > intended to match Filter, but it compiles
> > >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> > >                         .noInputs()))
> > >             .as(Config.class)
> > >             .toRule();
> > >
> > > Before we continue to the discussion and code review, we need to go back
> > to the original problem, is it a real problem that is facing us? Is there
> > any real demand or just artificial demand? How about we conduct a Calcite
> > user survey to see how Calcite devs and users think? Then let's see how to
> > move forward.
> > >
> > > [+1] Hell yeah, the new approach is awesome, let's go with it.
> > > [+0] Meh, I am OK with current approach, I don't see any burden or
> > problem with it.
> > > [-1] Hell no, current approach is bad, the new one is even worse.
> > >
> > >
> > > Thanks,
> > > Haisheng
> > >
> > >
> > > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > > > There is now a PR: https://github.com/apache/calcite/pull/2024. Can
> > > > people please review?
> > > >
> > > > Here's the TL;DR:
> > > >
> > > > Previously, it was not easy to customize, re-use or extend planner
> > > > rules. If you wished to customize a rule (i.e. create a new instance
> > > > of a rule with different properties) you would have to call the rule's
> > > > constructor. Frequently the required constructor did not exist, so we
> > > > would have to add a new constructor and deprecate the old one.
> > > >
> > > > After this change, you start off from an instance of the rule, modify
> > > > its configuration, and call toRule() on the configuration. (Rule
> > > > constructors are now private, because only the configuration ever
> > > > calls them.)
> > > >
> > > > A good illustration of this is DruidRules, which used to contain many
> > > > sub-classes. Those sub-classes are no longer needed. Old code:
> > > >
> > > >   public static final DruidSortProjectTransposeRule
> > SORT_PROJECT_TRANSPOSE =
> > > >       new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > > >
> > > >     public static class DruidSortProjectTransposeRule
> > > >         extends SortProjectTransposeRule {
> > > >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > > > relBuilderFactory) {
> > > >         super(
> > > >             operand(Sort.class,
> > > >                 operand(Project.class, operand(DruidQuery.class,
> > none()))),
> > > >             relBuilderFactory, null);
> > > >       }
> > > >     }
> > > >
> > > > New code:
> > > >
> > > >   public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
> > > >       SortProjectTransposeRule.INSTANCE.config
> > > >           .withOperandFor(Sort.class, Project.class, DruidQuery.class)
> > > >           .toRule();
> > > >
> > > > The change maintains backwards compatibility to a large degree. In a
> > > > few places, I had to change rule instances from type RelOptRule to
> > > > Supplier<RelOptRule>, to avoid deadlocks during class loading. For
> > > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
> > > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > > >
> > > > Julian
> > > >
> > > >
> > > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
> > > > >
> > > > > I have now pushed a dev branch with a prototype. Please see
> > > > > https://issues.apache.org/jira/browse/CALCITE-3923 for details.
> > > > >
> > > > > Having built the prototype, I believe that this change is beneficial
> > > > > and we should do it. But I would like to get to consensus on the
> > > > > design before we pull the trigger.
> > > > >
> > > > > Julian
> > > > >
> > > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org>
> > wrote:
> > > > > >
> > > > > > Haisheng,
> > > > > >
> > > > > > I hear you. I agree that major changes to rules will require new
> > rule
> > > > > > classes (not merely sub-classes). People should copy-paste,
> > refactor,
> > > > > > and all that good stuff. But I think there are a lot of cases
> > where we
> > > > > > need to make minor changes to rules (there are many of these in the
> > > > > > code base already), and this change will help.
> > > > > >
> > > > > > I have logged https://issues.apache.org/jira/browse/CALCITE-3923
> > and
> > > > > > am going to start working on a prototype. When we have a prototype
> > we
> > > > > > will be able to assess how big an impact the API change will have.
> > > > > > (E.g. whether it will be a breaking change.)
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> > h.yuan@alibaba-inc.com> wrote:
> > > > > > >
> > > > > > > I don't think it is worth the refactoring. People who want to
> > customize the rule, in most cases, won't be satisfied by a different
> > parameter, they most likely still need to rewrite (copy & paste) the rule
> > with some slightly their own logic. For many Calcite users, the rule is not
> > reusable even with flexible configurations.
> > > > > > >
> > > > > > > - Haisheng
> > > > > > >
> > > > > > >
> > ------------------------------------------------------------------
> > > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Apologies for the late reply but I just realised that I had
> > written the
> > > > > > > mail and never pressed the send button.
> > > > > > >
> > > > > > > I think it is a nice idea and certainly a problem worth
> > addressing. If I
> > > > > > > understood well you're thinking something like the current
> > constructor of
> > > > > > > the RelBuilder [1] that accepts a Context parameter. Indeed it
> > seems that
> > > > > > > with this change even rules that are not designed to be
> > configured can be
> > > > > > > changed much more gracefully (without adding new constructors
> > and breaking
> > > > > > > changes).
> > > > > > >
> > > > > > > On the other hand, some of the advantages that you mention can
> > also be
> > > > > > > turned into disadvantages. For instance, copying a rule without
> > knowing the
> > > > > > > values of the other parameters is a bit risky and might be
> > harder to reason
> > > > > > > about its correctness. Moreover, private constructors, final
> > classes, etc.,
> > > > > > > are primarily used for encapsulation purposes so allowing the
> > state of the
> > > > > > > rule escape somehow breaks the original design of the rule.
> > > > > > >
> > > > > > > Another problem with respect to rules is cross convention
> > matching and
> > > > > > > transformations [2]. Many rules should not fire for operands
> > that are in
> > > > > > > different conventions; a typical example that comes in my mind is
> > > > > > > FilterProjectTransposeRule [3]. In the same spirit most rules
> > should not
> > > > > > > generate mixed convention transformations. Although a different
> > problem, I
> > > > > > > am mentioning it here since it could affect the design of the
> > new API.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > > >
> > > > > > > [2]
> > > > > > >
> > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > > [3]
> > > > > > >
> > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > > >
> > > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org>
> > wrote:
> > > > > > >
> > > > > > > > This sounds reasonable to me. It also sounds like we could
> > make this
> > > > > > > > backwards compatible by retaining (but deprecating) the
> > existing
> > > > > > > > constructors and factory methods that will no longer be needed.
> > > > > > > > --
> > > > > > > > Michael Mior
> > > > > > > > mmior@apache.org
> > > > > > > >
> > > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org>
> > a écrit :
> > > > > > > > >
> > > > > > > > > I have an idea for a refactoring to RelOptRule. I haven’t
> > fully thought
> > > > > > > > it through, but I’m going to sketch it out here to see whether
> > folks agree
> > > > > > > > about the problems/solutions.
> > > > > > > > >
> > > > > > > > > It will be a breaking change (in the sense that people will
> > have to
> > > > > > > > change their code in order to get it to compile) but
> > relatively safe (in
> > > > > > > > that once the code compiles, it will have the same behavior as
> > before).
> > > > > > > > Also it will give Calcite developers and users a lot more
> > flexibility going
> > > > > > > > forward.
> > > > > > > > >
> > > > > > > > > The problem I see is that people often want different
> > variants of
> > > > > > > > planner rules. An example is FilterJoinRule, which has a
> > 'boolean smart’
> > > > > > > > parameter, a predicate (which returns whether to pull up filter
> > > > > > > > conditions), operands (which determine the precise sub-classes
> > of RelNode
> > > > > > > > that the rule should match) and a relBuilderFactory (which
> > controls the
> > > > > > > > type of RelNode created by this rule).
> > > > > > > > >
> > > > > > > > > Suppose you have an instance of FilterJoinRule and you want
> > to change
> > > > > > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable
> > (good!) but
> > > > > > > > you can’t easily create a clone of the rule because you don’t
> > know the
> > > > > > > > values of the other parameters. Your instance might even be
> > (unbeknownst to
> > > > > > > > you) a sub-class with extra parameters and a private
> > constructor.
> > > > > > > > >
> > > > > > > > > So, my proposal is to put all of the config information of a
> > RelOptRule
> > > > > > > > into a single ‘config’ parameter that contains all relevant
> > properties.
> > > > > > > > Each sub-class of RelOptRule would have one constructor with
> > just a
> > > > > > > > ‘config’ parameter. Each config knows which sub-class of
> > RelOptRule to
> > > > > > > > create. Therefore it is easy to copy a config, change one or
> > more
> > > > > > > > properties, and create a new rule instance.
> > > > > > > > >
> > > > > > > > > Adding a property to a rule’s config does not require us to
> > add or
> > > > > > > > deprecate any constructors.
> > > > > > > > >
> > > > > > > > > The operands are part of the config, so if you have a rule
> > that matches
> > > > > > > > a EnumerableFilter on an EnumerableJoin and you want to make
> > it match an
> > > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can
> > easily create one
> > > > > > > > with one changed operand.
> > > > > > > > >
> > > > > > > > > The config is immutable and self-describing, so we can use
> > it to
> > > > > > > > automatically generate a unique description for each rule
> > instance.
> > > > > > > > >
> > > > > > > > > Julian
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > >
> > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > <
> > > > > > > >
> > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > >
> >

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Chunwei Lei <ch...@gmail.com>.
If I am right, I am the release manager of 1.24.

I suggest releasing 1.24 a bit later(at the end of this month)
since 1.23 was released not long ago(on 2020.5.23).


Best,
Chunwei


On Fri, Jul 3, 2020 at 10:54 PM Michael Mior <mm...@apache.org> wrote:

> I'll just add for the relatively common case where a rule is matching
> a node with a specific node as a child, and so on that we could easily
> add a less verbose API to make this part of the rule definition
> equally concise.
>
> --
> Michael Mior
> mmior@apache.org
>
> Le dim. 14 juin 2020 à 15:36, Haisheng Yuan <hy...@apache.org> a écrit :
> >
> > Hi Julian,
> >
> > Thanks for working on this.
> >
> > We haven't reached a consensus yet.
> >
> > Frankly speaking, I agree with what Stamatis said earlier. Flexibility
> doesn't come with no cost. Admittedly, with this patch, any rule
> constructor refactoring won't need to deprecate old constructors and break
> backward compatibility, however, it also makes the rule definition much
> more verbose, less readable and understandable. IMHO, it does more harm
> than good.
> >
> > Let's see how CassandraFilterRule becomes before and after the change.
> >
> > Before this change:
> >
> >   private static class CassandraFilterRule extends RelOptRule {
> >     private static final CassandraFilterRule INSTANCE = new
> CassandraFilterRule();
> >
> >     private CassandraFilterRule() {
> >       super(operand(LogicalFilter.class,
> operand(CassandraTableScan.class, none())),
> >           "CassandraFilterRule");
> >     }
> >   }
> >
> > After this change:
> >
> >   private static class CassandraFilterRule
> >       extends RelOptNewRule<CassandraFilterRule.Config> {
> >     private static final CassandraFilterRule INSTANCE =
> >         Config.EMPTY
> >             .withOperandSupplier(b0 ->
> >                 b0.operand(LogicalFilter.class)
> >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> >                         .noInputs()))
> >             .as(Config.class)
> >             .toRule();
> >
> >     /** Creates a CassandraFilterRule. */
> >     protected CassandraFilterRule(Config config) {
> >       super(config);
> >     }
> >
> >     /** Rule configuration. */
> >     public interface Config extends RelOptNewRule.Config {
> >       @Override default CassandraFilterRule toRule() {
> >         return new CassandraFilterRule(this);
> >       }
> >     }
> >   }
> >
> >
> > Intuitively, if we want to check what does the rule generally match or
> how it is defined, we just check the constructor. Now we checkout the
> constructor, only config is there, go to Config, there is still nothing
> interesting, we have to go to the INSTANCE. What is the difference with
> just using operand and optionConfig as the rule constructor?
> >
> >    protected CassandraFilterRule(RelOptRuleOperand operand, Config
> config) {
> >      super(operand, config);
> >    }
> >
> > Or even simply replace Config with int, with each bit represent an
> option, if all of them are boolean options.
> >
> > Nothing is more flexible than just using RelOptRuleOperand as the
> parameter, just like the base class RelOptRule does. But do we want it?
> >
> > At the same time, with the new approach, it is now legit to create the
> following instance:
> >
> >   private static final CassandraFilterRule INSTANCE2 =
> >         Config.EMPTY
> >             .withOperandSupplier(b0 ->
> >                 b0.operand(LogicalProject.class)  // Even the is
> intended to match Filter, but it compiles
> >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> >                         .noInputs()))
> >             .as(Config.class)
> >             .toRule();
> >
> > Before we continue to the discussion and code review, we need to go back
> to the original problem, is it a real problem that is facing us? Is there
> any real demand or just artificial demand? How about we conduct a Calcite
> user survey to see how Calcite devs and users think? Then let's see how to
> move forward.
> >
> > [+1] Hell yeah, the new approach is awesome, let's go with it.
> > [+0] Meh, I am OK with current approach, I don't see any burden or
> problem with it.
> > [-1] Hell no, current approach is bad, the new one is even worse.
> >
> >
> > Thanks,
> > Haisheng
> >
> >
> > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > > There is now a PR: https://github.com/apache/calcite/pull/2024. Can
> > > people please review?
> > >
> > > Here's the TL;DR:
> > >
> > > Previously, it was not easy to customize, re-use or extend planner
> > > rules. If you wished to customize a rule (i.e. create a new instance
> > > of a rule with different properties) you would have to call the rule's
> > > constructor. Frequently the required constructor did not exist, so we
> > > would have to add a new constructor and deprecate the old one.
> > >
> > > After this change, you start off from an instance of the rule, modify
> > > its configuration, and call toRule() on the configuration. (Rule
> > > constructors are now private, because only the configuration ever
> > > calls them.)
> > >
> > > A good illustration of this is DruidRules, which used to contain many
> > > sub-classes. Those sub-classes are no longer needed. Old code:
> > >
> > >   public static final DruidSortProjectTransposeRule
> SORT_PROJECT_TRANSPOSE =
> > >       new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > >
> > >     public static class DruidSortProjectTransposeRule
> > >         extends SortProjectTransposeRule {
> > >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > > relBuilderFactory) {
> > >         super(
> > >             operand(Sort.class,
> > >                 operand(Project.class, operand(DruidQuery.class,
> none()))),
> > >             relBuilderFactory, null);
> > >       }
> > >     }
> > >
> > > New code:
> > >
> > >   public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
> > >       SortProjectTransposeRule.INSTANCE.config
> > >           .withOperandFor(Sort.class, Project.class, DruidQuery.class)
> > >           .toRule();
> > >
> > > The change maintains backwards compatibility to a large degree. In a
> > > few places, I had to change rule instances from type RelOptRule to
> > > Supplier<RelOptRule>, to avoid deadlocks during class loading. For
> > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
> > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > >
> > > Julian
> > >
> > >
> > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
> > > >
> > > > I have now pushed a dev branch with a prototype. Please see
> > > > https://issues.apache.org/jira/browse/CALCITE-3923 for details.
> > > >
> > > > Having built the prototype, I believe that this change is beneficial
> > > > and we should do it. But I would like to get to consensus on the
> > > > design before we pull the trigger.
> > > >
> > > > Julian
> > > >
> > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org>
> wrote:
> > > > >
> > > > > Haisheng,
> > > > >
> > > > > I hear you. I agree that major changes to rules will require new
> rule
> > > > > classes (not merely sub-classes). People should copy-paste,
> refactor,
> > > > > and all that good stuff. But I think there are a lot of cases
> where we
> > > > > need to make minor changes to rules (there are many of these in the
> > > > > code base already), and this change will help.
> > > > >
> > > > > I have logged https://issues.apache.org/jira/browse/CALCITE-3923
> and
> > > > > am going to start working on a prototype. When we have a prototype
> we
> > > > > will be able to assess how big an impact the API change will have.
> > > > > (E.g. whether it will be a breaking change.)
> > > > >
> > > > > Julian
> > > > >
> > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> h.yuan@alibaba-inc.com> wrote:
> > > > > >
> > > > > > I don't think it is worth the refactoring. People who want to
> customize the rule, in most cases, won't be satisfied by a different
> parameter, they most likely still need to rewrite (copy & paste) the rule
> with some slightly their own logic. For many Calcite users, the rule is not
> reusable even with flexible configurations.
> > > > > >
> > > > > > - Haisheng
> > > > > >
> > > > > >
> ------------------------------------------------------------------
> > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Apologies for the late reply but I just realised that I had
> written the
> > > > > > mail and never pressed the send button.
> > > > > >
> > > > > > I think it is a nice idea and certainly a problem worth
> addressing. If I
> > > > > > understood well you're thinking something like the current
> constructor of
> > > > > > the RelBuilder [1] that accepts a Context parameter. Indeed it
> seems that
> > > > > > with this change even rules that are not designed to be
> configured can be
> > > > > > changed much more gracefully (without adding new constructors
> and breaking
> > > > > > changes).
> > > > > >
> > > > > > On the other hand, some of the advantages that you mention can
> also be
> > > > > > turned into disadvantages. For instance, copying a rule without
> knowing the
> > > > > > values of the other parameters is a bit risky and might be
> harder to reason
> > > > > > about its correctness. Moreover, private constructors, final
> classes, etc.,
> > > > > > are primarily used for encapsulation purposes so allowing the
> state of the
> > > > > > rule escape somehow breaks the original design of the rule.
> > > > > >
> > > > > > Another problem with respect to rules is cross convention
> matching and
> > > > > > transformations [2]. Many rules should not fire for operands
> that are in
> > > > > > different conventions; a typical example that comes in my mind is
> > > > > > FilterProjectTransposeRule [3]. In the same spirit most rules
> should not
> > > > > > generate mixed convention transformations. Although a different
> problem, I
> > > > > > am mentioning it here since it could affect the design of the
> new API.
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > > >
> > > > > > [1]
> > > > > >
> https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > >
> > > > > > [2]
> > > > > >
> https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > [3]
> > > > > >
> https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > >
> > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org>
> wrote:
> > > > > >
> > > > > > > This sounds reasonable to me. It also sounds like we could
> make this
> > > > > > > backwards compatible by retaining (but deprecating) the
> existing
> > > > > > > constructors and factory methods that will no longer be needed.
> > > > > > > --
> > > > > > > Michael Mior
> > > > > > > mmior@apache.org
> > > > > > >
> > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org>
> a écrit :
> > > > > > > >
> > > > > > > > I have an idea for a refactoring to RelOptRule. I haven’t
> fully thought
> > > > > > > it through, but I’m going to sketch it out here to see whether
> folks agree
> > > > > > > about the problems/solutions.
> > > > > > > >
> > > > > > > > It will be a breaking change (in the sense that people will
> have to
> > > > > > > change their code in order to get it to compile) but
> relatively safe (in
> > > > > > > that once the code compiles, it will have the same behavior as
> before).
> > > > > > > Also it will give Calcite developers and users a lot more
> flexibility going
> > > > > > > forward.
> > > > > > > >
> > > > > > > > The problem I see is that people often want different
> variants of
> > > > > > > planner rules. An example is FilterJoinRule, which has a
> 'boolean smart’
> > > > > > > parameter, a predicate (which returns whether to pull up filter
> > > > > > > conditions), operands (which determine the precise sub-classes
> of RelNode
> > > > > > > that the rule should match) and a relBuilderFactory (which
> controls the
> > > > > > > type of RelNode created by this rule).
> > > > > > > >
> > > > > > > > Suppose you have an instance of FilterJoinRule and you want
> to change
> > > > > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable
> (good!) but
> > > > > > > you can’t easily create a clone of the rule because you don’t
> know the
> > > > > > > values of the other parameters. Your instance might even be
> (unbeknownst to
> > > > > > > you) a sub-class with extra parameters and a private
> constructor.
> > > > > > > >
> > > > > > > > So, my proposal is to put all of the config information of a
> RelOptRule
> > > > > > > into a single ‘config’ parameter that contains all relevant
> properties.
> > > > > > > Each sub-class of RelOptRule would have one constructor with
> just a
> > > > > > > ‘config’ parameter. Each config knows which sub-class of
> RelOptRule to
> > > > > > > create. Therefore it is easy to copy a config, change one or
> more
> > > > > > > properties, and create a new rule instance.
> > > > > > > >
> > > > > > > > Adding a property to a rule’s config does not require us to
> add or
> > > > > > > deprecate any constructors.
> > > > > > > >
> > > > > > > > The operands are part of the config, so if you have a rule
> that matches
> > > > > > > a EnumerableFilter on an EnumerableJoin and you want to make
> it match an
> > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can
> easily create one
> > > > > > > with one changed operand.
> > > > > > > >
> > > > > > > > The config is immutable and self-describing, so we can use
> it to
> > > > > > > automatically generate a unique description for each rule
> instance.
> > > > > > > >
> > > > > > > > Julian
> > > > > > > >
> > > > > > > > [1]
> > > > > > >
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > <
> > > > > > >
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > >
>

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Michael Mior <mm...@apache.org>.
I'll just add for the relatively common case where a rule is matching
a node with a specific node as a child, and so on that we could easily
add a less verbose API to make this part of the rule definition
equally concise.

--
Michael Mior
mmior@apache.org

Le dim. 14 juin 2020 à 15:36, Haisheng Yuan <hy...@apache.org> a écrit :
>
> Hi Julian,
>
> Thanks for working on this.
>
> We haven't reached a consensus yet.
>
> Frankly speaking, I agree with what Stamatis said earlier. Flexibility doesn't come with no cost. Admittedly, with this patch, any rule constructor refactoring won't need to deprecate old constructors and break backward compatibility, however, it also makes the rule definition much more verbose, less readable and understandable. IMHO, it does more harm than good.
>
> Let's see how CassandraFilterRule becomes before and after the change.
>
> Before this change:
>
>   private static class CassandraFilterRule extends RelOptRule {
>     private static final CassandraFilterRule INSTANCE = new CassandraFilterRule();
>
>     private CassandraFilterRule() {
>       super(operand(LogicalFilter.class, operand(CassandraTableScan.class, none())),
>           "CassandraFilterRule");
>     }
>   }
>
> After this change:
>
>   private static class CassandraFilterRule
>       extends RelOptNewRule<CassandraFilterRule.Config> {
>     private static final CassandraFilterRule INSTANCE =
>         Config.EMPTY
>             .withOperandSupplier(b0 ->
>                 b0.operand(LogicalFilter.class)
>                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
>                         .noInputs()))
>             .as(Config.class)
>             .toRule();
>
>     /** Creates a CassandraFilterRule. */
>     protected CassandraFilterRule(Config config) {
>       super(config);
>     }
>
>     /** Rule configuration. */
>     public interface Config extends RelOptNewRule.Config {
>       @Override default CassandraFilterRule toRule() {
>         return new CassandraFilterRule(this);
>       }
>     }
>   }
>
>
> Intuitively, if we want to check what does the rule generally match or how it is defined, we just check the constructor. Now we checkout the constructor, only config is there, go to Config, there is still nothing interesting, we have to go to the INSTANCE. What is the difference with just using operand and optionConfig as the rule constructor?
>
>    protected CassandraFilterRule(RelOptRuleOperand operand, Config config) {
>      super(operand, config);
>    }
>
> Or even simply replace Config with int, with each bit represent an option, if all of them are boolean options.
>
> Nothing is more flexible than just using RelOptRuleOperand as the parameter, just like the base class RelOptRule does. But do we want it?
>
> At the same time, with the new approach, it is now legit to create the following instance:
>
>   private static final CassandraFilterRule INSTANCE2 =
>         Config.EMPTY
>             .withOperandSupplier(b0 ->
>                 b0.operand(LogicalProject.class)  // Even the is intended to match Filter, but it compiles
>                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
>                         .noInputs()))
>             .as(Config.class)
>             .toRule();
>
> Before we continue to the discussion and code review, we need to go back to the original problem, is it a real problem that is facing us? Is there any real demand or just artificial demand? How about we conduct a Calcite user survey to see how Calcite devs and users think? Then let's see how to move forward.
>
> [+1] Hell yeah, the new approach is awesome, let's go with it.
> [+0] Meh, I am OK with current approach, I don't see any burden or problem with it.
> [-1] Hell no, current approach is bad, the new one is even worse.
>
>
> Thanks,
> Haisheng
>
>
> On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > There is now a PR: https://github.com/apache/calcite/pull/2024. Can
> > people please review?
> >
> > Here's the TL;DR:
> >
> > Previously, it was not easy to customize, re-use or extend planner
> > rules. If you wished to customize a rule (i.e. create a new instance
> > of a rule with different properties) you would have to call the rule's
> > constructor. Frequently the required constructor did not exist, so we
> > would have to add a new constructor and deprecate the old one.
> >
> > After this change, you start off from an instance of the rule, modify
> > its configuration, and call toRule() on the configuration. (Rule
> > constructors are now private, because only the configuration ever
> > calls them.)
> >
> > A good illustration of this is DruidRules, which used to contain many
> > sub-classes. Those sub-classes are no longer needed. Old code:
> >
> >   public static final DruidSortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
> >       new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> >
> >     public static class DruidSortProjectTransposeRule
> >         extends SortProjectTransposeRule {
> >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > relBuilderFactory) {
> >         super(
> >             operand(Sort.class,
> >                 operand(Project.class, operand(DruidQuery.class, none()))),
> >             relBuilderFactory, null);
> >       }
> >     }
> >
> > New code:
> >
> >   public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
> >       SortProjectTransposeRule.INSTANCE.config
> >           .withOperandFor(Sort.class, Project.class, DruidQuery.class)
> >           .toRule();
> >
> > The change maintains backwards compatibility to a large degree. In a
> > few places, I had to change rule instances from type RelOptRule to
> > Supplier<RelOptRule>, to avoid deadlocks during class loading. For
> > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
> > now write FilterJoinRule.FILTER_ON_JOIN.get().
> >
> > Julian
> >
> >
> > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
> > >
> > > I have now pushed a dev branch with a prototype. Please see
> > > https://issues.apache.org/jira/browse/CALCITE-3923 for details.
> > >
> > > Having built the prototype, I believe that this change is beneficial
> > > and we should do it. But I would like to get to consensus on the
> > > design before we pull the trigger.
> > >
> > > Julian
> > >
> > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org> wrote:
> > > >
> > > > Haisheng,
> > > >
> > > > I hear you. I agree that major changes to rules will require new rule
> > > > classes (not merely sub-classes). People should copy-paste, refactor,
> > > > and all that good stuff. But I think there are a lot of cases where we
> > > > need to make minor changes to rules (there are many of these in the
> > > > code base already), and this change will help.
> > > >
> > > > I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > am going to start working on a prototype. When we have a prototype we
> > > > will be able to assess how big an impact the API change will have.
> > > > (E.g. whether it will be a breaking change.)
> > > >
> > > > Julian
> > > >
> > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
> > > > >
> > > > > I don't think it is worth the refactoring. People who want to customize the rule, in most cases, won't be satisfied by a different parameter, they most likely still need to rewrite (copy & paste) the rule with some slightly their own logic. For many Calcite users, the rule is not reusable even with flexible configurations.
> > > > >
> > > > > - Haisheng
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > 日 期:2020年03月14日 22:54:04
> > > > > 收件人:<de...@calcite.apache.org>
> > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > > > >
> > > > > Hello,
> > > > >
> > > > > Apologies for the late reply but I just realised that I had written the
> > > > > mail and never pressed the send button.
> > > > >
> > > > > I think it is a nice idea and certainly a problem worth addressing. If I
> > > > > understood well you're thinking something like the current constructor of
> > > > > the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
> > > > > with this change even rules that are not designed to be configured can be
> > > > > changed much more gracefully (without adding new constructors and breaking
> > > > > changes).
> > > > >
> > > > > On the other hand, some of the advantages that you mention can also be
> > > > > turned into disadvantages. For instance, copying a rule without knowing the
> > > > > values of the other parameters is a bit risky and might be harder to reason
> > > > > about its correctness. Moreover, private constructors, final classes, etc.,
> > > > > are primarily used for encapsulation purposes so allowing the state of the
> > > > > rule escape somehow breaks the original design of the rule.
> > > > >
> > > > > Another problem with respect to rules is cross convention matching and
> > > > > transformations [2]. Many rules should not fire for operands that are in
> > > > > different conventions; a typical example that comes in my mind is
> > > > > FilterProjectTransposeRule [3]. In the same spirit most rules should not
> > > > > generate mixed convention transformations. Although a different problem, I
> > > > > am mentioning it here since it could affect the design of the new API.
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > > [1]
> > > > > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > >
> > > > > [2]
> > > > > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > [3]
> > > > > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > >
> > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org> wrote:
> > > > >
> > > > > > This sounds reasonable to me. It also sounds like we could make this
> > > > > > backwards compatible by retaining (but deprecating) the existing
> > > > > > constructors and factory methods that will no longer be needed.
> > > > > > --
> > > > > > Michael Mior
> > > > > > mmior@apache.org
> > > > > >
> > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a écrit :
> > > > > > >
> > > > > > > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> > > > > > it through, but I’m going to sketch it out here to see whether folks agree
> > > > > > about the problems/solutions.
> > > > > > >
> > > > > > > It will be a breaking change (in the sense that people will have to
> > > > > > change their code in order to get it to compile) but relatively safe (in
> > > > > > that once the code compiles, it will have the same behavior as before).
> > > > > > Also it will give Calcite developers and users a lot more flexibility going
> > > > > > forward.
> > > > > > >
> > > > > > > The problem I see is that people often want different variants of
> > > > > > planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> > > > > > parameter, a predicate (which returns whether to pull up filter
> > > > > > conditions), operands (which determine the precise sub-classes of RelNode
> > > > > > that the rule should match) and a relBuilderFactory (which controls the
> > > > > > type of RelNode created by this rule).
> > > > > > >
> > > > > > > Suppose you have an instance of FilterJoinRule and you want to change
> > > > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but
> > > > > > you can’t easily create a clone of the rule because you don’t know the
> > > > > > values of the other parameters. Your instance might even be (unbeknownst to
> > > > > > you) a sub-class with extra parameters and a private constructor.
> > > > > > >
> > > > > > > So, my proposal is to put all of the config information of a RelOptRule
> > > > > > into a single ‘config’ parameter that contains all relevant properties.
> > > > > > Each sub-class of RelOptRule would have one constructor with just a
> > > > > > ‘config’ parameter. Each config knows which sub-class of RelOptRule to
> > > > > > create. Therefore it is easy to copy a config, change one or more
> > > > > > properties, and create a new rule instance.
> > > > > > >
> > > > > > > Adding a property to a rule’s config does not require us to add or
> > > > > > deprecate any constructors.
> > > > > > >
> > > > > > > The operands are part of the config, so if you have a rule that matches
> > > > > > a EnumerableFilter on an EnumerableJoin and you want to make it match an
> > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one
> > > > > > with one changed operand.
> > > > > > >
> > > > > > > The config is immutable and self-describing, so we can use it to
> > > > > > automatically generate a unique description for each rule instance.
> > > > > > >
> > > > > > > Julian
> > > > > > >
> > > > > > > [1]
> > > > > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > <
> > > > > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> >

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Stamatis Zampetakis <za...@gmail.com>.
I agree with this plan. We already have ~70 commits in master so it is good
enough to go for a release

On Thu, Jul 2, 2020 at 12:27 AM Julian Hyde <jh...@apache.org> wrote:

> PS If we take that approach, let's release 1.24 soon. CALCITE-3923 is
> a big PR and bit-rot will set in if it lingers too long.
>
> On Wed, Jul 1, 2020 at 3:23 PM Julian Hyde <jh...@apache.org> wrote:
> >
> > Stamatis,
> >
> > Do you mean we should put into the next release (1.24) just a few
> > lines '@Deprecated // to be removed before 1.25; use xxx instead'
> > tagging the breaking changes, and then make the breaking changes in
> > master immediately after 1.24?
> >
> > I would support that approach. We could deal with
> > https://issues.apache.org/jira/browse/CALCITE-4079 at the same time.
> >
> > For the record, there won't be many breaking changes in
> > https://issues.apache.org/jira/browse/CALCITE-3923. (Just a few
> > constants that we have to remove because of class-loading issues.) But
> > there will be a lot of things deprecated because they have been moved
> > to a "better" place.
> >
> > On Sun, Jun 28, 2020 at 4:08 PM Stamatis Zampetakis <za...@gmail.com>
> wrote:
> > >
> > > Indeed nice explanation Julian, thanks!
> > >
> > > Regarding the transition one thing that might help is to try to put
> the new
> > > model in a small release without other breaking changes.
> > > We could get 1.24 out without the refactoring and soon afterwards
> release
> > > 1.25.
> > >
> > > Given that the community has been rather silent on this I guess that
> people
> > > don't really mind if it goes as is in the next release so I may be over
> > > complicating things for no reason.
> > > In the end, we have also committed features with bigger impact with no
> > > special treatment.
> > >
> > > Best,
> > > Stamatis
> > >
> > >
> > > On Fri, Jun 19, 2020 at 11:37 PM Haisheng Yuan <hy...@apache.org>
> wrote:
> > >
> > > > Thank you for the detailed explanation, Julian.
> > > >
> > > > I will step aside, let you and other community members decide.
> > > > Anyway, my comment is not blocking.
> > > >
> > > > Thanks,
> > > > Haisheng
> > > >
> > > > On 2020/06/17 23:12:40, Julian Hyde <jh...@apache.org> wrote:
> > > > > "I prefer the KISS principle" is a bit unfair. What I'm advocating
> is
> > > > > also simple. But maybe from a different perspective.
> > > > >
> > > > > I am looking at it from the perspective of people who use rules,
> and
> > > > > compose them into rule sets to perform particular optimizations. I
> > > > > believe this is the most important perspective to focus on. These
> > > > > people are the key audience to serve. The corpus of re-usable,
> > > > > composable rules may be Calcite's most important contribution.
> > > > >
> > > > > To these people, a rule is not a class but an object. It has a
> > > > > particular signature (in terms of the pattern of RelNodes that it
> > > > > matches), maybe one or two configuration parameters (e.g.
> > > > > FilterJoinRule.smart controls whether to automatically strengthen a
> > > > > LEFT join to INNER), and it has code to generate a new RelNode when
> > > > > matched.
> > > > >
> > > > > What do I mean by 'object'? Think of how you use regular
> expressions
> > > > > in Java. You call java.util.regex.Pattern.compile("a[bc]*") and it
> > > > > gives you a Pattern object. You do not know what fields are inside
> > > > > that Pattern, you do not care whether the object you receive is a
> > > > > Pattern or some sub-class of Pattern, but you know there are one or
> > > > > two methods such as 'matcher(CharSequence)' that you can call.
> > > > >
> > > > > Our current API treats planner rules as classes. If you want to
> > > > > customize a property of a rule (say make a FilterJoinRule with
> > > > > smart=false, or change its RelBuilder, or make it match a
> > > > > CassandraFilter rather than a LogicalFilter) then you have to make
> a
> > > > > new rule instance by calling the constructor.
> > > > >
> > > > > When you call the constructor, you have to supply ALL of the
> > > > > properties, including operands, and including properties that you
> > > > > don't know or care about. If someone in three months adds a new
> > > > > property, the signature of the constructor will change, and you
> will
> > > > > have to go back and change your code. This is broken.
> > > > >
> > > > > Haisheng asked for evidence that the current system is broken.
> > > > > https://issues.apache.org/jira/browse/CALCITE-3825 ("Split
> > > > > AbstractMaterializedViewRule into multiple classes") is one
> example.
> > > > > In my company, that change literally broke our code because we had
> > > > > changed properties of some rules.
> > > > >
> > > > > Another example is
> https://issues.apache.org/jira/browse/CALCITE-3975
> > > > > ("Add options to ProjectFilterTransposeRule to push down project
> and
> > > > > filter expressions whole, not just field references"). With that
> > > > > change, I broke my own code. I added two new arguments, 'boolean
> > > > > wholeProject, boolean wholeFilter' to ProjectFilterTransposeRule's
> > > > > constructor, deprecated the previous constructor, and created
> quite a
> > > > > few conflicts in another dev branch that I was working on.
> > > > >
> > > > > These kinds of problems are not uncommon. Because it is difficult
> to
> > > > > reconfigure rules, or evolve them by adding extra arguments, people
> > > > > are more likely to copy-paste rule code into their own projects,
> and
> > > > > less like to contribute back.
> > > > >
> > > > > In most rules, operands are created evaluating a complex
> expression -
> > > > > made up of calls to static methods such as RelOptRule.operand - in
> a
> > > > > constructor as an argument to the base class constructor. This is a
> > > > > code smell. Such code is hard to move elsewhere.
> > > > >
> > > > > We treat operands as immutable, but actually they are mutable.
> Every
> > > > > operand contains mutable fields 'solveOrder', 'ordinalInParent' and
> > > > > 'ordinalInRule' that are assigned in the RelOptRule constructor. We
> > > > > don't prevent you from passing the same operand to two different
> rule
> > > > > instances, but if you did, bad things would happen. Actually I
> think
> > > > > people would love to be able to say 'create a rule instance the
> same
> > > > > as this, but replacing LogicalFilter with CassandraFilter' but we
> > > > > don't make it easy, and copying operands from one rule to another
> is
> > > > > certainly the wrong way to achieve it.
> > > > >
> > > > > Given all of these problems, the solution was to convert rules into
> > > > > data objects. Those data objects are the new Config interface and
> its
> > > > > sub-interfaces. (I used an interface because the ImmutableBeans
> > > > > framework makes it easy to create immutable data objects without
> > > > > generating code or writing a lot of boilerplate code by hand.)
> > > > >
> > > > > These Config interfaces mean one extra class per rule, and about 5
> > > > > extra lines of source code. If you measure simplicity in lines of
> > > > > code, then I suppose they make things a little less simple.
> > > > >
> > > > > The new way of building operands, using a builder that makes
> > > > > call-backs for each nested operand, and has a method for each
> > > > > attribute of an operand (e.g. predicate()) is also 'less simple' if
> > > > > you count lines of code. But it is more elegant in that it can
> > > > > (potentially) produce genuinely immutable operands, is strongly
> typed,
> > > > > and offers helpful code-completion when you use it in an IDE.
> > > > >
> > > > > I hope you now understand the problems I am trying to solve, and
> why
> > > > > this new approach is better. That said, this change isn't perfect.
> We
> > > > > can evolve this change to make the code more concise, or easier to
> > > > > understand.
> > > > >
> > > > > Let's discuss some possible improvements.
> > > > >
> > > > > One possible improvement would be to stop using rule instances
> > > > > (RelOptRule) and instead use rule config instances
> > > > > (RelOptRule.Config). The planner could easily call Config.toRule()
> > > > > when building a program. The sub-classes of RelOptRule would be
> > > > > largely invisible (private sub-classes of the Config classes), and
> > > > > that is appropriate, because all they provide is the
> implementation of
> > > > > the onMatch() method. There would be a one-time impact because a
> lot
> > > > > of types would change from RelOptRule to RelOptRule.Config.
> > > > >
> > > > > Another improvement would be to move around the rule constants.
> > > > > FilterCorrelateRule.INSTANCE and
> AggregateFilterTransposeRule.INSTANCE
> > > > > would become the fields FILTER_CORRELATE and
> > > > > AGGREGATE_FILTER_TRANSPOSE in a new class, LogicalRules. People
> would
> > > > > no longer need to reference the names of rule classes in their
> code,
> > > > > just the instances. And of course they can now easily customize
> those
> > > > > instances.
> > > > >
> > > > > I hope I have convinced you that rules need to become 'data
> objects'
> > > > > with a small amount of behavior, and that people need to be able to
> > > > > customize them without calling their constructor or even knowing
> their
> > > > > precise class.
> > > > >
> > > > > I would love to hear suggestions for how we can make the
> transition to
> > > > > this new model as smooth as possible.
> > > > >
> > > > > Julian
> > > > >
> > > > > On Sun, Jun 14, 2020 at 9:15 PM Haisheng Yuan <hy...@apache.org>
> wrote:
> > > > > >
> > > > > > Druid adapter rules are not used by Druid. As far as I know,
> only Hive
> > > > uses these rules, I even think they should be moved to Hive project.
> :) If
> > > > any other projects other than Hive are using them, please let us
> know.
> > > > Basically, these Druid rules are not generally applicable. The PR
> changed
> > > > many built-in rules to specifically accommodate Druid adapter rules,
> > > > especially with third operand that can match any operator, even
> > > > EnumerableTableScan is legit.
> > > > > >
> > > > > > Most converter rules are not designed to be flexible, no one
> would
> > > > extend them. RelOptRule#operand() methods are deprecated, which
> implies
> > > > someday, they will be removed and migrated to the new Config. But
> most down
> > > > stream projects don't need to worry about the their rules'
> extensibility
> > > > and compatibility, because they can modify their own rules freely,
> anytime.
> > > > > >
> > > > > > At the end of the day, we may find that only a small fraction of
> > > > transformation rules need refactoring, e.g.
> ProjectFilterTransposeRule.
> > > > Note that even FilterProjectTransposeRule doesn't need refactoring,
> we can
> > > > safely remove copyFilter and copyProject, it is safe to always copy
> them,
> > > > do we really want to match physical operators and generate logical
> > > > alternatives?
> > > > > >
> > > > > > Last but the most important, rule operands, IMO, should be out of
> > > > Config's reach. How frequent do we need to change rule operands or
> its
> > > > matching class? IMO, operands of rule constructor should remain
> unchanged,
> > > > RelFactory and Rule description can also remain the same, or adapted
> by
> > > > Config silently without changing the rule itself, if there are no
> other
> > > > additional parameters. Other than that, everything can be put into
> Config.
> > > > Therefore I don't think RelOptNewRule is needed, because RelOptRule
> should
> > > > be able to integrate with Config seamlessly, without breaking
> anything.
> > > > > >
> > > > > > All in all, I prefer the KISS principle, keep it simple, stupid.
> > > > > >
> > > > > > So my opinion is +0.5.
> > > > > >
> > > > > > Thanks,
> > > > > > Haisheng
> > > > > >
> > > > > > On 2020/06/14 23:36:21, Stamatis Zampetakis <za...@gmail.com>
> wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > Thanks for putting this together Julian.
> > > > > > >
> > > > > > > I went quickly over the PR just to grasp better the idea and
> here
> > > > are some
> > > > > > > first thoughts.
> > > > > > >
> > > > > > > I really like the fact that rules creation is now much more
> uniform
> > > > than
> > > > > > > before.
> > > > > > > I also like the fact that in some cases subclassing can be
> avoided
> > > > thanks
> > > > > > > to the flexible configuration. As a fan of the Liskov
> > > > > > > substitution principle I find it positive to be able to avoid
> > > > classes such
> > > > > > > as DruidProjectFilterTransposeRule although to be honest I
> don't
> > > > know why
> > > > > > > these Druid rules impose those additional restrictions.
> > > > > > >
> > > > > > > On the other hand, I also feel that the new approach is a bit
> harder
> > > > to
> > > > > > > follow than the previous one. The fact that we have more
> extension
> > > > points
> > > > > > > gives more flexibility but at the same time complicates the
> > > > implementation
> > > > > > > a bit. I guess regular committers will not have much trouble to
> > > > adapt to
> > > > > > > the changes but for newcomers there may be more questions. For
> > > > instance:
> > > > > > > What do we do when we want to customize the behavior of an
> existing
> > > > rule?
> > > > > > > * Use an existing config with different parameters.
> > > > > > > * Extend the config.
> > > > > > > * Extend the rule.
> > > > > > > * Extend the config and the rule.
> > > > > > > * Create a new rule (if yes under which interface RelOptRule or
> > > > > > > RelOptNewRule).
> > > > > > >
> > > > > > > As Haisheng mentioned, the fact that every rule can be
> configured
> > > > with any
> > > > > > > RelOptRuleOperand is also a point possibly deserving more
> discussion.
> > > > > > > Ideally, the API should be able to guide developers to pass the
> > > > correct
> > > > > > > configuration parameters and not fail at runtime.
> > > > > > >
> > > > > > > Overall, I have mixed feelings about the proposed refactoring
> > > > (definitely
> > > > > > > not something blocking), I guess cause I haven't seen as many
> > > > use-cases as
> > > > > > > Julian.
> > > > > > > I'm curious to see what others think about the changes. It's a
> pity
> > > > to take
> > > > > > > a decision based on the feedback of only two/three people.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Jun 14, 2020 at 9:36 PM Haisheng Yuan <
> hyuan@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Julian,
> > > > > > > >
> > > > > > > > Thanks for working on this.
> > > > > > > >
> > > > > > > > We haven't reached a consensus yet.
> > > > > > > >
> > > > > > > > Frankly speaking, I agree with what Stamatis said earlier.
> > > > Flexibility
> > > > > > > > doesn't come with no cost. Admittedly, with this patch, any
> rule
> > > > > > > > constructor refactoring won't need to deprecate old
> constructors
> > > > and break
> > > > > > > > backward compatibility, however, it also makes the rule
> definition
> > > > much
> > > > > > > > more verbose, less readable and understandable. IMHO, it
> does more
> > > > harm
> > > > > > > > than good.
> > > > > > > >
> > > > > > > > Let's see how CassandraFilterRule becomes before and after
> the
> > > > change.
> > > > > > > >
> > > > > > > > Before this change:
> > > > > > > >
> > > > > > > >   private static class CassandraFilterRule extends
> RelOptRule {
> > > > > > > >     private static final CassandraFilterRule INSTANCE = new
> > > > > > > > CassandraFilterRule();
> > > > > > > >
> > > > > > > >     private CassandraFilterRule() {
> > > > > > > >       super(operand(LogicalFilter.class,
> > > > operand(CassandraTableScan.class,
> > > > > > > > none())),
> > > > > > > >           "CassandraFilterRule");
> > > > > > > >     }
> > > > > > > >   }
> > > > > > > >
> > > > > > > > After this change:
> > > > > > > >
> > > > > > > >   private static class CassandraFilterRule
> > > > > > > >       extends RelOptNewRule<CassandraFilterRule.Config> {
> > > > > > > >     private static final CassandraFilterRule INSTANCE =
> > > > > > > >         Config.EMPTY
> > > > > > > >             .withOperandSupplier(b0 ->
> > > > > > > >                 b0.operand(LogicalFilter.class)
> > > > > > > >                     .oneInput(b1 ->
> > > > b1.operand(CassandraTableScan.class)
> > > > > > > >                         .noInputs()))
> > > > > > > >             .as(Config.class)
> > > > > > > >             .toRule();
> > > > > > > >
> > > > > > > >     /** Creates a CassandraFilterRule. */
> > > > > > > >     protected CassandraFilterRule(Config config) {
> > > > > > > >       super(config);
> > > > > > > >     }
> > > > > > > >
> > > > > > > >     /** Rule configuration. */
> > > > > > > >     public interface Config extends RelOptNewRule.Config {
> > > > > > > >       @Override default CassandraFilterRule toRule() {
> > > > > > > >         return new CassandraFilterRule(this);
> > > > > > > >       }
> > > > > > > >     }
> > > > > > > >   }
> > > > > > > >
> > > > > > > >
> > > > > > > > Intuitively, if we want to check what does the rule generally
> > > > match or how
> > > > > > > > it is defined, we just check the constructor. Now we
> checkout the
> > > > > > > > constructor, only config is there, go to Config, there is
> still
> > > > nothing
> > > > > > > > interesting, we have to go to the INSTANCE. What is the
> difference
> > > > with
> > > > > > > > just using operand and optionConfig as the rule constructor?
> > > > > > > >
> > > > > > > >    protected CassandraFilterRule(RelOptRuleOperand operand,
> Config
> > > > config)
> > > > > > > > {
> > > > > > > >      super(operand, config);
> > > > > > > >    }
> > > > > > > >
> > > > > > > > Or even simply replace Config with int, with each bit
> represent an
> > > > option,
> > > > > > > > if all of them are boolean options.
> > > > > > > >
> > > > > > > > Nothing is more flexible than just using RelOptRuleOperand
> as the
> > > > > > > > parameter, just like the base class RelOptRule does. But do
> we
> > > > want it?
> > > > > > > >
> > > > > > > > At the same time, with the new approach, it is now legit to
> create
> > > > the
> > > > > > > > following instance:
> > > > > > > >
> > > > > > > >   private static final CassandraFilterRule INSTANCE2 =
> > > > > > > >         Config.EMPTY
> > > > > > > >             .withOperandSupplier(b0 ->
> > > > > > > >                 b0.operand(LogicalProject.class)  // Even
> the is
> > > > intended
> > > > > > > > to match Filter, but it compiles
> > > > > > > >                     .oneInput(b1 ->
> > > > b1.operand(CassandraTableScan.class)
> > > > > > > >                         .noInputs()))
> > > > > > > >             .as(Config.class)
> > > > > > > >             .toRule();
> > > > > > > >
> > > > > > > > Before we continue to the discussion and code review, we
> need to
> > > > go back
> > > > > > > > to the original problem, is it a real problem that is facing
> us?
> > > > Is there
> > > > > > > > any real demand or just artificial demand? How about we
> conduct a
> > > > Calcite
> > > > > > > > user survey to see how Calcite devs and users think? Then
> let's
> > > > see how to
> > > > > > > > move forward.
> > > > > > > >
> > > > > > > > [+1] Hell yeah, the new approach is awesome, let's go with
> it.
> > > > > > > > [+0] Meh, I am OK with current approach, I don't see any
> burden or
> > > > problem
> > > > > > > > with it.
> > > > > > > > [-1] Hell no, current approach is bad, the new one is even
> worse.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Haisheng
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org>
> wrote:
> > > > > > > > > There is now a PR:
> https://github.com/apache/calcite/pull/2024.
> > > > Can
> > > > > > > > > people please review?
> > > > > > > > >
> > > > > > > > > Here's the TL;DR:
> > > > > > > > >
> > > > > > > > > Previously, it was not easy to customize, re-use or extend
> > > > planner
> > > > > > > > > rules. If you wished to customize a rule (i.e. create a new
> > > > instance
> > > > > > > > > of a rule with different properties) you would have to
> call the
> > > > rule's
> > > > > > > > > constructor. Frequently the required constructor did not
> exist,
> > > > so we
> > > > > > > > > would have to add a new constructor and deprecate the old
> one.
> > > > > > > > >
> > > > > > > > > After this change, you start off from an instance of the
> rule,
> > > > modify
> > > > > > > > > its configuration, and call toRule() on the configuration.
> (Rule
> > > > > > > > > constructors are now private, because only the
> configuration ever
> > > > > > > > > calls them.)
> > > > > > > > >
> > > > > > > > > A good illustration of this is DruidRules, which used to
> contain
> > > > many
> > > > > > > > > sub-classes. Those sub-classes are no longer needed. Old
> code:
> > > > > > > > >
> > > > > > > > >   public static final DruidSortProjectTransposeRule
> > > > > > > > SORT_PROJECT_TRANSPOSE =
> > > > > > > > >       new
> > > > DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > > > > > > > >
> > > > > > > > >     public static class DruidSortProjectTransposeRule
> > > > > > > > >         extends SortProjectTransposeRule {
> > > > > > > > >       public
> DruidSortProjectTransposeRule(RelBuilderFactory
> > > > > > > > > relBuilderFactory) {
> > > > > > > > >         super(
> > > > > > > > >             operand(Sort.class,
> > > > > > > > >                 operand(Project.class,
> operand(DruidQuery.class,
> > > > > > > > none()))),
> > > > > > > > >             relBuilderFactory, null);
> > > > > > > > >       }
> > > > > > > > >     }
> > > > > > > > >
> > > > > > > > > New code:
> > > > > > > > >
> > > > > > > > >   public static final SortProjectTransposeRule
> > > > SORT_PROJECT_TRANSPOSE =
> > > > > > > > >       SortProjectTransposeRule.INSTANCE.config
> > > > > > > > >           .withOperandFor(Sort.class, Project.class,
> > > > DruidQuery.class)
> > > > > > > > >           .toRule();
> > > > > > > > >
> > > > > > > > > The change maintains backwards compatibility to a large
> degree.
> > > > In a
> > > > > > > > > few places, I had to change rule instances from type
> RelOptRule
> > > > to
> > > > > > > > > Supplier<RelOptRule>, to avoid deadlocks during class
> loading.
> > > > For
> > > > > > > > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN
> you
> > > > must
> > > > > > > > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > > > > > > > >
> > > > > > > > > Julian
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <
> jhyde@apache.org>
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > I have now pushed a dev branch with a prototype. Please
> see
> > > > > > > > > > https://issues.apache.org/jira/browse/CALCITE-3923 for
> > > > details.
> > > > > > > > > >
> > > > > > > > > > Having built the prototype, I believe that this change is
> > > > beneficial
> > > > > > > > > > and we should do it. But I would like to get to
> consensus on
> > > > the
> > > > > > > > > > design before we pull the trigger.
> > > > > > > > > >
> > > > > > > > > > Julian
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <
> jhyde@apache.org>
> > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Haisheng,
> > > > > > > > > > >
> > > > > > > > > > > I hear you. I agree that major changes to rules will
> require
> > > > new rule
> > > > > > > > > > > classes (not merely sub-classes). People should
> copy-paste,
> > > > refactor,
> > > > > > > > > > > and all that good stuff. But I think there are a lot of
> > > > cases where
> > > > > > > > we
> > > > > > > > > > > need to make minor changes to rules (there are many of
> these
> > > > in the
> > > > > > > > > > > code base already), and this change will help.
> > > > > > > > > > >
> > > > > > > > > > > I have logged
> > > > https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > > > > > > > > am going to start working on a prototype. When we have
> a
> > > > prototype we
> > > > > > > > > > > will be able to assess how big an impact the API
> change will
> > > > have.
> > > > > > > > > > > (E.g. whether it will be a breaking change.)
> > > > > > > > > > >
> > > > > > > > > > > Julian
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> > > > > > > > h.yuan@alibaba-inc.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I don't think it is worth the refactoring. People
> who want
> > > > to
> > > > > > > > customize the rule, in most cases, won't be satisfied by a
> > > > different
> > > > > > > > parameter, they most likely still need to rewrite (copy &
> paste)
> > > > the rule
> > > > > > > > with some slightly their own logic. For many Calcite users,
> the
> > > > rule is not
> > > > > > > > reusable even with flexible configurations.
> > > > > > > > > > > >
> > > > > > > > > > > > - Haisheng
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > ------------------------------------------------------------------
> > > > > > > > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > > > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > > > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > > > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are
> > > > parameterized
> > > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > > Apologies for the late reply but I just realised
> that I had
> > > > > > > > written the
> > > > > > > > > > > > mail and never pressed the send button.
> > > > > > > > > > > >
> > > > > > > > > > > > I think it is a nice idea and certainly a problem
> worth
> > > > > > > > addressing. If I
> > > > > > > > > > > > understood well you're thinking something like the
> current
> > > > > > > > constructor of
> > > > > > > > > > > > the RelBuilder [1] that accepts a Context parameter.
> > > > Indeed it
> > > > > > > > seems that
> > > > > > > > > > > > with this change even rules that are not designed to
> be
> > > > configured
> > > > > > > > can be
> > > > > > > > > > > > changed much more gracefully (without adding new
> > > > constructors and
> > > > > > > > breaking
> > > > > > > > > > > > changes).
> > > > > > > > > > > >
> > > > > > > > > > > > On the other hand, some of the advantages that you
> mention
> > > > can
> > > > > > > > also be
> > > > > > > > > > > > turned into disadvantages. For instance, copying a
> rule
> > > > without
> > > > > > > > knowing the
> > > > > > > > > > > > values of the other parameters is a bit risky and
> might be
> > > > harder
> > > > > > > > to reason
> > > > > > > > > > > > about its correctness. Moreover, private
> constructors,
> > > > final
> > > > > > > > classes, etc.,
> > > > > > > > > > > > are primarily used for encapsulation purposes so
> allowing
> > > > the
> > > > > > > > state of the
> > > > > > > > > > > > rule escape somehow breaks the original design of
> the rule.
> > > > > > > > > > > >
> > > > > > > > > > > > Another problem with respect to rules is cross
> convention
> > > > matching
> > > > > > > > and
> > > > > > > > > > > > transformations [2]. Many rules should not fire for
> > > > operands that
> > > > > > > > are in
> > > > > > > > > > > > different conventions; a typical example that comes
> in my
> > > > mind is
> > > > > > > > > > > > FilterProjectTransposeRule [3]. In the same spirit
> most
> > > > rules
> > > > > > > > should not
> > > > > > > > > > > > generate mixed convention transformations. Although a
> > > > different
> > > > > > > > problem, I
> > > > > > > > > > > > am mentioning it here since it could affect the
> design of
> > > > the new
> > > > > > > > API.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stamatis
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > >
> > > >
> https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > > > > > > > >
> > > > > > > > > > > > [2]
> > > > > > > > > > > >
> > > > > > > >
> > > >
> https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > > > > > > > [3]
> > > > > > > > > > > >
> > > > > > > >
> > > >
> https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <
> > > > mmior@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > This sounds reasonable to me. It also sounds like
> we
> > > > could make
> > > > > > > > this
> > > > > > > > > > > > > backwards compatible by retaining (but
> deprecating) the
> > > > existing
> > > > > > > > > > > > > constructors and factory methods that will no
> longer be
> > > > needed.
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Michael Mior
> > > > > > > > > > > > > mmior@apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <
> > > > jhyde@apache.org> a
> > > > > > > > écrit :
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have an idea for a refactoring to RelOptRule. I
> > > > haven’t
> > > > > > > > fully thought
> > > > > > > > > > > > > it through, but I’m going to sketch it out here to
> see
> > > > whether
> > > > > > > > folks agree
> > > > > > > > > > > > > about the problems/solutions.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It will be a breaking change (in the sense that
> people
> > > > will
> > > > > > > > have to
> > > > > > > > > > > > > change their code in order to get it to compile)
> but
> > > > relatively
> > > > > > > > safe (in
> > > > > > > > > > > > > that once the code compiles, it will have the same
> > > > behavior as
> > > > > > > > before).
> > > > > > > > > > > > > Also it will give Calcite developers and users a
> lot more
> > > > > > > > flexibility going
> > > > > > > > > > > > > forward.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The problem I see is that people often want
> different
> > > > variants
> > > > > > > > of
> > > > > > > > > > > > > planner rules. An example is FilterJoinRule, which
> has a
> > > > > > > > 'boolean smart’
> > > > > > > > > > > > > parameter, a predicate (which returns whether to
> pull up
> > > > filter
> > > > > > > > > > > > > conditions), operands (which determine the precise
> > > > sub-classes
> > > > > > > > of RelNode
> > > > > > > > > > > > > that the rule should match) and a relBuilderFactory
> > > > (which
> > > > > > > > controls the
> > > > > > > > > > > > > type of RelNode created by this rule).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Suppose you have an instance of FilterJoinRule
> and you
> > > > want to
> > > > > > > > change
> > > > > > > > > > > > > ‘smart’ from true to false. The ‘smart’ parameter
> is
> > > > immutable
> > > > > > > > (good!) but
> > > > > > > > > > > > > you can’t easily create a clone of the rule
> because you
> > > > don’t
> > > > > > > > know the
> > > > > > > > > > > > > values of the other parameters. Your instance
> might even
> > > > be
> > > > > > > > (unbeknownst to
> > > > > > > > > > > > > you) a sub-class with extra parameters and a
> private
> > > > constructor.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So, my proposal is to put all of the config
> > > > information of a
> > > > > > > > RelOptRule
> > > > > > > > > > > > > into a single ‘config’ parameter that contains all
> > > > relevant
> > > > > > > > properties.
> > > > > > > > > > > > > Each sub-class of RelOptRule would have one
> constructor
> > > > with
> > > > > > > > just a
> > > > > > > > > > > > > ‘config’ parameter. Each config knows which
> sub-class of
> > > > > > > > RelOptRule to
> > > > > > > > > > > > > create. Therefore it is easy to copy a config,
> change
> > > > one or more
> > > > > > > > > > > > > properties, and create a new rule instance.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Adding a property to a rule’s config does not
> require
> > > > us to
> > > > > > > > add or
> > > > > > > > > > > > > deprecate any constructors.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The operands are part of the config, so if you
> have a
> > > > rule
> > > > > > > > that matches
> > > > > > > > > > > > > a EnumerableFilter on an EnumerableJoin and you
> want to
> > > > make it
> > > > > > > > match an
> > > > > > > > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin,
> you can
> > > > easily
> > > > > > > > create one
> > > > > > > > > > > > > with one changed operand.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The config is immutable and self-describing, so
> we can
> > > > use it
> > > > > > > > to
> > > > > > > > > > > > > automatically generate a unique description for
> each rule
> > > > > > > > instance.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Julian
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > >
> > > >
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > > > > > <
> > > > > > > > > > > > >
> > > > > > > >
> > > >
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
>

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Julian Hyde <jh...@apache.org>.
PS If we take that approach, let's release 1.24 soon. CALCITE-3923 is
a big PR and bit-rot will set in if it lingers too long.

On Wed, Jul 1, 2020 at 3:23 PM Julian Hyde <jh...@apache.org> wrote:
>
> Stamatis,
>
> Do you mean we should put into the next release (1.24) just a few
> lines '@Deprecated // to be removed before 1.25; use xxx instead'
> tagging the breaking changes, and then make the breaking changes in
> master immediately after 1.24?
>
> I would support that approach. We could deal with
> https://issues.apache.org/jira/browse/CALCITE-4079 at the same time.
>
> For the record, there won't be many breaking changes in
> https://issues.apache.org/jira/browse/CALCITE-3923. (Just a few
> constants that we have to remove because of class-loading issues.) But
> there will be a lot of things deprecated because they have been moved
> to a "better" place.
>
> On Sun, Jun 28, 2020 at 4:08 PM Stamatis Zampetakis <za...@gmail.com> wrote:
> >
> > Indeed nice explanation Julian, thanks!
> >
> > Regarding the transition one thing that might help is to try to put the new
> > model in a small release without other breaking changes.
> > We could get 1.24 out without the refactoring and soon afterwards release
> > 1.25.
> >
> > Given that the community has been rather silent on this I guess that people
> > don't really mind if it goes as is in the next release so I may be over
> > complicating things for no reason.
> > In the end, we have also committed features with bigger impact with no
> > special treatment.
> >
> > Best,
> > Stamatis
> >
> >
> > On Fri, Jun 19, 2020 at 11:37 PM Haisheng Yuan <hy...@apache.org> wrote:
> >
> > > Thank you for the detailed explanation, Julian.
> > >
> > > I will step aside, let you and other community members decide.
> > > Anyway, my comment is not blocking.
> > >
> > > Thanks,
> > > Haisheng
> > >
> > > On 2020/06/17 23:12:40, Julian Hyde <jh...@apache.org> wrote:
> > > > "I prefer the KISS principle" is a bit unfair. What I'm advocating is
> > > > also simple. But maybe from a different perspective.
> > > >
> > > > I am looking at it from the perspective of people who use rules, and
> > > > compose them into rule sets to perform particular optimizations. I
> > > > believe this is the most important perspective to focus on. These
> > > > people are the key audience to serve. The corpus of re-usable,
> > > > composable rules may be Calcite's most important contribution.
> > > >
> > > > To these people, a rule is not a class but an object. It has a
> > > > particular signature (in terms of the pattern of RelNodes that it
> > > > matches), maybe one or two configuration parameters (e.g.
> > > > FilterJoinRule.smart controls whether to automatically strengthen a
> > > > LEFT join to INNER), and it has code to generate a new RelNode when
> > > > matched.
> > > >
> > > > What do I mean by 'object'? Think of how you use regular expressions
> > > > in Java. You call java.util.regex.Pattern.compile("a[bc]*") and it
> > > > gives you a Pattern object. You do not know what fields are inside
> > > > that Pattern, you do not care whether the object you receive is a
> > > > Pattern or some sub-class of Pattern, but you know there are one or
> > > > two methods such as 'matcher(CharSequence)' that you can call.
> > > >
> > > > Our current API treats planner rules as classes. If you want to
> > > > customize a property of a rule (say make a FilterJoinRule with
> > > > smart=false, or change its RelBuilder, or make it match a
> > > > CassandraFilter rather than a LogicalFilter) then you have to make a
> > > > new rule instance by calling the constructor.
> > > >
> > > > When you call the constructor, you have to supply ALL of the
> > > > properties, including operands, and including properties that you
> > > > don't know or care about. If someone in three months adds a new
> > > > property, the signature of the constructor will change, and you will
> > > > have to go back and change your code. This is broken.
> > > >
> > > > Haisheng asked for evidence that the current system is broken.
> > > > https://issues.apache.org/jira/browse/CALCITE-3825 ("Split
> > > > AbstractMaterializedViewRule into multiple classes") is one example.
> > > > In my company, that change literally broke our code because we had
> > > > changed properties of some rules.
> > > >
> > > > Another example is https://issues.apache.org/jira/browse/CALCITE-3975
> > > > ("Add options to ProjectFilterTransposeRule to push down project and
> > > > filter expressions whole, not just field references"). With that
> > > > change, I broke my own code. I added two new arguments, 'boolean
> > > > wholeProject, boolean wholeFilter' to ProjectFilterTransposeRule's
> > > > constructor, deprecated the previous constructor, and created quite a
> > > > few conflicts in another dev branch that I was working on.
> > > >
> > > > These kinds of problems are not uncommon. Because it is difficult to
> > > > reconfigure rules, or evolve them by adding extra arguments, people
> > > > are more likely to copy-paste rule code into their own projects, and
> > > > less like to contribute back.
> > > >
> > > > In most rules, operands are created evaluating a complex expression -
> > > > made up of calls to static methods such as RelOptRule.operand - in a
> > > > constructor as an argument to the base class constructor. This is a
> > > > code smell. Such code is hard to move elsewhere.
> > > >
> > > > We treat operands as immutable, but actually they are mutable. Every
> > > > operand contains mutable fields 'solveOrder', 'ordinalInParent' and
> > > > 'ordinalInRule' that are assigned in the RelOptRule constructor. We
> > > > don't prevent you from passing the same operand to two different rule
> > > > instances, but if you did, bad things would happen. Actually I think
> > > > people would love to be able to say 'create a rule instance the same
> > > > as this, but replacing LogicalFilter with CassandraFilter' but we
> > > > don't make it easy, and copying operands from one rule to another is
> > > > certainly the wrong way to achieve it.
> > > >
> > > > Given all of these problems, the solution was to convert rules into
> > > > data objects. Those data objects are the new Config interface and its
> > > > sub-interfaces. (I used an interface because the ImmutableBeans
> > > > framework makes it easy to create immutable data objects without
> > > > generating code or writing a lot of boilerplate code by hand.)
> > > >
> > > > These Config interfaces mean one extra class per rule, and about 5
> > > > extra lines of source code. If you measure simplicity in lines of
> > > > code, then I suppose they make things a little less simple.
> > > >
> > > > The new way of building operands, using a builder that makes
> > > > call-backs for each nested operand, and has a method for each
> > > > attribute of an operand (e.g. predicate()) is also 'less simple' if
> > > > you count lines of code. But it is more elegant in that it can
> > > > (potentially) produce genuinely immutable operands, is strongly typed,
> > > > and offers helpful code-completion when you use it in an IDE.
> > > >
> > > > I hope you now understand the problems I am trying to solve, and why
> > > > this new approach is better. That said, this change isn't perfect. We
> > > > can evolve this change to make the code more concise, or easier to
> > > > understand.
> > > >
> > > > Let's discuss some possible improvements.
> > > >
> > > > One possible improvement would be to stop using rule instances
> > > > (RelOptRule) and instead use rule config instances
> > > > (RelOptRule.Config). The planner could easily call Config.toRule()
> > > > when building a program. The sub-classes of RelOptRule would be
> > > > largely invisible (private sub-classes of the Config classes), and
> > > > that is appropriate, because all they provide is the implementation of
> > > > the onMatch() method. There would be a one-time impact because a lot
> > > > of types would change from RelOptRule to RelOptRule.Config.
> > > >
> > > > Another improvement would be to move around the rule constants.
> > > > FilterCorrelateRule.INSTANCE and AggregateFilterTransposeRule.INSTANCE
> > > > would become the fields FILTER_CORRELATE and
> > > > AGGREGATE_FILTER_TRANSPOSE in a new class, LogicalRules. People would
> > > > no longer need to reference the names of rule classes in their code,
> > > > just the instances. And of course they can now easily customize those
> > > > instances.
> > > >
> > > > I hope I have convinced you that rules need to become 'data objects'
> > > > with a small amount of behavior, and that people need to be able to
> > > > customize them without calling their constructor or even knowing their
> > > > precise class.
> > > >
> > > > I would love to hear suggestions for how we can make the transition to
> > > > this new model as smooth as possible.
> > > >
> > > > Julian
> > > >
> > > > On Sun, Jun 14, 2020 at 9:15 PM Haisheng Yuan <hy...@apache.org> wrote:
> > > > >
> > > > > Druid adapter rules are not used by Druid. As far as I know, only Hive
> > > uses these rules, I even think they should be moved to Hive project. :) If
> > > any other projects other than Hive are using them, please let us know.
> > > Basically, these Druid rules are not generally applicable. The PR changed
> > > many built-in rules to specifically accommodate Druid adapter rules,
> > > especially with third operand that can match any operator, even
> > > EnumerableTableScan is legit.
> > > > >
> > > > > Most converter rules are not designed to be flexible, no one would
> > > extend them. RelOptRule#operand() methods are deprecated, which implies
> > > someday, they will be removed and migrated to the new Config. But most down
> > > stream projects don't need to worry about the their rules' extensibility
> > > and compatibility, because they can modify their own rules freely, anytime.
> > > > >
> > > > > At the end of the day, we may find that only a small fraction of
> > > transformation rules need refactoring, e.g. ProjectFilterTransposeRule.
> > > Note that even FilterProjectTransposeRule doesn't need refactoring, we can
> > > safely remove copyFilter and copyProject, it is safe to always copy them,
> > > do we really want to match physical operators and generate logical
> > > alternatives?
> > > > >
> > > > > Last but the most important, rule operands, IMO, should be out of
> > > Config's reach. How frequent do we need to change rule operands or its
> > > matching class? IMO, operands of rule constructor should remain unchanged,
> > > RelFactory and Rule description can also remain the same, or adapted by
> > > Config silently without changing the rule itself, if there are no other
> > > additional parameters. Other than that, everything can be put into Config.
> > > Therefore I don't think RelOptNewRule is needed, because RelOptRule should
> > > be able to integrate with Config seamlessly, without breaking anything.
> > > > >
> > > > > All in all, I prefer the KISS principle, keep it simple, stupid.
> > > > >
> > > > > So my opinion is +0.5.
> > > > >
> > > > > Thanks,
> > > > > Haisheng
> > > > >
> > > > > On 2020/06/14 23:36:21, Stamatis Zampetakis <za...@gmail.com> wrote:
> > > > > > Hello,
> > > > > >
> > > > > > Thanks for putting this together Julian.
> > > > > >
> > > > > > I went quickly over the PR just to grasp better the idea and here
> > > are some
> > > > > > first thoughts.
> > > > > >
> > > > > > I really like the fact that rules creation is now much more uniform
> > > than
> > > > > > before.
> > > > > > I also like the fact that in some cases subclassing can be avoided
> > > thanks
> > > > > > to the flexible configuration. As a fan of the Liskov
> > > > > > substitution principle I find it positive to be able to avoid
> > > classes such
> > > > > > as DruidProjectFilterTransposeRule although to be honest I don't
> > > know why
> > > > > > these Druid rules impose those additional restrictions.
> > > > > >
> > > > > > On the other hand, I also feel that the new approach is a bit harder
> > > to
> > > > > > follow than the previous one. The fact that we have more extension
> > > points
> > > > > > gives more flexibility but at the same time complicates the
> > > implementation
> > > > > > a bit. I guess regular committers will not have much trouble to
> > > adapt to
> > > > > > the changes but for newcomers there may be more questions. For
> > > instance:
> > > > > > What do we do when we want to customize the behavior of an existing
> > > rule?
> > > > > > * Use an existing config with different parameters.
> > > > > > * Extend the config.
> > > > > > * Extend the rule.
> > > > > > * Extend the config and the rule.
> > > > > > * Create a new rule (if yes under which interface RelOptRule or
> > > > > > RelOptNewRule).
> > > > > >
> > > > > > As Haisheng mentioned, the fact that every rule can be configured
> > > with any
> > > > > > RelOptRuleOperand is also a point possibly deserving more discussion.
> > > > > > Ideally, the API should be able to guide developers to pass the
> > > correct
> > > > > > configuration parameters and not fail at runtime.
> > > > > >
> > > > > > Overall, I have mixed feelings about the proposed refactoring
> > > (definitely
> > > > > > not something blocking), I guess cause I haven't seen as many
> > > use-cases as
> > > > > > Julian.
> > > > > > I'm curious to see what others think about the changes. It's a pity
> > > to take
> > > > > > a decision based on the feedback of only two/three people.
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jun 14, 2020 at 9:36 PM Haisheng Yuan <hy...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Hi Julian,
> > > > > > >
> > > > > > > Thanks for working on this.
> > > > > > >
> > > > > > > We haven't reached a consensus yet.
> > > > > > >
> > > > > > > Frankly speaking, I agree with what Stamatis said earlier.
> > > Flexibility
> > > > > > > doesn't come with no cost. Admittedly, with this patch, any rule
> > > > > > > constructor refactoring won't need to deprecate old constructors
> > > and break
> > > > > > > backward compatibility, however, it also makes the rule definition
> > > much
> > > > > > > more verbose, less readable and understandable. IMHO, it does more
> > > harm
> > > > > > > than good.
> > > > > > >
> > > > > > > Let's see how CassandraFilterRule becomes before and after the
> > > change.
> > > > > > >
> > > > > > > Before this change:
> > > > > > >
> > > > > > >   private static class CassandraFilterRule extends RelOptRule {
> > > > > > >     private static final CassandraFilterRule INSTANCE = new
> > > > > > > CassandraFilterRule();
> > > > > > >
> > > > > > >     private CassandraFilterRule() {
> > > > > > >       super(operand(LogicalFilter.class,
> > > operand(CassandraTableScan.class,
> > > > > > > none())),
> > > > > > >           "CassandraFilterRule");
> > > > > > >     }
> > > > > > >   }
> > > > > > >
> > > > > > > After this change:
> > > > > > >
> > > > > > >   private static class CassandraFilterRule
> > > > > > >       extends RelOptNewRule<CassandraFilterRule.Config> {
> > > > > > >     private static final CassandraFilterRule INSTANCE =
> > > > > > >         Config.EMPTY
> > > > > > >             .withOperandSupplier(b0 ->
> > > > > > >                 b0.operand(LogicalFilter.class)
> > > > > > >                     .oneInput(b1 ->
> > > b1.operand(CassandraTableScan.class)
> > > > > > >                         .noInputs()))
> > > > > > >             .as(Config.class)
> > > > > > >             .toRule();
> > > > > > >
> > > > > > >     /** Creates a CassandraFilterRule. */
> > > > > > >     protected CassandraFilterRule(Config config) {
> > > > > > >       super(config);
> > > > > > >     }
> > > > > > >
> > > > > > >     /** Rule configuration. */
> > > > > > >     public interface Config extends RelOptNewRule.Config {
> > > > > > >       @Override default CassandraFilterRule toRule() {
> > > > > > >         return new CassandraFilterRule(this);
> > > > > > >       }
> > > > > > >     }
> > > > > > >   }
> > > > > > >
> > > > > > >
> > > > > > > Intuitively, if we want to check what does the rule generally
> > > match or how
> > > > > > > it is defined, we just check the constructor. Now we checkout the
> > > > > > > constructor, only config is there, go to Config, there is still
> > > nothing
> > > > > > > interesting, we have to go to the INSTANCE. What is the difference
> > > with
> > > > > > > just using operand and optionConfig as the rule constructor?
> > > > > > >
> > > > > > >    protected CassandraFilterRule(RelOptRuleOperand operand, Config
> > > config)
> > > > > > > {
> > > > > > >      super(operand, config);
> > > > > > >    }
> > > > > > >
> > > > > > > Or even simply replace Config with int, with each bit represent an
> > > option,
> > > > > > > if all of them are boolean options.
> > > > > > >
> > > > > > > Nothing is more flexible than just using RelOptRuleOperand as the
> > > > > > > parameter, just like the base class RelOptRule does. But do we
> > > want it?
> > > > > > >
> > > > > > > At the same time, with the new approach, it is now legit to create
> > > the
> > > > > > > following instance:
> > > > > > >
> > > > > > >   private static final CassandraFilterRule INSTANCE2 =
> > > > > > >         Config.EMPTY
> > > > > > >             .withOperandSupplier(b0 ->
> > > > > > >                 b0.operand(LogicalProject.class)  // Even the is
> > > intended
> > > > > > > to match Filter, but it compiles
> > > > > > >                     .oneInput(b1 ->
> > > b1.operand(CassandraTableScan.class)
> > > > > > >                         .noInputs()))
> > > > > > >             .as(Config.class)
> > > > > > >             .toRule();
> > > > > > >
> > > > > > > Before we continue to the discussion and code review, we need to
> > > go back
> > > > > > > to the original problem, is it a real problem that is facing us?
> > > Is there
> > > > > > > any real demand or just artificial demand? How about we conduct a
> > > Calcite
> > > > > > > user survey to see how Calcite devs and users think? Then let's
> > > see how to
> > > > > > > move forward.
> > > > > > >
> > > > > > > [+1] Hell yeah, the new approach is awesome, let's go with it.
> > > > > > > [+0] Meh, I am OK with current approach, I don't see any burden or
> > > problem
> > > > > > > with it.
> > > > > > > [-1] Hell no, current approach is bad, the new one is even worse.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Haisheng
> > > > > > >
> > > > > > >
> > > > > > > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > > > > > > > There is now a PR: https://github.com/apache/calcite/pull/2024.
> > > Can
> > > > > > > > people please review?
> > > > > > > >
> > > > > > > > Here's the TL;DR:
> > > > > > > >
> > > > > > > > Previously, it was not easy to customize, re-use or extend
> > > planner
> > > > > > > > rules. If you wished to customize a rule (i.e. create a new
> > > instance
> > > > > > > > of a rule with different properties) you would have to call the
> > > rule's
> > > > > > > > constructor. Frequently the required constructor did not exist,
> > > so we
> > > > > > > > would have to add a new constructor and deprecate the old one.
> > > > > > > >
> > > > > > > > After this change, you start off from an instance of the rule,
> > > modify
> > > > > > > > its configuration, and call toRule() on the configuration. (Rule
> > > > > > > > constructors are now private, because only the configuration ever
> > > > > > > > calls them.)
> > > > > > > >
> > > > > > > > A good illustration of this is DruidRules, which used to contain
> > > many
> > > > > > > > sub-classes. Those sub-classes are no longer needed. Old code:
> > > > > > > >
> > > > > > > >   public static final DruidSortProjectTransposeRule
> > > > > > > SORT_PROJECT_TRANSPOSE =
> > > > > > > >       new
> > > DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > > > > > > >
> > > > > > > >     public static class DruidSortProjectTransposeRule
> > > > > > > >         extends SortProjectTransposeRule {
> > > > > > > >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > > > > > > > relBuilderFactory) {
> > > > > > > >         super(
> > > > > > > >             operand(Sort.class,
> > > > > > > >                 operand(Project.class, operand(DruidQuery.class,
> > > > > > > none()))),
> > > > > > > >             relBuilderFactory, null);
> > > > > > > >       }
> > > > > > > >     }
> > > > > > > >
> > > > > > > > New code:
> > > > > > > >
> > > > > > > >   public static final SortProjectTransposeRule
> > > SORT_PROJECT_TRANSPOSE =
> > > > > > > >       SortProjectTransposeRule.INSTANCE.config
> > > > > > > >           .withOperandFor(Sort.class, Project.class,
> > > DruidQuery.class)
> > > > > > > >           .toRule();
> > > > > > > >
> > > > > > > > The change maintains backwards compatibility to a large degree.
> > > In a
> > > > > > > > few places, I had to change rule instances from type RelOptRule
> > > to
> > > > > > > > Supplier<RelOptRule>, to avoid deadlocks during class loading.
> > > For
> > > > > > > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you
> > > must
> > > > > > > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > > > > > > >
> > > > > > > > Julian
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org>
> > > wrote:
> > > > > > > > >
> > > > > > > > > I have now pushed a dev branch with a prototype. Please see
> > > > > > > > > https://issues.apache.org/jira/browse/CALCITE-3923 for
> > > details.
> > > > > > > > >
> > > > > > > > > Having built the prototype, I believe that this change is
> > > beneficial
> > > > > > > > > and we should do it. But I would like to get to consensus on
> > > the
> > > > > > > > > design before we pull the trigger.
> > > > > > > > >
> > > > > > > > > Julian
> > > > > > > > >
> > > > > > > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org>
> > > wrote:
> > > > > > > > > >
> > > > > > > > > > Haisheng,
> > > > > > > > > >
> > > > > > > > > > I hear you. I agree that major changes to rules will require
> > > new rule
> > > > > > > > > > classes (not merely sub-classes). People should copy-paste,
> > > refactor,
> > > > > > > > > > and all that good stuff. But I think there are a lot of
> > > cases where
> > > > > > > we
> > > > > > > > > > need to make minor changes to rules (there are many of these
> > > in the
> > > > > > > > > > code base already), and this change will help.
> > > > > > > > > >
> > > > > > > > > > I have logged
> > > https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > > > > > > > am going to start working on a prototype. When we have a
> > > prototype we
> > > > > > > > > > will be able to assess how big an impact the API change will
> > > have.
> > > > > > > > > > (E.g. whether it will be a breaking change.)
> > > > > > > > > >
> > > > > > > > > > Julian
> > > > > > > > > >
> > > > > > > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> > > > > > > h.yuan@alibaba-inc.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > I don't think it is worth the refactoring. People who want
> > > to
> > > > > > > customize the rule, in most cases, won't be satisfied by a
> > > different
> > > > > > > parameter, they most likely still need to rewrite (copy & paste)
> > > the rule
> > > > > > > with some slightly their own logic. For many Calcite users, the
> > > rule is not
> > > > > > > reusable even with flexible configurations.
> > > > > > > > > > >
> > > > > > > > > > > - Haisheng
> > > > > > > > > > >
> > > > > > > > > > >
> > > ------------------------------------------------------------------
> > > > > > > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are
> > > parameterized
> > > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > > Apologies for the late reply but I just realised that I had
> > > > > > > written the
> > > > > > > > > > > mail and never pressed the send button.
> > > > > > > > > > >
> > > > > > > > > > > I think it is a nice idea and certainly a problem worth
> > > > > > > addressing. If I
> > > > > > > > > > > understood well you're thinking something like the current
> > > > > > > constructor of
> > > > > > > > > > > the RelBuilder [1] that accepts a Context parameter.
> > > Indeed it
> > > > > > > seems that
> > > > > > > > > > > with this change even rules that are not designed to be
> > > configured
> > > > > > > can be
> > > > > > > > > > > changed much more gracefully (without adding new
> > > constructors and
> > > > > > > breaking
> > > > > > > > > > > changes).
> > > > > > > > > > >
> > > > > > > > > > > On the other hand, some of the advantages that you mention
> > > can
> > > > > > > also be
> > > > > > > > > > > turned into disadvantages. For instance, copying a rule
> > > without
> > > > > > > knowing the
> > > > > > > > > > > values of the other parameters is a bit risky and might be
> > > harder
> > > > > > > to reason
> > > > > > > > > > > about its correctness. Moreover, private constructors,
> > > final
> > > > > > > classes, etc.,
> > > > > > > > > > > are primarily used for encapsulation purposes so allowing
> > > the
> > > > > > > state of the
> > > > > > > > > > > rule escape somehow breaks the original design of the rule.
> > > > > > > > > > >
> > > > > > > > > > > Another problem with respect to rules is cross convention
> > > matching
> > > > > > > and
> > > > > > > > > > > transformations [2]. Many rules should not fire for
> > > operands that
> > > > > > > are in
> > > > > > > > > > > different conventions; a typical example that comes in my
> > > mind is
> > > > > > > > > > > FilterProjectTransposeRule [3]. In the same spirit most
> > > rules
> > > > > > > should not
> > > > > > > > > > > generate mixed convention transformations. Although a
> > > different
> > > > > > > problem, I
> > > > > > > > > > > am mentioning it here since it could affect the design of
> > > the new
> > > > > > > API.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stamatis
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > >
> > > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > > > > > > >
> > > > > > > > > > > [2]
> > > > > > > > > > >
> > > > > > >
> > > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > > > > > > [3]
> > > > > > > > > > >
> > > > > > >
> > > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <
> > > mmior@apache.org>
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > This sounds reasonable to me. It also sounds like we
> > > could make
> > > > > > > this
> > > > > > > > > > > > backwards compatible by retaining (but deprecating) the
> > > existing
> > > > > > > > > > > > constructors and factory methods that will no longer be
> > > needed.
> > > > > > > > > > > > --
> > > > > > > > > > > > Michael Mior
> > > > > > > > > > > > mmior@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <
> > > jhyde@apache.org> a
> > > > > > > écrit :
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have an idea for a refactoring to RelOptRule. I
> > > haven’t
> > > > > > > fully thought
> > > > > > > > > > > > it through, but I’m going to sketch it out here to see
> > > whether
> > > > > > > folks agree
> > > > > > > > > > > > about the problems/solutions.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It will be a breaking change (in the sense that people
> > > will
> > > > > > > have to
> > > > > > > > > > > > change their code in order to get it to compile) but
> > > relatively
> > > > > > > safe (in
> > > > > > > > > > > > that once the code compiles, it will have the same
> > > behavior as
> > > > > > > before).
> > > > > > > > > > > > Also it will give Calcite developers and users a lot more
> > > > > > > flexibility going
> > > > > > > > > > > > forward.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The problem I see is that people often want different
> > > variants
> > > > > > > of
> > > > > > > > > > > > planner rules. An example is FilterJoinRule, which has a
> > > > > > > 'boolean smart’
> > > > > > > > > > > > parameter, a predicate (which returns whether to pull up
> > > filter
> > > > > > > > > > > > conditions), operands (which determine the precise
> > > sub-classes
> > > > > > > of RelNode
> > > > > > > > > > > > that the rule should match) and a relBuilderFactory
> > > (which
> > > > > > > controls the
> > > > > > > > > > > > type of RelNode created by this rule).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Suppose you have an instance of FilterJoinRule and you
> > > want to
> > > > > > > change
> > > > > > > > > > > > ‘smart’ from true to false. The ‘smart’ parameter is
> > > immutable
> > > > > > > (good!) but
> > > > > > > > > > > > you can’t easily create a clone of the rule because you
> > > don’t
> > > > > > > know the
> > > > > > > > > > > > values of the other parameters. Your instance might even
> > > be
> > > > > > > (unbeknownst to
> > > > > > > > > > > > you) a sub-class with extra parameters and a private
> > > constructor.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So, my proposal is to put all of the config
> > > information of a
> > > > > > > RelOptRule
> > > > > > > > > > > > into a single ‘config’ parameter that contains all
> > > relevant
> > > > > > > properties.
> > > > > > > > > > > > Each sub-class of RelOptRule would have one constructor
> > > with
> > > > > > > just a
> > > > > > > > > > > > ‘config’ parameter. Each config knows which sub-class of
> > > > > > > RelOptRule to
> > > > > > > > > > > > create. Therefore it is easy to copy a config, change
> > > one or more
> > > > > > > > > > > > properties, and create a new rule instance.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Adding a property to a rule’s config does not require
> > > us to
> > > > > > > add or
> > > > > > > > > > > > deprecate any constructors.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The operands are part of the config, so if you have a
> > > rule
> > > > > > > that matches
> > > > > > > > > > > > a EnumerableFilter on an EnumerableJoin and you want to
> > > make it
> > > > > > > match an
> > > > > > > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can
> > > easily
> > > > > > > create one
> > > > > > > > > > > > with one changed operand.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The config is immutable and self-describing, so we can
> > > use it
> > > > > > > to
> > > > > > > > > > > > automatically generate a unique description for each rule
> > > > > > > instance.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Julian
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > >
> > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > > > > <
> > > > > > > > > > > >
> > > > > > >
> > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Julian Hyde <jh...@apache.org>.
Stamatis,

Do you mean we should put into the next release (1.24) just a few
lines '@Deprecated // to be removed before 1.25; use xxx instead'
tagging the breaking changes, and then make the breaking changes in
master immediately after 1.24?

I would support that approach. We could deal with
https://issues.apache.org/jira/browse/CALCITE-4079 at the same time.

For the record, there won't be many breaking changes in
https://issues.apache.org/jira/browse/CALCITE-3923. (Just a few
constants that we have to remove because of class-loading issues.) But
there will be a lot of things deprecated because they have been moved
to a "better" place.

On Sun, Jun 28, 2020 at 4:08 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>
> Indeed nice explanation Julian, thanks!
>
> Regarding the transition one thing that might help is to try to put the new
> model in a small release without other breaking changes.
> We could get 1.24 out without the refactoring and soon afterwards release
> 1.25.
>
> Given that the community has been rather silent on this I guess that people
> don't really mind if it goes as is in the next release so I may be over
> complicating things for no reason.
> In the end, we have also committed features with bigger impact with no
> special treatment.
>
> Best,
> Stamatis
>
>
> On Fri, Jun 19, 2020 at 11:37 PM Haisheng Yuan <hy...@apache.org> wrote:
>
> > Thank you for the detailed explanation, Julian.
> >
> > I will step aside, let you and other community members decide.
> > Anyway, my comment is not blocking.
> >
> > Thanks,
> > Haisheng
> >
> > On 2020/06/17 23:12:40, Julian Hyde <jh...@apache.org> wrote:
> > > "I prefer the KISS principle" is a bit unfair. What I'm advocating is
> > > also simple. But maybe from a different perspective.
> > >
> > > I am looking at it from the perspective of people who use rules, and
> > > compose them into rule sets to perform particular optimizations. I
> > > believe this is the most important perspective to focus on. These
> > > people are the key audience to serve. The corpus of re-usable,
> > > composable rules may be Calcite's most important contribution.
> > >
> > > To these people, a rule is not a class but an object. It has a
> > > particular signature (in terms of the pattern of RelNodes that it
> > > matches), maybe one or two configuration parameters (e.g.
> > > FilterJoinRule.smart controls whether to automatically strengthen a
> > > LEFT join to INNER), and it has code to generate a new RelNode when
> > > matched.
> > >
> > > What do I mean by 'object'? Think of how you use regular expressions
> > > in Java. You call java.util.regex.Pattern.compile("a[bc]*") and it
> > > gives you a Pattern object. You do not know what fields are inside
> > > that Pattern, you do not care whether the object you receive is a
> > > Pattern or some sub-class of Pattern, but you know there are one or
> > > two methods such as 'matcher(CharSequence)' that you can call.
> > >
> > > Our current API treats planner rules as classes. If you want to
> > > customize a property of a rule (say make a FilterJoinRule with
> > > smart=false, or change its RelBuilder, or make it match a
> > > CassandraFilter rather than a LogicalFilter) then you have to make a
> > > new rule instance by calling the constructor.
> > >
> > > When you call the constructor, you have to supply ALL of the
> > > properties, including operands, and including properties that you
> > > don't know or care about. If someone in three months adds a new
> > > property, the signature of the constructor will change, and you will
> > > have to go back and change your code. This is broken.
> > >
> > > Haisheng asked for evidence that the current system is broken.
> > > https://issues.apache.org/jira/browse/CALCITE-3825 ("Split
> > > AbstractMaterializedViewRule into multiple classes") is one example.
> > > In my company, that change literally broke our code because we had
> > > changed properties of some rules.
> > >
> > > Another example is https://issues.apache.org/jira/browse/CALCITE-3975
> > > ("Add options to ProjectFilterTransposeRule to push down project and
> > > filter expressions whole, not just field references"). With that
> > > change, I broke my own code. I added two new arguments, 'boolean
> > > wholeProject, boolean wholeFilter' to ProjectFilterTransposeRule's
> > > constructor, deprecated the previous constructor, and created quite a
> > > few conflicts in another dev branch that I was working on.
> > >
> > > These kinds of problems are not uncommon. Because it is difficult to
> > > reconfigure rules, or evolve them by adding extra arguments, people
> > > are more likely to copy-paste rule code into their own projects, and
> > > less like to contribute back.
> > >
> > > In most rules, operands are created evaluating a complex expression -
> > > made up of calls to static methods such as RelOptRule.operand - in a
> > > constructor as an argument to the base class constructor. This is a
> > > code smell. Such code is hard to move elsewhere.
> > >
> > > We treat operands as immutable, but actually they are mutable. Every
> > > operand contains mutable fields 'solveOrder', 'ordinalInParent' and
> > > 'ordinalInRule' that are assigned in the RelOptRule constructor. We
> > > don't prevent you from passing the same operand to two different rule
> > > instances, but if you did, bad things would happen. Actually I think
> > > people would love to be able to say 'create a rule instance the same
> > > as this, but replacing LogicalFilter with CassandraFilter' but we
> > > don't make it easy, and copying operands from one rule to another is
> > > certainly the wrong way to achieve it.
> > >
> > > Given all of these problems, the solution was to convert rules into
> > > data objects. Those data objects are the new Config interface and its
> > > sub-interfaces. (I used an interface because the ImmutableBeans
> > > framework makes it easy to create immutable data objects without
> > > generating code or writing a lot of boilerplate code by hand.)
> > >
> > > These Config interfaces mean one extra class per rule, and about 5
> > > extra lines of source code. If you measure simplicity in lines of
> > > code, then I suppose they make things a little less simple.
> > >
> > > The new way of building operands, using a builder that makes
> > > call-backs for each nested operand, and has a method for each
> > > attribute of an operand (e.g. predicate()) is also 'less simple' if
> > > you count lines of code. But it is more elegant in that it can
> > > (potentially) produce genuinely immutable operands, is strongly typed,
> > > and offers helpful code-completion when you use it in an IDE.
> > >
> > > I hope you now understand the problems I am trying to solve, and why
> > > this new approach is better. That said, this change isn't perfect. We
> > > can evolve this change to make the code more concise, or easier to
> > > understand.
> > >
> > > Let's discuss some possible improvements.
> > >
> > > One possible improvement would be to stop using rule instances
> > > (RelOptRule) and instead use rule config instances
> > > (RelOptRule.Config). The planner could easily call Config.toRule()
> > > when building a program. The sub-classes of RelOptRule would be
> > > largely invisible (private sub-classes of the Config classes), and
> > > that is appropriate, because all they provide is the implementation of
> > > the onMatch() method. There would be a one-time impact because a lot
> > > of types would change from RelOptRule to RelOptRule.Config.
> > >
> > > Another improvement would be to move around the rule constants.
> > > FilterCorrelateRule.INSTANCE and AggregateFilterTransposeRule.INSTANCE
> > > would become the fields FILTER_CORRELATE and
> > > AGGREGATE_FILTER_TRANSPOSE in a new class, LogicalRules. People would
> > > no longer need to reference the names of rule classes in their code,
> > > just the instances. And of course they can now easily customize those
> > > instances.
> > >
> > > I hope I have convinced you that rules need to become 'data objects'
> > > with a small amount of behavior, and that people need to be able to
> > > customize them without calling their constructor or even knowing their
> > > precise class.
> > >
> > > I would love to hear suggestions for how we can make the transition to
> > > this new model as smooth as possible.
> > >
> > > Julian
> > >
> > > On Sun, Jun 14, 2020 at 9:15 PM Haisheng Yuan <hy...@apache.org> wrote:
> > > >
> > > > Druid adapter rules are not used by Druid. As far as I know, only Hive
> > uses these rules, I even think they should be moved to Hive project. :) If
> > any other projects other than Hive are using them, please let us know.
> > Basically, these Druid rules are not generally applicable. The PR changed
> > many built-in rules to specifically accommodate Druid adapter rules,
> > especially with third operand that can match any operator, even
> > EnumerableTableScan is legit.
> > > >
> > > > Most converter rules are not designed to be flexible, no one would
> > extend them. RelOptRule#operand() methods are deprecated, which implies
> > someday, they will be removed and migrated to the new Config. But most down
> > stream projects don't need to worry about the their rules' extensibility
> > and compatibility, because they can modify their own rules freely, anytime.
> > > >
> > > > At the end of the day, we may find that only a small fraction of
> > transformation rules need refactoring, e.g. ProjectFilterTransposeRule.
> > Note that even FilterProjectTransposeRule doesn't need refactoring, we can
> > safely remove copyFilter and copyProject, it is safe to always copy them,
> > do we really want to match physical operators and generate logical
> > alternatives?
> > > >
> > > > Last but the most important, rule operands, IMO, should be out of
> > Config's reach. How frequent do we need to change rule operands or its
> > matching class? IMO, operands of rule constructor should remain unchanged,
> > RelFactory and Rule description can also remain the same, or adapted by
> > Config silently without changing the rule itself, if there are no other
> > additional parameters. Other than that, everything can be put into Config.
> > Therefore I don't think RelOptNewRule is needed, because RelOptRule should
> > be able to integrate with Config seamlessly, without breaking anything.
> > > >
> > > > All in all, I prefer the KISS principle, keep it simple, stupid.
> > > >
> > > > So my opinion is +0.5.
> > > >
> > > > Thanks,
> > > > Haisheng
> > > >
> > > > On 2020/06/14 23:36:21, Stamatis Zampetakis <za...@gmail.com> wrote:
> > > > > Hello,
> > > > >
> > > > > Thanks for putting this together Julian.
> > > > >
> > > > > I went quickly over the PR just to grasp better the idea and here
> > are some
> > > > > first thoughts.
> > > > >
> > > > > I really like the fact that rules creation is now much more uniform
> > than
> > > > > before.
> > > > > I also like the fact that in some cases subclassing can be avoided
> > thanks
> > > > > to the flexible configuration. As a fan of the Liskov
> > > > > substitution principle I find it positive to be able to avoid
> > classes such
> > > > > as DruidProjectFilterTransposeRule although to be honest I don't
> > know why
> > > > > these Druid rules impose those additional restrictions.
> > > > >
> > > > > On the other hand, I also feel that the new approach is a bit harder
> > to
> > > > > follow than the previous one. The fact that we have more extension
> > points
> > > > > gives more flexibility but at the same time complicates the
> > implementation
> > > > > a bit. I guess regular committers will not have much trouble to
> > adapt to
> > > > > the changes but for newcomers there may be more questions. For
> > instance:
> > > > > What do we do when we want to customize the behavior of an existing
> > rule?
> > > > > * Use an existing config with different parameters.
> > > > > * Extend the config.
> > > > > * Extend the rule.
> > > > > * Extend the config and the rule.
> > > > > * Create a new rule (if yes under which interface RelOptRule or
> > > > > RelOptNewRule).
> > > > >
> > > > > As Haisheng mentioned, the fact that every rule can be configured
> > with any
> > > > > RelOptRuleOperand is also a point possibly deserving more discussion.
> > > > > Ideally, the API should be able to guide developers to pass the
> > correct
> > > > > configuration parameters and not fail at runtime.
> > > > >
> > > > > Overall, I have mixed feelings about the proposed refactoring
> > (definitely
> > > > > not something blocking), I guess cause I haven't seen as many
> > use-cases as
> > > > > Julian.
> > > > > I'm curious to see what others think about the changes. It's a pity
> > to take
> > > > > a decision based on the feedback of only two/three people.
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jun 14, 2020 at 9:36 PM Haisheng Yuan <hy...@apache.org>
> > wrote:
> > > > >
> > > > > > Hi Julian,
> > > > > >
> > > > > > Thanks for working on this.
> > > > > >
> > > > > > We haven't reached a consensus yet.
> > > > > >
> > > > > > Frankly speaking, I agree with what Stamatis said earlier.
> > Flexibility
> > > > > > doesn't come with no cost. Admittedly, with this patch, any rule
> > > > > > constructor refactoring won't need to deprecate old constructors
> > and break
> > > > > > backward compatibility, however, it also makes the rule definition
> > much
> > > > > > more verbose, less readable and understandable. IMHO, it does more
> > harm
> > > > > > than good.
> > > > > >
> > > > > > Let's see how CassandraFilterRule becomes before and after the
> > change.
> > > > > >
> > > > > > Before this change:
> > > > > >
> > > > > >   private static class CassandraFilterRule extends RelOptRule {
> > > > > >     private static final CassandraFilterRule INSTANCE = new
> > > > > > CassandraFilterRule();
> > > > > >
> > > > > >     private CassandraFilterRule() {
> > > > > >       super(operand(LogicalFilter.class,
> > operand(CassandraTableScan.class,
> > > > > > none())),
> > > > > >           "CassandraFilterRule");
> > > > > >     }
> > > > > >   }
> > > > > >
> > > > > > After this change:
> > > > > >
> > > > > >   private static class CassandraFilterRule
> > > > > >       extends RelOptNewRule<CassandraFilterRule.Config> {
> > > > > >     private static final CassandraFilterRule INSTANCE =
> > > > > >         Config.EMPTY
> > > > > >             .withOperandSupplier(b0 ->
> > > > > >                 b0.operand(LogicalFilter.class)
> > > > > >                     .oneInput(b1 ->
> > b1.operand(CassandraTableScan.class)
> > > > > >                         .noInputs()))
> > > > > >             .as(Config.class)
> > > > > >             .toRule();
> > > > > >
> > > > > >     /** Creates a CassandraFilterRule. */
> > > > > >     protected CassandraFilterRule(Config config) {
> > > > > >       super(config);
> > > > > >     }
> > > > > >
> > > > > >     /** Rule configuration. */
> > > > > >     public interface Config extends RelOptNewRule.Config {
> > > > > >       @Override default CassandraFilterRule toRule() {
> > > > > >         return new CassandraFilterRule(this);
> > > > > >       }
> > > > > >     }
> > > > > >   }
> > > > > >
> > > > > >
> > > > > > Intuitively, if we want to check what does the rule generally
> > match or how
> > > > > > it is defined, we just check the constructor. Now we checkout the
> > > > > > constructor, only config is there, go to Config, there is still
> > nothing
> > > > > > interesting, we have to go to the INSTANCE. What is the difference
> > with
> > > > > > just using operand and optionConfig as the rule constructor?
> > > > > >
> > > > > >    protected CassandraFilterRule(RelOptRuleOperand operand, Config
> > config)
> > > > > > {
> > > > > >      super(operand, config);
> > > > > >    }
> > > > > >
> > > > > > Or even simply replace Config with int, with each bit represent an
> > option,
> > > > > > if all of them are boolean options.
> > > > > >
> > > > > > Nothing is more flexible than just using RelOptRuleOperand as the
> > > > > > parameter, just like the base class RelOptRule does. But do we
> > want it?
> > > > > >
> > > > > > At the same time, with the new approach, it is now legit to create
> > the
> > > > > > following instance:
> > > > > >
> > > > > >   private static final CassandraFilterRule INSTANCE2 =
> > > > > >         Config.EMPTY
> > > > > >             .withOperandSupplier(b0 ->
> > > > > >                 b0.operand(LogicalProject.class)  // Even the is
> > intended
> > > > > > to match Filter, but it compiles
> > > > > >                     .oneInput(b1 ->
> > b1.operand(CassandraTableScan.class)
> > > > > >                         .noInputs()))
> > > > > >             .as(Config.class)
> > > > > >             .toRule();
> > > > > >
> > > > > > Before we continue to the discussion and code review, we need to
> > go back
> > > > > > to the original problem, is it a real problem that is facing us?
> > Is there
> > > > > > any real demand or just artificial demand? How about we conduct a
> > Calcite
> > > > > > user survey to see how Calcite devs and users think? Then let's
> > see how to
> > > > > > move forward.
> > > > > >
> > > > > > [+1] Hell yeah, the new approach is awesome, let's go with it.
> > > > > > [+0] Meh, I am OK with current approach, I don't see any burden or
> > problem
> > > > > > with it.
> > > > > > [-1] Hell no, current approach is bad, the new one is even worse.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Haisheng
> > > > > >
> > > > > >
> > > > > > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > > > > > > There is now a PR: https://github.com/apache/calcite/pull/2024.
> > Can
> > > > > > > people please review?
> > > > > > >
> > > > > > > Here's the TL;DR:
> > > > > > >
> > > > > > > Previously, it was not easy to customize, re-use or extend
> > planner
> > > > > > > rules. If you wished to customize a rule (i.e. create a new
> > instance
> > > > > > > of a rule with different properties) you would have to call the
> > rule's
> > > > > > > constructor. Frequently the required constructor did not exist,
> > so we
> > > > > > > would have to add a new constructor and deprecate the old one.
> > > > > > >
> > > > > > > After this change, you start off from an instance of the rule,
> > modify
> > > > > > > its configuration, and call toRule() on the configuration. (Rule
> > > > > > > constructors are now private, because only the configuration ever
> > > > > > > calls them.)
> > > > > > >
> > > > > > > A good illustration of this is DruidRules, which used to contain
> > many
> > > > > > > sub-classes. Those sub-classes are no longer needed. Old code:
> > > > > > >
> > > > > > >   public static final DruidSortProjectTransposeRule
> > > > > > SORT_PROJECT_TRANSPOSE =
> > > > > > >       new
> > DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > > > > > >
> > > > > > >     public static class DruidSortProjectTransposeRule
> > > > > > >         extends SortProjectTransposeRule {
> > > > > > >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > > > > > > relBuilderFactory) {
> > > > > > >         super(
> > > > > > >             operand(Sort.class,
> > > > > > >                 operand(Project.class, operand(DruidQuery.class,
> > > > > > none()))),
> > > > > > >             relBuilderFactory, null);
> > > > > > >       }
> > > > > > >     }
> > > > > > >
> > > > > > > New code:
> > > > > > >
> > > > > > >   public static final SortProjectTransposeRule
> > SORT_PROJECT_TRANSPOSE =
> > > > > > >       SortProjectTransposeRule.INSTANCE.config
> > > > > > >           .withOperandFor(Sort.class, Project.class,
> > DruidQuery.class)
> > > > > > >           .toRule();
> > > > > > >
> > > > > > > The change maintains backwards compatibility to a large degree.
> > In a
> > > > > > > few places, I had to change rule instances from type RelOptRule
> > to
> > > > > > > Supplier<RelOptRule>, to avoid deadlocks during class loading.
> > For
> > > > > > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you
> > must
> > > > > > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > > > > > >
> > > > > > > Julian
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org>
> > wrote:
> > > > > > > >
> > > > > > > > I have now pushed a dev branch with a prototype. Please see
> > > > > > > > https://issues.apache.org/jira/browse/CALCITE-3923 for
> > details.
> > > > > > > >
> > > > > > > > Having built the prototype, I believe that this change is
> > beneficial
> > > > > > > > and we should do it. But I would like to get to consensus on
> > the
> > > > > > > > design before we pull the trigger.
> > > > > > > >
> > > > > > > > Julian
> > > > > > > >
> > > > > > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org>
> > wrote:
> > > > > > > > >
> > > > > > > > > Haisheng,
> > > > > > > > >
> > > > > > > > > I hear you. I agree that major changes to rules will require
> > new rule
> > > > > > > > > classes (not merely sub-classes). People should copy-paste,
> > refactor,
> > > > > > > > > and all that good stuff. But I think there are a lot of
> > cases where
> > > > > > we
> > > > > > > > > need to make minor changes to rules (there are many of these
> > in the
> > > > > > > > > code base already), and this change will help.
> > > > > > > > >
> > > > > > > > > I have logged
> > https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > > > > > > am going to start working on a prototype. When we have a
> > prototype we
> > > > > > > > > will be able to assess how big an impact the API change will
> > have.
> > > > > > > > > (E.g. whether it will be a breaking change.)
> > > > > > > > >
> > > > > > > > > Julian
> > > > > > > > >
> > > > > > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> > > > > > h.yuan@alibaba-inc.com> wrote:
> > > > > > > > > >
> > > > > > > > > > I don't think it is worth the refactoring. People who want
> > to
> > > > > > customize the rule, in most cases, won't be satisfied by a
> > different
> > > > > > parameter, they most likely still need to rewrite (copy & paste)
> > the rule
> > > > > > with some slightly their own logic. For many Calcite users, the
> > rule is not
> > > > > > reusable even with flexible configurations.
> > > > > > > > > >
> > > > > > > > > > - Haisheng
> > > > > > > > > >
> > > > > > > > > >
> > ------------------------------------------------------------------
> > > > > > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are
> > parameterized
> > > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > Apologies for the late reply but I just realised that I had
> > > > > > written the
> > > > > > > > > > mail and never pressed the send button.
> > > > > > > > > >
> > > > > > > > > > I think it is a nice idea and certainly a problem worth
> > > > > > addressing. If I
> > > > > > > > > > understood well you're thinking something like the current
> > > > > > constructor of
> > > > > > > > > > the RelBuilder [1] that accepts a Context parameter.
> > Indeed it
> > > > > > seems that
> > > > > > > > > > with this change even rules that are not designed to be
> > configured
> > > > > > can be
> > > > > > > > > > changed much more gracefully (without adding new
> > constructors and
> > > > > > breaking
> > > > > > > > > > changes).
> > > > > > > > > >
> > > > > > > > > > On the other hand, some of the advantages that you mention
> > can
> > > > > > also be
> > > > > > > > > > turned into disadvantages. For instance, copying a rule
> > without
> > > > > > knowing the
> > > > > > > > > > values of the other parameters is a bit risky and might be
> > harder
> > > > > > to reason
> > > > > > > > > > about its correctness. Moreover, private constructors,
> > final
> > > > > > classes, etc.,
> > > > > > > > > > are primarily used for encapsulation purposes so allowing
> > the
> > > > > > state of the
> > > > > > > > > > rule escape somehow breaks the original design of the rule.
> > > > > > > > > >
> > > > > > > > > > Another problem with respect to rules is cross convention
> > matching
> > > > > > and
> > > > > > > > > > transformations [2]. Many rules should not fire for
> > operands that
> > > > > > are in
> > > > > > > > > > different conventions; a typical example that comes in my
> > mind is
> > > > > > > > > > FilterProjectTransposeRule [3]. In the same spirit most
> > rules
> > > > > > should not
> > > > > > > > > > generate mixed convention transformations. Although a
> > different
> > > > > > problem, I
> > > > > > > > > > am mentioning it here since it could affect the design of
> > the new
> > > > > > API.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stamatis
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > >
> > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > > > > > >
> > > > > > > > > > [2]
> > > > > > > > > >
> > > > > >
> > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > > > > > [3]
> > > > > > > > > >
> > > > > >
> > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > > > > > >
> > > > > > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <
> > mmior@apache.org>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > This sounds reasonable to me. It also sounds like we
> > could make
> > > > > > this
> > > > > > > > > > > backwards compatible by retaining (but deprecating) the
> > existing
> > > > > > > > > > > constructors and factory methods that will no longer be
> > needed.
> > > > > > > > > > > --
> > > > > > > > > > > Michael Mior
> > > > > > > > > > > mmior@apache.org
> > > > > > > > > > >
> > > > > > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <
> > jhyde@apache.org> a
> > > > > > écrit :
> > > > > > > > > > > >
> > > > > > > > > > > > I have an idea for a refactoring to RelOptRule. I
> > haven’t
> > > > > > fully thought
> > > > > > > > > > > it through, but I’m going to sketch it out here to see
> > whether
> > > > > > folks agree
> > > > > > > > > > > about the problems/solutions.
> > > > > > > > > > > >
> > > > > > > > > > > > It will be a breaking change (in the sense that people
> > will
> > > > > > have to
> > > > > > > > > > > change their code in order to get it to compile) but
> > relatively
> > > > > > safe (in
> > > > > > > > > > > that once the code compiles, it will have the same
> > behavior as
> > > > > > before).
> > > > > > > > > > > Also it will give Calcite developers and users a lot more
> > > > > > flexibility going
> > > > > > > > > > > forward.
> > > > > > > > > > > >
> > > > > > > > > > > > The problem I see is that people often want different
> > variants
> > > > > > of
> > > > > > > > > > > planner rules. An example is FilterJoinRule, which has a
> > > > > > 'boolean smart’
> > > > > > > > > > > parameter, a predicate (which returns whether to pull up
> > filter
> > > > > > > > > > > conditions), operands (which determine the precise
> > sub-classes
> > > > > > of RelNode
> > > > > > > > > > > that the rule should match) and a relBuilderFactory
> > (which
> > > > > > controls the
> > > > > > > > > > > type of RelNode created by this rule).
> > > > > > > > > > > >
> > > > > > > > > > > > Suppose you have an instance of FilterJoinRule and you
> > want to
> > > > > > change
> > > > > > > > > > > ‘smart’ from true to false. The ‘smart’ parameter is
> > immutable
> > > > > > (good!) but
> > > > > > > > > > > you can’t easily create a clone of the rule because you
> > don’t
> > > > > > know the
> > > > > > > > > > > values of the other parameters. Your instance might even
> > be
> > > > > > (unbeknownst to
> > > > > > > > > > > you) a sub-class with extra parameters and a private
> > constructor.
> > > > > > > > > > > >
> > > > > > > > > > > > So, my proposal is to put all of the config
> > information of a
> > > > > > RelOptRule
> > > > > > > > > > > into a single ‘config’ parameter that contains all
> > relevant
> > > > > > properties.
> > > > > > > > > > > Each sub-class of RelOptRule would have one constructor
> > with
> > > > > > just a
> > > > > > > > > > > ‘config’ parameter. Each config knows which sub-class of
> > > > > > RelOptRule to
> > > > > > > > > > > create. Therefore it is easy to copy a config, change
> > one or more
> > > > > > > > > > > properties, and create a new rule instance.
> > > > > > > > > > > >
> > > > > > > > > > > > Adding a property to a rule’s config does not require
> > us to
> > > > > > add or
> > > > > > > > > > > deprecate any constructors.
> > > > > > > > > > > >
> > > > > > > > > > > > The operands are part of the config, so if you have a
> > rule
> > > > > > that matches
> > > > > > > > > > > a EnumerableFilter on an EnumerableJoin and you want to
> > make it
> > > > > > match an
> > > > > > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can
> > easily
> > > > > > create one
> > > > > > > > > > > with one changed operand.
> > > > > > > > > > > >
> > > > > > > > > > > > The config is immutable and self-describing, so we can
> > use it
> > > > > > to
> > > > > > > > > > > automatically generate a unique description for each rule
> > > > > > instance.
> > > > > > > > > > > >
> > > > > > > > > > > > Julian
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > >
> > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > > > <
> > > > > > > > > > >
> > > > > >
> > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Stamatis Zampetakis <za...@gmail.com>.
Indeed nice explanation Julian, thanks!

Regarding the transition one thing that might help is to try to put the new
model in a small release without other breaking changes.
We could get 1.24 out without the refactoring and soon afterwards release
1.25.

Given that the community has been rather silent on this I guess that people
don't really mind if it goes as is in the next release so I may be over
complicating things for no reason.
In the end, we have also committed features with bigger impact with no
special treatment.

Best,
Stamatis


On Fri, Jun 19, 2020 at 11:37 PM Haisheng Yuan <hy...@apache.org> wrote:

> Thank you for the detailed explanation, Julian.
>
> I will step aside, let you and other community members decide.
> Anyway, my comment is not blocking.
>
> Thanks,
> Haisheng
>
> On 2020/06/17 23:12:40, Julian Hyde <jh...@apache.org> wrote:
> > "I prefer the KISS principle" is a bit unfair. What I'm advocating is
> > also simple. But maybe from a different perspective.
> >
> > I am looking at it from the perspective of people who use rules, and
> > compose them into rule sets to perform particular optimizations. I
> > believe this is the most important perspective to focus on. These
> > people are the key audience to serve. The corpus of re-usable,
> > composable rules may be Calcite's most important contribution.
> >
> > To these people, a rule is not a class but an object. It has a
> > particular signature (in terms of the pattern of RelNodes that it
> > matches), maybe one or two configuration parameters (e.g.
> > FilterJoinRule.smart controls whether to automatically strengthen a
> > LEFT join to INNER), and it has code to generate a new RelNode when
> > matched.
> >
> > What do I mean by 'object'? Think of how you use regular expressions
> > in Java. You call java.util.regex.Pattern.compile("a[bc]*") and it
> > gives you a Pattern object. You do not know what fields are inside
> > that Pattern, you do not care whether the object you receive is a
> > Pattern or some sub-class of Pattern, but you know there are one or
> > two methods such as 'matcher(CharSequence)' that you can call.
> >
> > Our current API treats planner rules as classes. If you want to
> > customize a property of a rule (say make a FilterJoinRule with
> > smart=false, or change its RelBuilder, or make it match a
> > CassandraFilter rather than a LogicalFilter) then you have to make a
> > new rule instance by calling the constructor.
> >
> > When you call the constructor, you have to supply ALL of the
> > properties, including operands, and including properties that you
> > don't know or care about. If someone in three months adds a new
> > property, the signature of the constructor will change, and you will
> > have to go back and change your code. This is broken.
> >
> > Haisheng asked for evidence that the current system is broken.
> > https://issues.apache.org/jira/browse/CALCITE-3825 ("Split
> > AbstractMaterializedViewRule into multiple classes") is one example.
> > In my company, that change literally broke our code because we had
> > changed properties of some rules.
> >
> > Another example is https://issues.apache.org/jira/browse/CALCITE-3975
> > ("Add options to ProjectFilterTransposeRule to push down project and
> > filter expressions whole, not just field references"). With that
> > change, I broke my own code. I added two new arguments, 'boolean
> > wholeProject, boolean wholeFilter' to ProjectFilterTransposeRule's
> > constructor, deprecated the previous constructor, and created quite a
> > few conflicts in another dev branch that I was working on.
> >
> > These kinds of problems are not uncommon. Because it is difficult to
> > reconfigure rules, or evolve them by adding extra arguments, people
> > are more likely to copy-paste rule code into their own projects, and
> > less like to contribute back.
> >
> > In most rules, operands are created evaluating a complex expression -
> > made up of calls to static methods such as RelOptRule.operand - in a
> > constructor as an argument to the base class constructor. This is a
> > code smell. Such code is hard to move elsewhere.
> >
> > We treat operands as immutable, but actually they are mutable. Every
> > operand contains mutable fields 'solveOrder', 'ordinalInParent' and
> > 'ordinalInRule' that are assigned in the RelOptRule constructor. We
> > don't prevent you from passing the same operand to two different rule
> > instances, but if you did, bad things would happen. Actually I think
> > people would love to be able to say 'create a rule instance the same
> > as this, but replacing LogicalFilter with CassandraFilter' but we
> > don't make it easy, and copying operands from one rule to another is
> > certainly the wrong way to achieve it.
> >
> > Given all of these problems, the solution was to convert rules into
> > data objects. Those data objects are the new Config interface and its
> > sub-interfaces. (I used an interface because the ImmutableBeans
> > framework makes it easy to create immutable data objects without
> > generating code or writing a lot of boilerplate code by hand.)
> >
> > These Config interfaces mean one extra class per rule, and about 5
> > extra lines of source code. If you measure simplicity in lines of
> > code, then I suppose they make things a little less simple.
> >
> > The new way of building operands, using a builder that makes
> > call-backs for each nested operand, and has a method for each
> > attribute of an operand (e.g. predicate()) is also 'less simple' if
> > you count lines of code. But it is more elegant in that it can
> > (potentially) produce genuinely immutable operands, is strongly typed,
> > and offers helpful code-completion when you use it in an IDE.
> >
> > I hope you now understand the problems I am trying to solve, and why
> > this new approach is better. That said, this change isn't perfect. We
> > can evolve this change to make the code more concise, or easier to
> > understand.
> >
> > Let's discuss some possible improvements.
> >
> > One possible improvement would be to stop using rule instances
> > (RelOptRule) and instead use rule config instances
> > (RelOptRule.Config). The planner could easily call Config.toRule()
> > when building a program. The sub-classes of RelOptRule would be
> > largely invisible (private sub-classes of the Config classes), and
> > that is appropriate, because all they provide is the implementation of
> > the onMatch() method. There would be a one-time impact because a lot
> > of types would change from RelOptRule to RelOptRule.Config.
> >
> > Another improvement would be to move around the rule constants.
> > FilterCorrelateRule.INSTANCE and AggregateFilterTransposeRule.INSTANCE
> > would become the fields FILTER_CORRELATE and
> > AGGREGATE_FILTER_TRANSPOSE in a new class, LogicalRules. People would
> > no longer need to reference the names of rule classes in their code,
> > just the instances. And of course they can now easily customize those
> > instances.
> >
> > I hope I have convinced you that rules need to become 'data objects'
> > with a small amount of behavior, and that people need to be able to
> > customize them without calling their constructor or even knowing their
> > precise class.
> >
> > I would love to hear suggestions for how we can make the transition to
> > this new model as smooth as possible.
> >
> > Julian
> >
> > On Sun, Jun 14, 2020 at 9:15 PM Haisheng Yuan <hy...@apache.org> wrote:
> > >
> > > Druid adapter rules are not used by Druid. As far as I know, only Hive
> uses these rules, I even think they should be moved to Hive project. :) If
> any other projects other than Hive are using them, please let us know.
> Basically, these Druid rules are not generally applicable. The PR changed
> many built-in rules to specifically accommodate Druid adapter rules,
> especially with third operand that can match any operator, even
> EnumerableTableScan is legit.
> > >
> > > Most converter rules are not designed to be flexible, no one would
> extend them. RelOptRule#operand() methods are deprecated, which implies
> someday, they will be removed and migrated to the new Config. But most down
> stream projects don't need to worry about the their rules' extensibility
> and compatibility, because they can modify their own rules freely, anytime.
> > >
> > > At the end of the day, we may find that only a small fraction of
> transformation rules need refactoring, e.g. ProjectFilterTransposeRule.
> Note that even FilterProjectTransposeRule doesn't need refactoring, we can
> safely remove copyFilter and copyProject, it is safe to always copy them,
> do we really want to match physical operators and generate logical
> alternatives?
> > >
> > > Last but the most important, rule operands, IMO, should be out of
> Config's reach. How frequent do we need to change rule operands or its
> matching class? IMO, operands of rule constructor should remain unchanged,
> RelFactory and Rule description can also remain the same, or adapted by
> Config silently without changing the rule itself, if there are no other
> additional parameters. Other than that, everything can be put into Config.
> Therefore I don't think RelOptNewRule is needed, because RelOptRule should
> be able to integrate with Config seamlessly, without breaking anything.
> > >
> > > All in all, I prefer the KISS principle, keep it simple, stupid.
> > >
> > > So my opinion is +0.5.
> > >
> > > Thanks,
> > > Haisheng
> > >
> > > On 2020/06/14 23:36:21, Stamatis Zampetakis <za...@gmail.com> wrote:
> > > > Hello,
> > > >
> > > > Thanks for putting this together Julian.
> > > >
> > > > I went quickly over the PR just to grasp better the idea and here
> are some
> > > > first thoughts.
> > > >
> > > > I really like the fact that rules creation is now much more uniform
> than
> > > > before.
> > > > I also like the fact that in some cases subclassing can be avoided
> thanks
> > > > to the flexible configuration. As a fan of the Liskov
> > > > substitution principle I find it positive to be able to avoid
> classes such
> > > > as DruidProjectFilterTransposeRule although to be honest I don't
> know why
> > > > these Druid rules impose those additional restrictions.
> > > >
> > > > On the other hand, I also feel that the new approach is a bit harder
> to
> > > > follow than the previous one. The fact that we have more extension
> points
> > > > gives more flexibility but at the same time complicates the
> implementation
> > > > a bit. I guess regular committers will not have much trouble to
> adapt to
> > > > the changes but for newcomers there may be more questions. For
> instance:
> > > > What do we do when we want to customize the behavior of an existing
> rule?
> > > > * Use an existing config with different parameters.
> > > > * Extend the config.
> > > > * Extend the rule.
> > > > * Extend the config and the rule.
> > > > * Create a new rule (if yes under which interface RelOptRule or
> > > > RelOptNewRule).
> > > >
> > > > As Haisheng mentioned, the fact that every rule can be configured
> with any
> > > > RelOptRuleOperand is also a point possibly deserving more discussion.
> > > > Ideally, the API should be able to guide developers to pass the
> correct
> > > > configuration parameters and not fail at runtime.
> > > >
> > > > Overall, I have mixed feelings about the proposed refactoring
> (definitely
> > > > not something blocking), I guess cause I haven't seen as many
> use-cases as
> > > > Julian.
> > > > I'm curious to see what others think about the changes. It's a pity
> to take
> > > > a decision based on the feedback of only two/three people.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jun 14, 2020 at 9:36 PM Haisheng Yuan <hy...@apache.org>
> wrote:
> > > >
> > > > > Hi Julian,
> > > > >
> > > > > Thanks for working on this.
> > > > >
> > > > > We haven't reached a consensus yet.
> > > > >
> > > > > Frankly speaking, I agree with what Stamatis said earlier.
> Flexibility
> > > > > doesn't come with no cost. Admittedly, with this patch, any rule
> > > > > constructor refactoring won't need to deprecate old constructors
> and break
> > > > > backward compatibility, however, it also makes the rule definition
> much
> > > > > more verbose, less readable and understandable. IMHO, it does more
> harm
> > > > > than good.
> > > > >
> > > > > Let's see how CassandraFilterRule becomes before and after the
> change.
> > > > >
> > > > > Before this change:
> > > > >
> > > > >   private static class CassandraFilterRule extends RelOptRule {
> > > > >     private static final CassandraFilterRule INSTANCE = new
> > > > > CassandraFilterRule();
> > > > >
> > > > >     private CassandraFilterRule() {
> > > > >       super(operand(LogicalFilter.class,
> operand(CassandraTableScan.class,
> > > > > none())),
> > > > >           "CassandraFilterRule");
> > > > >     }
> > > > >   }
> > > > >
> > > > > After this change:
> > > > >
> > > > >   private static class CassandraFilterRule
> > > > >       extends RelOptNewRule<CassandraFilterRule.Config> {
> > > > >     private static final CassandraFilterRule INSTANCE =
> > > > >         Config.EMPTY
> > > > >             .withOperandSupplier(b0 ->
> > > > >                 b0.operand(LogicalFilter.class)
> > > > >                     .oneInput(b1 ->
> b1.operand(CassandraTableScan.class)
> > > > >                         .noInputs()))
> > > > >             .as(Config.class)
> > > > >             .toRule();
> > > > >
> > > > >     /** Creates a CassandraFilterRule. */
> > > > >     protected CassandraFilterRule(Config config) {
> > > > >       super(config);
> > > > >     }
> > > > >
> > > > >     /** Rule configuration. */
> > > > >     public interface Config extends RelOptNewRule.Config {
> > > > >       @Override default CassandraFilterRule toRule() {
> > > > >         return new CassandraFilterRule(this);
> > > > >       }
> > > > >     }
> > > > >   }
> > > > >
> > > > >
> > > > > Intuitively, if we want to check what does the rule generally
> match or how
> > > > > it is defined, we just check the constructor. Now we checkout the
> > > > > constructor, only config is there, go to Config, there is still
> nothing
> > > > > interesting, we have to go to the INSTANCE. What is the difference
> with
> > > > > just using operand and optionConfig as the rule constructor?
> > > > >
> > > > >    protected CassandraFilterRule(RelOptRuleOperand operand, Config
> config)
> > > > > {
> > > > >      super(operand, config);
> > > > >    }
> > > > >
> > > > > Or even simply replace Config with int, with each bit represent an
> option,
> > > > > if all of them are boolean options.
> > > > >
> > > > > Nothing is more flexible than just using RelOptRuleOperand as the
> > > > > parameter, just like the base class RelOptRule does. But do we
> want it?
> > > > >
> > > > > At the same time, with the new approach, it is now legit to create
> the
> > > > > following instance:
> > > > >
> > > > >   private static final CassandraFilterRule INSTANCE2 =
> > > > >         Config.EMPTY
> > > > >             .withOperandSupplier(b0 ->
> > > > >                 b0.operand(LogicalProject.class)  // Even the is
> intended
> > > > > to match Filter, but it compiles
> > > > >                     .oneInput(b1 ->
> b1.operand(CassandraTableScan.class)
> > > > >                         .noInputs()))
> > > > >             .as(Config.class)
> > > > >             .toRule();
> > > > >
> > > > > Before we continue to the discussion and code review, we need to
> go back
> > > > > to the original problem, is it a real problem that is facing us?
> Is there
> > > > > any real demand or just artificial demand? How about we conduct a
> Calcite
> > > > > user survey to see how Calcite devs and users think? Then let's
> see how to
> > > > > move forward.
> > > > >
> > > > > [+1] Hell yeah, the new approach is awesome, let's go with it.
> > > > > [+0] Meh, I am OK with current approach, I don't see any burden or
> problem
> > > > > with it.
> > > > > [-1] Hell no, current approach is bad, the new one is even worse.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Haisheng
> > > > >
> > > > >
> > > > > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > > > > > There is now a PR: https://github.com/apache/calcite/pull/2024.
> Can
> > > > > > people please review?
> > > > > >
> > > > > > Here's the TL;DR:
> > > > > >
> > > > > > Previously, it was not easy to customize, re-use or extend
> planner
> > > > > > rules. If you wished to customize a rule (i.e. create a new
> instance
> > > > > > of a rule with different properties) you would have to call the
> rule's
> > > > > > constructor. Frequently the required constructor did not exist,
> so we
> > > > > > would have to add a new constructor and deprecate the old one.
> > > > > >
> > > > > > After this change, you start off from an instance of the rule,
> modify
> > > > > > its configuration, and call toRule() on the configuration. (Rule
> > > > > > constructors are now private, because only the configuration ever
> > > > > > calls them.)
> > > > > >
> > > > > > A good illustration of this is DruidRules, which used to contain
> many
> > > > > > sub-classes. Those sub-classes are no longer needed. Old code:
> > > > > >
> > > > > >   public static final DruidSortProjectTransposeRule
> > > > > SORT_PROJECT_TRANSPOSE =
> > > > > >       new
> DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > > > > >
> > > > > >     public static class DruidSortProjectTransposeRule
> > > > > >         extends SortProjectTransposeRule {
> > > > > >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > > > > > relBuilderFactory) {
> > > > > >         super(
> > > > > >             operand(Sort.class,
> > > > > >                 operand(Project.class, operand(DruidQuery.class,
> > > > > none()))),
> > > > > >             relBuilderFactory, null);
> > > > > >       }
> > > > > >     }
> > > > > >
> > > > > > New code:
> > > > > >
> > > > > >   public static final SortProjectTransposeRule
> SORT_PROJECT_TRANSPOSE =
> > > > > >       SortProjectTransposeRule.INSTANCE.config
> > > > > >           .withOperandFor(Sort.class, Project.class,
> DruidQuery.class)
> > > > > >           .toRule();
> > > > > >
> > > > > > The change maintains backwards compatibility to a large degree.
> In a
> > > > > > few places, I had to change rule instances from type RelOptRule
> to
> > > > > > Supplier<RelOptRule>, to avoid deadlocks during class loading.
> For
> > > > > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you
> must
> > > > > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org>
> wrote:
> > > > > > >
> > > > > > > I have now pushed a dev branch with a prototype. Please see
> > > > > > > https://issues.apache.org/jira/browse/CALCITE-3923 for
> details.
> > > > > > >
> > > > > > > Having built the prototype, I believe that this change is
> beneficial
> > > > > > > and we should do it. But I would like to get to consensus on
> the
> > > > > > > design before we pull the trigger.
> > > > > > >
> > > > > > > Julian
> > > > > > >
> > > > > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org>
> wrote:
> > > > > > > >
> > > > > > > > Haisheng,
> > > > > > > >
> > > > > > > > I hear you. I agree that major changes to rules will require
> new rule
> > > > > > > > classes (not merely sub-classes). People should copy-paste,
> refactor,
> > > > > > > > and all that good stuff. But I think there are a lot of
> cases where
> > > > > we
> > > > > > > > need to make minor changes to rules (there are many of these
> in the
> > > > > > > > code base already), and this change will help.
> > > > > > > >
> > > > > > > > I have logged
> https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > > > > > am going to start working on a prototype. When we have a
> prototype we
> > > > > > > > will be able to assess how big an impact the API change will
> have.
> > > > > > > > (E.g. whether it will be a breaking change.)
> > > > > > > >
> > > > > > > > Julian
> > > > > > > >
> > > > > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> > > > > h.yuan@alibaba-inc.com> wrote:
> > > > > > > > >
> > > > > > > > > I don't think it is worth the refactoring. People who want
> to
> > > > > customize the rule, in most cases, won't be satisfied by a
> different
> > > > > parameter, they most likely still need to rewrite (copy & paste)
> the rule
> > > > > with some slightly their own logic. For many Calcite users, the
> rule is not
> > > > > reusable even with flexible configurations.
> > > > > > > > >
> > > > > > > > > - Haisheng
> > > > > > > > >
> > > > > > > > >
> ------------------------------------------------------------------
> > > > > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are
> parameterized
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > Apologies for the late reply but I just realised that I had
> > > > > written the
> > > > > > > > > mail and never pressed the send button.
> > > > > > > > >
> > > > > > > > > I think it is a nice idea and certainly a problem worth
> > > > > addressing. If I
> > > > > > > > > understood well you're thinking something like the current
> > > > > constructor of
> > > > > > > > > the RelBuilder [1] that accepts a Context parameter.
> Indeed it
> > > > > seems that
> > > > > > > > > with this change even rules that are not designed to be
> configured
> > > > > can be
> > > > > > > > > changed much more gracefully (without adding new
> constructors and
> > > > > breaking
> > > > > > > > > changes).
> > > > > > > > >
> > > > > > > > > On the other hand, some of the advantages that you mention
> can
> > > > > also be
> > > > > > > > > turned into disadvantages. For instance, copying a rule
> without
> > > > > knowing the
> > > > > > > > > values of the other parameters is a bit risky and might be
> harder
> > > > > to reason
> > > > > > > > > about its correctness. Moreover, private constructors,
> final
> > > > > classes, etc.,
> > > > > > > > > are primarily used for encapsulation purposes so allowing
> the
> > > > > state of the
> > > > > > > > > rule escape somehow breaks the original design of the rule.
> > > > > > > > >
> > > > > > > > > Another problem with respect to rules is cross convention
> matching
> > > > > and
> > > > > > > > > transformations [2]. Many rules should not fire for
> operands that
> > > > > are in
> > > > > > > > > different conventions; a typical example that comes in my
> mind is
> > > > > > > > > FilterProjectTransposeRule [3]. In the same spirit most
> rules
> > > > > should not
> > > > > > > > > generate mixed convention transformations. Although a
> different
> > > > > problem, I
> > > > > > > > > am mentioning it here since it could affect the design of
> the new
> > > > > API.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stamatis
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > >
> https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > > > > >
> > > > > > > > > [2]
> > > > > > > > >
> > > > >
> https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > > > > [3]
> > > > > > > > >
> > > > >
> https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > > > > >
> > > > > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <
> mmior@apache.org>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > This sounds reasonable to me. It also sounds like we
> could make
> > > > > this
> > > > > > > > > > backwards compatible by retaining (but deprecating) the
> existing
> > > > > > > > > > constructors and factory methods that will no longer be
> needed.
> > > > > > > > > > --
> > > > > > > > > > Michael Mior
> > > > > > > > > > mmior@apache.org
> > > > > > > > > >
> > > > > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <
> jhyde@apache.org> a
> > > > > écrit :
> > > > > > > > > > >
> > > > > > > > > > > I have an idea for a refactoring to RelOptRule. I
> haven’t
> > > > > fully thought
> > > > > > > > > > it through, but I’m going to sketch it out here to see
> whether
> > > > > folks agree
> > > > > > > > > > about the problems/solutions.
> > > > > > > > > > >
> > > > > > > > > > > It will be a breaking change (in the sense that people
> will
> > > > > have to
> > > > > > > > > > change their code in order to get it to compile) but
> relatively
> > > > > safe (in
> > > > > > > > > > that once the code compiles, it will have the same
> behavior as
> > > > > before).
> > > > > > > > > > Also it will give Calcite developers and users a lot more
> > > > > flexibility going
> > > > > > > > > > forward.
> > > > > > > > > > >
> > > > > > > > > > > The problem I see is that people often want different
> variants
> > > > > of
> > > > > > > > > > planner rules. An example is FilterJoinRule, which has a
> > > > > 'boolean smart’
> > > > > > > > > > parameter, a predicate (which returns whether to pull up
> filter
> > > > > > > > > > conditions), operands (which determine the precise
> sub-classes
> > > > > of RelNode
> > > > > > > > > > that the rule should match) and a relBuilderFactory
> (which
> > > > > controls the
> > > > > > > > > > type of RelNode created by this rule).
> > > > > > > > > > >
> > > > > > > > > > > Suppose you have an instance of FilterJoinRule and you
> want to
> > > > > change
> > > > > > > > > > ‘smart’ from true to false. The ‘smart’ parameter is
> immutable
> > > > > (good!) but
> > > > > > > > > > you can’t easily create a clone of the rule because you
> don’t
> > > > > know the
> > > > > > > > > > values of the other parameters. Your instance might even
> be
> > > > > (unbeknownst to
> > > > > > > > > > you) a sub-class with extra parameters and a private
> constructor.
> > > > > > > > > > >
> > > > > > > > > > > So, my proposal is to put all of the config
> information of a
> > > > > RelOptRule
> > > > > > > > > > into a single ‘config’ parameter that contains all
> relevant
> > > > > properties.
> > > > > > > > > > Each sub-class of RelOptRule would have one constructor
> with
> > > > > just a
> > > > > > > > > > ‘config’ parameter. Each config knows which sub-class of
> > > > > RelOptRule to
> > > > > > > > > > create. Therefore it is easy to copy a config, change
> one or more
> > > > > > > > > > properties, and create a new rule instance.
> > > > > > > > > > >
> > > > > > > > > > > Adding a property to a rule’s config does not require
> us to
> > > > > add or
> > > > > > > > > > deprecate any constructors.
> > > > > > > > > > >
> > > > > > > > > > > The operands are part of the config, so if you have a
> rule
> > > > > that matches
> > > > > > > > > > a EnumerableFilter on an EnumerableJoin and you want to
> make it
> > > > > match an
> > > > > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can
> easily
> > > > > create one
> > > > > > > > > > with one changed operand.
> > > > > > > > > > >
> > > > > > > > > > > The config is immutable and self-describing, so we can
> use it
> > > > > to
> > > > > > > > > > automatically generate a unique description for each rule
> > > > > instance.
> > > > > > > > > > >
> > > > > > > > > > > Julian
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > >
> > > > >
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > > <
> > > > > > > > > >
> > > > >
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Haisheng Yuan <hy...@apache.org>.
Thank you for the detailed explanation, Julian.

I will step aside, let you and other community members decide.
Anyway, my comment is not blocking.

Thanks,
Haisheng

On 2020/06/17 23:12:40, Julian Hyde <jh...@apache.org> wrote: 
> "I prefer the KISS principle" is a bit unfair. What I'm advocating is
> also simple. But maybe from a different perspective.
> 
> I am looking at it from the perspective of people who use rules, and
> compose them into rule sets to perform particular optimizations. I
> believe this is the most important perspective to focus on. These
> people are the key audience to serve. The corpus of re-usable,
> composable rules may be Calcite's most important contribution.
> 
> To these people, a rule is not a class but an object. It has a
> particular signature (in terms of the pattern of RelNodes that it
> matches), maybe one or two configuration parameters (e.g.
> FilterJoinRule.smart controls whether to automatically strengthen a
> LEFT join to INNER), and it has code to generate a new RelNode when
> matched.
> 
> What do I mean by 'object'? Think of how you use regular expressions
> in Java. You call java.util.regex.Pattern.compile("a[bc]*") and it
> gives you a Pattern object. You do not know what fields are inside
> that Pattern, you do not care whether the object you receive is a
> Pattern or some sub-class of Pattern, but you know there are one or
> two methods such as 'matcher(CharSequence)' that you can call.
> 
> Our current API treats planner rules as classes. If you want to
> customize a property of a rule (say make a FilterJoinRule with
> smart=false, or change its RelBuilder, or make it match a
> CassandraFilter rather than a LogicalFilter) then you have to make a
> new rule instance by calling the constructor.
> 
> When you call the constructor, you have to supply ALL of the
> properties, including operands, and including properties that you
> don't know or care about. If someone in three months adds a new
> property, the signature of the constructor will change, and you will
> have to go back and change your code. This is broken.
> 
> Haisheng asked for evidence that the current system is broken.
> https://issues.apache.org/jira/browse/CALCITE-3825 ("Split
> AbstractMaterializedViewRule into multiple classes") is one example.
> In my company, that change literally broke our code because we had
> changed properties of some rules.
> 
> Another example is https://issues.apache.org/jira/browse/CALCITE-3975
> ("Add options to ProjectFilterTransposeRule to push down project and
> filter expressions whole, not just field references"). With that
> change, I broke my own code. I added two new arguments, 'boolean
> wholeProject, boolean wholeFilter' to ProjectFilterTransposeRule's
> constructor, deprecated the previous constructor, and created quite a
> few conflicts in another dev branch that I was working on.
> 
> These kinds of problems are not uncommon. Because it is difficult to
> reconfigure rules, or evolve them by adding extra arguments, people
> are more likely to copy-paste rule code into their own projects, and
> less like to contribute back.
> 
> In most rules, operands are created evaluating a complex expression -
> made up of calls to static methods such as RelOptRule.operand - in a
> constructor as an argument to the base class constructor. This is a
> code smell. Such code is hard to move elsewhere.
> 
> We treat operands as immutable, but actually they are mutable. Every
> operand contains mutable fields 'solveOrder', 'ordinalInParent' and
> 'ordinalInRule' that are assigned in the RelOptRule constructor. We
> don't prevent you from passing the same operand to two different rule
> instances, but if you did, bad things would happen. Actually I think
> people would love to be able to say 'create a rule instance the same
> as this, but replacing LogicalFilter with CassandraFilter' but we
> don't make it easy, and copying operands from one rule to another is
> certainly the wrong way to achieve it.
> 
> Given all of these problems, the solution was to convert rules into
> data objects. Those data objects are the new Config interface and its
> sub-interfaces. (I used an interface because the ImmutableBeans
> framework makes it easy to create immutable data objects without
> generating code or writing a lot of boilerplate code by hand.)
> 
> These Config interfaces mean one extra class per rule, and about 5
> extra lines of source code. If you measure simplicity in lines of
> code, then I suppose they make things a little less simple.
> 
> The new way of building operands, using a builder that makes
> call-backs for each nested operand, and has a method for each
> attribute of an operand (e.g. predicate()) is also 'less simple' if
> you count lines of code. But it is more elegant in that it can
> (potentially) produce genuinely immutable operands, is strongly typed,
> and offers helpful code-completion when you use it in an IDE.
> 
> I hope you now understand the problems I am trying to solve, and why
> this new approach is better. That said, this change isn't perfect. We
> can evolve this change to make the code more concise, or easier to
> understand.
> 
> Let's discuss some possible improvements.
> 
> One possible improvement would be to stop using rule instances
> (RelOptRule) and instead use rule config instances
> (RelOptRule.Config). The planner could easily call Config.toRule()
> when building a program. The sub-classes of RelOptRule would be
> largely invisible (private sub-classes of the Config classes), and
> that is appropriate, because all they provide is the implementation of
> the onMatch() method. There would be a one-time impact because a lot
> of types would change from RelOptRule to RelOptRule.Config.
> 
> Another improvement would be to move around the rule constants.
> FilterCorrelateRule.INSTANCE and AggregateFilterTransposeRule.INSTANCE
> would become the fields FILTER_CORRELATE and
> AGGREGATE_FILTER_TRANSPOSE in a new class, LogicalRules. People would
> no longer need to reference the names of rule classes in their code,
> just the instances. And of course they can now easily customize those
> instances.
> 
> I hope I have convinced you that rules need to become 'data objects'
> with a small amount of behavior, and that people need to be able to
> customize them without calling their constructor or even knowing their
> precise class.
> 
> I would love to hear suggestions for how we can make the transition to
> this new model as smooth as possible.
> 
> Julian
> 
> On Sun, Jun 14, 2020 at 9:15 PM Haisheng Yuan <hy...@apache.org> wrote:
> >
> > Druid adapter rules are not used by Druid. As far as I know, only Hive uses these rules, I even think they should be moved to Hive project. :) If any other projects other than Hive are using them, please let us know. Basically, these Druid rules are not generally applicable. The PR changed many built-in rules to specifically accommodate Druid adapter rules, especially with third operand that can match any operator, even EnumerableTableScan is legit.
> >
> > Most converter rules are not designed to be flexible, no one would extend them. RelOptRule#operand() methods are deprecated, which implies someday, they will be removed and migrated to the new Config. But most down stream projects don't need to worry about the their rules' extensibility and compatibility, because they can modify their own rules freely, anytime.
> >
> > At the end of the day, we may find that only a small fraction of transformation rules need refactoring, e.g. ProjectFilterTransposeRule. Note that even FilterProjectTransposeRule doesn't need refactoring, we can safely remove copyFilter and copyProject, it is safe to always copy them, do we really want to match physical operators and generate logical alternatives?
> >
> > Last but the most important, rule operands, IMO, should be out of Config's reach. How frequent do we need to change rule operands or its matching class? IMO, operands of rule constructor should remain unchanged, RelFactory and Rule description can also remain the same, or adapted by Config silently without changing the rule itself, if there are no other additional parameters. Other than that, everything can be put into Config. Therefore I don't think RelOptNewRule is needed, because RelOptRule should be able to integrate with Config seamlessly, without breaking anything.
> >
> > All in all, I prefer the KISS principle, keep it simple, stupid.
> >
> > So my opinion is +0.5.
> >
> > Thanks,
> > Haisheng
> >
> > On 2020/06/14 23:36:21, Stamatis Zampetakis <za...@gmail.com> wrote:
> > > Hello,
> > >
> > > Thanks for putting this together Julian.
> > >
> > > I went quickly over the PR just to grasp better the idea and here are some
> > > first thoughts.
> > >
> > > I really like the fact that rules creation is now much more uniform than
> > > before.
> > > I also like the fact that in some cases subclassing can be avoided thanks
> > > to the flexible configuration. As a fan of the Liskov
> > > substitution principle I find it positive to be able to avoid classes such
> > > as DruidProjectFilterTransposeRule although to be honest I don't know why
> > > these Druid rules impose those additional restrictions.
> > >
> > > On the other hand, I also feel that the new approach is a bit harder to
> > > follow than the previous one. The fact that we have more extension points
> > > gives more flexibility but at the same time complicates the implementation
> > > a bit. I guess regular committers will not have much trouble to adapt to
> > > the changes but for newcomers there may be more questions. For instance:
> > > What do we do when we want to customize the behavior of an existing rule?
> > > * Use an existing config with different parameters.
> > > * Extend the config.
> > > * Extend the rule.
> > > * Extend the config and the rule.
> > > * Create a new rule (if yes under which interface RelOptRule or
> > > RelOptNewRule).
> > >
> > > As Haisheng mentioned, the fact that every rule can be configured with any
> > > RelOptRuleOperand is also a point possibly deserving more discussion.
> > > Ideally, the API should be able to guide developers to pass the correct
> > > configuration parameters and not fail at runtime.
> > >
> > > Overall, I have mixed feelings about the proposed refactoring (definitely
> > > not something blocking), I guess cause I haven't seen as many use-cases as
> > > Julian.
> > > I'm curious to see what others think about the changes. It's a pity to take
> > > a decision based on the feedback of only two/three people.
> > >
> > > Best,
> > > Stamatis
> > >
> > >
> > >
> > >
> > > On Sun, Jun 14, 2020 at 9:36 PM Haisheng Yuan <hy...@apache.org> wrote:
> > >
> > > > Hi Julian,
> > > >
> > > > Thanks for working on this.
> > > >
> > > > We haven't reached a consensus yet.
> > > >
> > > > Frankly speaking, I agree with what Stamatis said earlier. Flexibility
> > > > doesn't come with no cost. Admittedly, with this patch, any rule
> > > > constructor refactoring won't need to deprecate old constructors and break
> > > > backward compatibility, however, it also makes the rule definition much
> > > > more verbose, less readable and understandable. IMHO, it does more harm
> > > > than good.
> > > >
> > > > Let's see how CassandraFilterRule becomes before and after the change.
> > > >
> > > > Before this change:
> > > >
> > > >   private static class CassandraFilterRule extends RelOptRule {
> > > >     private static final CassandraFilterRule INSTANCE = new
> > > > CassandraFilterRule();
> > > >
> > > >     private CassandraFilterRule() {
> > > >       super(operand(LogicalFilter.class, operand(CassandraTableScan.class,
> > > > none())),
> > > >           "CassandraFilterRule");
> > > >     }
> > > >   }
> > > >
> > > > After this change:
> > > >
> > > >   private static class CassandraFilterRule
> > > >       extends RelOptNewRule<CassandraFilterRule.Config> {
> > > >     private static final CassandraFilterRule INSTANCE =
> > > >         Config.EMPTY
> > > >             .withOperandSupplier(b0 ->
> > > >                 b0.operand(LogicalFilter.class)
> > > >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> > > >                         .noInputs()))
> > > >             .as(Config.class)
> > > >             .toRule();
> > > >
> > > >     /** Creates a CassandraFilterRule. */
> > > >     protected CassandraFilterRule(Config config) {
> > > >       super(config);
> > > >     }
> > > >
> > > >     /** Rule configuration. */
> > > >     public interface Config extends RelOptNewRule.Config {
> > > >       @Override default CassandraFilterRule toRule() {
> > > >         return new CassandraFilterRule(this);
> > > >       }
> > > >     }
> > > >   }
> > > >
> > > >
> > > > Intuitively, if we want to check what does the rule generally match or how
> > > > it is defined, we just check the constructor. Now we checkout the
> > > > constructor, only config is there, go to Config, there is still nothing
> > > > interesting, we have to go to the INSTANCE. What is the difference with
> > > > just using operand and optionConfig as the rule constructor?
> > > >
> > > >    protected CassandraFilterRule(RelOptRuleOperand operand, Config config)
> > > > {
> > > >      super(operand, config);
> > > >    }
> > > >
> > > > Or even simply replace Config with int, with each bit represent an option,
> > > > if all of them are boolean options.
> > > >
> > > > Nothing is more flexible than just using RelOptRuleOperand as the
> > > > parameter, just like the base class RelOptRule does. But do we want it?
> > > >
> > > > At the same time, with the new approach, it is now legit to create the
> > > > following instance:
> > > >
> > > >   private static final CassandraFilterRule INSTANCE2 =
> > > >         Config.EMPTY
> > > >             .withOperandSupplier(b0 ->
> > > >                 b0.operand(LogicalProject.class)  // Even the is intended
> > > > to match Filter, but it compiles
> > > >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> > > >                         .noInputs()))
> > > >             .as(Config.class)
> > > >             .toRule();
> > > >
> > > > Before we continue to the discussion and code review, we need to go back
> > > > to the original problem, is it a real problem that is facing us? Is there
> > > > any real demand or just artificial demand? How about we conduct a Calcite
> > > > user survey to see how Calcite devs and users think? Then let's see how to
> > > > move forward.
> > > >
> > > > [+1] Hell yeah, the new approach is awesome, let's go with it.
> > > > [+0] Meh, I am OK with current approach, I don't see any burden or problem
> > > > with it.
> > > > [-1] Hell no, current approach is bad, the new one is even worse.
> > > >
> > > >
> > > > Thanks,
> > > > Haisheng
> > > >
> > > >
> > > > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > > > > There is now a PR: https://github.com/apache/calcite/pull/2024. Can
> > > > > people please review?
> > > > >
> > > > > Here's the TL;DR:
> > > > >
> > > > > Previously, it was not easy to customize, re-use or extend planner
> > > > > rules. If you wished to customize a rule (i.e. create a new instance
> > > > > of a rule with different properties) you would have to call the rule's
> > > > > constructor. Frequently the required constructor did not exist, so we
> > > > > would have to add a new constructor and deprecate the old one.
> > > > >
> > > > > After this change, you start off from an instance of the rule, modify
> > > > > its configuration, and call toRule() on the configuration. (Rule
> > > > > constructors are now private, because only the configuration ever
> > > > > calls them.)
> > > > >
> > > > > A good illustration of this is DruidRules, which used to contain many
> > > > > sub-classes. Those sub-classes are no longer needed. Old code:
> > > > >
> > > > >   public static final DruidSortProjectTransposeRule
> > > > SORT_PROJECT_TRANSPOSE =
> > > > >       new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > > > >
> > > > >     public static class DruidSortProjectTransposeRule
> > > > >         extends SortProjectTransposeRule {
> > > > >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > > > > relBuilderFactory) {
> > > > >         super(
> > > > >             operand(Sort.class,
> > > > >                 operand(Project.class, operand(DruidQuery.class,
> > > > none()))),
> > > > >             relBuilderFactory, null);
> > > > >       }
> > > > >     }
> > > > >
> > > > > New code:
> > > > >
> > > > >   public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
> > > > >       SortProjectTransposeRule.INSTANCE.config
> > > > >           .withOperandFor(Sort.class, Project.class, DruidQuery.class)
> > > > >           .toRule();
> > > > >
> > > > > The change maintains backwards compatibility to a large degree. In a
> > > > > few places, I had to change rule instances from type RelOptRule to
> > > > > Supplier<RelOptRule>, to avoid deadlocks during class loading. For
> > > > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
> > > > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > > > >
> > > > > Julian
> > > > >
> > > > >
> > > > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
> > > > > >
> > > > > > I have now pushed a dev branch with a prototype. Please see
> > > > > > https://issues.apache.org/jira/browse/CALCITE-3923 for details.
> > > > > >
> > > > > > Having built the prototype, I believe that this change is beneficial
> > > > > > and we should do it. But I would like to get to consensus on the
> > > > > > design before we pull the trigger.
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org> wrote:
> > > > > > >
> > > > > > > Haisheng,
> > > > > > >
> > > > > > > I hear you. I agree that major changes to rules will require new rule
> > > > > > > classes (not merely sub-classes). People should copy-paste, refactor,
> > > > > > > and all that good stuff. But I think there are a lot of cases where
> > > > we
> > > > > > > need to make minor changes to rules (there are many of these in the
> > > > > > > code base already), and this change will help.
> > > > > > >
> > > > > > > I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > > > > am going to start working on a prototype. When we have a prototype we
> > > > > > > will be able to assess how big an impact the API change will have.
> > > > > > > (E.g. whether it will be a breaking change.)
> > > > > > >
> > > > > > > Julian
> > > > > > >
> > > > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> > > > h.yuan@alibaba-inc.com> wrote:
> > > > > > > >
> > > > > > > > I don't think it is worth the refactoring. People who want to
> > > > customize the rule, in most cases, won't be satisfied by a different
> > > > parameter, they most likely still need to rewrite (copy & paste) the rule
> > > > with some slightly their own logic. For many Calcite users, the rule is not
> > > > reusable even with flexible configurations.
> > > > > > > >
> > > > > > > > - Haisheng
> > > > > > > >
> > > > > > > > ------------------------------------------------------------------
> > > > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > Apologies for the late reply but I just realised that I had
> > > > written the
> > > > > > > > mail and never pressed the send button.
> > > > > > > >
> > > > > > > > I think it is a nice idea and certainly a problem worth
> > > > addressing. If I
> > > > > > > > understood well you're thinking something like the current
> > > > constructor of
> > > > > > > > the RelBuilder [1] that accepts a Context parameter. Indeed it
> > > > seems that
> > > > > > > > with this change even rules that are not designed to be configured
> > > > can be
> > > > > > > > changed much more gracefully (without adding new constructors and
> > > > breaking
> > > > > > > > changes).
> > > > > > > >
> > > > > > > > On the other hand, some of the advantages that you mention can
> > > > also be
> > > > > > > > turned into disadvantages. For instance, copying a rule without
> > > > knowing the
> > > > > > > > values of the other parameters is a bit risky and might be harder
> > > > to reason
> > > > > > > > about its correctness. Moreover, private constructors, final
> > > > classes, etc.,
> > > > > > > > are primarily used for encapsulation purposes so allowing the
> > > > state of the
> > > > > > > > rule escape somehow breaks the original design of the rule.
> > > > > > > >
> > > > > > > > Another problem with respect to rules is cross convention matching
> > > > and
> > > > > > > > transformations [2]. Many rules should not fire for operands that
> > > > are in
> > > > > > > > different conventions; a typical example that comes in my mind is
> > > > > > > > FilterProjectTransposeRule [3]. In the same spirit most rules
> > > > should not
> > > > > > > > generate mixed convention transformations. Although a different
> > > > problem, I
> > > > > > > > am mentioning it here since it could affect the design of the new
> > > > API.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stamatis
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > > > >
> > > > > > > > [2]
> > > > > > > >
> > > > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > > > [3]
> > > > > > > >
> > > > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > > > >
> > > > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org>
> > > > wrote:
> > > > > > > >
> > > > > > > > > This sounds reasonable to me. It also sounds like we could make
> > > > this
> > > > > > > > > backwards compatible by retaining (but deprecating) the existing
> > > > > > > > > constructors and factory methods that will no longer be needed.
> > > > > > > > > --
> > > > > > > > > Michael Mior
> > > > > > > > > mmior@apache.org
> > > > > > > > >
> > > > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a
> > > > écrit :
> > > > > > > > > >
> > > > > > > > > > I have an idea for a refactoring to RelOptRule. I haven’t
> > > > fully thought
> > > > > > > > > it through, but I’m going to sketch it out here to see whether
> > > > folks agree
> > > > > > > > > about the problems/solutions.
> > > > > > > > > >
> > > > > > > > > > It will be a breaking change (in the sense that people will
> > > > have to
> > > > > > > > > change their code in order to get it to compile) but relatively
> > > > safe (in
> > > > > > > > > that once the code compiles, it will have the same behavior as
> > > > before).
> > > > > > > > > Also it will give Calcite developers and users a lot more
> > > > flexibility going
> > > > > > > > > forward.
> > > > > > > > > >
> > > > > > > > > > The problem I see is that people often want different variants
> > > > of
> > > > > > > > > planner rules. An example is FilterJoinRule, which has a
> > > > 'boolean smart’
> > > > > > > > > parameter, a predicate (which returns whether to pull up filter
> > > > > > > > > conditions), operands (which determine the precise sub-classes
> > > > of RelNode
> > > > > > > > > that the rule should match) and a relBuilderFactory (which
> > > > controls the
> > > > > > > > > type of RelNode created by this rule).
> > > > > > > > > >
> > > > > > > > > > Suppose you have an instance of FilterJoinRule and you want to
> > > > change
> > > > > > > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable
> > > > (good!) but
> > > > > > > > > you can’t easily create a clone of the rule because you don’t
> > > > know the
> > > > > > > > > values of the other parameters. Your instance might even be
> > > > (unbeknownst to
> > > > > > > > > you) a sub-class with extra parameters and a private constructor.
> > > > > > > > > >
> > > > > > > > > > So, my proposal is to put all of the config information of a
> > > > RelOptRule
> > > > > > > > > into a single ‘config’ parameter that contains all relevant
> > > > properties.
> > > > > > > > > Each sub-class of RelOptRule would have one constructor with
> > > > just a
> > > > > > > > > ‘config’ parameter. Each config knows which sub-class of
> > > > RelOptRule to
> > > > > > > > > create. Therefore it is easy to copy a config, change one or more
> > > > > > > > > properties, and create a new rule instance.
> > > > > > > > > >
> > > > > > > > > > Adding a property to a rule’s config does not require us to
> > > > add or
> > > > > > > > > deprecate any constructors.
> > > > > > > > > >
> > > > > > > > > > The operands are part of the config, so if you have a rule
> > > > that matches
> > > > > > > > > a EnumerableFilter on an EnumerableJoin and you want to make it
> > > > match an
> > > > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily
> > > > create one
> > > > > > > > > with one changed operand.
> > > > > > > > > >
> > > > > > > > > > The config is immutable and self-describing, so we can use it
> > > > to
> > > > > > > > > automatically generate a unique description for each rule
> > > > instance.
> > > > > > > > > >
> > > > > > > > > > Julian
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > >
> > > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > <
> > > > > > > > >
> > > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > >
> > > >
> > >
> 

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Julian Hyde <jh...@apache.org>.
"I prefer the KISS principle" is a bit unfair. What I'm advocating is
also simple. But maybe from a different perspective.

I am looking at it from the perspective of people who use rules, and
compose them into rule sets to perform particular optimizations. I
believe this is the most important perspective to focus on. These
people are the key audience to serve. The corpus of re-usable,
composable rules may be Calcite's most important contribution.

To these people, a rule is not a class but an object. It has a
particular signature (in terms of the pattern of RelNodes that it
matches), maybe one or two configuration parameters (e.g.
FilterJoinRule.smart controls whether to automatically strengthen a
LEFT join to INNER), and it has code to generate a new RelNode when
matched.

What do I mean by 'object'? Think of how you use regular expressions
in Java. You call java.util.regex.Pattern.compile("a[bc]*") and it
gives you a Pattern object. You do not know what fields are inside
that Pattern, you do not care whether the object you receive is a
Pattern or some sub-class of Pattern, but you know there are one or
two methods such as 'matcher(CharSequence)' that you can call.

Our current API treats planner rules as classes. If you want to
customize a property of a rule (say make a FilterJoinRule with
smart=false, or change its RelBuilder, or make it match a
CassandraFilter rather than a LogicalFilter) then you have to make a
new rule instance by calling the constructor.

When you call the constructor, you have to supply ALL of the
properties, including operands, and including properties that you
don't know or care about. If someone in three months adds a new
property, the signature of the constructor will change, and you will
have to go back and change your code. This is broken.

Haisheng asked for evidence that the current system is broken.
https://issues.apache.org/jira/browse/CALCITE-3825 ("Split
AbstractMaterializedViewRule into multiple classes") is one example.
In my company, that change literally broke our code because we had
changed properties of some rules.

Another example is https://issues.apache.org/jira/browse/CALCITE-3975
("Add options to ProjectFilterTransposeRule to push down project and
filter expressions whole, not just field references"). With that
change, I broke my own code. I added two new arguments, 'boolean
wholeProject, boolean wholeFilter' to ProjectFilterTransposeRule's
constructor, deprecated the previous constructor, and created quite a
few conflicts in another dev branch that I was working on.

These kinds of problems are not uncommon. Because it is difficult to
reconfigure rules, or evolve them by adding extra arguments, people
are more likely to copy-paste rule code into their own projects, and
less like to contribute back.

In most rules, operands are created evaluating a complex expression -
made up of calls to static methods such as RelOptRule.operand - in a
constructor as an argument to the base class constructor. This is a
code smell. Such code is hard to move elsewhere.

We treat operands as immutable, but actually they are mutable. Every
operand contains mutable fields 'solveOrder', 'ordinalInParent' and
'ordinalInRule' that are assigned in the RelOptRule constructor. We
don't prevent you from passing the same operand to two different rule
instances, but if you did, bad things would happen. Actually I think
people would love to be able to say 'create a rule instance the same
as this, but replacing LogicalFilter with CassandraFilter' but we
don't make it easy, and copying operands from one rule to another is
certainly the wrong way to achieve it.

Given all of these problems, the solution was to convert rules into
data objects. Those data objects are the new Config interface and its
sub-interfaces. (I used an interface because the ImmutableBeans
framework makes it easy to create immutable data objects without
generating code or writing a lot of boilerplate code by hand.)

These Config interfaces mean one extra class per rule, and about 5
extra lines of source code. If you measure simplicity in lines of
code, then I suppose they make things a little less simple.

The new way of building operands, using a builder that makes
call-backs for each nested operand, and has a method for each
attribute of an operand (e.g. predicate()) is also 'less simple' if
you count lines of code. But it is more elegant in that it can
(potentially) produce genuinely immutable operands, is strongly typed,
and offers helpful code-completion when you use it in an IDE.

I hope you now understand the problems I am trying to solve, and why
this new approach is better. That said, this change isn't perfect. We
can evolve this change to make the code more concise, or easier to
understand.

Let's discuss some possible improvements.

One possible improvement would be to stop using rule instances
(RelOptRule) and instead use rule config instances
(RelOptRule.Config). The planner could easily call Config.toRule()
when building a program. The sub-classes of RelOptRule would be
largely invisible (private sub-classes of the Config classes), and
that is appropriate, because all they provide is the implementation of
the onMatch() method. There would be a one-time impact because a lot
of types would change from RelOptRule to RelOptRule.Config.

Another improvement would be to move around the rule constants.
FilterCorrelateRule.INSTANCE and AggregateFilterTransposeRule.INSTANCE
would become the fields FILTER_CORRELATE and
AGGREGATE_FILTER_TRANSPOSE in a new class, LogicalRules. People would
no longer need to reference the names of rule classes in their code,
just the instances. And of course they can now easily customize those
instances.

I hope I have convinced you that rules need to become 'data objects'
with a small amount of behavior, and that people need to be able to
customize them without calling their constructor or even knowing their
precise class.

I would love to hear suggestions for how we can make the transition to
this new model as smooth as possible.

Julian

On Sun, Jun 14, 2020 at 9:15 PM Haisheng Yuan <hy...@apache.org> wrote:
>
> Druid adapter rules are not used by Druid. As far as I know, only Hive uses these rules, I even think they should be moved to Hive project. :) If any other projects other than Hive are using them, please let us know. Basically, these Druid rules are not generally applicable. The PR changed many built-in rules to specifically accommodate Druid adapter rules, especially with third operand that can match any operator, even EnumerableTableScan is legit.
>
> Most converter rules are not designed to be flexible, no one would extend them. RelOptRule#operand() methods are deprecated, which implies someday, they will be removed and migrated to the new Config. But most down stream projects don't need to worry about the their rules' extensibility and compatibility, because they can modify their own rules freely, anytime.
>
> At the end of the day, we may find that only a small fraction of transformation rules need refactoring, e.g. ProjectFilterTransposeRule. Note that even FilterProjectTransposeRule doesn't need refactoring, we can safely remove copyFilter and copyProject, it is safe to always copy them, do we really want to match physical operators and generate logical alternatives?
>
> Last but the most important, rule operands, IMO, should be out of Config's reach. How frequent do we need to change rule operands or its matching class? IMO, operands of rule constructor should remain unchanged, RelFactory and Rule description can also remain the same, or adapted by Config silently without changing the rule itself, if there are no other additional parameters. Other than that, everything can be put into Config. Therefore I don't think RelOptNewRule is needed, because RelOptRule should be able to integrate with Config seamlessly, without breaking anything.
>
> All in all, I prefer the KISS principle, keep it simple, stupid.
>
> So my opinion is +0.5.
>
> Thanks,
> Haisheng
>
> On 2020/06/14 23:36:21, Stamatis Zampetakis <za...@gmail.com> wrote:
> > Hello,
> >
> > Thanks for putting this together Julian.
> >
> > I went quickly over the PR just to grasp better the idea and here are some
> > first thoughts.
> >
> > I really like the fact that rules creation is now much more uniform than
> > before.
> > I also like the fact that in some cases subclassing can be avoided thanks
> > to the flexible configuration. As a fan of the Liskov
> > substitution principle I find it positive to be able to avoid classes such
> > as DruidProjectFilterTransposeRule although to be honest I don't know why
> > these Druid rules impose those additional restrictions.
> >
> > On the other hand, I also feel that the new approach is a bit harder to
> > follow than the previous one. The fact that we have more extension points
> > gives more flexibility but at the same time complicates the implementation
> > a bit. I guess regular committers will not have much trouble to adapt to
> > the changes but for newcomers there may be more questions. For instance:
> > What do we do when we want to customize the behavior of an existing rule?
> > * Use an existing config with different parameters.
> > * Extend the config.
> > * Extend the rule.
> > * Extend the config and the rule.
> > * Create a new rule (if yes under which interface RelOptRule or
> > RelOptNewRule).
> >
> > As Haisheng mentioned, the fact that every rule can be configured with any
> > RelOptRuleOperand is also a point possibly deserving more discussion.
> > Ideally, the API should be able to guide developers to pass the correct
> > configuration parameters and not fail at runtime.
> >
> > Overall, I have mixed feelings about the proposed refactoring (definitely
> > not something blocking), I guess cause I haven't seen as many use-cases as
> > Julian.
> > I'm curious to see what others think about the changes. It's a pity to take
> > a decision based on the feedback of only two/three people.
> >
> > Best,
> > Stamatis
> >
> >
> >
> >
> > On Sun, Jun 14, 2020 at 9:36 PM Haisheng Yuan <hy...@apache.org> wrote:
> >
> > > Hi Julian,
> > >
> > > Thanks for working on this.
> > >
> > > We haven't reached a consensus yet.
> > >
> > > Frankly speaking, I agree with what Stamatis said earlier. Flexibility
> > > doesn't come with no cost. Admittedly, with this patch, any rule
> > > constructor refactoring won't need to deprecate old constructors and break
> > > backward compatibility, however, it also makes the rule definition much
> > > more verbose, less readable and understandable. IMHO, it does more harm
> > > than good.
> > >
> > > Let's see how CassandraFilterRule becomes before and after the change.
> > >
> > > Before this change:
> > >
> > >   private static class CassandraFilterRule extends RelOptRule {
> > >     private static final CassandraFilterRule INSTANCE = new
> > > CassandraFilterRule();
> > >
> > >     private CassandraFilterRule() {
> > >       super(operand(LogicalFilter.class, operand(CassandraTableScan.class,
> > > none())),
> > >           "CassandraFilterRule");
> > >     }
> > >   }
> > >
> > > After this change:
> > >
> > >   private static class CassandraFilterRule
> > >       extends RelOptNewRule<CassandraFilterRule.Config> {
> > >     private static final CassandraFilterRule INSTANCE =
> > >         Config.EMPTY
> > >             .withOperandSupplier(b0 ->
> > >                 b0.operand(LogicalFilter.class)
> > >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> > >                         .noInputs()))
> > >             .as(Config.class)
> > >             .toRule();
> > >
> > >     /** Creates a CassandraFilterRule. */
> > >     protected CassandraFilterRule(Config config) {
> > >       super(config);
> > >     }
> > >
> > >     /** Rule configuration. */
> > >     public interface Config extends RelOptNewRule.Config {
> > >       @Override default CassandraFilterRule toRule() {
> > >         return new CassandraFilterRule(this);
> > >       }
> > >     }
> > >   }
> > >
> > >
> > > Intuitively, if we want to check what does the rule generally match or how
> > > it is defined, we just check the constructor. Now we checkout the
> > > constructor, only config is there, go to Config, there is still nothing
> > > interesting, we have to go to the INSTANCE. What is the difference with
> > > just using operand and optionConfig as the rule constructor?
> > >
> > >    protected CassandraFilterRule(RelOptRuleOperand operand, Config config)
> > > {
> > >      super(operand, config);
> > >    }
> > >
> > > Or even simply replace Config with int, with each bit represent an option,
> > > if all of them are boolean options.
> > >
> > > Nothing is more flexible than just using RelOptRuleOperand as the
> > > parameter, just like the base class RelOptRule does. But do we want it?
> > >
> > > At the same time, with the new approach, it is now legit to create the
> > > following instance:
> > >
> > >   private static final CassandraFilterRule INSTANCE2 =
> > >         Config.EMPTY
> > >             .withOperandSupplier(b0 ->
> > >                 b0.operand(LogicalProject.class)  // Even the is intended
> > > to match Filter, but it compiles
> > >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> > >                         .noInputs()))
> > >             .as(Config.class)
> > >             .toRule();
> > >
> > > Before we continue to the discussion and code review, we need to go back
> > > to the original problem, is it a real problem that is facing us? Is there
> > > any real demand or just artificial demand? How about we conduct a Calcite
> > > user survey to see how Calcite devs and users think? Then let's see how to
> > > move forward.
> > >
> > > [+1] Hell yeah, the new approach is awesome, let's go with it.
> > > [+0] Meh, I am OK with current approach, I don't see any burden or problem
> > > with it.
> > > [-1] Hell no, current approach is bad, the new one is even worse.
> > >
> > >
> > > Thanks,
> > > Haisheng
> > >
> > >
> > > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > > > There is now a PR: https://github.com/apache/calcite/pull/2024. Can
> > > > people please review?
> > > >
> > > > Here's the TL;DR:
> > > >
> > > > Previously, it was not easy to customize, re-use or extend planner
> > > > rules. If you wished to customize a rule (i.e. create a new instance
> > > > of a rule with different properties) you would have to call the rule's
> > > > constructor. Frequently the required constructor did not exist, so we
> > > > would have to add a new constructor and deprecate the old one.
> > > >
> > > > After this change, you start off from an instance of the rule, modify
> > > > its configuration, and call toRule() on the configuration. (Rule
> > > > constructors are now private, because only the configuration ever
> > > > calls them.)
> > > >
> > > > A good illustration of this is DruidRules, which used to contain many
> > > > sub-classes. Those sub-classes are no longer needed. Old code:
> > > >
> > > >   public static final DruidSortProjectTransposeRule
> > > SORT_PROJECT_TRANSPOSE =
> > > >       new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > > >
> > > >     public static class DruidSortProjectTransposeRule
> > > >         extends SortProjectTransposeRule {
> > > >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > > > relBuilderFactory) {
> > > >         super(
> > > >             operand(Sort.class,
> > > >                 operand(Project.class, operand(DruidQuery.class,
> > > none()))),
> > > >             relBuilderFactory, null);
> > > >       }
> > > >     }
> > > >
> > > > New code:
> > > >
> > > >   public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
> > > >       SortProjectTransposeRule.INSTANCE.config
> > > >           .withOperandFor(Sort.class, Project.class, DruidQuery.class)
> > > >           .toRule();
> > > >
> > > > The change maintains backwards compatibility to a large degree. In a
> > > > few places, I had to change rule instances from type RelOptRule to
> > > > Supplier<RelOptRule>, to avoid deadlocks during class loading. For
> > > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
> > > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > > >
> > > > Julian
> > > >
> > > >
> > > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
> > > > >
> > > > > I have now pushed a dev branch with a prototype. Please see
> > > > > https://issues.apache.org/jira/browse/CALCITE-3923 for details.
> > > > >
> > > > > Having built the prototype, I believe that this change is beneficial
> > > > > and we should do it. But I would like to get to consensus on the
> > > > > design before we pull the trigger.
> > > > >
> > > > > Julian
> > > > >
> > > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org> wrote:
> > > > > >
> > > > > > Haisheng,
> > > > > >
> > > > > > I hear you. I agree that major changes to rules will require new rule
> > > > > > classes (not merely sub-classes). People should copy-paste, refactor,
> > > > > > and all that good stuff. But I think there are a lot of cases where
> > > we
> > > > > > need to make minor changes to rules (there are many of these in the
> > > > > > code base already), and this change will help.
> > > > > >
> > > > > > I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > > > am going to start working on a prototype. When we have a prototype we
> > > > > > will be able to assess how big an impact the API change will have.
> > > > > > (E.g. whether it will be a breaking change.)
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> > > h.yuan@alibaba-inc.com> wrote:
> > > > > > >
> > > > > > > I don't think it is worth the refactoring. People who want to
> > > customize the rule, in most cases, won't be satisfied by a different
> > > parameter, they most likely still need to rewrite (copy & paste) the rule
> > > with some slightly their own logic. For many Calcite users, the rule is not
> > > reusable even with flexible configurations.
> > > > > > >
> > > > > > > - Haisheng
> > > > > > >
> > > > > > > ------------------------------------------------------------------
> > > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Apologies for the late reply but I just realised that I had
> > > written the
> > > > > > > mail and never pressed the send button.
> > > > > > >
> > > > > > > I think it is a nice idea and certainly a problem worth
> > > addressing. If I
> > > > > > > understood well you're thinking something like the current
> > > constructor of
> > > > > > > the RelBuilder [1] that accepts a Context parameter. Indeed it
> > > seems that
> > > > > > > with this change even rules that are not designed to be configured
> > > can be
> > > > > > > changed much more gracefully (without adding new constructors and
> > > breaking
> > > > > > > changes).
> > > > > > >
> > > > > > > On the other hand, some of the advantages that you mention can
> > > also be
> > > > > > > turned into disadvantages. For instance, copying a rule without
> > > knowing the
> > > > > > > values of the other parameters is a bit risky and might be harder
> > > to reason
> > > > > > > about its correctness. Moreover, private constructors, final
> > > classes, etc.,
> > > > > > > are primarily used for encapsulation purposes so allowing the
> > > state of the
> > > > > > > rule escape somehow breaks the original design of the rule.
> > > > > > >
> > > > > > > Another problem with respect to rules is cross convention matching
> > > and
> > > > > > > transformations [2]. Many rules should not fire for operands that
> > > are in
> > > > > > > different conventions; a typical example that comes in my mind is
> > > > > > > FilterProjectTransposeRule [3]. In the same spirit most rules
> > > should not
> > > > > > > generate mixed convention transformations. Although a different
> > > problem, I
> > > > > > > am mentioning it here since it could affect the design of the new
> > > API.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > > >
> > > > > > > [2]
> > > > > > >
> > > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > > [3]
> > > > > > >
> > > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > > >
> > > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > This sounds reasonable to me. It also sounds like we could make
> > > this
> > > > > > > > backwards compatible by retaining (but deprecating) the existing
> > > > > > > > constructors and factory methods that will no longer be needed.
> > > > > > > > --
> > > > > > > > Michael Mior
> > > > > > > > mmior@apache.org
> > > > > > > >
> > > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a
> > > écrit :
> > > > > > > > >
> > > > > > > > > I have an idea for a refactoring to RelOptRule. I haven’t
> > > fully thought
> > > > > > > > it through, but I’m going to sketch it out here to see whether
> > > folks agree
> > > > > > > > about the problems/solutions.
> > > > > > > > >
> > > > > > > > > It will be a breaking change (in the sense that people will
> > > have to
> > > > > > > > change their code in order to get it to compile) but relatively
> > > safe (in
> > > > > > > > that once the code compiles, it will have the same behavior as
> > > before).
> > > > > > > > Also it will give Calcite developers and users a lot more
> > > flexibility going
> > > > > > > > forward.
> > > > > > > > >
> > > > > > > > > The problem I see is that people often want different variants
> > > of
> > > > > > > > planner rules. An example is FilterJoinRule, which has a
> > > 'boolean smart’
> > > > > > > > parameter, a predicate (which returns whether to pull up filter
> > > > > > > > conditions), operands (which determine the precise sub-classes
> > > of RelNode
> > > > > > > > that the rule should match) and a relBuilderFactory (which
> > > controls the
> > > > > > > > type of RelNode created by this rule).
> > > > > > > > >
> > > > > > > > > Suppose you have an instance of FilterJoinRule and you want to
> > > change
> > > > > > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable
> > > (good!) but
> > > > > > > > you can’t easily create a clone of the rule because you don’t
> > > know the
> > > > > > > > values of the other parameters. Your instance might even be
> > > (unbeknownst to
> > > > > > > > you) a sub-class with extra parameters and a private constructor.
> > > > > > > > >
> > > > > > > > > So, my proposal is to put all of the config information of a
> > > RelOptRule
> > > > > > > > into a single ‘config’ parameter that contains all relevant
> > > properties.
> > > > > > > > Each sub-class of RelOptRule would have one constructor with
> > > just a
> > > > > > > > ‘config’ parameter. Each config knows which sub-class of
> > > RelOptRule to
> > > > > > > > create. Therefore it is easy to copy a config, change one or more
> > > > > > > > properties, and create a new rule instance.
> > > > > > > > >
> > > > > > > > > Adding a property to a rule’s config does not require us to
> > > add or
> > > > > > > > deprecate any constructors.
> > > > > > > > >
> > > > > > > > > The operands are part of the config, so if you have a rule
> > > that matches
> > > > > > > > a EnumerableFilter on an EnumerableJoin and you want to make it
> > > match an
> > > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily
> > > create one
> > > > > > > > with one changed operand.
> > > > > > > > >
> > > > > > > > > The config is immutable and self-describing, so we can use it
> > > to
> > > > > > > > automatically generate a unique description for each rule
> > > instance.
> > > > > > > > >
> > > > > > > > > Julian
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > >
> > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > <
> > > > > > > >
> > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > >
> > >
> >

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Haisheng Yuan <hy...@apache.org>.
Druid adapter rules are not used by Druid. As far as I know, only Hive uses these rules, I even think they should be moved to Hive project. :) If any other projects other than Hive are using them, please let us know. Basically, these Druid rules are not generally applicable. The PR changed many built-in rules to specifically accommodate Druid adapter rules, especially with third operand that can match any operator, even EnumerableTableScan is legit.

Most converter rules are not designed to be flexible, no one would extend them. RelOptRule#operand() methods are deprecated, which implies someday, they will be removed and migrated to the new Config. But most down stream projects don't need to worry about the their rules' extensibility and compatibility, because they can modify their own rules freely, anytime.

At the end of the day, we may find that only a small fraction of transformation rules need refactoring, e.g. ProjectFilterTransposeRule. Note that even FilterProjectTransposeRule doesn't need refactoring, we can safely remove copyFilter and copyProject, it is safe to always copy them, do we really want to match physical operators and generate logical alternatives?

Last but the most important, rule operands, IMO, should be out of Config's reach. How frequent do we need to change rule operands or its matching class? IMO, operands of rule constructor should remain unchanged, RelFactory and Rule description can also remain the same, or adapted by Config silently without changing the rule itself, if there are no other additional parameters. Other than that, everything can be put into Config. Therefore I don't think RelOptNewRule is needed, because RelOptRule should be able to integrate with Config seamlessly, without breaking anything.

All in all, I prefer the KISS principle, keep it simple, stupid.

So my opinion is +0.5.

Thanks,
Haisheng

On 2020/06/14 23:36:21, Stamatis Zampetakis <za...@gmail.com> wrote: 
> Hello,
> 
> Thanks for putting this together Julian.
> 
> I went quickly over the PR just to grasp better the idea and here are some
> first thoughts.
> 
> I really like the fact that rules creation is now much more uniform than
> before.
> I also like the fact that in some cases subclassing can be avoided thanks
> to the flexible configuration. As a fan of the Liskov
> substitution principle I find it positive to be able to avoid classes such
> as DruidProjectFilterTransposeRule although to be honest I don't know why
> these Druid rules impose those additional restrictions.
> 
> On the other hand, I also feel that the new approach is a bit harder to
> follow than the previous one. The fact that we have more extension points
> gives more flexibility but at the same time complicates the implementation
> a bit. I guess regular committers will not have much trouble to adapt to
> the changes but for newcomers there may be more questions. For instance:
> What do we do when we want to customize the behavior of an existing rule?
> * Use an existing config with different parameters.
> * Extend the config.
> * Extend the rule.
> * Extend the config and the rule.
> * Create a new rule (if yes under which interface RelOptRule or
> RelOptNewRule).
> 
> As Haisheng mentioned, the fact that every rule can be configured with any
> RelOptRuleOperand is also a point possibly deserving more discussion.
> Ideally, the API should be able to guide developers to pass the correct
> configuration parameters and not fail at runtime.
> 
> Overall, I have mixed feelings about the proposed refactoring (definitely
> not something blocking), I guess cause I haven't seen as many use-cases as
> Julian.
> I'm curious to see what others think about the changes. It's a pity to take
> a decision based on the feedback of only two/three people.
> 
> Best,
> Stamatis
> 
> 
> 
> 
> On Sun, Jun 14, 2020 at 9:36 PM Haisheng Yuan <hy...@apache.org> wrote:
> 
> > Hi Julian,
> >
> > Thanks for working on this.
> >
> > We haven't reached a consensus yet.
> >
> > Frankly speaking, I agree with what Stamatis said earlier. Flexibility
> > doesn't come with no cost. Admittedly, with this patch, any rule
> > constructor refactoring won't need to deprecate old constructors and break
> > backward compatibility, however, it also makes the rule definition much
> > more verbose, less readable and understandable. IMHO, it does more harm
> > than good.
> >
> > Let's see how CassandraFilterRule becomes before and after the change.
> >
> > Before this change:
> >
> >   private static class CassandraFilterRule extends RelOptRule {
> >     private static final CassandraFilterRule INSTANCE = new
> > CassandraFilterRule();
> >
> >     private CassandraFilterRule() {
> >       super(operand(LogicalFilter.class, operand(CassandraTableScan.class,
> > none())),
> >           "CassandraFilterRule");
> >     }
> >   }
> >
> > After this change:
> >
> >   private static class CassandraFilterRule
> >       extends RelOptNewRule<CassandraFilterRule.Config> {
> >     private static final CassandraFilterRule INSTANCE =
> >         Config.EMPTY
> >             .withOperandSupplier(b0 ->
> >                 b0.operand(LogicalFilter.class)
> >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> >                         .noInputs()))
> >             .as(Config.class)
> >             .toRule();
> >
> >     /** Creates a CassandraFilterRule. */
> >     protected CassandraFilterRule(Config config) {
> >       super(config);
> >     }
> >
> >     /** Rule configuration. */
> >     public interface Config extends RelOptNewRule.Config {
> >       @Override default CassandraFilterRule toRule() {
> >         return new CassandraFilterRule(this);
> >       }
> >     }
> >   }
> >
> >
> > Intuitively, if we want to check what does the rule generally match or how
> > it is defined, we just check the constructor. Now we checkout the
> > constructor, only config is there, go to Config, there is still nothing
> > interesting, we have to go to the INSTANCE. What is the difference with
> > just using operand and optionConfig as the rule constructor?
> >
> >    protected CassandraFilterRule(RelOptRuleOperand operand, Config config)
> > {
> >      super(operand, config);
> >    }
> >
> > Or even simply replace Config with int, with each bit represent an option,
> > if all of them are boolean options.
> >
> > Nothing is more flexible than just using RelOptRuleOperand as the
> > parameter, just like the base class RelOptRule does. But do we want it?
> >
> > At the same time, with the new approach, it is now legit to create the
> > following instance:
> >
> >   private static final CassandraFilterRule INSTANCE2 =
> >         Config.EMPTY
> >             .withOperandSupplier(b0 ->
> >                 b0.operand(LogicalProject.class)  // Even the is intended
> > to match Filter, but it compiles
> >                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
> >                         .noInputs()))
> >             .as(Config.class)
> >             .toRule();
> >
> > Before we continue to the discussion and code review, we need to go back
> > to the original problem, is it a real problem that is facing us? Is there
> > any real demand or just artificial demand? How about we conduct a Calcite
> > user survey to see how Calcite devs and users think? Then let's see how to
> > move forward.
> >
> > [+1] Hell yeah, the new approach is awesome, let's go with it.
> > [+0] Meh, I am OK with current approach, I don't see any burden or problem
> > with it.
> > [-1] Hell no, current approach is bad, the new one is even worse.
> >
> >
> > Thanks,
> > Haisheng
> >
> >
> > On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > > There is now a PR: https://github.com/apache/calcite/pull/2024. Can
> > > people please review?
> > >
> > > Here's the TL;DR:
> > >
> > > Previously, it was not easy to customize, re-use or extend planner
> > > rules. If you wished to customize a rule (i.e. create a new instance
> > > of a rule with different properties) you would have to call the rule's
> > > constructor. Frequently the required constructor did not exist, so we
> > > would have to add a new constructor and deprecate the old one.
> > >
> > > After this change, you start off from an instance of the rule, modify
> > > its configuration, and call toRule() on the configuration. (Rule
> > > constructors are now private, because only the configuration ever
> > > calls them.)
> > >
> > > A good illustration of this is DruidRules, which used to contain many
> > > sub-classes. Those sub-classes are no longer needed. Old code:
> > >
> > >   public static final DruidSortProjectTransposeRule
> > SORT_PROJECT_TRANSPOSE =
> > >       new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> > >
> > >     public static class DruidSortProjectTransposeRule
> > >         extends SortProjectTransposeRule {
> > >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > > relBuilderFactory) {
> > >         super(
> > >             operand(Sort.class,
> > >                 operand(Project.class, operand(DruidQuery.class,
> > none()))),
> > >             relBuilderFactory, null);
> > >       }
> > >     }
> > >
> > > New code:
> > >
> > >   public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
> > >       SortProjectTransposeRule.INSTANCE.config
> > >           .withOperandFor(Sort.class, Project.class, DruidQuery.class)
> > >           .toRule();
> > >
> > > The change maintains backwards compatibility to a large degree. In a
> > > few places, I had to change rule instances from type RelOptRule to
> > > Supplier<RelOptRule>, to avoid deadlocks during class loading. For
> > > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
> > > now write FilterJoinRule.FILTER_ON_JOIN.get().
> > >
> > > Julian
> > >
> > >
> > > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
> > > >
> > > > I have now pushed a dev branch with a prototype. Please see
> > > > https://issues.apache.org/jira/browse/CALCITE-3923 for details.
> > > >
> > > > Having built the prototype, I believe that this change is beneficial
> > > > and we should do it. But I would like to get to consensus on the
> > > > design before we pull the trigger.
> > > >
> > > > Julian
> > > >
> > > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org> wrote:
> > > > >
> > > > > Haisheng,
> > > > >
> > > > > I hear you. I agree that major changes to rules will require new rule
> > > > > classes (not merely sub-classes). People should copy-paste, refactor,
> > > > > and all that good stuff. But I think there are a lot of cases where
> > we
> > > > > need to make minor changes to rules (there are many of these in the
> > > > > code base already), and this change will help.
> > > > >
> > > > > I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > > am going to start working on a prototype. When we have a prototype we
> > > > > will be able to assess how big an impact the API change will have.
> > > > > (E.g. whether it will be a breaking change.)
> > > > >
> > > > > Julian
> > > > >
> > > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> > h.yuan@alibaba-inc.com> wrote:
> > > > > >
> > > > > > I don't think it is worth the refactoring. People who want to
> > customize the rule, in most cases, won't be satisfied by a different
> > parameter, they most likely still need to rewrite (copy & paste) the rule
> > with some slightly their own logic. For many Calcite users, the rule is not
> > reusable even with flexible configurations.
> > > > > >
> > > > > > - Haisheng
> > > > > >
> > > > > > ------------------------------------------------------------------
> > > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > > 日 期:2020年03月14日 22:54:04
> > > > > > 收件人:<de...@calcite.apache.org>
> > > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Apologies for the late reply but I just realised that I had
> > written the
> > > > > > mail and never pressed the send button.
> > > > > >
> > > > > > I think it is a nice idea and certainly a problem worth
> > addressing. If I
> > > > > > understood well you're thinking something like the current
> > constructor of
> > > > > > the RelBuilder [1] that accepts a Context parameter. Indeed it
> > seems that
> > > > > > with this change even rules that are not designed to be configured
> > can be
> > > > > > changed much more gracefully (without adding new constructors and
> > breaking
> > > > > > changes).
> > > > > >
> > > > > > On the other hand, some of the advantages that you mention can
> > also be
> > > > > > turned into disadvantages. For instance, copying a rule without
> > knowing the
> > > > > > values of the other parameters is a bit risky and might be harder
> > to reason
> > > > > > about its correctness. Moreover, private constructors, final
> > classes, etc.,
> > > > > > are primarily used for encapsulation purposes so allowing the
> > state of the
> > > > > > rule escape somehow breaks the original design of the rule.
> > > > > >
> > > > > > Another problem with respect to rules is cross convention matching
> > and
> > > > > > transformations [2]. Many rules should not fire for operands that
> > are in
> > > > > > different conventions; a typical example that comes in my mind is
> > > > > > FilterProjectTransposeRule [3]. In the same spirit most rules
> > should not
> > > > > > generate mixed convention transformations. Although a different
> > problem, I
> > > > > > am mentioning it here since it could affect the design of the new
> > API.
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > > >
> > > > > > [1]
> > > > > >
> > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > > >
> > > > > > [2]
> > > > > >
> > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > > [3]
> > > > > >
> > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > > >
> > > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org>
> > wrote:
> > > > > >
> > > > > > > This sounds reasonable to me. It also sounds like we could make
> > this
> > > > > > > backwards compatible by retaining (but deprecating) the existing
> > > > > > > constructors and factory methods that will no longer be needed.
> > > > > > > --
> > > > > > > Michael Mior
> > > > > > > mmior@apache.org
> > > > > > >
> > > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a
> > écrit :
> > > > > > > >
> > > > > > > > I have an idea for a refactoring to RelOptRule. I haven’t
> > fully thought
> > > > > > > it through, but I’m going to sketch it out here to see whether
> > folks agree
> > > > > > > about the problems/solutions.
> > > > > > > >
> > > > > > > > It will be a breaking change (in the sense that people will
> > have to
> > > > > > > change their code in order to get it to compile) but relatively
> > safe (in
> > > > > > > that once the code compiles, it will have the same behavior as
> > before).
> > > > > > > Also it will give Calcite developers and users a lot more
> > flexibility going
> > > > > > > forward.
> > > > > > > >
> > > > > > > > The problem I see is that people often want different variants
> > of
> > > > > > > planner rules. An example is FilterJoinRule, which has a
> > 'boolean smart’
> > > > > > > parameter, a predicate (which returns whether to pull up filter
> > > > > > > conditions), operands (which determine the precise sub-classes
> > of RelNode
> > > > > > > that the rule should match) and a relBuilderFactory (which
> > controls the
> > > > > > > type of RelNode created by this rule).
> > > > > > > >
> > > > > > > > Suppose you have an instance of FilterJoinRule and you want to
> > change
> > > > > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable
> > (good!) but
> > > > > > > you can’t easily create a clone of the rule because you don’t
> > know the
> > > > > > > values of the other parameters. Your instance might even be
> > (unbeknownst to
> > > > > > > you) a sub-class with extra parameters and a private constructor.
> > > > > > > >
> > > > > > > > So, my proposal is to put all of the config information of a
> > RelOptRule
> > > > > > > into a single ‘config’ parameter that contains all relevant
> > properties.
> > > > > > > Each sub-class of RelOptRule would have one constructor with
> > just a
> > > > > > > ‘config’ parameter. Each config knows which sub-class of
> > RelOptRule to
> > > > > > > create. Therefore it is easy to copy a config, change one or more
> > > > > > > properties, and create a new rule instance.
> > > > > > > >
> > > > > > > > Adding a property to a rule’s config does not require us to
> > add or
> > > > > > > deprecate any constructors.
> > > > > > > >
> > > > > > > > The operands are part of the config, so if you have a rule
> > that matches
> > > > > > > a EnumerableFilter on an EnumerableJoin and you want to make it
> > match an
> > > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily
> > create one
> > > > > > > with one changed operand.
> > > > > > > >
> > > > > > > > The config is immutable and self-describing, so we can use it
> > to
> > > > > > > automatically generate a unique description for each rule
> > instance.
> > > > > > > >
> > > > > > > > Julian
> > > > > > > >
> > > > > > > > [1]
> > > > > > >
> > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > <
> > > > > > >
> > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > >
> >
> 

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hello,

Thanks for putting this together Julian.

I went quickly over the PR just to grasp better the idea and here are some
first thoughts.

I really like the fact that rules creation is now much more uniform than
before.
I also like the fact that in some cases subclassing can be avoided thanks
to the flexible configuration. As a fan of the Liskov
substitution principle I find it positive to be able to avoid classes such
as DruidProjectFilterTransposeRule although to be honest I don't know why
these Druid rules impose those additional restrictions.

On the other hand, I also feel that the new approach is a bit harder to
follow than the previous one. The fact that we have more extension points
gives more flexibility but at the same time complicates the implementation
a bit. I guess regular committers will not have much trouble to adapt to
the changes but for newcomers there may be more questions. For instance:
What do we do when we want to customize the behavior of an existing rule?
* Use an existing config with different parameters.
* Extend the config.
* Extend the rule.
* Extend the config and the rule.
* Create a new rule (if yes under which interface RelOptRule or
RelOptNewRule).

As Haisheng mentioned, the fact that every rule can be configured with any
RelOptRuleOperand is also a point possibly deserving more discussion.
Ideally, the API should be able to guide developers to pass the correct
configuration parameters and not fail at runtime.

Overall, I have mixed feelings about the proposed refactoring (definitely
not something blocking), I guess cause I haven't seen as many use-cases as
Julian.
I'm curious to see what others think about the changes. It's a pity to take
a decision based on the feedback of only two/three people.

Best,
Stamatis




On Sun, Jun 14, 2020 at 9:36 PM Haisheng Yuan <hy...@apache.org> wrote:

> Hi Julian,
>
> Thanks for working on this.
>
> We haven't reached a consensus yet.
>
> Frankly speaking, I agree with what Stamatis said earlier. Flexibility
> doesn't come with no cost. Admittedly, with this patch, any rule
> constructor refactoring won't need to deprecate old constructors and break
> backward compatibility, however, it also makes the rule definition much
> more verbose, less readable and understandable. IMHO, it does more harm
> than good.
>
> Let's see how CassandraFilterRule becomes before and after the change.
>
> Before this change:
>
>   private static class CassandraFilterRule extends RelOptRule {
>     private static final CassandraFilterRule INSTANCE = new
> CassandraFilterRule();
>
>     private CassandraFilterRule() {
>       super(operand(LogicalFilter.class, operand(CassandraTableScan.class,
> none())),
>           "CassandraFilterRule");
>     }
>   }
>
> After this change:
>
>   private static class CassandraFilterRule
>       extends RelOptNewRule<CassandraFilterRule.Config> {
>     private static final CassandraFilterRule INSTANCE =
>         Config.EMPTY
>             .withOperandSupplier(b0 ->
>                 b0.operand(LogicalFilter.class)
>                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
>                         .noInputs()))
>             .as(Config.class)
>             .toRule();
>
>     /** Creates a CassandraFilterRule. */
>     protected CassandraFilterRule(Config config) {
>       super(config);
>     }
>
>     /** Rule configuration. */
>     public interface Config extends RelOptNewRule.Config {
>       @Override default CassandraFilterRule toRule() {
>         return new CassandraFilterRule(this);
>       }
>     }
>   }
>
>
> Intuitively, if we want to check what does the rule generally match or how
> it is defined, we just check the constructor. Now we checkout the
> constructor, only config is there, go to Config, there is still nothing
> interesting, we have to go to the INSTANCE. What is the difference with
> just using operand and optionConfig as the rule constructor?
>
>    protected CassandraFilterRule(RelOptRuleOperand operand, Config config)
> {
>      super(operand, config);
>    }
>
> Or even simply replace Config with int, with each bit represent an option,
> if all of them are boolean options.
>
> Nothing is more flexible than just using RelOptRuleOperand as the
> parameter, just like the base class RelOptRule does. But do we want it?
>
> At the same time, with the new approach, it is now legit to create the
> following instance:
>
>   private static final CassandraFilterRule INSTANCE2 =
>         Config.EMPTY
>             .withOperandSupplier(b0 ->
>                 b0.operand(LogicalProject.class)  // Even the is intended
> to match Filter, but it compiles
>                     .oneInput(b1 -> b1.operand(CassandraTableScan.class)
>                         .noInputs()))
>             .as(Config.class)
>             .toRule();
>
> Before we continue to the discussion and code review, we need to go back
> to the original problem, is it a real problem that is facing us? Is there
> any real demand or just artificial demand? How about we conduct a Calcite
> user survey to see how Calcite devs and users think? Then let's see how to
> move forward.
>
> [+1] Hell yeah, the new approach is awesome, let's go with it.
> [+0] Meh, I am OK with current approach, I don't see any burden or problem
> with it.
> [-1] Hell no, current approach is bad, the new one is even worse.
>
>
> Thanks,
> Haisheng
>
>
> On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote:
> > There is now a PR: https://github.com/apache/calcite/pull/2024. Can
> > people please review?
> >
> > Here's the TL;DR:
> >
> > Previously, it was not easy to customize, re-use or extend planner
> > rules. If you wished to customize a rule (i.e. create a new instance
> > of a rule with different properties) you would have to call the rule's
> > constructor. Frequently the required constructor did not exist, so we
> > would have to add a new constructor and deprecate the old one.
> >
> > After this change, you start off from an instance of the rule, modify
> > its configuration, and call toRule() on the configuration. (Rule
> > constructors are now private, because only the configuration ever
> > calls them.)
> >
> > A good illustration of this is DruidRules, which used to contain many
> > sub-classes. Those sub-classes are no longer needed. Old code:
> >
> >   public static final DruidSortProjectTransposeRule
> SORT_PROJECT_TRANSPOSE =
> >       new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> >
> >     public static class DruidSortProjectTransposeRule
> >         extends SortProjectTransposeRule {
> >       public DruidSortProjectTransposeRule(RelBuilderFactory
> > relBuilderFactory) {
> >         super(
> >             operand(Sort.class,
> >                 operand(Project.class, operand(DruidQuery.class,
> none()))),
> >             relBuilderFactory, null);
> >       }
> >     }
> >
> > New code:
> >
> >   public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
> >       SortProjectTransposeRule.INSTANCE.config
> >           .withOperandFor(Sort.class, Project.class, DruidQuery.class)
> >           .toRule();
> >
> > The change maintains backwards compatibility to a large degree. In a
> > few places, I had to change rule instances from type RelOptRule to
> > Supplier<RelOptRule>, to avoid deadlocks during class loading. For
> > instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
> > now write FilterJoinRule.FILTER_ON_JOIN.get().
> >
> > Julian
> >
> >
> > On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
> > >
> > > I have now pushed a dev branch with a prototype. Please see
> > > https://issues.apache.org/jira/browse/CALCITE-3923 for details.
> > >
> > > Having built the prototype, I believe that this change is beneficial
> > > and we should do it. But I would like to get to consensus on the
> > > design before we pull the trigger.
> > >
> > > Julian
> > >
> > > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org> wrote:
> > > >
> > > > Haisheng,
> > > >
> > > > I hear you. I agree that major changes to rules will require new rule
> > > > classes (not merely sub-classes). People should copy-paste, refactor,
> > > > and all that good stuff. But I think there are a lot of cases where
> we
> > > > need to make minor changes to rules (there are many of these in the
> > > > code base already), and this change will help.
> > > >
> > > > I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > > am going to start working on a prototype. When we have a prototype we
> > > > will be able to assess how big an impact the API change will have.
> > > > (E.g. whether it will be a breaking change.)
> > > >
> > > > Julian
> > > >
> > > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <
> h.yuan@alibaba-inc.com> wrote:
> > > > >
> > > > > I don't think it is worth the refactoring. People who want to
> customize the rule, in most cases, won't be satisfied by a different
> parameter, they most likely still need to rewrite (copy & paste) the rule
> with some slightly their own logic. For many Calcite users, the rule is not
> reusable even with flexible configurations.
> > > > >
> > > > > - Haisheng
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > > 日 期:2020年03月14日 22:54:04
> > > > > 收件人:<de...@calcite.apache.org>
> > > > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > > > >
> > > > > Hello,
> > > > >
> > > > > Apologies for the late reply but I just realised that I had
> written the
> > > > > mail and never pressed the send button.
> > > > >
> > > > > I think it is a nice idea and certainly a problem worth
> addressing. If I
> > > > > understood well you're thinking something like the current
> constructor of
> > > > > the RelBuilder [1] that accepts a Context parameter. Indeed it
> seems that
> > > > > with this change even rules that are not designed to be configured
> can be
> > > > > changed much more gracefully (without adding new constructors and
> breaking
> > > > > changes).
> > > > >
> > > > > On the other hand, some of the advantages that you mention can
> also be
> > > > > turned into disadvantages. For instance, copying a rule without
> knowing the
> > > > > values of the other parameters is a bit risky and might be harder
> to reason
> > > > > about its correctness. Moreover, private constructors, final
> classes, etc.,
> > > > > are primarily used for encapsulation purposes so allowing the
> state of the
> > > > > rule escape somehow breaks the original design of the rule.
> > > > >
> > > > > Another problem with respect to rules is cross convention matching
> and
> > > > > transformations [2]. Many rules should not fire for operands that
> are in
> > > > > different conventions; a typical example that comes in my mind is
> > > > > FilterProjectTransposeRule [3]. In the same spirit most rules
> should not
> > > > > generate mixed convention transformations. Although a different
> problem, I
> > > > > am mentioning it here since it could affect the design of the new
> API.
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > > [1]
> > > > >
> https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > > >
> > > > > [2]
> > > > >
> https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > > [3]
> > > > >
> https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > > >
> > > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org>
> wrote:
> > > > >
> > > > > > This sounds reasonable to me. It also sounds like we could make
> this
> > > > > > backwards compatible by retaining (but deprecating) the existing
> > > > > > constructors and factory methods that will no longer be needed.
> > > > > > --
> > > > > > Michael Mior
> > > > > > mmior@apache.org
> > > > > >
> > > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a
> écrit :
> > > > > > >
> > > > > > > I have an idea for a refactoring to RelOptRule. I haven’t
> fully thought
> > > > > > it through, but I’m going to sketch it out here to see whether
> folks agree
> > > > > > about the problems/solutions.
> > > > > > >
> > > > > > > It will be a breaking change (in the sense that people will
> have to
> > > > > > change their code in order to get it to compile) but relatively
> safe (in
> > > > > > that once the code compiles, it will have the same behavior as
> before).
> > > > > > Also it will give Calcite developers and users a lot more
> flexibility going
> > > > > > forward.
> > > > > > >
> > > > > > > The problem I see is that people often want different variants
> of
> > > > > > planner rules. An example is FilterJoinRule, which has a
> 'boolean smart’
> > > > > > parameter, a predicate (which returns whether to pull up filter
> > > > > > conditions), operands (which determine the precise sub-classes
> of RelNode
> > > > > > that the rule should match) and a relBuilderFactory (which
> controls the
> > > > > > type of RelNode created by this rule).
> > > > > > >
> > > > > > > Suppose you have an instance of FilterJoinRule and you want to
> change
> > > > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable
> (good!) but
> > > > > > you can’t easily create a clone of the rule because you don’t
> know the
> > > > > > values of the other parameters. Your instance might even be
> (unbeknownst to
> > > > > > you) a sub-class with extra parameters and a private constructor.
> > > > > > >
> > > > > > > So, my proposal is to put all of the config information of a
> RelOptRule
> > > > > > into a single ‘config’ parameter that contains all relevant
> properties.
> > > > > > Each sub-class of RelOptRule would have one constructor with
> just a
> > > > > > ‘config’ parameter. Each config knows which sub-class of
> RelOptRule to
> > > > > > create. Therefore it is easy to copy a config, change one or more
> > > > > > properties, and create a new rule instance.
> > > > > > >
> > > > > > > Adding a property to a rule’s config does not require us to
> add or
> > > > > > deprecate any constructors.
> > > > > > >
> > > > > > > The operands are part of the config, so if you have a rule
> that matches
> > > > > > a EnumerableFilter on an EnumerableJoin and you want to make it
> match an
> > > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily
> create one
> > > > > > with one changed operand.
> > > > > > >
> > > > > > > The config is immutable and self-describing, so we can use it
> to
> > > > > > automatically generate a unique description for each rule
> instance.
> > > > > > >
> > > > > > > Julian
> > > > > > >
> > > > > > > [1]
> > > > > >
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > <
> > > > > >
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> >
>

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Haisheng Yuan <hy...@apache.org>.
Hi Julian,

Thanks for working on this.

We haven't reached a consensus yet.

Frankly speaking, I agree with what Stamatis said earlier. Flexibility doesn't come with no cost. Admittedly, with this patch, any rule constructor refactoring won't need to deprecate old constructors and break backward compatibility, however, it also makes the rule definition much more verbose, less readable and understandable. IMHO, it does more harm than good.

Let's see how CassandraFilterRule becomes before and after the change.

Before this change:

  private static class CassandraFilterRule extends RelOptRule {
    private static final CassandraFilterRule INSTANCE = new CassandraFilterRule();

    private CassandraFilterRule() {
      super(operand(LogicalFilter.class, operand(CassandraTableScan.class, none())),
          "CassandraFilterRule");
    }
  }

After this change:

  private static class CassandraFilterRule
      extends RelOptNewRule<CassandraFilterRule.Config> {
    private static final CassandraFilterRule INSTANCE =
        Config.EMPTY
            .withOperandSupplier(b0 ->
                b0.operand(LogicalFilter.class)
                    .oneInput(b1 -> b1.operand(CassandraTableScan.class)
                        .noInputs()))
            .as(Config.class)
            .toRule();

    /** Creates a CassandraFilterRule. */
    protected CassandraFilterRule(Config config) {
      super(config);
    }

    /** Rule configuration. */
    public interface Config extends RelOptNewRule.Config {
      @Override default CassandraFilterRule toRule() {
        return new CassandraFilterRule(this);
      }
    }
  }


Intuitively, if we want to check what does the rule generally match or how it is defined, we just check the constructor. Now we checkout the constructor, only config is there, go to Config, there is still nothing interesting, we have to go to the INSTANCE. What is the difference with just using operand and optionConfig as the rule constructor?

   protected CassandraFilterRule(RelOptRuleOperand operand, Config config) {
     super(operand, config);
   }

Or even simply replace Config with int, with each bit represent an option, if all of them are boolean options.

Nothing is more flexible than just using RelOptRuleOperand as the parameter, just like the base class RelOptRule does. But do we want it?

At the same time, with the new approach, it is now legit to create the following instance:

  private static final CassandraFilterRule INSTANCE2 =
        Config.EMPTY
            .withOperandSupplier(b0 ->
                b0.operand(LogicalProject.class)  // Even the is intended to match Filter, but it compiles
                    .oneInput(b1 -> b1.operand(CassandraTableScan.class)
                        .noInputs()))
            .as(Config.class)
            .toRule();

Before we continue to the discussion and code review, we need to go back to the original problem, is it a real problem that is facing us? Is there any real demand or just artificial demand? How about we conduct a Calcite user survey to see how Calcite devs and users think? Then let's see how to move forward.

[+1] Hell yeah, the new approach is awesome, let's go with it.
[+0] Meh, I am OK with current approach, I don't see any burden or problem with it.
[-1] Hell no, current approach is bad, the new one is even worse.


Thanks,
Haisheng


On 2020/06/12 23:51:56, Julian Hyde <jh...@apache.org> wrote: 
> There is now a PR: https://github.com/apache/calcite/pull/2024. Can
> people please review?
> 
> Here's the TL;DR:
> 
> Previously, it was not easy to customize, re-use or extend planner
> rules. If you wished to customize a rule (i.e. create a new instance
> of a rule with different properties) you would have to call the rule's
> constructor. Frequently the required constructor did not exist, so we
> would have to add a new constructor and deprecate the old one.
> 
> After this change, you start off from an instance of the rule, modify
> its configuration, and call toRule() on the configuration. (Rule
> constructors are now private, because only the configuration ever
> calls them.)
> 
> A good illustration of this is DruidRules, which used to contain many
> sub-classes. Those sub-classes are no longer needed. Old code:
> 
>   public static final DruidSortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
>       new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);
> 
>     public static class DruidSortProjectTransposeRule
>         extends SortProjectTransposeRule {
>       public DruidSortProjectTransposeRule(RelBuilderFactory
> relBuilderFactory) {
>         super(
>             operand(Sort.class,
>                 operand(Project.class, operand(DruidQuery.class, none()))),
>             relBuilderFactory, null);
>       }
>     }
> 
> New code:
> 
>   public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
>       SortProjectTransposeRule.INSTANCE.config
>           .withOperandFor(Sort.class, Project.class, DruidQuery.class)
>           .toRule();
> 
> The change maintains backwards compatibility to a large degree. In a
> few places, I had to change rule instances from type RelOptRule to
> Supplier<RelOptRule>, to avoid deadlocks during class loading. For
> instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
> now write FilterJoinRule.FILTER_ON_JOIN.get().
> 
> Julian
> 
> 
> On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
> >
> > I have now pushed a dev branch with a prototype. Please see
> > https://issues.apache.org/jira/browse/CALCITE-3923 for details.
> >
> > Having built the prototype, I believe that this change is beneficial
> > and we should do it. But I would like to get to consensus on the
> > design before we pull the trigger.
> >
> > Julian
> >
> > On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org> wrote:
> > >
> > > Haisheng,
> > >
> > > I hear you. I agree that major changes to rules will require new rule
> > > classes (not merely sub-classes). People should copy-paste, refactor,
> > > and all that good stuff. But I think there are a lot of cases where we
> > > need to make minor changes to rules (there are many of these in the
> > > code base already), and this change will help.
> > >
> > > I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> > > am going to start working on a prototype. When we have a prototype we
> > > will be able to assess how big an impact the API change will have.
> > > (E.g. whether it will be a breaking change.)
> > >
> > > Julian
> > >
> > > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
> > > >
> > > > I don't think it is worth the refactoring. People who want to customize the rule, in most cases, won't be satisfied by a different parameter, they most likely still need to rewrite (copy & paste) the rule with some slightly their own logic. For many Calcite users, the rule is not reusable even with flexible configurations.
> > > >
> > > > - Haisheng
> > > >
> > > > ------------------------------------------------------------------
> > > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > > 日 期:2020年03月14日 22:54:04
> > > > 收件人:<de...@calcite.apache.org>
> > > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > > >
> > > > Hello,
> > > >
> > > > Apologies for the late reply but I just realised that I had written the
> > > > mail and never pressed the send button.
> > > >
> > > > I think it is a nice idea and certainly a problem worth addressing. If I
> > > > understood well you're thinking something like the current constructor of
> > > > the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
> > > > with this change even rules that are not designed to be configured can be
> > > > changed much more gracefully (without adding new constructors and breaking
> > > > changes).
> > > >
> > > > On the other hand, some of the advantages that you mention can also be
> > > > turned into disadvantages. For instance, copying a rule without knowing the
> > > > values of the other parameters is a bit risky and might be harder to reason
> > > > about its correctness. Moreover, private constructors, final classes, etc.,
> > > > are primarily used for encapsulation purposes so allowing the state of the
> > > > rule escape somehow breaks the original design of the rule.
> > > >
> > > > Another problem with respect to rules is cross convention matching and
> > > > transformations [2]. Many rules should not fire for operands that are in
> > > > different conventions; a typical example that comes in my mind is
> > > > FilterProjectTransposeRule [3]. In the same spirit most rules should not
> > > > generate mixed convention transformations. Although a different problem, I
> > > > am mentioning it here since it could affect the design of the new API.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1]
> > > > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > > >
> > > > [2]
> > > > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > > [3]
> > > > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > > >
> > > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org> wrote:
> > > >
> > > > > This sounds reasonable to me. It also sounds like we could make this
> > > > > backwards compatible by retaining (but deprecating) the existing
> > > > > constructors and factory methods that will no longer be needed.
> > > > > --
> > > > > Michael Mior
> > > > > mmior@apache.org
> > > > >
> > > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a écrit :
> > > > > >
> > > > > > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> > > > > it through, but I’m going to sketch it out here to see whether folks agree
> > > > > about the problems/solutions.
> > > > > >
> > > > > > It will be a breaking change (in the sense that people will have to
> > > > > change their code in order to get it to compile) but relatively safe (in
> > > > > that once the code compiles, it will have the same behavior as before).
> > > > > Also it will give Calcite developers and users a lot more flexibility going
> > > > > forward.
> > > > > >
> > > > > > The problem I see is that people often want different variants of
> > > > > planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> > > > > parameter, a predicate (which returns whether to pull up filter
> > > > > conditions), operands (which determine the precise sub-classes of RelNode
> > > > > that the rule should match) and a relBuilderFactory (which controls the
> > > > > type of RelNode created by this rule).
> > > > > >
> > > > > > Suppose you have an instance of FilterJoinRule and you want to change
> > > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but
> > > > > you can’t easily create a clone of the rule because you don’t know the
> > > > > values of the other parameters. Your instance might even be (unbeknownst to
> > > > > you) a sub-class with extra parameters and a private constructor.
> > > > > >
> > > > > > So, my proposal is to put all of the config information of a RelOptRule
> > > > > into a single ‘config’ parameter that contains all relevant properties.
> > > > > Each sub-class of RelOptRule would have one constructor with just a
> > > > > ‘config’ parameter. Each config knows which sub-class of RelOptRule to
> > > > > create. Therefore it is easy to copy a config, change one or more
> > > > > properties, and create a new rule instance.
> > > > > >
> > > > > > Adding a property to a rule’s config does not require us to add or
> > > > > deprecate any constructors.
> > > > > >
> > > > > > The operands are part of the config, so if you have a rule that matches
> > > > > a EnumerableFilter on an EnumerableJoin and you want to make it match an
> > > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one
> > > > > with one changed operand.
> > > > > >
> > > > > > The config is immutable and self-describing, so we can use it to
> > > > > automatically generate a unique description for each rule instance.
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > [1]
> > > > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > <
> > > > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > > >
> > > > > >
> > > > >
> > > >
> 

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Julian Hyde <jh...@apache.org>.
There is now a PR: https://github.com/apache/calcite/pull/2024. Can
people please review?

Here's the TL;DR:

Previously, it was not easy to customize, re-use or extend planner
rules. If you wished to customize a rule (i.e. create a new instance
of a rule with different properties) you would have to call the rule's
constructor. Frequently the required constructor did not exist, so we
would have to add a new constructor and deprecate the old one.

After this change, you start off from an instance of the rule, modify
its configuration, and call toRule() on the configuration. (Rule
constructors are now private, because only the configuration ever
calls them.)

A good illustration of this is DruidRules, which used to contain many
sub-classes. Those sub-classes are no longer needed. Old code:

  public static final DruidSortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
      new DruidSortProjectTransposeRule(RelFactories.LOGICAL_BUILDER);

    public static class DruidSortProjectTransposeRule
        extends SortProjectTransposeRule {
      public DruidSortProjectTransposeRule(RelBuilderFactory
relBuilderFactory) {
        super(
            operand(Sort.class,
                operand(Project.class, operand(DruidQuery.class, none()))),
            relBuilderFactory, null);
      }
    }

New code:

  public static final SortProjectTransposeRule SORT_PROJECT_TRANSPOSE =
      SortProjectTransposeRule.INSTANCE.config
          .withOperandFor(Sort.class, Project.class, DruidQuery.class)
          .toRule();

The change maintains backwards compatibility to a large degree. In a
few places, I had to change rule instances from type RelOptRule to
Supplier<RelOptRule>, to avoid deadlocks during class loading. For
instance, instead of writing FilterJoinRule.FILTER_ON_JOIN you must
now write FilterJoinRule.FILTER_ON_JOIN.get().

Julian


On Thu, Apr 16, 2020 at 12:08 PM Julian Hyde <jh...@apache.org> wrote:
>
> I have now pushed a dev branch with a prototype. Please see
> https://issues.apache.org/jira/browse/CALCITE-3923 for details.
>
> Having built the prototype, I believe that this change is beneficial
> and we should do it. But I would like to get to consensus on the
> design before we pull the trigger.
>
> Julian
>
> On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org> wrote:
> >
> > Haisheng,
> >
> > I hear you. I agree that major changes to rules will require new rule
> > classes (not merely sub-classes). People should copy-paste, refactor,
> > and all that good stuff. But I think there are a lot of cases where we
> > need to make minor changes to rules (there are many of these in the
> > code base already), and this change will help.
> >
> > I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> > am going to start working on a prototype. When we have a prototype we
> > will be able to assess how big an impact the API change will have.
> > (E.g. whether it will be a breaking change.)
> >
> > Julian
> >
> > On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
> > >
> > > I don't think it is worth the refactoring. People who want to customize the rule, in most cases, won't be satisfied by a different parameter, they most likely still need to rewrite (copy & paste) the rule with some slightly their own logic. For many Calcite users, the rule is not reusable even with flexible configurations.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > > 日 期:2020年03月14日 22:54:04
> > > 收件人:<de...@calcite.apache.org>
> > > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> > >
> > > Hello,
> > >
> > > Apologies for the late reply but I just realised that I had written the
> > > mail and never pressed the send button.
> > >
> > > I think it is a nice idea and certainly a problem worth addressing. If I
> > > understood well you're thinking something like the current constructor of
> > > the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
> > > with this change even rules that are not designed to be configured can be
> > > changed much more gracefully (without adding new constructors and breaking
> > > changes).
> > >
> > > On the other hand, some of the advantages that you mention can also be
> > > turned into disadvantages. For instance, copying a rule without knowing the
> > > values of the other parameters is a bit risky and might be harder to reason
> > > about its correctness. Moreover, private constructors, final classes, etc.,
> > > are primarily used for encapsulation purposes so allowing the state of the
> > > rule escape somehow breaks the original design of the rule.
> > >
> > > Another problem with respect to rules is cross convention matching and
> > > transformations [2]. Many rules should not fire for operands that are in
> > > different conventions; a typical example that comes in my mind is
> > > FilterProjectTransposeRule [3]. In the same spirit most rules should not
> > > generate mixed convention transformations. Although a different problem, I
> > > am mentioning it here since it could affect the design of the new API.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> > >
> > > [2]
> > > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > > [3]
> > > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> > >
> > > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org> wrote:
> > >
> > > > This sounds reasonable to me. It also sounds like we could make this
> > > > backwards compatible by retaining (but deprecating) the existing
> > > > constructors and factory methods that will no longer be needed.
> > > > --
> > > > Michael Mior
> > > > mmior@apache.org
> > > >
> > > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a écrit :
> > > > >
> > > > > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> > > > it through, but I’m going to sketch it out here to see whether folks agree
> > > > about the problems/solutions.
> > > > >
> > > > > It will be a breaking change (in the sense that people will have to
> > > > change their code in order to get it to compile) but relatively safe (in
> > > > that once the code compiles, it will have the same behavior as before).
> > > > Also it will give Calcite developers and users a lot more flexibility going
> > > > forward.
> > > > >
> > > > > The problem I see is that people often want different variants of
> > > > planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> > > > parameter, a predicate (which returns whether to pull up filter
> > > > conditions), operands (which determine the precise sub-classes of RelNode
> > > > that the rule should match) and a relBuilderFactory (which controls the
> > > > type of RelNode created by this rule).
> > > > >
> > > > > Suppose you have an instance of FilterJoinRule and you want to change
> > > > ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but
> > > > you can’t easily create a clone of the rule because you don’t know the
> > > > values of the other parameters. Your instance might even be (unbeknownst to
> > > > you) a sub-class with extra parameters and a private constructor.
> > > > >
> > > > > So, my proposal is to put all of the config information of a RelOptRule
> > > > into a single ‘config’ parameter that contains all relevant properties.
> > > > Each sub-class of RelOptRule would have one constructor with just a
> > > > ‘config’ parameter. Each config knows which sub-class of RelOptRule to
> > > > create. Therefore it is easy to copy a config, change one or more
> > > > properties, and create a new rule instance.
> > > > >
> > > > > Adding a property to a rule’s config does not require us to add or
> > > > deprecate any constructors.
> > > > >
> > > > > The operands are part of the config, so if you have a rule that matches
> > > > a EnumerableFilter on an EnumerableJoin and you want to make it match an
> > > > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one
> > > > with one changed operand.
> > > > >
> > > > > The config is immutable and self-describing, so we can use it to
> > > > automatically generate a unique description for each rule instance.
> > > > >
> > > > > Julian
> > > > >
> > > > > [1]
> > > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > <
> > > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > > >
> > > > >
> > > >
> > >

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Julian Hyde <jh...@apache.org>.
I have now pushed a dev branch with a prototype. Please see
https://issues.apache.org/jira/browse/CALCITE-3923 for details.

Having built the prototype, I believe that this change is beneficial
and we should do it. But I would like to get to consensus on the
design before we pull the trigger.

Julian

On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde <jh...@apache.org> wrote:
>
> Haisheng,
>
> I hear you. I agree that major changes to rules will require new rule
> classes (not merely sub-classes). People should copy-paste, refactor,
> and all that good stuff. But I think there are a lot of cases where we
> need to make minor changes to rules (there are many of these in the
> code base already), and this change will help.
>
> I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> am going to start working on a prototype. When we have a prototype we
> will be able to assess how big an impact the API change will have.
> (E.g. whether it will be a breaking change.)
>
> Julian
>
> On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
> >
> > I don't think it is worth the refactoring. People who want to customize the rule, in most cases, won't be satisfied by a different parameter, they most likely still need to rewrite (copy & paste) the rule with some slightly their own logic. For many Calcite users, the rule is not reusable even with flexible configurations.
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人:Stamatis Zampetakis<za...@gmail.com>
> > 日 期:2020年03月14日 22:54:04
> > 收件人:<de...@calcite.apache.org>
> > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> >
> > Hello,
> >
> > Apologies for the late reply but I just realised that I had written the
> > mail and never pressed the send button.
> >
> > I think it is a nice idea and certainly a problem worth addressing. If I
> > understood well you're thinking something like the current constructor of
> > the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
> > with this change even rules that are not designed to be configured can be
> > changed much more gracefully (without adding new constructors and breaking
> > changes).
> >
> > On the other hand, some of the advantages that you mention can also be
> > turned into disadvantages. For instance, copying a rule without knowing the
> > values of the other parameters is a bit risky and might be harder to reason
> > about its correctness. Moreover, private constructors, final classes, etc.,
> > are primarily used for encapsulation purposes so allowing the state of the
> > rule escape somehow breaks the original design of the rule.
> >
> > Another problem with respect to rules is cross convention matching and
> > transformations [2]. Many rules should not fire for operands that are in
> > different conventions; a typical example that comes in my mind is
> > FilterProjectTransposeRule [3]. In the same spirit most rules should not
> > generate mixed convention transformations. Although a different problem, I
> > am mentioning it here since it could affect the design of the new API.
> >
> > Best,
> > Stamatis
> >
> > [1]
> > https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> >
> > [2]
> > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > [3]
> > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> >
> > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org> wrote:
> >
> > > This sounds reasonable to me. It also sounds like we could make this
> > > backwards compatible by retaining (but deprecating) the existing
> > > constructors and factory methods that will no longer be needed.
> > > --
> > > Michael Mior
> > > mmior@apache.org
> > >
> > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a écrit :
> > > >
> > > > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> > > it through, but I’m going to sketch it out here to see whether folks agree
> > > about the problems/solutions.
> > > >
> > > > It will be a breaking change (in the sense that people will have to
> > > change their code in order to get it to compile) but relatively safe (in
> > > that once the code compiles, it will have the same behavior as before).
> > > Also it will give Calcite developers and users a lot more flexibility going
> > > forward.
> > > >
> > > > The problem I see is that people often want different variants of
> > > planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> > > parameter, a predicate (which returns whether to pull up filter
> > > conditions), operands (which determine the precise sub-classes of RelNode
> > > that the rule should match) and a relBuilderFactory (which controls the
> > > type of RelNode created by this rule).
> > > >
> > > > Suppose you have an instance of FilterJoinRule and you want to change
> > > ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but
> > > you can’t easily create a clone of the rule because you don’t know the
> > > values of the other parameters. Your instance might even be (unbeknownst to
> > > you) a sub-class with extra parameters and a private constructor.
> > > >
> > > > So, my proposal is to put all of the config information of a RelOptRule
> > > into a single ‘config’ parameter that contains all relevant properties.
> > > Each sub-class of RelOptRule would have one constructor with just a
> > > ‘config’ parameter. Each config knows which sub-class of RelOptRule to
> > > create. Therefore it is easy to copy a config, change one or more
> > > properties, and create a new rule instance.
> > > >
> > > > Adding a property to a rule’s config does not require us to add or
> > > deprecate any constructors.
> > > >
> > > > The operands are part of the config, so if you have a rule that matches
> > > a EnumerableFilter on an EnumerableJoin and you want to make it match an
> > > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one
> > > with one changed operand.
> > > >
> > > > The config is immutable and self-describing, so we can use it to
> > > automatically generate a unique description for each rule instance.
> > > >
> > > > Julian
> > > >
> > > > [1]
> > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > <
> > > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > > >
> > > >
> > >
> >

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Julian Hyde <jh...@apache.org>.
Haisheng,

I hear you. I agree that major changes to rules will require new rule
classes (not merely sub-classes). People should copy-paste, refactor,
and all that good stuff. But I think there are a lot of cases where we
need to make minor changes to rules (there are many of these in the
code base already), and this change will help.

I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
am going to start working on a prototype. When we have a prototype we
will be able to assess how big an impact the API change will have.
(E.g. whether it will be a breaking change.)

Julian

On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
>
> I don't think it is worth the refactoring. People who want to customize the rule, in most cases, won't be satisfied by a different parameter, they most likely still need to rewrite (copy & paste) the rule with some slightly their own logic. For many Calcite users, the rule is not reusable even with flexible configurations.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人:Stamatis Zampetakis<za...@gmail.com>
> 日 期:2020年03月14日 22:54:04
> 收件人:<de...@calcite.apache.org>
> 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
>
> Hello,
>
> Apologies for the late reply but I just realised that I had written the
> mail and never pressed the send button.
>
> I think it is a nice idea and certainly a problem worth addressing. If I
> understood well you're thinking something like the current constructor of
> the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
> with this change even rules that are not designed to be configured can be
> changed much more gracefully (without adding new constructors and breaking
> changes).
>
> On the other hand, some of the advantages that you mention can also be
> turned into disadvantages. For instance, copying a rule without knowing the
> values of the other parameters is a bit risky and might be harder to reason
> about its correctness. Moreover, private constructors, final classes, etc.,
> are primarily used for encapsulation purposes so allowing the state of the
> rule escape somehow breaks the original design of the rule.
>
> Another problem with respect to rules is cross convention matching and
> transformations [2]. Many rules should not fire for operands that are in
> different conventions; a typical example that comes in my mind is
> FilterProjectTransposeRule [3]. In the same spirit most rules should not
> generate mixed convention transformations. Although a different problem, I
> am mentioning it here since it could affect the design of the new API.
>
> Best,
> Stamatis
>
> [1]
> https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
>
> [2]
> https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> [3]
> https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
>
> On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org> wrote:
>
> > This sounds reasonable to me. It also sounds like we could make this
> > backwards compatible by retaining (but deprecating) the existing
> > constructors and factory methods that will no longer be needed.
> > --
> > Michael Mior
> > mmior@apache.org
> >
> > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a écrit :
> > >
> > > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> > it through, but I’m going to sketch it out here to see whether folks agree
> > about the problems/solutions.
> > >
> > > It will be a breaking change (in the sense that people will have to
> > change their code in order to get it to compile) but relatively safe (in
> > that once the code compiles, it will have the same behavior as before).
> > Also it will give Calcite developers and users a lot more flexibility going
> > forward.
> > >
> > > The problem I see is that people often want different variants of
> > planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> > parameter, a predicate (which returns whether to pull up filter
> > conditions), operands (which determine the precise sub-classes of RelNode
> > that the rule should match) and a relBuilderFactory (which controls the
> > type of RelNode created by this rule).
> > >
> > > Suppose you have an instance of FilterJoinRule and you want to change
> > ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but
> > you can’t easily create a clone of the rule because you don’t know the
> > values of the other parameters. Your instance might even be (unbeknownst to
> > you) a sub-class with extra parameters and a private constructor.
> > >
> > > So, my proposal is to put all of the config information of a RelOptRule
> > into a single ‘config’ parameter that contains all relevant properties.
> > Each sub-class of RelOptRule would have one constructor with just a
> > ‘config’ parameter. Each config knows which sub-class of RelOptRule to
> > create. Therefore it is easy to copy a config, change one or more
> > properties, and create a new rule instance.
> > >
> > > Adding a property to a rule’s config does not require us to add or
> > deprecate any constructors.
> > >
> > > The operands are part of the config, so if you have a rule that matches
> > a EnumerableFilter on an EnumerableJoin and you want to make it match an
> > EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one
> > with one changed operand.
> > >
> > > The config is immutable and self-describing, so we can use it to
> > automatically generate a unique description for each rule instance.
> > >
> > > Julian
> > >
> > > [1]
> > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > <
> > https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> > >
> > >
> >
>

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Haisheng Yuan <h....@alibaba-inc.com>.
I don't think it is worth the refactoring. People who want to customize the rule, in most cases, won't be satisfied by a different parameter, they most likely still need to rewrite (copy & paste) the rule with some slightly their own logic. For many Calcite users, the rule is not reusable even with flexible configurations.

- Haisheng

------------------------------------------------------------------
发件人:Stamatis Zampetakis<za...@gmail.com>
日 期:2020年03月14日 22:54:04
收件人:<de...@calcite.apache.org>
主 题:Re: [DISCUSS] Refactor how planner rules are parameterized

Hello,

Apologies for the late reply but I just realised that I had written the
mail and never pressed the send button.

I think it is a nice idea and certainly a problem worth addressing. If I
understood well you're thinking something like the current constructor of
the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
with this change even rules that are not designed to be configured can be
changed much more gracefully (without adding new constructors and breaking
changes).

On the other hand, some of the advantages that you mention can also be
turned into disadvantages. For instance, copying a rule without knowing the
values of the other parameters is a bit risky and might be harder to reason
about its correctness. Moreover, private constructors, final classes, etc.,
are primarily used for encapsulation purposes so allowing the state of the
rule escape somehow breaks the original design of the rule.

Another problem with respect to rules is cross convention matching and
transformations [2]. Many rules should not fire for operands that are in
different conventions; a typical example that comes in my mind is
FilterProjectTransposeRule [3]. In the same spirit most rules should not
generate mixed convention transformations. Although a different problem, I
am mentioning it here since it could affect the design of the new API.

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155

[2]
https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
[3]
https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57

On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org> wrote:

> This sounds reasonable to me. It also sounds like we could make this
> backwards compatible by retaining (but deprecating) the existing
> constructors and factory methods that will no longer be needed.
> --
> Michael Mior
> mmior@apache.org
>
> Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a écrit :
> >
> > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> it through, but I’m going to sketch it out here to see whether folks agree
> about the problems/solutions.
> >
> > It will be a breaking change (in the sense that people will have to
> change their code in order to get it to compile) but relatively safe (in
> that once the code compiles, it will have the same behavior as before).
> Also it will give Calcite developers and users a lot more flexibility going
> forward.
> >
> > The problem I see is that people often want different variants of
> planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> parameter, a predicate (which returns whether to pull up filter
> conditions), operands (which determine the precise sub-classes of RelNode
> that the rule should match) and a relBuilderFactory (which controls the
> type of RelNode created by this rule).
> >
> > Suppose you have an instance of FilterJoinRule and you want to change
> ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but
> you can’t easily create a clone of the rule because you don’t know the
> values of the other parameters. Your instance might even be (unbeknownst to
> you) a sub-class with extra parameters and a private constructor.
> >
> > So, my proposal is to put all of the config information of a RelOptRule
> into a single ‘config’ parameter that contains all relevant properties.
> Each sub-class of RelOptRule would have one constructor with just a
> ‘config’ parameter. Each config knows which sub-class of RelOptRule to
> create. Therefore it is easy to copy a config, change one or more
> properties, and create a new rule instance.
> >
> > Adding a property to a rule’s config does not require us to add or
> deprecate any constructors.
> >
> > The operands are part of the config, so if you have a rule that matches
> a EnumerableFilter on an EnumerableJoin and you want to make it match an
> EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one
> with one changed operand.
> >
> > The config is immutable and self-describing, so we can use it to
> automatically generate a unique description for each rule instance.
> >
> > Julian
> >
> > [1]
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> <
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> >
> >
>


Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hello,

Apologies for the late reply but I just realised that I had written the
mail and never pressed the send button.

I think it is a nice idea and certainly a problem worth addressing. If I
understood well you're thinking something like the current constructor of
the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
with this change even rules that are not designed to be configured can be
changed much more gracefully (without adding new constructors and breaking
changes).

On the other hand, some of the advantages that you mention can also be
turned into disadvantages. For instance, copying a rule without knowing the
values of the other parameters is a bit risky and might be harder to reason
about its correctness. Moreover, private constructors, final classes, etc.,
are primarily used for encapsulation purposes so allowing the state of the
rule escape somehow breaks the original design of the rule.

Another problem with respect to rules is cross convention matching and
transformations [2]. Many rules should not fire for operands that are in
different conventions; a typical example that comes in my mind is
FilterProjectTransposeRule [3]. In the same spirit most rules should not
generate mixed convention transformations. Although a different problem, I
am mentioning it here since it could affect the design of the new API.

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/f11115a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155

[2]
https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
[3]
https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57

On Thu, Feb 20, 2020 at 9:20 PM Michael Mior <mm...@apache.org> wrote:

> This sounds reasonable to me. It also sounds like we could make this
> backwards compatible by retaining (but deprecating) the existing
> constructors and factory methods that will no longer be needed.
> --
> Michael Mior
> mmior@apache.org
>
> Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a écrit :
> >
> > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> it through, but I’m going to sketch it out here to see whether folks agree
> about the problems/solutions.
> >
> > It will be a breaking change (in the sense that people will have to
> change their code in order to get it to compile) but relatively safe (in
> that once the code compiles, it will have the same behavior as before).
> Also it will give Calcite developers and users a lot more flexibility going
> forward.
> >
> > The problem I see is that people often want different variants of
> planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> parameter, a predicate (which returns whether to pull up filter
> conditions), operands (which determine the precise sub-classes of RelNode
> that the rule should match) and a relBuilderFactory (which controls the
> type of RelNode created by this rule).
> >
> > Suppose you have an instance of FilterJoinRule and you want to change
> ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but
> you can’t easily create a clone of the rule because you don’t know the
> values of the other parameters. Your instance might even be (unbeknownst to
> you) a sub-class with extra parameters and a private constructor.
> >
> > So, my proposal is to put all of the config information of a RelOptRule
> into a single ‘config’ parameter that contains all relevant properties.
> Each sub-class of RelOptRule would have one constructor with just a
> ‘config’ parameter. Each config knows which sub-class of RelOptRule to
> create. Therefore it is easy to copy a config, change one or more
> properties, and create a new rule instance.
> >
> > Adding a property to a rule’s config does not require us to add or
> deprecate any constructors.
> >
> > The operands are part of the config, so if you have a rule that matches
> a EnumerableFilter on an EnumerableJoin and you want to make it match an
> EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one
> with one changed operand.
> >
> > The config is immutable and self-describing, so we can use it to
> automatically generate a unique description for each rule instance.
> >
> > Julian
> >
> > [1]
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> <
> https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93
> >
> >
>

Re: [DISCUSS] Refactor how planner rules are parameterized

Posted by Michael Mior <mm...@apache.org>.
This sounds reasonable to me. It also sounds like we could make this
backwards compatible by retaining (but deprecating) the existing
constructors and factory methods that will no longer be needed.
--
Michael Mior
mmior@apache.org

Le jeu. 20 févr. 2020 à 13:11, Julian Hyde <jh...@apache.org> a écrit :
>
> I have an idea for a refactoring to RelOptRule. I haven’t fully thought it through, but I’m going to sketch it out here to see whether folks agree about the problems/solutions.
>
> It will be a breaking change (in the sense that people will have to change their code in order to get it to compile) but relatively safe (in that once the code compiles, it will have the same behavior as before). Also it will give Calcite developers and users a lot more flexibility going forward.
>
> The problem I see is that people often want different variants of planner rules. An example is FilterJoinRule, which has a 'boolean smart’ parameter, a predicate (which returns whether to pull up filter conditions), operands (which determine the precise sub-classes of RelNode that the rule should match) and a relBuilderFactory (which controls the type of RelNode created by this rule).
>
> Suppose you have an instance of FilterJoinRule and you want to change ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but you can’t easily create a clone of the rule because you don’t know the values of the other parameters. Your instance might even be (unbeknownst to you) a sub-class with extra parameters and a private constructor.
>
> So, my proposal is to put all of the config information of a RelOptRule into a single ‘config’ parameter that contains all relevant properties. Each sub-class of RelOptRule would have one constructor with just a ‘config’ parameter. Each config knows which sub-class of RelOptRule to create. Therefore it is easy to copy a config, change one or more properties, and create a new rule instance.
>
> Adding a property to a rule’s config does not require us to add or deprecate any constructors.
>
> The operands are part of the config, so if you have a rule that matches a EnumerableFilter on an EnumerableJoin and you want to make it match an EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one with one changed operand.
>
> The config is immutable and self-describing, so we can use it to automatically generate a unique description for each rule instance.
>
> Julian
>
> [1] https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93 <https://github.com/apache/calcite/blob/5fa41609cb0fe310a0a11d86319d861423850a36/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L93>
>