You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Vladimir Ozerov <pp...@gmail.com> on 2022/04/12 10:24:24 UTC

Changes to the rule pattern interface

Hi folks,

Rules are an essential part of the Calcite-based query optimizers. A
typical optimizer may require dozens of custom rules that are created by
extending some Apache Calcite interfaces.

During the last two years, there were two major revisions of how rules are
created:

   1. In early 1.2x versions, the typical approach was to use
   RelOptRuleOperand with a set of helper methods in a builder-like
   pattern.
   2. Then, we switched to the runtime code generation.
   3. Finally, we switched to the compile-time code generation with the
   Immutables framework.

Every such change requires the downstream projects to rewrite all their
rules. Not only does this require time to understand the new approach, but
it may also compromise the correctness of the downstream optimizer because
the regression tracking in query optimizers is not trivial.

I had the privilege to try all three approaches, and I cannot get rid of
the feeling that every new approach is more complicated than the previous
one. I understand that this is a highly subjective statement, but when I
just started using Apache Calcite and knew very little about it, I was able
to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
code completion. When the RuleConfig was introduced, every new rule always
required me to look at some other rule as an example, yet it was doable.
Now we also need to configure the project build system to write a single
custom rule.

At the same time, a significant fraction of the rules are pretty simple.
E.g., "operator A on top of operator B". If some additional configuration
is required, it could be added via plain rules fields, because at the end
of the day the rule instance is not more than a plain Java object.

A good example is the FilterProjectTransposeRule. What now takes tens of
lines of code in the Config subclass [1] (that you hardly could write
without a reference example), and ~500 LOC in the generated code that you
get through additional plugin configuration [2] in your build system, could
have been expressed in a dozen lines of code [3] in Apache Calcite 1.22.0.

My question is - are we sure we are going in the right direction in terms
of complexity and the entry bar for the newcomers? Wouldn't it be better to
follow the 80/20 rule, when simple rules could be easily created
programmatically with no external dependencies, while more advanced
facilities like Immutables are used only for the complex rules?

Regards,
Vladimir.

[1]
https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
[2]
https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
[3]
https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110

Re: Changes to the rule pattern interface

Posted by Vladimir Ozerov <pp...@gmail.com>.
Hi,

Thanks, everybody, for your comments. I want to clarify that my concern is
not the ideas behind the changes (which are completely valid and valuable)
but that users are _forced_ to use the new approach even for simple
patterns. Walking through the peculiarities of RelRule.Config and
Immutables look pretty complicated to define a simple pattern like "A on
top of B".

The previous infrastructure with manually defined operand requirements is
still available via the RelOptRule, but some of the relevant methods are
deprecated, and the operand builder is hidden from the direct usage by the
Java visibility rules. What do you think if we refractor OperandBuilderImpl
and related classes to make them publicly available without the Config
object? This could allow for straightforward Rule construction without
RelRule.Config and static code-generation that could be used for the cases
when complex configuration is not needed, which is a common pattern for
physical rules in the downstream projects?

Regards,
Vladimir.

чт, 14 апр. 2022 г. в 16:06, Julian Hyde <jh...@apache.org>:

> Yanjing,
>
> That is a valid point. The API could perhaps allow people to define rules
> with nodes that are optional. Can you please log a JIRA for this? Include
> the code that you would like to write in order to define a rule and handle
> a match.
>
> Julian
>
> On Thu, Apr 14, 2022 at 2:27 AM Yanjing Wang <zh...@gmail.com>
> wrote:
>
> > Hi, all,
> >
> > I think the rule configuration is not convenient for optional operand,
> such
> > as a match operand tree
> >               Project
> >                   |
> >               Union
> >               /        \
> >             Filter    Filter
> > If I want to compose a rule that the Project operand is optional, I must
> > create two configuration and new two rule instance, if we can make the
> > operand optional, I think I just need one rule and configuration.
> >
> >
> > Viliam Durina <vi...@hazelcast.com.invalid> 于2022年4月14日周四 14:55写道:
> >
> > > I can say for myself that though the migration was a nuisance, after we
> > > figured out what to do, it was straightforward and the current approach
> > > with Immutables is more readable and easier to write than the previous
> > with
> > > `operand` and `operandJ`.
> > >
> > > Viliam
> > >
> > > On Wed, 13 Apr 2022 at 19:23, Julian Hyde <jh...@gmail.com>
> > wrote:
> > >
> > > > 1. I agree with Jacques. Thanks, Vladimir, for asking the question
> ‘is
> > > > progress really progress?’. We may not back out these changes
> (probably
> > > > won’t) but it will inform future discussions. It’s valid to say ‘when
> > > did X
> > > > as part of the rule refactoring, that didn’t deliver on promises’.
> > > >
> > > > 2. Thanks, Thomas, for sending the link to the original discussion.
> It
> > is
> > > > still my position that many more people will use existing rules than
> > > write
> > > > their own.
> > > >
> > > > 3. We have effectively created a DSL [1] (especially for operand
> > patterns
> > > > [2]) embedded in Java. Java 8 is not a great language for writing
> DSLs:
> > > > Kotlin, Scala, Lisp and even later versions of Java are superior.
> But a
> > > > design option we should consider is a DSL with its own syntax and
> > parser.
> > > > CockroachDB’s OptGen [3] is an example of this, and there is much to
> > > like.
> > > >
> > > > Compared to DSLs embedded in other languages, languages with their
> own
> > > > parser are more intuitive for users, and give better error messages.
> It
> > > is
> > > > also possible to support multiple versions simultaneously.
> > > >
> > > > Julian
> > > >
> > > > [1] https://en.wikipedia.org/wiki/Domain-specific_language <
> > > > https://en.wikipedia.org/wiki/Domain-specific_language>
> > > >
> > > > [2] https://issues.apache.org/jira/browse/CALCITE-3923 <
> > > > https://issues.apache.org/jira/browse/CALCITE-3923>
> > > >
> > > > [3]
> > > >
> > >
> >
> https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go
> > > > <
> > > >
> > >
> >
> https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go
> > > >
> > > >
> > > >
> > > >
> > > > > On Apr 13, 2022, at 9:56 AM, Jacques Nadeau <ja...@apache.org>
> > > wrote:
> > > > >
> > > > > Vladimir,
> > > > >
> > > > > It's always good to ask these kinds of questions. e.g. "Is progress
> > > > really
> > > > > progress?"
> > > > >
> > > > > I think moving to a configuration object was a substantial
> > improvement.
> > > > > Until Java has default parameters and named parameters, I don't
> know
> > of
> > > > > another good way to make rules sufficiently configurable for a wide
> > > range
> > > > > of users.
> > > > >
> > > > > To me, one of the main benefits of the new configuration objects
> is a
> > > > much
> > > > > less frequent need to subclass a rule. Back when we built original
> > > Drill
> > > > > and Dremio code, we were constantly pained by the need to control
> > more
> > > > > pieces of a rule than were exposed by public constructors. When
> > trying
> > > to
> > > > > introduce those options, we had to both: (1) update calcite and
> wait
> > > for
> > > > a
> > > > > release cycle and (2) create another constructor overload,
> sometimes
> > > > > leading to a large number of overloads. The config object almost
> > > entirely
> > > > > eliminated this challenge.
> > > > >
> > > > > I do agree that having config objects is more complex for the
> casual
> > > user
> > > > > so in some ways we're adding advanced functionality at the
> potential
> > > cost
> > > > > of usability for new users. That's not something great. I'm not
> sure
> > > it's
> > > > > really bad but it is something to be aware of. Having default
> configs
> > > for
> > > > > rules seems to make this a rather small additional challenge.
> > > > >
> > > > > For your comments on downstream projects and the examples you
> give: I
> > > > think
> > > > > that sounds more like advanced usage. My experience is somewhat
> > > different
> > > > > from yours: these changes have been net positive on the stuff I've
> > been
> > > > > working on.
> > > > >
> > > > > Two last thoughts wrt downstream projects:
> > > > >
> > > > >   - I think it is important to separate how to create config
> objects
> > > from
> > > > >   the way that Calcite does it. We did change from proxies to
> > > immutables
> > > > >   within Caclite but neither has ever been the required way to
> > > implement
> > > > >   config objects. Config objects are always defined as plain java
> > > > interfaces.
> > > > >   If a particular subproject wants to expose configuration in a
> > > > completely
> > > > >   different way, I don't know of any constraint the rules interface
> > > > forces.
> > > > >   - A nice thing that was done with the initial implementation is
> now
> > > of
> > > > >   the configuration concepts are exposed on the RelOptRule abstract
> > > > class. A
> > > > >   user must opt in to RelRule for them to have to worry about any
> of
> > > > these
> > > > >   concerns.
> > > > >
> > > > >
> > > > > In working with the config concepts and extensibility, I do think a
> > > > couple
> > > > > of things are relatively painful. I don't
> > > > >
> > > > >   1. The definition of configs as interfaces inside the rules
> causes
> > > some
> > > > >   issues with static initialization order. I think Julian even
> > > mentioned
> > > > some
> > > > >   of this in one of the original docs config docs.
> > > > >   2. The generics of RelRule specifically can be challenging when
> > > > extended
> > > > >   rules in some cases due to how generics and inheritance work.
> > > > >   3. The class definition of the config interfaces are not
> > > > >   self-referencing (one of my shelved reworks actually modified to
> > > > >   support this) meaning that there are a lot of ugly (my subjective
> > > > opinion)
> > > > >   uses of Config.as() in the code. This last one could be resolved
> by
> > > > >   exposing the concrete builders that Immutables generates but for
> at
> > > > least
> > > > >   the initial changes, I wanted to avoid exposing these as new
> public
> > > > >   interfaces (and thus all the immutables classes are marked
> package
> > > > private).
> > > > >   4. Merging construction and use feels like an anti-pattern (my
> > > > >   subjective opinion). The most common patterns I've seen treat a
> > > builder
> > > > >   separate from an immutable constructed object (and have something
> > > akin
> > > > to a
> > > > >   toBuilder() method to take an immutable config back to a
> builder).
> > In
> > > > the
> > > > >   Calcite config, these two concepts are merged. In some ways it
> > makes
> > > > things
> > > > >   simpler for trivial cases. However,  it is less formal and causes
> > > pain
> > > > when
> > > > >   you have required properties that have no defaults since there is
> > no
> > > > formal
> > > > >   model for "I'm done building, now check everything is complete".
> > This
> > > > means
> > > > >   in several places we have to put default values that are actually
> > > > invalid
> > > > >   rather than just rely on a builder's build validation step.
> > > > >
> > > > > One other note, I think if someone is working in Java 14+ (records)
> > or
> > > > > Kotlin, there are also several easy ways to produce config impls
> that
> > > are
> > > > > easy to use (without proxies and/or immutables).
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Apr 12, 2022 at 9:29 AM Thomas Rebele
> > > <trebele@tibco.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> The reasons for the planner rules configuration can be found here:
> > > > >> CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923
> >.
> > > See
> > > > >> also
> > > > >> the email thread [DISCUSS] Refactor how planner rules are
> > > parameterized
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E
> > > > >>>
> > > > >> .
> > > > >>
> > > > >> Cordialement / Best Regards,
> > > > >> *Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com
> > > > >>
> > > > >>
> > > > >> On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray <ra...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >>> I don't have any weight behind my opinion or experience,
> > > > >>> but anything that lowers the barrier to entry to Calcite for
> > > newcomers
> > > > >> is a
> > > > >>> huge win in my mind.
> > > > >>>
> > > > >>> I assume the reason for the changes was because codegen improved
> > > > >>> performance?
> > > > >>>
> > > > >>> Could it make sense to allow both options, the
> easy/less-performant
> > > way
> > > > >> for
> > > > >>> people who want to experiment and learn the ropes,
> > > > >>> and the codegen path for productionizing the final rules you come
> > up
> > > > >> with?
> > > > >>>
> > > > >>> Or does this make matters worse, trying to support two API's
> > > > >>>
> > > > >>> On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov <
> > ppozerov@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Hi folks,
> > > > >>>>
> > > > >>>> Rules are an essential part of the Calcite-based query
> > optimizers. A
> > > > >>>> typical optimizer may require dozens of custom rules that are
> > > created
> > > > >> by
> > > > >>>> extending some Apache Calcite interfaces.
> > > > >>>>
> > > > >>>> During the last two years, there were two major revisions of how
> > > rules
> > > > >>> are
> > > > >>>> created:
> > > > >>>>
> > > > >>>>   1. In early 1.2x versions, the typical approach was to use
> > > > >>>>   RelOptRuleOperand with a set of helper methods in a
> builder-like
> > > > >>>>   pattern.
> > > > >>>>   2. Then, we switched to the runtime code generation.
> > > > >>>>   3. Finally, we switched to the compile-time code generation
> with
> > > the
> > > > >>>>   Immutables framework.
> > > > >>>>
> > > > >>>> Every such change requires the downstream projects to rewrite
> all
> > > > their
> > > > >>>> rules. Not only does this require time to understand the new
> > > approach,
> > > > >>> but
> > > > >>>> it may also compromise the correctness of the downstream
> optimizer
> > > > >>> because
> > > > >>>> the regression tracking in query optimizers is not trivial.
> > > > >>>>
> > > > >>>> I had the privilege to try all three approaches, and I cannot
> get
> > > rid
> > > > >> of
> > > > >>>> the feeling that every new approach is more complicated than the
> > > > >> previous
> > > > >>>> one. I understand that this is a highly subjective statement,
> but
> > > when
> > > > >> I
> > > > >>>> just started using Apache Calcite and knew very little about
> it, I
> > > was
> > > > >>> able
> > > > >>>> to write rule patterns by simply looking at the IDE JavaDoc
> > pop-ups
> > > > and
> > > > >>>> code completion. When the RuleConfig was introduced, every new
> > rule
> > > > >>> always
> > > > >>>> required me to look at some other rule as an example, yet it was
> > > > >> doable.
> > > > >>>> Now we also need to configure the project build system to write
> a
> > > > >> single
> > > > >>>> custom rule.
> > > > >>>>
> > > > >>>> At the same time, a significant fraction of the rules are pretty
> > > > >> simple.
> > > > >>>> E.g., "operator A on top of operator B". If some additional
> > > > >> configuration
> > > > >>>> is required, it could be added via plain rules fields, because
> at
> > > the
> > > > >> end
> > > > >>>> of the day the rule instance is not more than a plain Java
> object.
> > > > >>>>
> > > > >>>> A good example is the FilterProjectTransposeRule. What now takes
> > > tens
> > > > >> of
> > > > >>>> lines of code in the Config subclass [1] (that you hardly could
> > > write
> > > > >>>> without a reference example), and ~500 LOC in the generated code
> > > that
> > > > >> you
> > > > >>>> get through additional plugin configuration [2] in your build
> > > system,
> > > > >>> could
> > > > >>>> have been expressed in a dozen lines of code [3] in Apache
> Calcite
> > > > >>> 1.22.0.
> > > > >>>>
> > > > >>>> My question is - are we sure we are going in the right direction
> > in
> > > > >> terms
> > > > >>>> of complexity and the entry bar for the newcomers? Wouldn't it
> be
> > > > >> better
> > > > >>> to
> > > > >>>> follow the 80/20 rule, when simple rules could be easily created
> > > > >>>> programmatically with no external dependencies, while more
> > advanced
> > > > >>>> facilities like Immutables are used only for the complex rules?
> > > > >>>>
> > > > >>>> Regards,
> > > > >>>> Vladimir.
> > > > >>>>
> > > > >>>> [1]
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> > > > >>>> [2]
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> > > > >>>> [3]
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> > > --
> > > This message contains confidential information and is intended only for
> > > the
> > > individuals named. If you are not the named addressee you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately by e-mail if you have received this e-mail by mistake and
> > > delete this e-mail from your system. E-mail transmission cannot be
> > > guaranteed to be secure or error-free as information could be
> > intercepted,
> > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> > viruses.
> > > The sender therefore does not accept liability for any errors or
> > omissions
> > > in the contents of this message, which arise as a result of e-mail
> > > transmission. If verification is required, please request a hard-copy
> > > version. -Hazelcast
> > >
> >
>

Re: Changes to the rule pattern interface

Posted by Julian Hyde <jh...@apache.org>.
Yanjing,

That is a valid point. The API could perhaps allow people to define rules
with nodes that are optional. Can you please log a JIRA for this? Include
the code that you would like to write in order to define a rule and handle
a match.

Julian

On Thu, Apr 14, 2022 at 2:27 AM Yanjing Wang <zh...@gmail.com>
wrote:

> Hi, all,
>
> I think the rule configuration is not convenient for optional operand, such
> as a match operand tree
>               Project
>                   |
>               Union
>               /        \
>             Filter    Filter
> If I want to compose a rule that the Project operand is optional, I must
> create two configuration and new two rule instance, if we can make the
> operand optional, I think I just need one rule and configuration.
>
>
> Viliam Durina <vi...@hazelcast.com.invalid> 于2022年4月14日周四 14:55写道:
>
> > I can say for myself that though the migration was a nuisance, after we
> > figured out what to do, it was straightforward and the current approach
> > with Immutables is more readable and easier to write than the previous
> with
> > `operand` and `operandJ`.
> >
> > Viliam
> >
> > On Wed, 13 Apr 2022 at 19:23, Julian Hyde <jh...@gmail.com>
> wrote:
> >
> > > 1. I agree with Jacques. Thanks, Vladimir, for asking the question ‘is
> > > progress really progress?’. We may not back out these changes (probably
> > > won’t) but it will inform future discussions. It’s valid to say ‘when
> > did X
> > > as part of the rule refactoring, that didn’t deliver on promises’.
> > >
> > > 2. Thanks, Thomas, for sending the link to the original discussion. It
> is
> > > still my position that many more people will use existing rules than
> > write
> > > their own.
> > >
> > > 3. We have effectively created a DSL [1] (especially for operand
> patterns
> > > [2]) embedded in Java. Java 8 is not a great language for writing DSLs:
> > > Kotlin, Scala, Lisp and even later versions of Java are superior. But a
> > > design option we should consider is a DSL with its own syntax and
> parser.
> > > CockroachDB’s OptGen [3] is an example of this, and there is much to
> > like.
> > >
> > > Compared to DSLs embedded in other languages, languages with their own
> > > parser are more intuitive for users, and give better error messages. It
> > is
> > > also possible to support multiple versions simultaneously.
> > >
> > > Julian
> > >
> > > [1] https://en.wikipedia.org/wiki/Domain-specific_language <
> > > https://en.wikipedia.org/wiki/Domain-specific_language>
> > >
> > > [2] https://issues.apache.org/jira/browse/CALCITE-3923 <
> > > https://issues.apache.org/jira/browse/CALCITE-3923>
> > >
> > > [3]
> > >
> >
> https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go
> > > <
> > >
> >
> https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go
> > >
> > >
> > >
> > >
> > > > On Apr 13, 2022, at 9:56 AM, Jacques Nadeau <ja...@apache.org>
> > wrote:
> > > >
> > > > Vladimir,
> > > >
> > > > It's always good to ask these kinds of questions. e.g. "Is progress
> > > really
> > > > progress?"
> > > >
> > > > I think moving to a configuration object was a substantial
> improvement.
> > > > Until Java has default parameters and named parameters, I don't know
> of
> > > > another good way to make rules sufficiently configurable for a wide
> > range
> > > > of users.
> > > >
> > > > To me, one of the main benefits of the new configuration objects is a
> > > much
> > > > less frequent need to subclass a rule. Back when we built original
> > Drill
> > > > and Dremio code, we were constantly pained by the need to control
> more
> > > > pieces of a rule than were exposed by public constructors. When
> trying
> > to
> > > > introduce those options, we had to both: (1) update calcite and wait
> > for
> > > a
> > > > release cycle and (2) create another constructor overload, sometimes
> > > > leading to a large number of overloads. The config object almost
> > entirely
> > > > eliminated this challenge.
> > > >
> > > > I do agree that having config objects is more complex for the casual
> > user
> > > > so in some ways we're adding advanced functionality at the potential
> > cost
> > > > of usability for new users. That's not something great. I'm not sure
> > it's
> > > > really bad but it is something to be aware of. Having default configs
> > for
> > > > rules seems to make this a rather small additional challenge.
> > > >
> > > > For your comments on downstream projects and the examples you give: I
> > > think
> > > > that sounds more like advanced usage. My experience is somewhat
> > different
> > > > from yours: these changes have been net positive on the stuff I've
> been
> > > > working on.
> > > >
> > > > Two last thoughts wrt downstream projects:
> > > >
> > > >   - I think it is important to separate how to create config objects
> > from
> > > >   the way that Calcite does it. We did change from proxies to
> > immutables
> > > >   within Caclite but neither has ever been the required way to
> > implement
> > > >   config objects. Config objects are always defined as plain java
> > > interfaces.
> > > >   If a particular subproject wants to expose configuration in a
> > > completely
> > > >   different way, I don't know of any constraint the rules interface
> > > forces.
> > > >   - A nice thing that was done with the initial implementation is now
> > of
> > > >   the configuration concepts are exposed on the RelOptRule abstract
> > > class. A
> > > >   user must opt in to RelRule for them to have to worry about any of
> > > these
> > > >   concerns.
> > > >
> > > >
> > > > In working with the config concepts and extensibility, I do think a
> > > couple
> > > > of things are relatively painful. I don't
> > > >
> > > >   1. The definition of configs as interfaces inside the rules causes
> > some
> > > >   issues with static initialization order. I think Julian even
> > mentioned
> > > some
> > > >   of this in one of the original docs config docs.
> > > >   2. The generics of RelRule specifically can be challenging when
> > > extended
> > > >   rules in some cases due to how generics and inheritance work.
> > > >   3. The class definition of the config interfaces are not
> > > >   self-referencing (one of my shelved reworks actually modified to
> > > >   support this) meaning that there are a lot of ugly (my subjective
> > > opinion)
> > > >   uses of Config.as() in the code. This last one could be resolved by
> > > >   exposing the concrete builders that Immutables generates but for at
> > > least
> > > >   the initial changes, I wanted to avoid exposing these as new public
> > > >   interfaces (and thus all the immutables classes are marked package
> > > private).
> > > >   4. Merging construction and use feels like an anti-pattern (my
> > > >   subjective opinion). The most common patterns I've seen treat a
> > builder
> > > >   separate from an immutable constructed object (and have something
> > akin
> > > to a
> > > >   toBuilder() method to take an immutable config back to a builder).
> In
> > > the
> > > >   Calcite config, these two concepts are merged. In some ways it
> makes
> > > things
> > > >   simpler for trivial cases. However,  it is less formal and causes
> > pain
> > > when
> > > >   you have required properties that have no defaults since there is
> no
> > > formal
> > > >   model for "I'm done building, now check everything is complete".
> This
> > > means
> > > >   in several places we have to put default values that are actually
> > > invalid
> > > >   rather than just rely on a builder's build validation step.
> > > >
> > > > One other note, I think if someone is working in Java 14+ (records)
> or
> > > > Kotlin, there are also several easy ways to produce config impls that
> > are
> > > > easy to use (without proxies and/or immutables).
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Apr 12, 2022 at 9:29 AM Thomas Rebele
> > <trebele@tibco.com.invalid
> > > >
> > > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> The reasons for the planner rules configuration can be found here:
> > > >> CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923>.
> > See
> > > >> also
> > > >> the email thread [DISCUSS] Refactor how planner rules are
> > parameterized
> > > >> <
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E
> > > >>>
> > > >> .
> > > >>
> > > >> Cordialement / Best Regards,
> > > >> *Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com
> > > >>
> > > >>
> > > >> On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray <ra...@gmail.com>
> > > wrote:
> > > >>
> > > >>> I don't have any weight behind my opinion or experience,
> > > >>> but anything that lowers the barrier to entry to Calcite for
> > newcomers
> > > >> is a
> > > >>> huge win in my mind.
> > > >>>
> > > >>> I assume the reason for the changes was because codegen improved
> > > >>> performance?
> > > >>>
> > > >>> Could it make sense to allow both options, the easy/less-performant
> > way
> > > >> for
> > > >>> people who want to experiment and learn the ropes,
> > > >>> and the codegen path for productionizing the final rules you come
> up
> > > >> with?
> > > >>>
> > > >>> Or does this make matters worse, trying to support two API's
> > > >>>
> > > >>> On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov <
> ppozerov@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Hi folks,
> > > >>>>
> > > >>>> Rules are an essential part of the Calcite-based query
> optimizers. A
> > > >>>> typical optimizer may require dozens of custom rules that are
> > created
> > > >> by
> > > >>>> extending some Apache Calcite interfaces.
> > > >>>>
> > > >>>> During the last two years, there were two major revisions of how
> > rules
> > > >>> are
> > > >>>> created:
> > > >>>>
> > > >>>>   1. In early 1.2x versions, the typical approach was to use
> > > >>>>   RelOptRuleOperand with a set of helper methods in a builder-like
> > > >>>>   pattern.
> > > >>>>   2. Then, we switched to the runtime code generation.
> > > >>>>   3. Finally, we switched to the compile-time code generation with
> > the
> > > >>>>   Immutables framework.
> > > >>>>
> > > >>>> Every such change requires the downstream projects to rewrite all
> > > their
> > > >>>> rules. Not only does this require time to understand the new
> > approach,
> > > >>> but
> > > >>>> it may also compromise the correctness of the downstream optimizer
> > > >>> because
> > > >>>> the regression tracking in query optimizers is not trivial.
> > > >>>>
> > > >>>> I had the privilege to try all three approaches, and I cannot get
> > rid
> > > >> of
> > > >>>> the feeling that every new approach is more complicated than the
> > > >> previous
> > > >>>> one. I understand that this is a highly subjective statement, but
> > when
> > > >> I
> > > >>>> just started using Apache Calcite and knew very little about it, I
> > was
> > > >>> able
> > > >>>> to write rule patterns by simply looking at the IDE JavaDoc
> pop-ups
> > > and
> > > >>>> code completion. When the RuleConfig was introduced, every new
> rule
> > > >>> always
> > > >>>> required me to look at some other rule as an example, yet it was
> > > >> doable.
> > > >>>> Now we also need to configure the project build system to write a
> > > >> single
> > > >>>> custom rule.
> > > >>>>
> > > >>>> At the same time, a significant fraction of the rules are pretty
> > > >> simple.
> > > >>>> E.g., "operator A on top of operator B". If some additional
> > > >> configuration
> > > >>>> is required, it could be added via plain rules fields, because at
> > the
> > > >> end
> > > >>>> of the day the rule instance is not more than a plain Java object.
> > > >>>>
> > > >>>> A good example is the FilterProjectTransposeRule. What now takes
> > tens
> > > >> of
> > > >>>> lines of code in the Config subclass [1] (that you hardly could
> > write
> > > >>>> without a reference example), and ~500 LOC in the generated code
> > that
> > > >> you
> > > >>>> get through additional plugin configuration [2] in your build
> > system,
> > > >>> could
> > > >>>> have been expressed in a dozen lines of code [3] in Apache Calcite
> > > >>> 1.22.0.
> > > >>>>
> > > >>>> My question is - are we sure we are going in the right direction
> in
> > > >> terms
> > > >>>> of complexity and the entry bar for the newcomers? Wouldn't it be
> > > >> better
> > > >>> to
> > > >>>> follow the 80/20 rule, when simple rules could be easily created
> > > >>>> programmatically with no external dependencies, while more
> advanced
> > > >>>> facilities like Immutables are used only for the complex rules?
> > > >>>>
> > > >>>> Regards,
> > > >>>> Vladimir.
> > > >>>>
> > > >>>> [1]
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> > > >>>> [2]
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> > > >>>> [3]
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
> > --
> > This message contains confidential information and is intended only for
> > the
> > individuals named. If you are not the named addressee you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system. E-mail transmission cannot be
> > guaranteed to be secure or error-free as information could be
> intercepted,
> > corrupted, lost, destroyed, arrive late or incomplete, or contain
> viruses.
> > The sender therefore does not accept liability for any errors or
> omissions
> > in the contents of this message, which arise as a result of e-mail
> > transmission. If verification is required, please request a hard-copy
> > version. -Hazelcast
> >
>

Re: Changes to the rule pattern interface

Posted by Yanjing Wang <zh...@gmail.com>.
Hi, all,

I think the rule configuration is not convenient for optional operand, such
as a match operand tree
              Project
                  |
              Union
              /        \
            Filter    Filter
If I want to compose a rule that the Project operand is optional, I must
create two configuration and new two rule instance, if we can make the
operand optional, I think I just need one rule and configuration.


Viliam Durina <vi...@hazelcast.com.invalid> 于2022年4月14日周四 14:55写道:

> I can say for myself that though the migration was a nuisance, after we
> figured out what to do, it was straightforward and the current approach
> with Immutables is more readable and easier to write than the previous with
> `operand` and `operandJ`.
>
> Viliam
>
> On Wed, 13 Apr 2022 at 19:23, Julian Hyde <jh...@gmail.com> wrote:
>
> > 1. I agree with Jacques. Thanks, Vladimir, for asking the question ‘is
> > progress really progress?’. We may not back out these changes (probably
> > won’t) but it will inform future discussions. It’s valid to say ‘when
> did X
> > as part of the rule refactoring, that didn’t deliver on promises’.
> >
> > 2. Thanks, Thomas, for sending the link to the original discussion. It is
> > still my position that many more people will use existing rules than
> write
> > their own.
> >
> > 3. We have effectively created a DSL [1] (especially for operand patterns
> > [2]) embedded in Java. Java 8 is not a great language for writing DSLs:
> > Kotlin, Scala, Lisp and even later versions of Java are superior. But a
> > design option we should consider is a DSL with its own syntax and parser.
> > CockroachDB’s OptGen [3] is an example of this, and there is much to
> like.
> >
> > Compared to DSLs embedded in other languages, languages with their own
> > parser are more intuitive for users, and give better error messages. It
> is
> > also possible to support multiple versions simultaneously.
> >
> > Julian
> >
> > [1] https://en.wikipedia.org/wiki/Domain-specific_language <
> > https://en.wikipedia.org/wiki/Domain-specific_language>
> >
> > [2] https://issues.apache.org/jira/browse/CALCITE-3923 <
> > https://issues.apache.org/jira/browse/CALCITE-3923>
> >
> > [3]
> >
> https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go
> > <
> >
> https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go
> >
> >
> >
> >
> > > On Apr 13, 2022, at 9:56 AM, Jacques Nadeau <ja...@apache.org>
> wrote:
> > >
> > > Vladimir,
> > >
> > > It's always good to ask these kinds of questions. e.g. "Is progress
> > really
> > > progress?"
> > >
> > > I think moving to a configuration object was a substantial improvement.
> > > Until Java has default parameters and named parameters, I don't know of
> > > another good way to make rules sufficiently configurable for a wide
> range
> > > of users.
> > >
> > > To me, one of the main benefits of the new configuration objects is a
> > much
> > > less frequent need to subclass a rule. Back when we built original
> Drill
> > > and Dremio code, we were constantly pained by the need to control more
> > > pieces of a rule than were exposed by public constructors. When trying
> to
> > > introduce those options, we had to both: (1) update calcite and wait
> for
> > a
> > > release cycle and (2) create another constructor overload, sometimes
> > > leading to a large number of overloads. The config object almost
> entirely
> > > eliminated this challenge.
> > >
> > > I do agree that having config objects is more complex for the casual
> user
> > > so in some ways we're adding advanced functionality at the potential
> cost
> > > of usability for new users. That's not something great. I'm not sure
> it's
> > > really bad but it is something to be aware of. Having default configs
> for
> > > rules seems to make this a rather small additional challenge.
> > >
> > > For your comments on downstream projects and the examples you give: I
> > think
> > > that sounds more like advanced usage. My experience is somewhat
> different
> > > from yours: these changes have been net positive on the stuff I've been
> > > working on.
> > >
> > > Two last thoughts wrt downstream projects:
> > >
> > >   - I think it is important to separate how to create config objects
> from
> > >   the way that Calcite does it. We did change from proxies to
> immutables
> > >   within Caclite but neither has ever been the required way to
> implement
> > >   config objects. Config objects are always defined as plain java
> > interfaces.
> > >   If a particular subproject wants to expose configuration in a
> > completely
> > >   different way, I don't know of any constraint the rules interface
> > forces.
> > >   - A nice thing that was done with the initial implementation is now
> of
> > >   the configuration concepts are exposed on the RelOptRule abstract
> > class. A
> > >   user must opt in to RelRule for them to have to worry about any of
> > these
> > >   concerns.
> > >
> > >
> > > In working with the config concepts and extensibility, I do think a
> > couple
> > > of things are relatively painful. I don't
> > >
> > >   1. The definition of configs as interfaces inside the rules causes
> some
> > >   issues with static initialization order. I think Julian even
> mentioned
> > some
> > >   of this in one of the original docs config docs.
> > >   2. The generics of RelRule specifically can be challenging when
> > extended
> > >   rules in some cases due to how generics and inheritance work.
> > >   3. The class definition of the config interfaces are not
> > >   self-referencing (one of my shelved reworks actually modified to
> > >   support this) meaning that there are a lot of ugly (my subjective
> > opinion)
> > >   uses of Config.as() in the code. This last one could be resolved by
> > >   exposing the concrete builders that Immutables generates but for at
> > least
> > >   the initial changes, I wanted to avoid exposing these as new public
> > >   interfaces (and thus all the immutables classes are marked package
> > private).
> > >   4. Merging construction and use feels like an anti-pattern (my
> > >   subjective opinion). The most common patterns I've seen treat a
> builder
> > >   separate from an immutable constructed object (and have something
> akin
> > to a
> > >   toBuilder() method to take an immutable config back to a builder). In
> > the
> > >   Calcite config, these two concepts are merged. In some ways it makes
> > things
> > >   simpler for trivial cases. However,  it is less formal and causes
> pain
> > when
> > >   you have required properties that have no defaults since there is no
> > formal
> > >   model for "I'm done building, now check everything is complete". This
> > means
> > >   in several places we have to put default values that are actually
> > invalid
> > >   rather than just rely on a builder's build validation step.
> > >
> > > One other note, I think if someone is working in Java 14+ (records) or
> > > Kotlin, there are also several easy ways to produce config impls that
> are
> > > easy to use (without proxies and/or immutables).
> > >
> > >
> > >
> > >
> > > On Tue, Apr 12, 2022 at 9:29 AM Thomas Rebele
> <trebele@tibco.com.invalid
> > >
> > > wrote:
> > >
> > >> Hello,
> > >>
> > >> The reasons for the planner rules configuration can be found here:
> > >> CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923>.
> See
> > >> also
> > >> the email thread [DISCUSS] Refactor how planner rules are
> parameterized
> > >> <
> > >>
> >
> https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E
> > >>>
> > >> .
> > >>
> > >> Cordialement / Best Regards,
> > >> *Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com
> > >>
> > >>
> > >> On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray <ra...@gmail.com>
> > wrote:
> > >>
> > >>> I don't have any weight behind my opinion or experience,
> > >>> but anything that lowers the barrier to entry to Calcite for
> newcomers
> > >> is a
> > >>> huge win in my mind.
> > >>>
> > >>> I assume the reason for the changes was because codegen improved
> > >>> performance?
> > >>>
> > >>> Could it make sense to allow both options, the easy/less-performant
> way
> > >> for
> > >>> people who want to experiment and learn the ropes,
> > >>> and the codegen path for productionizing the final rules you come up
> > >> with?
> > >>>
> > >>> Or does this make matters worse, trying to support two API's
> > >>>
> > >>> On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov <pp...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hi folks,
> > >>>>
> > >>>> Rules are an essential part of the Calcite-based query optimizers. A
> > >>>> typical optimizer may require dozens of custom rules that are
> created
> > >> by
> > >>>> extending some Apache Calcite interfaces.
> > >>>>
> > >>>> During the last two years, there were two major revisions of how
> rules
> > >>> are
> > >>>> created:
> > >>>>
> > >>>>   1. In early 1.2x versions, the typical approach was to use
> > >>>>   RelOptRuleOperand with a set of helper methods in a builder-like
> > >>>>   pattern.
> > >>>>   2. Then, we switched to the runtime code generation.
> > >>>>   3. Finally, we switched to the compile-time code generation with
> the
> > >>>>   Immutables framework.
> > >>>>
> > >>>> Every such change requires the downstream projects to rewrite all
> > their
> > >>>> rules. Not only does this require time to understand the new
> approach,
> > >>> but
> > >>>> it may also compromise the correctness of the downstream optimizer
> > >>> because
> > >>>> the regression tracking in query optimizers is not trivial.
> > >>>>
> > >>>> I had the privilege to try all three approaches, and I cannot get
> rid
> > >> of
> > >>>> the feeling that every new approach is more complicated than the
> > >> previous
> > >>>> one. I understand that this is a highly subjective statement, but
> when
> > >> I
> > >>>> just started using Apache Calcite and knew very little about it, I
> was
> > >>> able
> > >>>> to write rule patterns by simply looking at the IDE JavaDoc pop-ups
> > and
> > >>>> code completion. When the RuleConfig was introduced, every new rule
> > >>> always
> > >>>> required me to look at some other rule as an example, yet it was
> > >> doable.
> > >>>> Now we also need to configure the project build system to write a
> > >> single
> > >>>> custom rule.
> > >>>>
> > >>>> At the same time, a significant fraction of the rules are pretty
> > >> simple.
> > >>>> E.g., "operator A on top of operator B". If some additional
> > >> configuration
> > >>>> is required, it could be added via plain rules fields, because at
> the
> > >> end
> > >>>> of the day the rule instance is not more than a plain Java object.
> > >>>>
> > >>>> A good example is the FilterProjectTransposeRule. What now takes
> tens
> > >> of
> > >>>> lines of code in the Config subclass [1] (that you hardly could
> write
> > >>>> without a reference example), and ~500 LOC in the generated code
> that
> > >> you
> > >>>> get through additional plugin configuration [2] in your build
> system,
> > >>> could
> > >>>> have been expressed in a dozen lines of code [3] in Apache Calcite
> > >>> 1.22.0.
> > >>>>
> > >>>> My question is - are we sure we are going in the right direction in
> > >> terms
> > >>>> of complexity and the entry bar for the newcomers? Wouldn't it be
> > >> better
> > >>> to
> > >>>> follow the 80/20 rule, when simple rules could be easily created
> > >>>> programmatically with no external dependencies, while more advanced
> > >>>> facilities like Immutables are used only for the complex rules?
> > >>>>
> > >>>> Regards,
> > >>>> Vladimir.
> > >>>>
> > >>>> [1]
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> > >>>> [2]
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> > >>>> [3]
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
> > >>>>
> > >>>
> > >>
> >
> >
>
> --
> This message contains confidential information and is intended only for
> the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast
>

Re: Changes to the rule pattern interface

Posted by Viliam Durina <vi...@hazelcast.com.INVALID>.
I can say for myself that though the migration was a nuisance, after we
figured out what to do, it was straightforward and the current approach
with Immutables is more readable and easier to write than the previous with
`operand` and `operandJ`.

Viliam

On Wed, 13 Apr 2022 at 19:23, Julian Hyde <jh...@gmail.com> wrote:

> 1. I agree with Jacques. Thanks, Vladimir, for asking the question ‘is
> progress really progress?’. We may not back out these changes (probably
> won’t) but it will inform future discussions. It’s valid to say ‘when did X
> as part of the rule refactoring, that didn’t deliver on promises’.
>
> 2. Thanks, Thomas, for sending the link to the original discussion. It is
> still my position that many more people will use existing rules than write
> their own.
>
> 3. We have effectively created a DSL [1] (especially for operand patterns
> [2]) embedded in Java. Java 8 is not a great language for writing DSLs:
> Kotlin, Scala, Lisp and even later versions of Java are superior. But a
> design option we should consider is a DSL with its own syntax and parser.
> CockroachDB’s OptGen [3] is an example of this, and there is much to like.
>
> Compared to DSLs embedded in other languages, languages with their own
> parser are more intuitive for users, and give better error messages. It is
> also possible to support multiple versions simultaneously.
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Domain-specific_language <
> https://en.wikipedia.org/wiki/Domain-specific_language>
>
> [2] https://issues.apache.org/jira/browse/CALCITE-3923 <
> https://issues.apache.org/jira/browse/CALCITE-3923>
>
> [3]
> https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go
> <
> https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go>
>
>
>
> > On Apr 13, 2022, at 9:56 AM, Jacques Nadeau <ja...@apache.org> wrote:
> >
> > Vladimir,
> >
> > It's always good to ask these kinds of questions. e.g. "Is progress
> really
> > progress?"
> >
> > I think moving to a configuration object was a substantial improvement.
> > Until Java has default parameters and named parameters, I don't know of
> > another good way to make rules sufficiently configurable for a wide range
> > of users.
> >
> > To me, one of the main benefits of the new configuration objects is a
> much
> > less frequent need to subclass a rule. Back when we built original Drill
> > and Dremio code, we were constantly pained by the need to control more
> > pieces of a rule than were exposed by public constructors. When trying to
> > introduce those options, we had to both: (1) update calcite and wait for
> a
> > release cycle and (2) create another constructor overload, sometimes
> > leading to a large number of overloads. The config object almost entirely
> > eliminated this challenge.
> >
> > I do agree that having config objects is more complex for the casual user
> > so in some ways we're adding advanced functionality at the potential cost
> > of usability for new users. That's not something great. I'm not sure it's
> > really bad but it is something to be aware of. Having default configs for
> > rules seems to make this a rather small additional challenge.
> >
> > For your comments on downstream projects and the examples you give: I
> think
> > that sounds more like advanced usage. My experience is somewhat different
> > from yours: these changes have been net positive on the stuff I've been
> > working on.
> >
> > Two last thoughts wrt downstream projects:
> >
> >   - I think it is important to separate how to create config objects from
> >   the way that Calcite does it. We did change from proxies to immutables
> >   within Caclite but neither has ever been the required way to implement
> >   config objects. Config objects are always defined as plain java
> interfaces.
> >   If a particular subproject wants to expose configuration in a
> completely
> >   different way, I don't know of any constraint the rules interface
> forces.
> >   - A nice thing that was done with the initial implementation is now of
> >   the configuration concepts are exposed on the RelOptRule abstract
> class. A
> >   user must opt in to RelRule for them to have to worry about any of
> these
> >   concerns.
> >
> >
> > In working with the config concepts and extensibility, I do think a
> couple
> > of things are relatively painful. I don't
> >
> >   1. The definition of configs as interfaces inside the rules causes some
> >   issues with static initialization order. I think Julian even mentioned
> some
> >   of this in one of the original docs config docs.
> >   2. The generics of RelRule specifically can be challenging when
> extended
> >   rules in some cases due to how generics and inheritance work.
> >   3. The class definition of the config interfaces are not
> >   self-referencing (one of my shelved reworks actually modified to
> >   support this) meaning that there are a lot of ugly (my subjective
> opinion)
> >   uses of Config.as() in the code. This last one could be resolved by
> >   exposing the concrete builders that Immutables generates but for at
> least
> >   the initial changes, I wanted to avoid exposing these as new public
> >   interfaces (and thus all the immutables classes are marked package
> private).
> >   4. Merging construction and use feels like an anti-pattern (my
> >   subjective opinion). The most common patterns I've seen treat a builder
> >   separate from an immutable constructed object (and have something akin
> to a
> >   toBuilder() method to take an immutable config back to a builder). In
> the
> >   Calcite config, these two concepts are merged. In some ways it makes
> things
> >   simpler for trivial cases. However,  it is less formal and causes pain
> when
> >   you have required properties that have no defaults since there is no
> formal
> >   model for "I'm done building, now check everything is complete". This
> means
> >   in several places we have to put default values that are actually
> invalid
> >   rather than just rely on a builder's build validation step.
> >
> > One other note, I think if someone is working in Java 14+ (records) or
> > Kotlin, there are also several easy ways to produce config impls that are
> > easy to use (without proxies and/or immutables).
> >
> >
> >
> >
> > On Tue, Apr 12, 2022 at 9:29 AM Thomas Rebele <trebele@tibco.com.invalid
> >
> > wrote:
> >
> >> Hello,
> >>
> >> The reasons for the planner rules configuration can be found here:
> >> CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923>. See
> >> also
> >> the email thread [DISCUSS] Refactor how planner rules are parameterized
> >> <
> >>
> https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E
> >>>
> >> .
> >>
> >> Cordialement / Best Regards,
> >> *Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com
> >>
> >>
> >> On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray <ra...@gmail.com>
> wrote:
> >>
> >>> I don't have any weight behind my opinion or experience,
> >>> but anything that lowers the barrier to entry to Calcite for newcomers
> >> is a
> >>> huge win in my mind.
> >>>
> >>> I assume the reason for the changes was because codegen improved
> >>> performance?
> >>>
> >>> Could it make sense to allow both options, the easy/less-performant way
> >> for
> >>> people who want to experiment and learn the ropes,
> >>> and the codegen path for productionizing the final rules you come up
> >> with?
> >>>
> >>> Or does this make matters worse, trying to support two API's
> >>>
> >>> On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov <pp...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi folks,
> >>>>
> >>>> Rules are an essential part of the Calcite-based query optimizers. A
> >>>> typical optimizer may require dozens of custom rules that are created
> >> by
> >>>> extending some Apache Calcite interfaces.
> >>>>
> >>>> During the last two years, there were two major revisions of how rules
> >>> are
> >>>> created:
> >>>>
> >>>>   1. In early 1.2x versions, the typical approach was to use
> >>>>   RelOptRuleOperand with a set of helper methods in a builder-like
> >>>>   pattern.
> >>>>   2. Then, we switched to the runtime code generation.
> >>>>   3. Finally, we switched to the compile-time code generation with the
> >>>>   Immutables framework.
> >>>>
> >>>> Every such change requires the downstream projects to rewrite all
> their
> >>>> rules. Not only does this require time to understand the new approach,
> >>> but
> >>>> it may also compromise the correctness of the downstream optimizer
> >>> because
> >>>> the regression tracking in query optimizers is not trivial.
> >>>>
> >>>> I had the privilege to try all three approaches, and I cannot get rid
> >> of
> >>>> the feeling that every new approach is more complicated than the
> >> previous
> >>>> one. I understand that this is a highly subjective statement, but when
> >> I
> >>>> just started using Apache Calcite and knew very little about it, I was
> >>> able
> >>>> to write rule patterns by simply looking at the IDE JavaDoc pop-ups
> and
> >>>> code completion. When the RuleConfig was introduced, every new rule
> >>> always
> >>>> required me to look at some other rule as an example, yet it was
> >> doable.
> >>>> Now we also need to configure the project build system to write a
> >> single
> >>>> custom rule.
> >>>>
> >>>> At the same time, a significant fraction of the rules are pretty
> >> simple.
> >>>> E.g., "operator A on top of operator B". If some additional
> >> configuration
> >>>> is required, it could be added via plain rules fields, because at the
> >> end
> >>>> of the day the rule instance is not more than a plain Java object.
> >>>>
> >>>> A good example is the FilterProjectTransposeRule. What now takes tens
> >> of
> >>>> lines of code in the Config subclass [1] (that you hardly could write
> >>>> without a reference example), and ~500 LOC in the generated code that
> >> you
> >>>> get through additional plugin configuration [2] in your build system,
> >>> could
> >>>> have been expressed in a dozen lines of code [3] in Apache Calcite
> >>> 1.22.0.
> >>>>
> >>>> My question is - are we sure we are going in the right direction in
> >> terms
> >>>> of complexity and the entry bar for the newcomers? Wouldn't it be
> >> better
> >>> to
> >>>> follow the 80/20 rule, when simple rules could be easily created
> >>>> programmatically with no external dependencies, while more advanced
> >>>> facilities like Immutables are used only for the complex rules?
> >>>>
> >>>> Regards,
> >>>> Vladimir.
> >>>>
> >>>> [1]
> >>>>
> >>>>
> >>>
> >>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> >>>> [2]
> >>>>
> >>>>
> >>>
> >>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> >>>> [3]
> >>>>
> >>>>
> >>>
> >>
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
> >>>>
> >>>
> >>
>
>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Re: Changes to the rule pattern interface

Posted by Julian Hyde <jh...@gmail.com>.
1. I agree with Jacques. Thanks, Vladimir, for asking the question ‘is progress really progress?’. We may not back out these changes (probably won’t) but it will inform future discussions. It’s valid to say ‘when did X as part of the rule refactoring, that didn’t deliver on promises’.

2. Thanks, Thomas, for sending the link to the original discussion. It is still my position that many more people will use existing rules than write their own.

3. We have effectively created a DSL [1] (especially for operand patterns [2]) embedded in Java. Java 8 is not a great language for writing DSLs: Kotlin, Scala, Lisp and even later versions of Java are superior. But a design option we should consider is a DSL with its own syntax and parser. CockroachDB’s OptGen [3] is an example of this, and there is much to like.

Compared to DSLs embedded in other languages, languages with their own parser are more intuitive for users, and give better error messages. It is also possible to support multiple versions simultaneously. 

Julian

[1] https://en.wikipedia.org/wiki/Domain-specific_language <https://en.wikipedia.org/wiki/Domain-specific_language> 

[2] https://issues.apache.org/jira/browse/CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923>

[3] https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go <https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optgen/lang/doc.go> 


> On Apr 13, 2022, at 9:56 AM, Jacques Nadeau <ja...@apache.org> wrote:
> 
> Vladimir,
> 
> It's always good to ask these kinds of questions. e.g. "Is progress really
> progress?"
> 
> I think moving to a configuration object was a substantial improvement.
> Until Java has default parameters and named parameters, I don't know of
> another good way to make rules sufficiently configurable for a wide range
> of users.
> 
> To me, one of the main benefits of the new configuration objects is a much
> less frequent need to subclass a rule. Back when we built original Drill
> and Dremio code, we were constantly pained by the need to control more
> pieces of a rule than were exposed by public constructors. When trying to
> introduce those options, we had to both: (1) update calcite and wait for a
> release cycle and (2) create another constructor overload, sometimes
> leading to a large number of overloads. The config object almost entirely
> eliminated this challenge.
> 
> I do agree that having config objects is more complex for the casual user
> so in some ways we're adding advanced functionality at the potential cost
> of usability for new users. That's not something great. I'm not sure it's
> really bad but it is something to be aware of. Having default configs for
> rules seems to make this a rather small additional challenge.
> 
> For your comments on downstream projects and the examples you give: I think
> that sounds more like advanced usage. My experience is somewhat different
> from yours: these changes have been net positive on the stuff I've been
> working on.
> 
> Two last thoughts wrt downstream projects:
> 
>   - I think it is important to separate how to create config objects from
>   the way that Calcite does it. We did change from proxies to immutables
>   within Caclite but neither has ever been the required way to implement
>   config objects. Config objects are always defined as plain java interfaces.
>   If a particular subproject wants to expose configuration in a completely
>   different way, I don't know of any constraint the rules interface forces.
>   - A nice thing that was done with the initial implementation is now of
>   the configuration concepts are exposed on the RelOptRule abstract class. A
>   user must opt in to RelRule for them to have to worry about any of these
>   concerns.
> 
> 
> In working with the config concepts and extensibility, I do think a couple
> of things are relatively painful. I don't
> 
>   1. The definition of configs as interfaces inside the rules causes some
>   issues with static initialization order. I think Julian even mentioned some
>   of this in one of the original docs config docs.
>   2. The generics of RelRule specifically can be challenging when extended
>   rules in some cases due to how generics and inheritance work.
>   3. The class definition of the config interfaces are not
>   self-referencing (one of my shelved reworks actually modified to
>   support this) meaning that there are a lot of ugly (my subjective opinion)
>   uses of Config.as() in the code. This last one could be resolved by
>   exposing the concrete builders that Immutables generates but for at least
>   the initial changes, I wanted to avoid exposing these as new public
>   interfaces (and thus all the immutables classes are marked package private).
>   4. Merging construction and use feels like an anti-pattern (my
>   subjective opinion). The most common patterns I've seen treat a builder
>   separate from an immutable constructed object (and have something akin to a
>   toBuilder() method to take an immutable config back to a builder). In the
>   Calcite config, these two concepts are merged. In some ways it makes things
>   simpler for trivial cases. However,  it is less formal and causes pain when
>   you have required properties that have no defaults since there is no formal
>   model for "I'm done building, now check everything is complete". This means
>   in several places we have to put default values that are actually invalid
>   rather than just rely on a builder's build validation step.
> 
> One other note, I think if someone is working in Java 14+ (records) or
> Kotlin, there are also several easy ways to produce config impls that are
> easy to use (without proxies and/or immutables).
> 
> 
> 
> 
> On Tue, Apr 12, 2022 at 9:29 AM Thomas Rebele <tr...@tibco.com.invalid>
> wrote:
> 
>> Hello,
>> 
>> The reasons for the planner rules configuration can be found here:
>> CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923>. See
>> also
>> the email thread [DISCUSS] Refactor how planner rules are parameterized
>> <
>> https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E
>>> 
>> .
>> 
>> Cordialement / Best Regards,
>> *Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com
>> 
>> 
>> On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray <ra...@gmail.com> wrote:
>> 
>>> I don't have any weight behind my opinion or experience,
>>> but anything that lowers the barrier to entry to Calcite for newcomers
>> is a
>>> huge win in my mind.
>>> 
>>> I assume the reason for the changes was because codegen improved
>>> performance?
>>> 
>>> Could it make sense to allow both options, the easy/less-performant way
>> for
>>> people who want to experiment and learn the ropes,
>>> and the codegen path for productionizing the final rules you come up
>> with?
>>> 
>>> Or does this make matters worse, trying to support two API's
>>> 
>>> On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov <pp...@gmail.com>
>>> wrote:
>>> 
>>>> Hi folks,
>>>> 
>>>> Rules are an essential part of the Calcite-based query optimizers. A
>>>> typical optimizer may require dozens of custom rules that are created
>> by
>>>> extending some Apache Calcite interfaces.
>>>> 
>>>> During the last two years, there were two major revisions of how rules
>>> are
>>>> created:
>>>> 
>>>>   1. In early 1.2x versions, the typical approach was to use
>>>>   RelOptRuleOperand with a set of helper methods in a builder-like
>>>>   pattern.
>>>>   2. Then, we switched to the runtime code generation.
>>>>   3. Finally, we switched to the compile-time code generation with the
>>>>   Immutables framework.
>>>> 
>>>> Every such change requires the downstream projects to rewrite all their
>>>> rules. Not only does this require time to understand the new approach,
>>> but
>>>> it may also compromise the correctness of the downstream optimizer
>>> because
>>>> the regression tracking in query optimizers is not trivial.
>>>> 
>>>> I had the privilege to try all three approaches, and I cannot get rid
>> of
>>>> the feeling that every new approach is more complicated than the
>> previous
>>>> one. I understand that this is a highly subjective statement, but when
>> I
>>>> just started using Apache Calcite and knew very little about it, I was
>>> able
>>>> to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
>>>> code completion. When the RuleConfig was introduced, every new rule
>>> always
>>>> required me to look at some other rule as an example, yet it was
>> doable.
>>>> Now we also need to configure the project build system to write a
>> single
>>>> custom rule.
>>>> 
>>>> At the same time, a significant fraction of the rules are pretty
>> simple.
>>>> E.g., "operator A on top of operator B". If some additional
>> configuration
>>>> is required, it could be added via plain rules fields, because at the
>> end
>>>> of the day the rule instance is not more than a plain Java object.
>>>> 
>>>> A good example is the FilterProjectTransposeRule. What now takes tens
>> of
>>>> lines of code in the Config subclass [1] (that you hardly could write
>>>> without a reference example), and ~500 LOC in the generated code that
>> you
>>>> get through additional plugin configuration [2] in your build system,
>>> could
>>>> have been expressed in a dozen lines of code [3] in Apache Calcite
>>> 1.22.0.
>>>> 
>>>> My question is - are we sure we are going in the right direction in
>> terms
>>>> of complexity and the entry bar for the newcomers? Wouldn't it be
>> better
>>> to
>>>> follow the 80/20 rule, when simple rules could be easily created
>>>> programmatically with no external dependencies, while more advanced
>>>> facilities like Immutables are used only for the complex rules?
>>>> 
>>>> Regards,
>>>> Vladimir.
>>>> 
>>>> [1]
>>>> 
>>>> 
>>> 
>> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
>>>> [2]
>>>> 
>>>> 
>>> 
>> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
>>>> [3]
>>>> 
>>>> 
>>> 
>> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
>>>> 
>>> 
>> 


Re: Changes to the rule pattern interface

Posted by Jacques Nadeau <ja...@apache.org>.
Vladimir,

It's always good to ask these kinds of questions. e.g. "Is progress really
progress?"

I think moving to a configuration object was a substantial improvement.
Until Java has default parameters and named parameters, I don't know of
another good way to make rules sufficiently configurable for a wide range
of users.

To me, one of the main benefits of the new configuration objects is a much
less frequent need to subclass a rule. Back when we built original Drill
and Dremio code, we were constantly pained by the need to control more
pieces of a rule than were exposed by public constructors. When trying to
introduce those options, we had to both: (1) update calcite and wait for a
release cycle and (2) create another constructor overload, sometimes
leading to a large number of overloads. The config object almost entirely
eliminated this challenge.

I do agree that having config objects is more complex for the casual user
so in some ways we're adding advanced functionality at the potential cost
of usability for new users. That's not something great. I'm not sure it's
really bad but it is something to be aware of. Having default configs for
rules seems to make this a rather small additional challenge.

For your comments on downstream projects and the examples you give: I think
that sounds more like advanced usage. My experience is somewhat different
from yours: these changes have been net positive on the stuff I've been
working on.

Two last thoughts wrt downstream projects:

   - I think it is important to separate how to create config objects from
   the way that Calcite does it. We did change from proxies to immutables
   within Caclite but neither has ever been the required way to implement
   config objects. Config objects are always defined as plain java interfaces.
   If a particular subproject wants to expose configuration in a completely
   different way, I don't know of any constraint the rules interface forces.
   - A nice thing that was done with the initial implementation is now of
   the configuration concepts are exposed on the RelOptRule abstract class. A
   user must opt in to RelRule for them to have to worry about any of these
   concerns.


In working with the config concepts and extensibility, I do think a couple
of things are relatively painful. I don't

   1. The definition of configs as interfaces inside the rules causes some
   issues with static initialization order. I think Julian even mentioned some
   of this in one of the original docs config docs.
   2. The generics of RelRule specifically can be challenging when extended
   rules in some cases due to how generics and inheritance work.
   3. The class definition of the config interfaces are not
   self-referencing (one of my shelved reworks actually modified to
   support this) meaning that there are a lot of ugly (my subjective opinion)
   uses of Config.as() in the code. This last one could be resolved by
   exposing the concrete builders that Immutables generates but for at least
   the initial changes, I wanted to avoid exposing these as new public
   interfaces (and thus all the immutables classes are marked package private).
   4. Merging construction and use feels like an anti-pattern (my
   subjective opinion). The most common patterns I've seen treat a builder
   separate from an immutable constructed object (and have something akin to a
   toBuilder() method to take an immutable config back to a builder). In the
   Calcite config, these two concepts are merged. In some ways it makes things
   simpler for trivial cases. However,  it is less formal and causes pain when
   you have required properties that have no defaults since there is no formal
   model for "I'm done building, now check everything is complete". This means
   in several places we have to put default values that are actually invalid
   rather than just rely on a builder's build validation step.

One other note, I think if someone is working in Java 14+ (records) or
Kotlin, there are also several easy ways to produce config impls that are
easy to use (without proxies and/or immutables).




On Tue, Apr 12, 2022 at 9:29 AM Thomas Rebele <tr...@tibco.com.invalid>
wrote:

> Hello,
>
> The reasons for the planner rules configuration can be found here:
> CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923>. See
> also
> the email thread [DISCUSS] Refactor how planner rules are parameterized
> <
> https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E
> >
> .
>
> Cordialement / Best Regards,
> *Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com
>
>
> On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray <ra...@gmail.com> wrote:
>
> > I don't have any weight behind my opinion or experience,
> > but anything that lowers the barrier to entry to Calcite for newcomers
> is a
> > huge win in my mind.
> >
> > I assume the reason for the changes was because codegen improved
> > performance?
> >
> > Could it make sense to allow both options, the easy/less-performant way
> for
> > people who want to experiment and learn the ropes,
> > and the codegen path for productionizing the final rules you come up
> with?
> >
> > Or does this make matters worse, trying to support two API's
> >
> > On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov <pp...@gmail.com>
> > wrote:
> >
> > > Hi folks,
> > >
> > > Rules are an essential part of the Calcite-based query optimizers. A
> > > typical optimizer may require dozens of custom rules that are created
> by
> > > extending some Apache Calcite interfaces.
> > >
> > > During the last two years, there were two major revisions of how rules
> > are
> > > created:
> > >
> > >    1. In early 1.2x versions, the typical approach was to use
> > >    RelOptRuleOperand with a set of helper methods in a builder-like
> > >    pattern.
> > >    2. Then, we switched to the runtime code generation.
> > >    3. Finally, we switched to the compile-time code generation with the
> > >    Immutables framework.
> > >
> > > Every such change requires the downstream projects to rewrite all their
> > > rules. Not only does this require time to understand the new approach,
> > but
> > > it may also compromise the correctness of the downstream optimizer
> > because
> > > the regression tracking in query optimizers is not trivial.
> > >
> > > I had the privilege to try all three approaches, and I cannot get rid
> of
> > > the feeling that every new approach is more complicated than the
> previous
> > > one. I understand that this is a highly subjective statement, but when
> I
> > > just started using Apache Calcite and knew very little about it, I was
> > able
> > > to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
> > > code completion. When the RuleConfig was introduced, every new rule
> > always
> > > required me to look at some other rule as an example, yet it was
> doable.
> > > Now we also need to configure the project build system to write a
> single
> > > custom rule.
> > >
> > > At the same time, a significant fraction of the rules are pretty
> simple.
> > > E.g., "operator A on top of operator B". If some additional
> configuration
> > > is required, it could be added via plain rules fields, because at the
> end
> > > of the day the rule instance is not more than a plain Java object.
> > >
> > > A good example is the FilterProjectTransposeRule. What now takes tens
> of
> > > lines of code in the Config subclass [1] (that you hardly could write
> > > without a reference example), and ~500 LOC in the generated code that
> you
> > > get through additional plugin configuration [2] in your build system,
> > could
> > > have been expressed in a dozen lines of code [3] in Apache Calcite
> > 1.22.0.
> > >
> > > My question is - are we sure we are going in the right direction in
> terms
> > > of complexity and the entry bar for the newcomers? Wouldn't it be
> better
> > to
> > > follow the 80/20 rule, when simple rules could be easily created
> > > programmatically with no external dependencies, while more advanced
> > > facilities like Immutables are used only for the complex rules?
> > >
> > > Regards,
> > > Vladimir.
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> > > [2]
> > >
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> > > [3]
> > >
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
> > >
> >
>

Re: Changes to the rule pattern interface

Posted by Thomas Rebele <tr...@tibco.com.INVALID>.
Hello,

The reasons for the planner rules configuration can be found here:
CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923>. See also
the email thread [DISCUSS] Refactor how planner rules are parameterized
<https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E>
.

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray <ra...@gmail.com> wrote:

> I don't have any weight behind my opinion or experience,
> but anything that lowers the barrier to entry to Calcite for newcomers is a
> huge win in my mind.
>
> I assume the reason for the changes was because codegen improved
> performance?
>
> Could it make sense to allow both options, the easy/less-performant way for
> people who want to experiment and learn the ropes,
> and the codegen path for productionizing the final rules you come up with?
>
> Or does this make matters worse, trying to support two API's
>
> On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov <pp...@gmail.com>
> wrote:
>
> > Hi folks,
> >
> > Rules are an essential part of the Calcite-based query optimizers. A
> > typical optimizer may require dozens of custom rules that are created by
> > extending some Apache Calcite interfaces.
> >
> > During the last two years, there were two major revisions of how rules
> are
> > created:
> >
> >    1. In early 1.2x versions, the typical approach was to use
> >    RelOptRuleOperand with a set of helper methods in a builder-like
> >    pattern.
> >    2. Then, we switched to the runtime code generation.
> >    3. Finally, we switched to the compile-time code generation with the
> >    Immutables framework.
> >
> > Every such change requires the downstream projects to rewrite all their
> > rules. Not only does this require time to understand the new approach,
> but
> > it may also compromise the correctness of the downstream optimizer
> because
> > the regression tracking in query optimizers is not trivial.
> >
> > I had the privilege to try all three approaches, and I cannot get rid of
> > the feeling that every new approach is more complicated than the previous
> > one. I understand that this is a highly subjective statement, but when I
> > just started using Apache Calcite and knew very little about it, I was
> able
> > to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
> > code completion. When the RuleConfig was introduced, every new rule
> always
> > required me to look at some other rule as an example, yet it was doable.
> > Now we also need to configure the project build system to write a single
> > custom rule.
> >
> > At the same time, a significant fraction of the rules are pretty simple.
> > E.g., "operator A on top of operator B". If some additional configuration
> > is required, it could be added via plain rules fields, because at the end
> > of the day the rule instance is not more than a plain Java object.
> >
> > A good example is the FilterProjectTransposeRule. What now takes tens of
> > lines of code in the Config subclass [1] (that you hardly could write
> > without a reference example), and ~500 LOC in the generated code that you
> > get through additional plugin configuration [2] in your build system,
> could
> > have been expressed in a dozen lines of code [3] in Apache Calcite
> 1.22.0.
> >
> > My question is - are we sure we are going in the right direction in terms
> > of complexity and the entry bar for the newcomers? Wouldn't it be better
> to
> > follow the 80/20 rule, when simple rules could be easily created
> > programmatically with no external dependencies, while more advanced
> > facilities like Immutables are used only for the complex rules?
> >
> > Regards,
> > Vladimir.
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> > [3]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
> >
>

Re: Changes to the rule pattern interface

Posted by Gavin Ray <ra...@gmail.com>.
I don't have any weight behind my opinion or experience,
but anything that lowers the barrier to entry to Calcite for newcomers is a
huge win in my mind.

I assume the reason for the changes was because codegen improved
performance?

Could it make sense to allow both options, the easy/less-performant way for
people who want to experiment and learn the ropes,
and the codegen path for productionizing the final rules you come up with?

Or does this make matters worse, trying to support two API's

On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov <pp...@gmail.com> wrote:

> Hi folks,
>
> Rules are an essential part of the Calcite-based query optimizers. A
> typical optimizer may require dozens of custom rules that are created by
> extending some Apache Calcite interfaces.
>
> During the last two years, there were two major revisions of how rules are
> created:
>
>    1. In early 1.2x versions, the typical approach was to use
>    RelOptRuleOperand with a set of helper methods in a builder-like
>    pattern.
>    2. Then, we switched to the runtime code generation.
>    3. Finally, we switched to the compile-time code generation with the
>    Immutables framework.
>
> Every such change requires the downstream projects to rewrite all their
> rules. Not only does this require time to understand the new approach, but
> it may also compromise the correctness of the downstream optimizer because
> the regression tracking in query optimizers is not trivial.
>
> I had the privilege to try all three approaches, and I cannot get rid of
> the feeling that every new approach is more complicated than the previous
> one. I understand that this is a highly subjective statement, but when I
> just started using Apache Calcite and knew very little about it, I was able
> to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
> code completion. When the RuleConfig was introduced, every new rule always
> required me to look at some other rule as an example, yet it was doable.
> Now we also need to configure the project build system to write a single
> custom rule.
>
> At the same time, a significant fraction of the rules are pretty simple.
> E.g., "operator A on top of operator B". If some additional configuration
> is required, it could be added via plain rules fields, because at the end
> of the day the rule instance is not more than a plain Java object.
>
> A good example is the FilterProjectTransposeRule. What now takes tens of
> lines of code in the Config subclass [1] (that you hardly could write
> without a reference example), and ~500 LOC in the generated code that you
> get through additional plugin configuration [2] in your build system, could
> have been expressed in a dozen lines of code [3] in Apache Calcite 1.22.0.
>
> My question is - are we sure we are going in the right direction in terms
> of complexity and the entry bar for the newcomers? Wouldn't it be better to
> follow the 80/20 rule, when simple rules could be easily created
> programmatically with no external dependencies, while more advanced
> facilities like Immutables are used only for the complex rules?
>
> Regards,
> Vladimir.
>
> [1]
>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> [2]
>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> [3]
>
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
>