You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2015/09/23 02:14:43 UTC

Transitioning rules from ProjectFactory etc. to RelBuilder

I have started work on
https://issues.apache.org/jira/browse/CALCITE-828 and my
work-in-progress is in
https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder.

This will be a big change to teams such as Hive and Drill. Now would
be a good time to chime in.

The plan is that every rule instance has a ProtoRelBuilder field from
which a RelBuilder can be created to be used in an onMatch method. The
RelBuilder contains factories for every kind of RelNode, so we won't
have to keep plumbing new factories in to existing rules.

There will be other advantages. RelBuilder is an easier and more
concise way of creating relational expressions. It does some useful
stuff for free, like flattening ANDs and eliminating identity
projections. I'm seeing the volume of code decrease and a few minor
plan improvements.

I haven't yet found a good way to deal with the pattern where if, say,
ProjectFactory is null the rule is to use Project.copy to create a
Project of the same type.

I'm keeping & deprecating the old rule constructors that take a
variety of XxxFactory arguments.

As always, we try to keep API compatibility and mark the old API

  @Deprecated // to be removed before 2.0

but the deprecated APIs are no longer tested and are likely to be
flaky. We strongly suggest that you fix calls to deprecated APIs as
soon as you upgrade to a more recent version of Calcite.

Julian

Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Jesus Camacho Rodriguez <jc...@hortonworks.com>.
I created the pull request at
https://github.com/apache/incubator-calcite/pull/139

I will take a look at it later today.

Thanks,
Jesús


On 9/28/15, 7:57 AM, "Julian Hyde" <jh...@apache.org> wrote:

>I have completed work on this change. Can someone please review and +1?
>
>https://github.com/julianhyde/incubator-calcite/tree/828-rule-builder
>
>https://issues.apache.org/jira/browse/CALCITE-828
>
>Julian
>
>> On Sep 25, 2015, at 1:07 PM, Julian Hyde <jh...@apache.org> wrote:
>> 
>> The problem of making rules extensible is not straightforward. The
>>solution has to be a mixture of API changes and best practices. In this
>>change I am making API changes and fixing up the existing rules to
>>match, but I am not revisiting the rules to introduce best practices.
>> 
>> The API change is that every RelOptRule has a relBuilderFactory. We
>>strongly recommend that if a rule needs to create a relational
>>expression it either matches the type of its inputs (calling the copy
>>method) or it uses the factory. That means removing calls in rules to
>>methods such as LogicalProject.create or RelOptUtil.project. I¹ve done
>>quite a lot of this as part of this change.
>> 
>> The best practice is that a rule¹s constructor should expose
>>RelBuilderFactory and Class parameters so that the rule can be re-used.
>>Quite a few rules do this, but a lot do not. Quite a few rules are
>>re-usable in principle, but have not been tested against other RelNode
>>sub-classes, and quite a few rules are just not re-usable. So I propose
>>that we enforce best practices when reviewing contributions but don¹t
>>spend a lot of effort upgrading the existing rules.
>> 
>> In some cases (e.g. WindowedAggRelSplitter in ProjectToWindowRule) the
>>rule uses the default factory but I¹ve changed the code to use a
>>RelBuilder. So, there¹s no change in behavior, but if someone were to
>>add a RelBuilderFactory to the rule¹s constructor, they would reap the
>>benefit.
>> 
>> Julian
>> 
>> 
>> 
>>> On Sep 24, 2015, at 6:29 AM, Jesus Camacho Rodriguez
>>><jc...@hortonworks.com> wrote:
>>> 
>>> In our case, we had to do some copy-pasting of rules in the past in
>>>Hive.
>>> Although the reasons have been already discussed in the thread, just to
>>> confirm:
>>> - Rules that use a default factory to create operators e.g.
>>> LogicalProject, instead of being able to provide our own factory. An
>>> example is AggregateProjectMergeRule. (Instead of copy/pasting the
>>>rule,
>>> we could apply another rule to transform LogicalProject into
>>>HiveProject,
>>> but this is not the right way either).
>>> - Rules that match directly on Logical instances of the operators. An
>>> example is SortProjectTransposeRule.
>>> 
>>> I see that with the solutions that we are discussing, those two
>>>problems
>>> would be solved.
>>> 
>>> About including the matching classes in RelBuilder, it can be a good
>>>idea
>>> if we still make the rules extensible i.e. we can provide our own
>>>matching
>>> classes if we want to. Assume a project that might subclass the Project
>>> operator into two different operators, then would like to match a
>>>certain
>>> rule only to one of them. Further, I guess using the copy methods
>>>provided
>>> by the operators when possible, instead of calling directly to the
>>> Factory, would make the rules more extensible for these cases.
>>> 
>>> --
>>> Jesús
>>> 
>>> 
>>> On 9/23/15, 11:08 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:
>>> 
>>>> I did a quick check of Drill side rule, and did not find the
>>>>copy/paste
>>>> code
>>>> simply because we have to specify the rule's operands.
>>>> 
>>>> The reason I'm asking this is because many Calcite rules use core
>>>> Filter.class,
>>>> Project.class etc.  Currently, Drill simply uses some of Calcite rules
>>>> directly.
>>>> However, that means the rule will fire against both LogicalFilter, and
>>>> DrillFilter,
>>>> which seems to be redundant.  I understand we could extend Calcite
>>>>rule to
>>>> make sure it matches only one of them. But I'm looking for a bit more
>>>> direct way
>>>> to do that.
>>>> 
>>>> For instance, in [2] we uses Calcite's FilterMergeRule directly, which
>>>> would match
>>>> both LogicalFilter and DrillFilter.
>>>> 
>>>> Also, I feel the rule patten specification can be regarded as the
>>>> input type for the
>>>> rule, while RelBuilder is to address the output type from the rule's
>>>> firing. That's why
>>>> I feel they had better be specified together in some way.
>>>> 
>>>> 
>>>> [1] 
>>>> 
>>>>https://github.com/apache/incubator-calcite/blob/master/core/src/main/j
>>>>ava
>>>> /org/apache/calcite/rel/rules/FilterMergeRule.java#L46
>>>> 
>>>> [2] 
>>>> 
>>>>https://github.com/apache/drill/blob/master/exec/java-exec/src/main/jav
>>>>a/o
>>>> rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121
>>>> 
>>>> On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <ja...@apache.org>
>>>> wrote:
>>>>> I think this would be a good place to have a little bit more design,
>>>>> discussion and validation. Julian, it seems like you've done a lot of
>>>>> thinking and it would be good to have a better understanding of the
>>>>> choices
>>>>> you're making. (Your response below seems to be just scratching the
>>>>> surface).
>>>>> 
>>>>> If the goal is to minimize copy and paste rules, it would be useful
>>>>>to
>>>>> test
>>>>> the proposal against example rules that have been copy/pasted and see
>>>>> what
>>>>> other patterns might emerge. At first glance, I'm inclined more with
>>>>> Jinfeng's proposal regarding operands than yours. Your decision about
>>>>> copy/not behavior should also be discussed further.
>>>>> 
>>>>> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example
>>>>>set
>>>>> of
>>>>> rules that we've needed to copy/paste due to constraints around rule
>>>>> configuration to make sure the outcome of these changes is as
>>>>>desired?
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <jh...@apache.org>
>>>>>wrote:
>>>>> 
>>>>>> Great question.
>>>>>> 
>>>>>> For that case, I think you should carry on making a separate rule
>>>>>> instance with operands that contain classes. RelBuilder is a good
>>>>>> single point to put a lot of the configuration, but it is taking it
>>>>>> too far to put the target classes in there. I don't want its role as
>>>>>> "point for configuration for rules" to overwhelm its main role "an
>>>>>> easy way to build relational algebra expressions".
>>>>>> 
>>>>>> And besides, operands are powerful and descriptive, and because the
>>>>>> engine understands them it can short-cut which rules are fired.
>>>>>> 
>>>>>> For what it's worth, there's another piece of configuration that I
>>>>>> considered putting into RelBuilder but decided not to. That is,
>>>>>> whether a rule instance should call RelNode.copy to automatically
>>>>>> create RelNodes the same type as its inputs. See I introduced
>>>>>> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b0
>>>>>>6a4
>>>>>> 
>>>>>>cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules
>>>>>>/Fi
>>>>>> lterProjectTransposeRule.java#L67
>>>>>> 
>>>>>> rather than introduce new methods RelBuilder.shouldCopyProject() and
>>>>>> .shouldCopyFilter().
>>>>>> 
>>>>>> And by the way I also considered adding methods to RelBuilderFactory
>>>>>> (formerly known as ProtoRelBuilder). Adding methods there would not
>>>>>> "pollute" RelBuilder as badly.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <ji...@gmail.com>
>>>>>> wrote:
>>>>>>> +1.
>>>>>>> 
>>>>>>> I have one question.  RelBuilder seems to be used in onMatch()
>>>>>> method,
>>>>>>> when a particular type of RelNode has to be created.  What about
>>>>>>> RelOptRuleOperand in the constructor of rule? Let's say if we want
>>>>>>>to
>>>>>>> rule to match only DrillProject, or HiveFilter. The current way
>>>>>> seems to
>>>>>>> be that we extend the rule by providing a different
>>>>>> RelOptRuleOperand.
>>>>>>> 
>>>>>>> Can RelBuilder also have getFilterClass(), getProjectClass() etc,
>>>>>> which
>>>>>>> would be used to specify the rule patten matching?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau
>>>>>>><ja...@apache.org>
>>>>>> wrote:
>>>>>>>> Agree on the utility of this. +1.
>>>>>>>> 
>>>>>>>> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>>>>>>>> jpullokkaran@hortonworks.com> wrote:
>>>>>>>> 
>>>>>>>>> This is useful, though Hive will have to do some refactoring.
>>>>>>>>> 
>>>>>>>>> In past Hive had to copy bunch of rules to work around these
>>>>>> issues.
>>>>>>>>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good
>>>>>> example of
>>>>>>>>> this.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> John
>>>>>>>>> 
>>>>>>>>> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
>>>>>>>>> 
>>>>>>>>>> I have started work on
>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-828 and my
>>>>>>>>>> work-in-progress is in
>>>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>https://github.com/julianhyde/incubator-calcite/commits/828-rule-buil
>>>>>>der
>>>>>> .
>>>>>>>>>> 
>>>>>>>>>> This will be a big change to teams such as Hive and Drill. Now
>>>>>> would
>>>>>>>>>> be a good time to chime in.
>>>>>>>>>> 
>>>>>>>>>> The plan is that every rule instance has a ProtoRelBuilder field
>>>>>> from
>>>>>>>>>> which a RelBuilder can be created to be used in an onMatch
>>>>>> method. The
>>>>>>>>>> RelBuilder contains factories for every kind of RelNode, so we
>>>>>> won't
>>>>>>>>>> have to keep plumbing new factories in to existing rules.
>>>>>>>>>> 
>>>>>>>>>> There will be other advantages. RelBuilder is an easier and more
>>>>>>>>>> concise way of creating relational expressions. It does some
>>>>>> useful
>>>>>>>>>> stuff for free, like flattening ANDs and eliminating identity
>>>>>>>>>> projections. I'm seeing the volume of code decrease and a few
>>>>>> minor
>>>>>>>>>> plan improvements.
>>>>>>>>>> 
>>>>>>>>>> I haven't yet found a good way to deal with the pattern where
>>>>>>>>>>if,
>>>>>> say,
>>>>>>>>>> ProjectFactory is null the rule is to use Project.copy to
>>>>>>>>>>create a
>>>>>>>>>> Project of the same type.
>>>>>>>>>> 
>>>>>>>>>> I'm keeping & deprecating the old rule constructors that take a
>>>>>>>>>> variety of XxxFactory arguments.
>>>>>>>>>> 
>>>>>>>>>> As always, we try to keep API compatibility and mark the old API
>>>>>>>>>> 
>>>>>>>>>> @Deprecated // to be removed before 2.0
>>>>>>>>>> 
>>>>>>>>>> but the deprecated APIs are no longer tested and are likely to
>>>>>>>>>>be
>>>>>>>>>> flaky. We strongly suggest that you fix calls to deprecated APIs
>>>>>> as
>>>>>>>>>> soon as you upgrade to a more recent version of Calcite.
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>
>


Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Julian Hyde <jh...@apache.org>.
I have completed work on this change. Can someone please review and +1?

https://github.com/julianhyde/incubator-calcite/tree/828-rule-builder

https://issues.apache.org/jira/browse/CALCITE-828

Julian

> On Sep 25, 2015, at 1:07 PM, Julian Hyde <jh...@apache.org> wrote:
> 
> The problem of making rules extensible is not straightforward. The solution has to be a mixture of API changes and best practices. In this change I am making API changes and fixing up the existing rules to match, but I am not revisiting the rules to introduce best practices.
> 
> The API change is that every RelOptRule has a relBuilderFactory. We strongly recommend that if a rule needs to create a relational expression it either matches the type of its inputs (calling the copy method) or it uses the factory. That means removing calls in rules to methods such as LogicalProject.create or RelOptUtil.project. I’ve done quite a lot of this as part of this change.
> 
> The best practice is that a rule’s constructor should expose RelBuilderFactory and Class parameters so that the rule can be re-used. Quite a few rules do this, but a lot do not. Quite a few rules are re-usable in principle, but have not been tested against other RelNode sub-classes, and quite a few rules are just not re-usable. So I propose that we enforce best practices when reviewing contributions but don’t spend a lot of effort upgrading the existing rules.
> 
> In some cases (e.g. WindowedAggRelSplitter in ProjectToWindowRule) the rule uses the default factory but I’ve changed the code to use a RelBuilder. So, there’s no change in behavior, but if someone were to add a RelBuilderFactory to the rule’s constructor, they would reap the benefit.
> 
> Julian
> 
> 
> 
>> On Sep 24, 2015, at 6:29 AM, Jesus Camacho Rodriguez <jc...@hortonworks.com> wrote:
>> 
>> In our case, we had to do some copy-pasting of rules in the past in Hive.
>> Although the reasons have been already discussed in the thread, just to
>> confirm:
>> - Rules that use a default factory to create operators e.g.
>> LogicalProject, instead of being able to provide our own factory. An
>> example is AggregateProjectMergeRule. (Instead of copy/pasting the rule,
>> we could apply another rule to transform LogicalProject into HiveProject,
>> but this is not the right way either).
>> - Rules that match directly on Logical instances of the operators. An
>> example is SortProjectTransposeRule.
>> 
>> I see that with the solutions that we are discussing, those two problems
>> would be solved.
>> 
>> About including the matching classes in RelBuilder, it can be a good idea
>> if we still make the rules extensible i.e. we can provide our own matching
>> classes if we want to. Assume a project that might subclass the Project
>> operator into two different operators, then would like to match a certain
>> rule only to one of them. Further, I guess using the copy methods provided
>> by the operators when possible, instead of calling directly to the
>> Factory, would make the rules more extensible for these cases.
>> 
>> --
>> Jesús
>> 
>> 
>> On 9/23/15, 11:08 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:
>> 
>>> I did a quick check of Drill side rule, and did not find the copy/paste
>>> code
>>> simply because we have to specify the rule's operands.
>>> 
>>> The reason I'm asking this is because many Calcite rules use core
>>> Filter.class,
>>> Project.class etc.  Currently, Drill simply uses some of Calcite rules
>>> directly.
>>> However, that means the rule will fire against both LogicalFilter, and
>>> DrillFilter,
>>> which seems to be redundant.  I understand we could extend Calcite rule to
>>> make sure it matches only one of them. But I'm looking for a bit more
>>> direct way
>>> to do that.
>>> 
>>> For instance, in [2] we uses Calcite's FilterMergeRule directly, which
>>> would match
>>> both LogicalFilter and DrillFilter.
>>> 
>>> Also, I feel the rule patten specification can be regarded as the
>>> input type for the
>>> rule, while RelBuilder is to address the output type from the rule's
>>> firing. That's why
>>> I feel they had better be specified together in some way.
>>> 
>>> 
>>> [1] 
>>> https://github.com/apache/incubator-calcite/blob/master/core/src/main/java
>>> /org/apache/calcite/rel/rules/FilterMergeRule.java#L46
>>> 
>>> [2] 
>>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/o
>>> rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121
>>> 
>>> On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>>> I think this would be a good place to have a little bit more design,
>>>> discussion and validation. Julian, it seems like you've done a lot of
>>>> thinking and it would be good to have a better understanding of the
>>>> choices
>>>> you're making. (Your response below seems to be just scratching the
>>>> surface).
>>>> 
>>>> If the goal is to minimize copy and paste rules, it would be useful to
>>>> test
>>>> the proposal against example rules that have been copy/pasted and see
>>>> what
>>>> other patterns might emerge. At first glance, I'm inclined more with
>>>> Jinfeng's proposal regarding operands than yours. Your decision about
>>>> copy/not behavior should also be discussed further.
>>>> 
>>>> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set
>>>> of
>>>> rules that we've needed to copy/paste due to constraints around rule
>>>> configuration to make sure the outcome of these changes is as desired?
>>>> 
>>>> 
>>>> 
>>>> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> Great question.
>>>>> 
>>>>> For that case, I think you should carry on making a separate rule
>>>>> instance with operands that contain classes. RelBuilder is a good
>>>>> single point to put a lot of the configuration, but it is taking it
>>>>> too far to put the target classes in there. I don't want its role as
>>>>> "point for configuration for rules" to overwhelm its main role "an
>>>>> easy way to build relational algebra expressions".
>>>>> 
>>>>> And besides, operands are powerful and descriptive, and because the
>>>>> engine understands them it can short-cut which rules are fired.
>>>>> 
>>>>> For what it's worth, there's another piece of configuration that I
>>>>> considered putting into RelBuilder but decided not to. That is,
>>>>> whether a rule instance should call RelNode.copy to automatically
>>>>> create RelNodes the same type as its inputs. See I introduced
>>>>> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>>>>> 
>>>>> 
>>>>> 
>>>>> https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4
>>>>> cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/Fi
>>>>> lterProjectTransposeRule.java#L67
>>>>> 
>>>>> rather than introduce new methods RelBuilder.shouldCopyProject() and
>>>>> .shouldCopyFilter().
>>>>> 
>>>>> And by the way I also considered adding methods to RelBuilderFactory
>>>>> (formerly known as ProtoRelBuilder). Adding methods there would not
>>>>> "pollute" RelBuilder as badly.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <ji...@gmail.com>
>>>>> wrote:
>>>>>> +1.
>>>>>> 
>>>>>> I have one question.  RelBuilder seems to be used in onMatch()
>>>>> method,
>>>>>> when a particular type of RelNode has to be created.  What about
>>>>>> RelOptRuleOperand in the constructor of rule? Let's say if we want to
>>>>>> rule to match only DrillProject, or HiveFilter. The current way
>>>>> seems to
>>>>>> be that we extend the rule by providing a different
>>>>> RelOptRuleOperand.
>>>>>> 
>>>>>> Can RelBuilder also have getFilterClass(), getProjectClass() etc,
>>>>> which
>>>>>> would be used to specify the rule patten matching?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <ja...@apache.org>
>>>>> wrote:
>>>>>>> Agree on the utility of this. +1.
>>>>>>> 
>>>>>>> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>>>>>>> jpullokkaran@hortonworks.com> wrote:
>>>>>>> 
>>>>>>>> This is useful, though Hive will have to do some refactoring.
>>>>>>>> 
>>>>>>>> In past Hive had to copy bunch of rules to work around these
>>>>> issues.
>>>>>>>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good
>>>>> example of
>>>>>>>> this.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> John
>>>>>>>> 
>>>>>>>> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
>>>>>>>> 
>>>>>>>>> I have started work on
>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-828 and my
>>>>>>>>> work-in-progress is in
>>>>>>>>> 
>>>>> 
>>>>> https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder
>>>>> .
>>>>>>>>> 
>>>>>>>>> This will be a big change to teams such as Hive and Drill. Now
>>>>> would
>>>>>>>>> be a good time to chime in.
>>>>>>>>> 
>>>>>>>>> The plan is that every rule instance has a ProtoRelBuilder field
>>>>> from
>>>>>>>>> which a RelBuilder can be created to be used in an onMatch
>>>>> method. The
>>>>>>>>> RelBuilder contains factories for every kind of RelNode, so we
>>>>> won't
>>>>>>>>> have to keep plumbing new factories in to existing rules.
>>>>>>>>> 
>>>>>>>>> There will be other advantages. RelBuilder is an easier and more
>>>>>>>>> concise way of creating relational expressions. It does some
>>>>> useful
>>>>>>>>> stuff for free, like flattening ANDs and eliminating identity
>>>>>>>>> projections. I'm seeing the volume of code decrease and a few
>>>>> minor
>>>>>>>>> plan improvements.
>>>>>>>>> 
>>>>>>>>> I haven't yet found a good way to deal with the pattern where if,
>>>>> say,
>>>>>>>>> ProjectFactory is null the rule is to use Project.copy to create a
>>>>>>>>> Project of the same type.
>>>>>>>>> 
>>>>>>>>> I'm keeping & deprecating the old rule constructors that take a
>>>>>>>>> variety of XxxFactory arguments.
>>>>>>>>> 
>>>>>>>>> As always, we try to keep API compatibility and mark the old API
>>>>>>>>> 
>>>>>>>>> @Deprecated // to be removed before 2.0
>>>>>>>>> 
>>>>>>>>> but the deprecated APIs are no longer tested and are likely to be
>>>>>>>>> flaky. We strongly suggest that you fix calls to deprecated APIs
>>>>> as
>>>>>>>>> soon as you upgrade to a more recent version of Calcite.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> 
>> 
> 


Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Julian Hyde <jh...@apache.org>.
The problem of making rules extensible is not straightforward. The solution has to be a mixture of API changes and best practices. In this change I am making API changes and fixing up the existing rules to match, but I am not revisiting the rules to introduce best practices.

The API change is that every RelOptRule has a relBuilderFactory. We strongly recommend that if a rule needs to create a relational expression it either matches the type of its inputs (calling the copy method) or it uses the factory. That means removing calls in rules to methods such as LogicalProject.create or RelOptUtil.project. I’ve done quite a lot of this as part of this change.

The best practice is that a rule’s constructor should expose RelBuilderFactory and Class parameters so that the rule can be re-used. Quite a few rules do this, but a lot do not. Quite a few rules are re-usable in principle, but have not been tested against other RelNode sub-classes, and quite a few rules are just not re-usable. So I propose that we enforce best practices when reviewing contributions but don’t spend a lot of effort upgrading the existing rules.

In some cases (e.g. WindowedAggRelSplitter in ProjectToWindowRule) the rule uses the default factory but I’ve changed the code to use a RelBuilder. So, there’s no change in behavior, but if someone were to add a RelBuilderFactory to the rule’s constructor, they would reap the benefit.

Julian


 
> On Sep 24, 2015, at 6:29 AM, Jesus Camacho Rodriguez <jc...@hortonworks.com> wrote:
> 
> In our case, we had to do some copy-pasting of rules in the past in Hive.
> Although the reasons have been already discussed in the thread, just to
> confirm:
> - Rules that use a default factory to create operators e.g.
> LogicalProject, instead of being able to provide our own factory. An
> example is AggregateProjectMergeRule. (Instead of copy/pasting the rule,
> we could apply another rule to transform LogicalProject into HiveProject,
> but this is not the right way either).
> - Rules that match directly on Logical instances of the operators. An
> example is SortProjectTransposeRule.
> 
> I see that with the solutions that we are discussing, those two problems
> would be solved.
> 
> About including the matching classes in RelBuilder, it can be a good idea
> if we still make the rules extensible i.e. we can provide our own matching
> classes if we want to. Assume a project that might subclass the Project
> operator into two different operators, then would like to match a certain
> rule only to one of them. Further, I guess using the copy methods provided
> by the operators when possible, instead of calling directly to the
> Factory, would make the rules more extensible for these cases.
> 
> --
> Jesús
> 
> 
> On 9/23/15, 11:08 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:
> 
>> I did a quick check of Drill side rule, and did not find the copy/paste
>> code
>> simply because we have to specify the rule's operands.
>> 
>> The reason I'm asking this is because many Calcite rules use core
>> Filter.class,
>> Project.class etc.  Currently, Drill simply uses some of Calcite rules
>> directly.
>> However, that means the rule will fire against both LogicalFilter, and
>> DrillFilter,
>> which seems to be redundant.  I understand we could extend Calcite rule to
>> make sure it matches only one of them. But I'm looking for a bit more
>> direct way
>> to do that.
>> 
>> For instance, in [2] we uses Calcite's FilterMergeRule directly, which
>> would match
>> both LogicalFilter and DrillFilter.
>> 
>> Also, I feel the rule patten specification can be regarded as the
>> input type for the
>> rule, while RelBuilder is to address the output type from the rule's
>> firing. That's why
>> I feel they had better be specified together in some way.
>> 
>> 
>> [1] 
>> https://github.com/apache/incubator-calcite/blob/master/core/src/main/java
>> /org/apache/calcite/rel/rules/FilterMergeRule.java#L46
>> 
>> [2] 
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/o
>> rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121
>> 
>> On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>> I think this would be a good place to have a little bit more design,
>>> discussion and validation. Julian, it seems like you've done a lot of
>>> thinking and it would be good to have a better understanding of the
>>> choices
>>> you're making. (Your response below seems to be just scratching the
>>> surface).
>>> 
>>> If the goal is to minimize copy and paste rules, it would be useful to
>>> test
>>> the proposal against example rules that have been copy/pasted and see
>>> what
>>> other patterns might emerge. At first glance, I'm inclined more with
>>> Jinfeng's proposal regarding operands than yours. Your decision about
>>> copy/not behavior should also be discussed further.
>>> 
>>> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set
>>> of
>>> rules that we've needed to copy/paste due to constraints around rule
>>> configuration to make sure the outcome of these changes is as desired?
>>> 
>>> 
>>> 
>>> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> Great question.
>>>> 
>>>> For that case, I think you should carry on making a separate rule
>>>> instance with operands that contain classes. RelBuilder is a good
>>>> single point to put a lot of the configuration, but it is taking it
>>>> too far to put the target classes in there. I don't want its role as
>>>> "point for configuration for rules" to overwhelm its main role "an
>>>> easy way to build relational algebra expressions".
>>>> 
>>>> And besides, operands are powerful and descriptive, and because the
>>>> engine understands them it can short-cut which rules are fired.
>>>> 
>>>> For what it's worth, there's another piece of configuration that I
>>>> considered putting into RelBuilder but decided not to. That is,
>>>> whether a rule instance should call RelNode.copy to automatically
>>>> create RelNodes the same type as its inputs. See I introduced
>>>> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>>>> 
>>>> 
>>>> 
>>>> https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4
>>>> cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/Fi
>>>> lterProjectTransposeRule.java#L67
>>>> 
>>>> rather than introduce new methods RelBuilder.shouldCopyProject() and
>>>> .shouldCopyFilter().
>>>> 
>>>> And by the way I also considered adding methods to RelBuilderFactory
>>>> (formerly known as ProtoRelBuilder). Adding methods there would not
>>>> "pollute" RelBuilder as badly.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <ji...@gmail.com>
>>>> wrote:
>>>>> +1.
>>>>> 
>>>>> I have one question.  RelBuilder seems to be used in onMatch()
>>>> method,
>>>>> when a particular type of RelNode has to be created.  What about
>>>>> RelOptRuleOperand in the constructor of rule? Let's say if we want to
>>>>> rule to match only DrillProject, or HiveFilter. The current way
>>>> seems to
>>>>> be that we extend the rule by providing a different
>>>> RelOptRuleOperand.
>>>>> 
>>>>> Can RelBuilder also have getFilterClass(), getProjectClass() etc,
>>>> which
>>>>> would be used to specify the rule patten matching?
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <ja...@apache.org>
>>>> wrote:
>>>>>> Agree on the utility of this. +1.
>>>>>> 
>>>>>> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>>>>>> jpullokkaran@hortonworks.com> wrote:
>>>>>> 
>>>>>>> This is useful, though Hive will have to do some refactoring.
>>>>>>> 
>>>>>>> In past Hive had to copy bunch of rules to work around these
>>>> issues.
>>>>>>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good
>>>> example of
>>>>>>> this.
>>>>>>> 
>>>>>>> 
>>>>>>> John
>>>>>>> 
>>>>>>> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
>>>>>>> 
>>>>>>>> I have started work on
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-828 and my
>>>>>>>> work-in-progress is in
>>>>>>>> 
>>>> 
>>>> https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder
>>>> .
>>>>>>>> 
>>>>>>>> This will be a big change to teams such as Hive and Drill. Now
>>>> would
>>>>>>>> be a good time to chime in.
>>>>>>>> 
>>>>>>>> The plan is that every rule instance has a ProtoRelBuilder field
>>>> from
>>>>>>>> which a RelBuilder can be created to be used in an onMatch
>>>> method. The
>>>>>>>> RelBuilder contains factories for every kind of RelNode, so we
>>>> won't
>>>>>>>> have to keep plumbing new factories in to existing rules.
>>>>>>>> 
>>>>>>>> There will be other advantages. RelBuilder is an easier and more
>>>>>>>> concise way of creating relational expressions. It does some
>>>> useful
>>>>>>>> stuff for free, like flattening ANDs and eliminating identity
>>>>>>>> projections. I'm seeing the volume of code decrease and a few
>>>> minor
>>>>>>>> plan improvements.
>>>>>>>> 
>>>>>>>> I haven't yet found a good way to deal with the pattern where if,
>>>> say,
>>>>>>>> ProjectFactory is null the rule is to use Project.copy to create a
>>>>>>>> Project of the same type.
>>>>>>>> 
>>>>>>>> I'm keeping & deprecating the old rule constructors that take a
>>>>>>>> variety of XxxFactory arguments.
>>>>>>>> 
>>>>>>>> As always, we try to keep API compatibility and mark the old API
>>>>>>>> 
>>>>>>>> @Deprecated // to be removed before 2.0
>>>>>>>> 
>>>>>>>> but the deprecated APIs are no longer tested and are likely to be
>>>>>>>> flaky. We strongly suggest that you fix calls to deprecated APIs
>>>> as
>>>>>>>> soon as you upgrade to a more recent version of Calcite.
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>> 
>> 
> 


Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Jesus Camacho Rodriguez <jc...@hortonworks.com>.
In our case, we had to do some copy-pasting of rules in the past in Hive.
Although the reasons have been already discussed in the thread, just to
confirm:
- Rules that use a default factory to create operators e.g.
LogicalProject, instead of being able to provide our own factory. An
example is AggregateProjectMergeRule. (Instead of copy/pasting the rule,
we could apply another rule to transform LogicalProject into HiveProject,
but this is not the right way either).
- Rules that match directly on Logical instances of the operators. An
example is SortProjectTransposeRule.

I see that with the solutions that we are discussing, those two problems
would be solved.

About including the matching classes in RelBuilder, it can be a good idea
if we still make the rules extensible i.e. we can provide our own matching
classes if we want to. Assume a project that might subclass the Project
operator into two different operators, then would like to match a certain
rule only to one of them. Further, I guess using the copy methods provided
by the operators when possible, instead of calling directly to the
Factory, would make the rules more extensible for these cases.

--
Jesús


On 9/23/15, 11:08 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:

>I did a quick check of Drill side rule, and did not find the copy/paste
>code
>simply because we have to specify the rule's operands.
>
>The reason I'm asking this is because many Calcite rules use core
>Filter.class,
>Project.class etc.  Currently, Drill simply uses some of Calcite rules
>directly.
>However, that means the rule will fire against both LogicalFilter, and
>DrillFilter,
>which seems to be redundant.  I understand we could extend Calcite rule to
>make sure it matches only one of them. But I'm looking for a bit more
>direct way
>to do that.
>
>For instance, in [2] we uses Calcite's FilterMergeRule directly, which
>would match
>both LogicalFilter and DrillFilter.
>
>Also, I feel the rule patten specification can be regarded as the
>input type for the
>rule, while RelBuilder is to address the output type from the rule's
>firing. That's why
>I feel they had better be specified together in some way.
>
>
>[1] 
>https://github.com/apache/incubator-calcite/blob/master/core/src/main/java
>/org/apache/calcite/rel/rules/FilterMergeRule.java#L46
>
>[2] 
>https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/o
>rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121
>
>On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <ja...@apache.org>
>wrote:
>> I think this would be a good place to have a little bit more design,
>> discussion and validation. Julian, it seems like you've done a lot of
>> thinking and it would be good to have a better understanding of the
>>choices
>> you're making. (Your response below seems to be just scratching the
>> surface).
>>
>> If the goal is to minimize copy and paste rules, it would be useful to
>>test
>> the proposal against example rules that have been copy/pasted and see
>>what
>> other patterns might emerge. At first glance, I'm inclined more with
>> Jinfeng's proposal regarding operands than yours. Your decision about
>> copy/not behavior should also be discussed further.
>>
>> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set
>>of
>> rules that we've needed to copy/paste due to constraints around rule
>> configuration to make sure the outcome of these changes is as desired?
>>
>>
>>
>> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <jh...@apache.org> wrote:
>>
>>> Great question.
>>>
>>> For that case, I think you should carry on making a separate rule
>>> instance with operands that contain classes. RelBuilder is a good
>>> single point to put a lot of the configuration, but it is taking it
>>> too far to put the target classes in there. I don't want its role as
>>> "point for configuration for rules" to overwhelm its main role "an
>>> easy way to build relational algebra expressions".
>>>
>>> And besides, operands are powerful and descriptive, and because the
>>> engine understands them it can short-cut which rules are fired.
>>>
>>> For what it's worth, there's another piece of configuration that I
>>> considered putting into RelBuilder but decided not to. That is,
>>> whether a rule instance should call RelNode.copy to automatically
>>> create RelNodes the same type as its inputs. See I introduced
>>> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>>>
>>>
>>> 
>>>https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4
>>>cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/Fi
>>>lterProjectTransposeRule.java#L67
>>>
>>> rather than introduce new methods RelBuilder.shouldCopyProject() and
>>> .shouldCopyFilter().
>>>
>>> And by the way I also considered adding methods to RelBuilderFactory
>>> (formerly known as ProtoRelBuilder). Adding methods there would not
>>> "pollute" RelBuilder as badly.
>>>
>>> Julian
>>>
>>>
>>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <ji...@gmail.com>
>>> wrote:
>>> > +1.
>>> >
>>> > I have one question.  RelBuilder seems to be used in onMatch()
>>>method,
>>> > when a particular type of RelNode has to be created.  What about
>>> > RelOptRuleOperand in the constructor of rule? Let's say if we want to
>>> > rule to match only DrillProject, or HiveFilter. The current way
>>>seems to
>>> > be that we extend the rule by providing a different
>>>RelOptRuleOperand.
>>> >
>>> > Can RelBuilder also have getFilterClass(), getProjectClass() etc,
>>>which
>>> > would be used to specify the rule patten matching?
>>> >
>>> >
>>> >
>>> > On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>> >> Agree on the utility of this. +1.
>>> >>
>>> >> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>>> >> jpullokkaran@hortonworks.com> wrote:
>>> >>
>>> >>> This is useful, though Hive will have to do some refactoring.
>>> >>>
>>> >>> In past Hive had to copy bunch of rules to work around these
>>>issues.
>>> >>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good
>>> example of
>>> >>> this.
>>> >>>
>>> >>>
>>> >>> John
>>> >>>
>>> >>> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
>>> >>>
>>> >>> >I have started work on
>>> >>> >https://issues.apache.org/jira/browse/CALCITE-828 and my
>>> >>> >work-in-progress is in
>>> >>> >
>>> 
>>>https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder
>>>.
>>> >>> >
>>> >>> >This will be a big change to teams such as Hive and Drill. Now
>>>would
>>> >>> >be a good time to chime in.
>>> >>> >
>>> >>> >The plan is that every rule instance has a ProtoRelBuilder field
>>>from
>>> >>> >which a RelBuilder can be created to be used in an onMatch
>>>method. The
>>> >>> >RelBuilder contains factories for every kind of RelNode, so we
>>>won't
>>> >>> >have to keep plumbing new factories in to existing rules.
>>> >>> >
>>> >>> >There will be other advantages. RelBuilder is an easier and more
>>> >>> >concise way of creating relational expressions. It does some
>>>useful
>>> >>> >stuff for free, like flattening ANDs and eliminating identity
>>> >>> >projections. I'm seeing the volume of code decrease and a few
>>>minor
>>> >>> >plan improvements.
>>> >>> >
>>> >>> >I haven't yet found a good way to deal with the pattern where if,
>>>say,
>>> >>> >ProjectFactory is null the rule is to use Project.copy to create a
>>> >>> >Project of the same type.
>>> >>> >
>>> >>> >I'm keeping & deprecating the old rule constructors that take a
>>> >>> >variety of XxxFactory arguments.
>>> >>> >
>>> >>> >As always, we try to keep API compatibility and mark the old API
>>> >>> >
>>> >>> >  @Deprecated // to be removed before 2.0
>>> >>> >
>>> >>> >but the deprecated APIs are no longer tested and are likely to be
>>> >>> >flaky. We strongly suggest that you fix calls to deprecated APIs
>>>as
>>> >>> >soon as you upgrade to a more recent version of Calcite.
>>> >>> >
>>> >>> >Julian
>>> >>> >
>>> >>>
>>> >>>
>>>
>


Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Jinfeng Ni <ji...@gmail.com>.
I did a quick check of Drill side rule, and did not find the copy/paste code
simply because we have to specify the rule's operands.

The reason I'm asking this is because many Calcite rules use core Filter.class,
Project.class etc.  Currently, Drill simply uses some of Calcite rules directly.
However, that means the rule will fire against both LogicalFilter, and
DrillFilter,
which seems to be redundant.  I understand we could extend Calcite rule to
make sure it matches only one of them. But I'm looking for a bit more direct way
to do that.

For instance, in [2] we uses Calcite's FilterMergeRule directly, which
would match
both LogicalFilter and DrillFilter.

Also, I feel the rule patten specification can be regarded as the
input type for the
rule, while RelBuilder is to address the output type from the rule's
firing. That's why
I feel they had better be specified together in some way.


[1] https://github.com/apache/incubator-calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/FilterMergeRule.java#L46

[2] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java#L121

On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <ja...@apache.org> wrote:
> I think this would be a good place to have a little bit more design,
> discussion and validation. Julian, it seems like you've done a lot of
> thinking and it would be good to have a better understanding of the choices
> you're making. (Your response below seems to be just scratching the
> surface).
>
> If the goal is to minimize copy and paste rules, it would be useful to test
> the proposal against example rules that have been copy/pasted and see what
> other patterns might emerge. At first glance, I'm inclined more with
> Jinfeng's proposal regarding operands than yours. Your decision about
> copy/not behavior should also be discussed further.
>
> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set of
> rules that we've needed to copy/paste due to constraints around rule
> configuration to make sure the outcome of these changes is as desired?
>
>
>
> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> Great question.
>>
>> For that case, I think you should carry on making a separate rule
>> instance with operands that contain classes. RelBuilder is a good
>> single point to put a lot of the configuration, but it is taking it
>> too far to put the target classes in there. I don't want its role as
>> "point for configuration for rules" to overwhelm its main role "an
>> easy way to build relational algebra expressions".
>>
>> And besides, operands are powerful and descriptive, and because the
>> engine understands them it can short-cut which rules are fired.
>>
>> For what it's worth, there's another piece of configuration that I
>> considered putting into RelBuilder but decided not to. That is,
>> whether a rule instance should call RelNode.copy to automatically
>> create RelNodes the same type as its inputs. See I introduced
>> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>>
>>
>> https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L67
>>
>> rather than introduce new methods RelBuilder.shouldCopyProject() and
>> .shouldCopyFilter().
>>
>> And by the way I also considered adding methods to RelBuilderFactory
>> (formerly known as ProtoRelBuilder). Adding methods there would not
>> "pollute" RelBuilder as badly.
>>
>> Julian
>>
>>
>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <ji...@gmail.com>
>> wrote:
>> > +1.
>> >
>> > I have one question.  RelBuilder seems to be used in onMatch() method,
>> > when a particular type of RelNode has to be created.  What about
>> > RelOptRuleOperand in the constructor of rule? Let's say if we want to
>> > rule to match only DrillProject, or HiveFilter. The current way seems to
>> > be that we extend the rule by providing a different RelOptRuleOperand.
>> >
>> > Can RelBuilder also have getFilterClass(), getProjectClass() etc, which
>> > would be used to specify the rule patten matching?
>> >
>> >
>> >
>> > On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>> >> Agree on the utility of this. +1.
>> >>
>> >> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>> >> jpullokkaran@hortonworks.com> wrote:
>> >>
>> >>> This is useful, though Hive will have to do some refactoring.
>> >>>
>> >>> In past Hive had to copy bunch of rules to work around these issues.
>> >>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good
>> example of
>> >>> this.
>> >>>
>> >>>
>> >>> John
>> >>>
>> >>> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
>> >>>
>> >>> >I have started work on
>> >>> >https://issues.apache.org/jira/browse/CALCITE-828 and my
>> >>> >work-in-progress is in
>> >>> >
>> https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder.
>> >>> >
>> >>> >This will be a big change to teams such as Hive and Drill. Now would
>> >>> >be a good time to chime in.
>> >>> >
>> >>> >The plan is that every rule instance has a ProtoRelBuilder field from
>> >>> >which a RelBuilder can be created to be used in an onMatch method. The
>> >>> >RelBuilder contains factories for every kind of RelNode, so we won't
>> >>> >have to keep plumbing new factories in to existing rules.
>> >>> >
>> >>> >There will be other advantages. RelBuilder is an easier and more
>> >>> >concise way of creating relational expressions. It does some useful
>> >>> >stuff for free, like flattening ANDs and eliminating identity
>> >>> >projections. I'm seeing the volume of code decrease and a few minor
>> >>> >plan improvements.
>> >>> >
>> >>> >I haven't yet found a good way to deal with the pattern where if, say,
>> >>> >ProjectFactory is null the rule is to use Project.copy to create a
>> >>> >Project of the same type.
>> >>> >
>> >>> >I'm keeping & deprecating the old rule constructors that take a
>> >>> >variety of XxxFactory arguments.
>> >>> >
>> >>> >As always, we try to keep API compatibility and mark the old API
>> >>> >
>> >>> >  @Deprecated // to be removed before 2.0
>> >>> >
>> >>> >but the deprecated APIs are no longer tested and are likely to be
>> >>> >flaky. We strongly suggest that you fix calls to deprecated APIs as
>> >>> >soon as you upgrade to a more recent version of Calcite.
>> >>> >
>> >>> >Julian
>> >>> >
>> >>>
>> >>>
>>

Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Jacques Nadeau <ja...@apache.org>.
I think this would be a good place to have a little bit more design,
discussion and validation. Julian, it seems like you've done a lot of
thinking and it would be good to have a better understanding of the choices
you're making. (Your response below seems to be just scratching the
surface).

If the goal is to minimize copy and paste rules, it would be useful to test
the proposal against example rules that have been copy/pasted and see what
other patterns might emerge. At first glance, I'm inclined more with
Jinfeng's proposal regarding operands than yours. Your decision about
copy/not behavior should also be discussed further.

Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set of
rules that we've needed to copy/paste due to constraints around rule
configuration to make sure the outcome of these changes is as desired?



On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <jh...@apache.org> wrote:

> Great question.
>
> For that case, I think you should carry on making a separate rule
> instance with operands that contain classes. RelBuilder is a good
> single point to put a lot of the configuration, but it is taking it
> too far to put the target classes in there. I don't want its role as
> "point for configuration for rules" to overwhelm its main role "an
> easy way to build relational algebra expressions".
>
> And besides, operands are powerful and descriptive, and because the
> engine understands them it can short-cut which rules are fired.
>
> For what it's worth, there's another piece of configuration that I
> considered putting into RelBuilder but decided not to. That is,
> whether a rule instance should call RelNode.copy to automatically
> create RelNodes the same type as its inputs. See I introduced
> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>
>
> https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L67
>
> rather than introduce new methods RelBuilder.shouldCopyProject() and
> .shouldCopyFilter().
>
> And by the way I also considered adding methods to RelBuilderFactory
> (formerly known as ProtoRelBuilder). Adding methods there would not
> "pollute" RelBuilder as badly.
>
> Julian
>
>
> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <ji...@gmail.com>
> wrote:
> > +1.
> >
> > I have one question.  RelBuilder seems to be used in onMatch() method,
> > when a particular type of RelNode has to be created.  What about
> > RelOptRuleOperand in the constructor of rule? Let's say if we want to
> > rule to match only DrillProject, or HiveFilter. The current way seems to
> > be that we extend the rule by providing a different RelOptRuleOperand.
> >
> > Can RelBuilder also have getFilterClass(), getProjectClass() etc, which
> > would be used to specify the rule patten matching?
> >
> >
> >
> > On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >> Agree on the utility of this. +1.
> >>
> >> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
> >> jpullokkaran@hortonworks.com> wrote:
> >>
> >>> This is useful, though Hive will have to do some refactoring.
> >>>
> >>> In past Hive had to copy bunch of rules to work around these issues.
> >>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good
> example of
> >>> this.
> >>>
> >>>
> >>> John
> >>>
> >>> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
> >>>
> >>> >I have started work on
> >>> >https://issues.apache.org/jira/browse/CALCITE-828 and my
> >>> >work-in-progress is in
> >>> >
> https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder.
> >>> >
> >>> >This will be a big change to teams such as Hive and Drill. Now would
> >>> >be a good time to chime in.
> >>> >
> >>> >The plan is that every rule instance has a ProtoRelBuilder field from
> >>> >which a RelBuilder can be created to be used in an onMatch method. The
> >>> >RelBuilder contains factories for every kind of RelNode, so we won't
> >>> >have to keep plumbing new factories in to existing rules.
> >>> >
> >>> >There will be other advantages. RelBuilder is an easier and more
> >>> >concise way of creating relational expressions. It does some useful
> >>> >stuff for free, like flattening ANDs and eliminating identity
> >>> >projections. I'm seeing the volume of code decrease and a few minor
> >>> >plan improvements.
> >>> >
> >>> >I haven't yet found a good way to deal with the pattern where if, say,
> >>> >ProjectFactory is null the rule is to use Project.copy to create a
> >>> >Project of the same type.
> >>> >
> >>> >I'm keeping & deprecating the old rule constructors that take a
> >>> >variety of XxxFactory arguments.
> >>> >
> >>> >As always, we try to keep API compatibility and mark the old API
> >>> >
> >>> >  @Deprecated // to be removed before 2.0
> >>> >
> >>> >but the deprecated APIs are no longer tested and are likely to be
> >>> >flaky. We strongly suggest that you fix calls to deprecated APIs as
> >>> >soon as you upgrade to a more recent version of Calcite.
> >>> >
> >>> >Julian
> >>> >
> >>>
> >>>
>

Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Julian Hyde <jh...@apache.org>.
Great question.

For that case, I think you should carry on making a separate rule
instance with operands that contain classes. RelBuilder is a good
single point to put a lot of the configuration, but it is taking it
too far to put the target classes in there. I don't want its role as
"point for configuration for rules" to overwhelm its main role "an
easy way to build relational algebra expressions".

And besides, operands are powerful and descriptive, and because the
engine understands them it can short-cut which rules are fired.

For what it's worth, there's another piece of configuration that I
considered putting into RelBuilder but decided not to. That is,
whether a rule instance should call RelNode.copy to automatically
create RelNodes the same type as its inputs. See I introduced
copyFilter and copyProject arguments in FilterProjectTransposeRule:

https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L67

rather than introduce new methods RelBuilder.shouldCopyProject() and
.shouldCopyFilter().

And by the way I also considered adding methods to RelBuilderFactory
(formerly known as ProtoRelBuilder). Adding methods there would not
"pollute" RelBuilder as badly.

Julian


On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <ji...@gmail.com> wrote:
> +1.
>
> I have one question.  RelBuilder seems to be used in onMatch() method,
> when a particular type of RelNode has to be created.  What about
> RelOptRuleOperand in the constructor of rule? Let's say if we want to
> rule to match only DrillProject, or HiveFilter. The current way seems to
> be that we extend the rule by providing a different RelOptRuleOperand.
>
> Can RelBuilder also have getFilterClass(), getProjectClass() etc, which
> would be used to specify the rule patten matching?
>
>
>
> On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <ja...@apache.org> wrote:
>> Agree on the utility of this. +1.
>>
>> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>> jpullokkaran@hortonworks.com> wrote:
>>
>>> This is useful, though Hive will have to do some refactoring.
>>>
>>> In past Hive had to copy bunch of rules to work around these issues.
>>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good example of
>>> this.
>>>
>>>
>>> John
>>>
>>> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
>>>
>>> >I have started work on
>>> >https://issues.apache.org/jira/browse/CALCITE-828 and my
>>> >work-in-progress is in
>>> >https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder.
>>> >
>>> >This will be a big change to teams such as Hive and Drill. Now would
>>> >be a good time to chime in.
>>> >
>>> >The plan is that every rule instance has a ProtoRelBuilder field from
>>> >which a RelBuilder can be created to be used in an onMatch method. The
>>> >RelBuilder contains factories for every kind of RelNode, so we won't
>>> >have to keep plumbing new factories in to existing rules.
>>> >
>>> >There will be other advantages. RelBuilder is an easier and more
>>> >concise way of creating relational expressions. It does some useful
>>> >stuff for free, like flattening ANDs and eliminating identity
>>> >projections. I'm seeing the volume of code decrease and a few minor
>>> >plan improvements.
>>> >
>>> >I haven't yet found a good way to deal with the pattern where if, say,
>>> >ProjectFactory is null the rule is to use Project.copy to create a
>>> >Project of the same type.
>>> >
>>> >I'm keeping & deprecating the old rule constructors that take a
>>> >variety of XxxFactory arguments.
>>> >
>>> >As always, we try to keep API compatibility and mark the old API
>>> >
>>> >  @Deprecated // to be removed before 2.0
>>> >
>>> >but the deprecated APIs are no longer tested and are likely to be
>>> >flaky. We strongly suggest that you fix calls to deprecated APIs as
>>> >soon as you upgrade to a more recent version of Calcite.
>>> >
>>> >Julian
>>> >
>>>
>>>

Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Jinfeng Ni <ji...@gmail.com>.
+1.

I have one question.  RelBuilder seems to be used in onMatch() method,
when a particular type of RelNode has to be created.  What about
RelOptRuleOperand in the constructor of rule? Let's say if we want to
rule to match only DrillProject, or HiveFilter. The current way seems to
be that we extend the rule by providing a different RelOptRuleOperand.

Can RelBuilder also have getFilterClass(), getProjectClass() etc, which
would be used to specify the rule patten matching?



On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <ja...@apache.org> wrote:
> Agree on the utility of this. +1.
>
> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
> jpullokkaran@hortonworks.com> wrote:
>
>> This is useful, though Hive will have to do some refactoring.
>>
>> In past Hive had to copy bunch of rules to work around these issues.
>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good example of
>> this.
>>
>>
>> John
>>
>> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
>>
>> >I have started work on
>> >https://issues.apache.org/jira/browse/CALCITE-828 and my
>> >work-in-progress is in
>> >https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder.
>> >
>> >This will be a big change to teams such as Hive and Drill. Now would
>> >be a good time to chime in.
>> >
>> >The plan is that every rule instance has a ProtoRelBuilder field from
>> >which a RelBuilder can be created to be used in an onMatch method. The
>> >RelBuilder contains factories for every kind of RelNode, so we won't
>> >have to keep plumbing new factories in to existing rules.
>> >
>> >There will be other advantages. RelBuilder is an easier and more
>> >concise way of creating relational expressions. It does some useful
>> >stuff for free, like flattening ANDs and eliminating identity
>> >projections. I'm seeing the volume of code decrease and a few minor
>> >plan improvements.
>> >
>> >I haven't yet found a good way to deal with the pattern where if, say,
>> >ProjectFactory is null the rule is to use Project.copy to create a
>> >Project of the same type.
>> >
>> >I'm keeping & deprecating the old rule constructors that take a
>> >variety of XxxFactory arguments.
>> >
>> >As always, we try to keep API compatibility and mark the old API
>> >
>> >  @Deprecated // to be removed before 2.0
>> >
>> >but the deprecated APIs are no longer tested and are likely to be
>> >flaky. We strongly suggest that you fix calls to deprecated APIs as
>> >soon as you upgrade to a more recent version of Calcite.
>> >
>> >Julian
>> >
>>
>>

Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by Jacques Nadeau <ja...@apache.org>.
Agree on the utility of this. +1.

On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
jpullokkaran@hortonworks.com> wrote:

> This is useful, though Hive will have to do some refactoring.
>
> In past Hive had to copy bunch of rules to work around these issues.
> Rules & relnodes for org.apache.calcite.rel.logical.* is a good example of
> this.
>
>
> John
>
> On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:
>
> >I have started work on
> >https://issues.apache.org/jira/browse/CALCITE-828 and my
> >work-in-progress is in
> >https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder.
> >
> >This will be a big change to teams such as Hive and Drill. Now would
> >be a good time to chime in.
> >
> >The plan is that every rule instance has a ProtoRelBuilder field from
> >which a RelBuilder can be created to be used in an onMatch method. The
> >RelBuilder contains factories for every kind of RelNode, so we won't
> >have to keep plumbing new factories in to existing rules.
> >
> >There will be other advantages. RelBuilder is an easier and more
> >concise way of creating relational expressions. It does some useful
> >stuff for free, like flattening ANDs and eliminating identity
> >projections. I'm seeing the volume of code decrease and a few minor
> >plan improvements.
> >
> >I haven't yet found a good way to deal with the pattern where if, say,
> >ProjectFactory is null the rule is to use Project.copy to create a
> >Project of the same type.
> >
> >I'm keeping & deprecating the old rule constructors that take a
> >variety of XxxFactory arguments.
> >
> >As always, we try to keep API compatibility and mark the old API
> >
> >  @Deprecated // to be removed before 2.0
> >
> >but the deprecated APIs are no longer tested and are likely to be
> >flaky. We strongly suggest that you fix calls to deprecated APIs as
> >soon as you upgrade to a more recent version of Calcite.
> >
> >Julian
> >
>
>

Re: Transitioning rules from ProjectFactory etc. to RelBuilder

Posted by John Pullokkaran <jp...@hortonworks.com>.
This is useful, though Hive will have to do some refactoring.

In past Hive had to copy bunch of rules to work around these issues.
Rules & relnodes for org.apache.calcite.rel.logical.* is a good example of
this.


John

On 9/22/15, 5:14 PM, "Julian Hyde" <jh...@apache.org> wrote:

>I have started work on
>https://issues.apache.org/jira/browse/CALCITE-828 and my
>work-in-progress is in
>https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder.
>
>This will be a big change to teams such as Hive and Drill. Now would
>be a good time to chime in.
>
>The plan is that every rule instance has a ProtoRelBuilder field from
>which a RelBuilder can be created to be used in an onMatch method. The
>RelBuilder contains factories for every kind of RelNode, so we won't
>have to keep plumbing new factories in to existing rules.
>
>There will be other advantages. RelBuilder is an easier and more
>concise way of creating relational expressions. It does some useful
>stuff for free, like flattening ANDs and eliminating identity
>projections. I'm seeing the volume of code decrease and a few minor
>plan improvements.
>
>I haven't yet found a good way to deal with the pattern where if, say,
>ProjectFactory is null the rule is to use Project.copy to create a
>Project of the same type.
>
>I'm keeping & deprecating the old rule constructors that take a
>variety of XxxFactory arguments.
>
>As always, we try to keep API compatibility and mark the old API
>
>  @Deprecated // to be removed before 2.0
>
>but the deprecated APIs are no longer tested and are likely to be
>flaky. We strongly suggest that you fix calls to deprecated APIs as
>soon as you upgrade to a more recent version of Calcite.
>
>Julian
>