You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Γιώργος Θεοδωράκης <gi...@gmail.com> on 2017/05/11 11:16:46 UTC

Best way to merge a set of Operators in a single Operator

I am trying to "separate" certain subsets of Operators in a query tree and
transform them to a more "general" RelNode implementation, that holds the
information required to rebuild them. I want to implement something more
general than CALC (for more types of operators), that works like this:

Operator1 -> Operator2 -> Operator3   ===Enforcing Rules with certain
Conditions==>

Operator1 -> (Operator2 -> Operator3) == Operator1 -> MergedOperators2,3
(we can distinguish that this operator is build from Operators 2 and 3)

Can anyone suggest a possible starting point?

Re: Best way to merge a set of Operators in a single Operator

Posted by Julian Hyde <jh...@apache.org>.

Yes, sometimes you have to do this. I agree that it’s a pain. You have to satisfy the Volcano planner by supplying an identical row-type (field names can be different, but types must be the same down to nullability, precision and scale). But the uniformity demanded by Volcano has benefits elsewhere.

Julian


> On May 22, 2017, at 4:06 AM, Γιώργος Θεοδωράκης <gi...@gmail.com> wrote:
> 
> In order to make it work, I had to specifically define my rowtype in the
> core operator I have created:
> 
> public abstract class AggrCalc extends SingleRel{
> ...
>   // constructor
>   protected AggrCalc(...) {
> super(cluster, traits, child);
> this.rowType = wantedRowType;
>   }
> }
> 
> I am not sure if this is a good way to do it from the beginning (e.g. I
> lose wanted information about the original operators), but I think it
> serves my purpose as it this rule is enforced before execution.
> 
> 2017-05-22 12:13 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
> 
>> Hello,
>> I tried to write something by myself, and your example helped. However, I
>> am stuck with this error:
>> 
>> I have created an operator that holds the proper info (program1, program2,
>> AggList, GroupList, ...) about two CALCs and one Aggregate in the following
>> formation:
>> (Calc1->Aggregate->Calc2) => CustomOperator
>> 
>> and I get this error when I try to transform them to my custom operator:
>> 
>> Exception in thread "main" java.lang.AssertionError: Cannot add expression
>> of different type to set:
>> set type is RecordType(Calc1 output schema)
>> expression type is RecordType(Calc2 output schema)
>> ...
>> 
>> I have seen that there is a check when calling the trasformTo() method. Is
>> there any trivial way to overcome this error?
>> 
>> In addition, in your example, DruidQuery starts from the bottom of the
>> operators' tree and you have to worry about only the row type of the last
>> RelNode in the stack. Would I have a problem with this kind of rule I try
>> to create? If I had only (Calc->Aggregate), would it be easier?
>> 
>> 
>> 2017-05-12 5:34 GMT+03:00 Julian Hyde <jh...@apache.org>:
>> 
>>> It seems that "composite" operators crop up quite often. Having kept
>>> the operators separate in older adapters like the Mongo adapter, I
>>> took a different approach in Druid adapter, and I'm quite please with
>>> it.
>>> 
>>> DruidQuery contains a "stack" of internal relational operators. They
>>> are held in the field
>>> 
>>>  final ImmutableList<RelNode> rels;
>>> 
>>> The initial DruidQuery contains just a TableScan. It grows as other
>>> operators (Filter, Project, Aggregate) are pushed down to Druid, one
>>> at a time.
>>> 
>>> The internal RelNodes are not visible to the planner but are
>>> nevertheless set up pretty much as they would be if they were outside
>>> the DruidQuery. The row type of the query is the row type of the last
>>> RelNode in the stack.
>>> 
>>> The "signature()" method helps figure out whether an operation can be
>>> pushed onto a DruidQuery. It returns a string that indicates the
>>> sequence of operations. For example, a TableScan followed by a Filter
>>> followed by a Project returns "sfp", and the rule to push an Aggregate
>>> into Druid knows that it can succeed because "sfpa" is in the list of
>>> valid signatures.
>>> 
>>> Julian
>>> 
>>> 
>>> 
>>> On Thu, May 11, 2017 at 4:16 AM, Γιώργος Θεοδωράκης
>>> <gi...@gmail.com> wrote:
>>>> I am trying to "separate" certain subsets of Operators in a query tree
>>> and
>>>> transform them to a more "general" RelNode implementation, that holds
>>> the
>>>> information required to rebuild them. I want to implement something more
>>>> general than CALC (for more types of operators), that works like this:
>>>> 
>>>> Operator1 -> Operator2 -> Operator3   ===Enforcing Rules with certain
>>>> Conditions==>
>>>> 
>>>> Operator1 -> (Operator2 -> Operator3) == Operator1 -> MergedOperators2,3
>>>> (we can distinguish that this operator is build from Operators 2 and 3)
>>>> 
>>>> Can anyone suggest a possible starting point?
>>> 
>> 
>>

Re: Best way to merge a set of Operators in a single Operator

Posted by Γιώργος Θεοδωράκης <gi...@gmail.com>.

In order to make it work, I had to specifically define my rowtype in the
core operator I have created:

public abstract class AggrCalc extends SingleRel{
...
   // constructor
   protected AggrCalc(...) {
super(cluster, traits, child);
 this.rowType = wantedRowType;
   }
}

I am not sure if this is a good way to do it from the beginning (e.g. I
lose wanted information about the original operators), but I think it
serves my purpose as it this rule is enforced before execution.

2017-05-22 12:13 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:

> Hello,
> I tried to write something by myself, and your example helped. However, I
> am stuck with this error:
>
> I have created an operator that holds the proper info (program1, program2,
> AggList, GroupList, ...) about two CALCs and one Aggregate in the following
> formation:
> (Calc1->Aggregate->Calc2) => CustomOperator
>
> and I get this error when I try to transform them to my custom operator:
>
> Exception in thread "main" java.lang.AssertionError: Cannot add expression
> of different type to set:
> set type is RecordType(Calc1 output schema)
> expression type is RecordType(Calc2 output schema)
> ...
>
> I have seen that there is a check when calling the trasformTo() method. Is
> there any trivial way to overcome this error?
>
> In addition, in your example, DruidQuery starts from the bottom of the
> operators' tree and you have to worry about only the row type of the last
> RelNode in the stack. Would I have a problem with this kind of rule I try
> to create? If I had only (Calc->Aggregate), would it be easier?
>
>
> 2017-05-12 5:34 GMT+03:00 Julian Hyde <jh...@apache.org>:
>
>> It seems that "composite" operators crop up quite often. Having kept
>> the operators separate in older adapters like the Mongo adapter, I
>> took a different approach in Druid adapter, and I'm quite please with
>> it.
>>
>> DruidQuery contains a "stack" of internal relational operators. They
>> are held in the field
>>
>>   final ImmutableList<RelNode> rels;
>>
>> The initial DruidQuery contains just a TableScan. It grows as other
>> operators (Filter, Project, Aggregate) are pushed down to Druid, one
>> at a time.
>>
>> The internal RelNodes are not visible to the planner but are
>> nevertheless set up pretty much as they would be if they were outside
>> the DruidQuery. The row type of the query is the row type of the last
>> RelNode in the stack.
>>
>> The "signature()" method helps figure out whether an operation can be
>> pushed onto a DruidQuery. It returns a string that indicates the
>> sequence of operations. For example, a TableScan followed by a Filter
>> followed by a Project returns "sfp", and the rule to push an Aggregate
>> into Druid knows that it can succeed because "sfpa" is in the list of
>> valid signatures.
>>
>> Julian
>>
>>
>>
>> On Thu, May 11, 2017 at 4:16 AM, Γιώργος Θεοδωράκης
>> <gi...@gmail.com> wrote:
>> > I am trying to "separate" certain subsets of Operators in a query tree
>> and
>> > transform them to a more "general" RelNode implementation, that holds
>> the
>> > information required to rebuild them. I want to implement something more
>> > general than CALC (for more types of operators), that works like this:
>> >
>> > Operator1 -> Operator2 -> Operator3   ===Enforcing Rules with certain
>> > Conditions==>
>> >
>> > Operator1 -> (Operator2 -> Operator3) == Operator1 -> MergedOperators2,3
>> > (we can distinguish that this operator is build from Operators 2 and 3)
>> >
>> > Can anyone suggest a possible starting point?
>>
>
>

Re: Best way to merge a set of Operators in a single Operator

Posted by Γιώργος Θεοδωράκης <gi...@gmail.com>.

Hello,
I tried to write something by myself, and your example helped. However, I
am stuck with this error:

I have created an operator that holds the proper info (program1, program2,
AggList, GroupList, ...) about two CALCs and one Aggregate in the following
formation:
(Calc1->Aggregate->Calc2) => CustomOperator

and I get this error when I try to transform them to my custom operator:

Exception in thread "main" java.lang.AssertionError: Cannot add expression
of different type to set:
set type is RecordType(Calc1 output schema)
expression type is RecordType(Calc2 output schema)
...

I have seen that there is a check when calling the trasformTo() method. Is
there any trivial way to overcome this error?

In addition, in your example, DruidQuery starts from the bottom of the
operators' tree and you have to worry about only the row type of the last
RelNode in the stack. Would I have a problem with this kind of rule I try
to create? If I had only (Calc->Aggregate), would it be easier?


2017-05-12 5:34 GMT+03:00 Julian Hyde <jh...@apache.org>:

> It seems that "composite" operators crop up quite often. Having kept
> the operators separate in older adapters like the Mongo adapter, I
> took a different approach in Druid adapter, and I'm quite please with
> it.
>
> DruidQuery contains a "stack" of internal relational operators. They
> are held in the field
>
>   final ImmutableList<RelNode> rels;
>
> The initial DruidQuery contains just a TableScan. It grows as other
> operators (Filter, Project, Aggregate) are pushed down to Druid, one
> at a time.
>
> The internal RelNodes are not visible to the planner but are
> nevertheless set up pretty much as they would be if they were outside
> the DruidQuery. The row type of the query is the row type of the last
> RelNode in the stack.
>
> The "signature()" method helps figure out whether an operation can be
> pushed onto a DruidQuery. It returns a string that indicates the
> sequence of operations. For example, a TableScan followed by a Filter
> followed by a Project returns "sfp", and the rule to push an Aggregate
> into Druid knows that it can succeed because "sfpa" is in the list of
> valid signatures.
>
> Julian
>
>
>
> On Thu, May 11, 2017 at 4:16 AM, Γιώργος Θεοδωράκης
> <gi...@gmail.com> wrote:
> > I am trying to "separate" certain subsets of Operators in a query tree
> and
> > transform them to a more "general" RelNode implementation, that holds the
> > information required to rebuild them. I want to implement something more
> > general than CALC (for more types of operators), that works like this:
> >
> > Operator1 -> Operator2 -> Operator3   ===Enforcing Rules with certain
> > Conditions==>
> >
> > Operator1 -> (Operator2 -> Operator3) == Operator1 -> MergedOperators2,3
> > (we can distinguish that this operator is build from Operators 2 and 3)
> >
> > Can anyone suggest a possible starting point?
>

Re: Best way to merge a set of Operators in a single Operator

Posted by Julian Hyde <jh...@apache.org>.

It seems that "composite" operators crop up quite often. Having kept
the operators separate in older adapters like the Mongo adapter, I
took a different approach in Druid adapter, and I'm quite please with
it.

DruidQuery contains a "stack" of internal relational operators. They
are held in the field

  final ImmutableList<RelNode> rels;

The initial DruidQuery contains just a TableScan. It grows as other
operators (Filter, Project, Aggregate) are pushed down to Druid, one
at a time.

The internal RelNodes are not visible to the planner but are
nevertheless set up pretty much as they would be if they were outside
the DruidQuery. The row type of the query is the row type of the last
RelNode in the stack.

The "signature()" method helps figure out whether an operation can be
pushed onto a DruidQuery. It returns a string that indicates the
sequence of operations. For example, a TableScan followed by a Filter
followed by a Project returns "sfp", and the rule to push an Aggregate
into Druid knows that it can succeed because "sfpa" is in the list of
valid signatures.

Julian

On Thu, May 11, 2017 at 4:16 AM, Γιώργος Θεοδωράκης
<gi...@gmail.com> wrote:
> I am trying to "separate" certain subsets of Operators in a query tree and
> transform them to a more "general" RelNode implementation, that holds the
> information required to rebuild them. I want to implement something more
> general than CALC (for more types of operators), that works like this:
>
> Operator1 -> Operator2 -> Operator3   ===Enforcing Rules with certain
> Conditions==>
>
> Operator1 -> (Operator2 -> Operator3) == Operator1 -> MergedOperators2,3
> (we can distinguish that this operator is build from Operators 2 and 3)
>
> Can anyone suggest a possible starting point?