You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Stamatis Zampetakis <za...@gmail.com> on 2018/09/06 16:33:56 UTC

Correctness of SortJoinTransposeRule

Hello,

I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
pushes a LogicalSort past a LogicalJoin if the join is either left outer or
right outer.

Who guarantees that the left and right outer joins are preserving the order
of the inputs?
Does the SQL standard requires that these types of joins are order
preserving?

Since we are working with logical operators, I would tend to think that we
cannot assume anything about the physical equivalent.

Best,
Stamatis

Re: Correctness of SortJoinTransposeRule

Posted by Stamatis Zampetakis <za...@gmail.com>.
Thanks Julian and Jesus for your feedback.

Jesus> If I remember correctly, the rule pushes the Sort through the Join
(if possible), but it also preserves the Sort on top of the Join to ensure
correctness.

Indeed, I was mislead by the name of the rule and thought that the Sort is
completely pushed below the join.

Julian> Or perhaps we leave the Sort and have a rule that notices the
output order of the Join and, based on that, weakens[1] or removes the Sort.

There is a rule for removing the Sort if its input is already ordered
(i.e., SortRemoveRule).

Julian> And the rule could also be extended to deal with Sort operators
that do not have a limit. Limits seriously constrain what the rule can
safely do, and if there is no limit, we can safely push through inner join.

In cases where there is a Sort with a limit can't we push only the sort and
leave the limit as is if both cannot be pushed together?

Ideally, what I want to achieve is push the sort down as much as possible
in order to exploit an index with an interesting order and remove the sort
if the join (and other intermediate operators are ordered in the same way).
For the moment, I am relying a lot on the Enumerable operators so  based on
the discussion so far and by looking in the existing code, I guess what is
missing for my use-case is:
(i) to enrich the RelTraitSet of the Enumerable join variants to properly
declare if they preserve the order of their input(s) (only
EnumerableMergeJoin provides such information at the moment);
(ii) extend the SortJoinTransposeRule to be able to handle more cases
notably the INNER join which is very frequent;
(iii) possibly add a rule that uses nested loops instead of hash join to
perform equijoins (since queries with index scans and limit can be much
faster if the join does not need to bring the whole relation in memory).

Regarding (i) I have the impression (still needs to be verified) that:
(a) the hash join implementation of Calcite (EnumerableDefaults#_join)
retains the order of the outer/probing relation;
(b) the nested loop join implementation of Calcite
(EnumerableDefaults#correlateJoin) retains the order of the outer relation.
If anybody else has already an idea of what exactly holds for the physical
join algorithms of Calcite, please share it here.

Best,
Stamatis

Στις Πέμ, 6 Σεπ 2018 στις 8:07 μ.μ., ο/η Julian Hyde <jh...@apache.org>
έγραψε:

> I’d forgotten about LIMIT.
>
> The rule could be extended to push limit through inner join if there is a
> foreign key (e.g. we know that the join is a lookup that does not
> increase/decrease the number of rows).
>
> And the rule could also be extended to deal with Sort operators that do
> not have a limit. Limits seriously constrain what the rule can safely do,
> and if there is no limit, we can safely push through inner join.
>
> Julian
>
>
> > On Sep 6, 2018, at 10:39 AM, Jesus Camacho Rodriguez <
> jcamachorodriguez@hortonworks.com> wrote:
> >
> > The idea for that rule was to be able to exploit the limit/fetch spec of
> the Sort operator to reduce the number of rows that needed to be joined,
> that is why it was only applied to LEFT/RIGHT outer join.
> >
> > I think option 2 below sounds better than creating a new rule variant.
> >
> > Thanks,
> > Jesús
> >
> >
> > On 9/6/18, 10:28 AM, "Julian Hyde" <jh...@apache.org> wrote:
> >
> >    Ah, that makes sense.
> >
> >    Reading the code, I couldn’t figure out why it applies to LEFT and
> RIGHT but not to INNER. (For some kinds of join, for example inner merge
> join, it could push the sort to both sides, as long as the sort was
> compatible with what is needed to ensure that the keys arrive at the right
> time.)
> >
> >    If needed, we could have a variant of the rule that omits the Sort
> after the Join. Or perhaps we leave the Sort and have a rule that notices
> the output order of the Join and, based on that, weakens[1] or removes the
> Sort.
> >
> >    Julian
> >
> >    [1] https://issues.apache.org/jira/browse/CALCITE-2540 <
> https://issues.apache.org/jira/browse/CALCITE-2540>
> >
> >
> >
> >> On Sep 6, 2018, at 10:08 AM, Jesus Camacho Rodriguez <
> jcamachorodriguez@hortonworks.com> wrote:
> >>
> >> If I remember correctly, the rule pushes the Sort through the Join (if
> possible), but it also preserves the Sort on top of the Join to ensure
> correctness.
> >>
> >> -Jesús
> >>
> >>
> >> On 9/6/18, 9:57 AM, "Julian Hyde" <jh...@apache.org> wrote:
> >>
> >>   Yes, it depends very much on the operator. Some examples:
> >>   Merge join typically requires inputs to be sorted, and preserves that
> order. (But some outer joins may throw in null values out of order.)
> >>   Map join typically preserves the order of the probing side, not the
> build side.
> >>   Hash join typically destroys the order of both sides.
> >>   Use the rule with caution.
> >>
> >>   Julian
> >>
> >>
> >>> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis <za...@gmail.com>
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule)
> that
> >>> pushes a LogicalSort past a LogicalJoin if the join is either left
> outer or
> >>> right outer.
> >>>
> >>> Who guarantees that the left and right outer joins are preserving the
> order
> >>> of the inputs?
> >>> Does the SQL standard requires that these types of joins are order
> >>> preserving?
> >>>
> >>> Since we are working with logical operators, I would tend to think
> that we
> >>> cannot assume anything about the physical equivalent.
> >>>
> >>> Best,
> >>> Stamatis
> >>
> >>
> >>
> >
> >
> >
>
>

Re: Correctness of SortJoinTransposeRule

Posted by Julian Hyde <jh...@apache.org>.
I’d forgotten about LIMIT. 

The rule could be extended to push limit through inner join if there is a foreign key (e.g. we know that the join is a lookup that does not increase/decrease the number of rows).

And the rule could also be extended to deal with Sort operators that do not have a limit. Limits seriously constrain what the rule can safely do, and if there is no limit, we can safely push through inner join.

Julian


> On Sep 6, 2018, at 10:39 AM, Jesus Camacho Rodriguez <jc...@hortonworks.com> wrote:
> 
> The idea for that rule was to be able to exploit the limit/fetch spec of the Sort operator to reduce the number of rows that needed to be joined, that is why it was only applied to LEFT/RIGHT outer join.
> 
> I think option 2 below sounds better than creating a new rule variant.
> 
> Thanks,
> Jesús
> 
> 
> On 9/6/18, 10:28 AM, "Julian Hyde" <jh...@apache.org> wrote:
> 
>    Ah, that makes sense.
> 
>    Reading the code, I couldn’t figure out why it applies to LEFT and RIGHT but not to INNER. (For some kinds of join, for example inner merge join, it could push the sort to both sides, as long as the sort was compatible with what is needed to ensure that the keys arrive at the right time.)
> 
>    If needed, we could have a variant of the rule that omits the Sort after the Join. Or perhaps we leave the Sort and have a rule that notices the output order of the Join and, based on that, weakens[1] or removes the Sort.
> 
>    Julian
> 
>    [1] https://issues.apache.org/jira/browse/CALCITE-2540 <https://issues.apache.org/jira/browse/CALCITE-2540>
> 
> 
> 
>> On Sep 6, 2018, at 10:08 AM, Jesus Camacho Rodriguez <jc...@hortonworks.com> wrote:
>> 
>> If I remember correctly, the rule pushes the Sort through the Join (if possible), but it also preserves the Sort on top of the Join to ensure correctness.
>> 
>> -Jesús
>> 
>> 
>> On 9/6/18, 9:57 AM, "Julian Hyde" <jh...@apache.org> wrote:
>> 
>>   Yes, it depends very much on the operator. Some examples:
>>   Merge join typically requires inputs to be sorted, and preserves that order. (But some outer joins may throw in null values out of order.)
>>   Map join typically preserves the order of the probing side, not the build side.
>>   Hash join typically destroys the order of both sides.
>>   Use the rule with caution.
>> 
>>   Julian
>> 
>> 
>>> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
>>> pushes a LogicalSort past a LogicalJoin if the join is either left outer or
>>> right outer.
>>> 
>>> Who guarantees that the left and right outer joins are preserving the order
>>> of the inputs?
>>> Does the SQL standard requires that these types of joins are order
>>> preserving?
>>> 
>>> Since we are working with logical operators, I would tend to think that we
>>> cannot assume anything about the physical equivalent.
>>> 
>>> Best,
>>> Stamatis
>> 
>> 
>> 
> 
> 
> 


Re: Correctness of SortJoinTransposeRule

Posted by Jesus Camacho Rodriguez <jc...@hortonworks.com>.
The idea for that rule was to be able to exploit the limit/fetch spec of the Sort operator to reduce the number of rows that needed to be joined, that is why it was only applied to LEFT/RIGHT outer join.

I think option 2 below sounds better than creating a new rule variant.

Thanks,
Jesús


On 9/6/18, 10:28 AM, "Julian Hyde" <jh...@apache.org> wrote:

    Ah, that makes sense.
    
    Reading the code, I couldn’t figure out why it applies to LEFT and RIGHT but not to INNER. (For some kinds of join, for example inner merge join, it could push the sort to both sides, as long as the sort was compatible with what is needed to ensure that the keys arrive at the right time.)
    
    If needed, we could have a variant of the rule that omits the Sort after the Join. Or perhaps we leave the Sort and have a rule that notices the output order of the Join and, based on that, weakens[1] or removes the Sort.
    
    Julian
    
    [1] https://issues.apache.org/jira/browse/CALCITE-2540 <https://issues.apache.org/jira/browse/CALCITE-2540>
    
    
    
    > On Sep 6, 2018, at 10:08 AM, Jesus Camacho Rodriguez <jc...@hortonworks.com> wrote:
    > 
    > If I remember correctly, the rule pushes the Sort through the Join (if possible), but it also preserves the Sort on top of the Join to ensure correctness.
    > 
    > -Jesús
    > 
    > 
    > On 9/6/18, 9:57 AM, "Julian Hyde" <jh...@apache.org> wrote:
    > 
    >    Yes, it depends very much on the operator. Some examples:
    >    Merge join typically requires inputs to be sorted, and preserves that order. (But some outer joins may throw in null values out of order.)
    >    Map join typically preserves the order of the probing side, not the build side.
    >    Hash join typically destroys the order of both sides.
    >    Use the rule with caution.
    > 
    >    Julian
    > 
    > 
    >> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
    >> 
    >> Hello,
    >> 
    >> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
    >> pushes a LogicalSort past a LogicalJoin if the join is either left outer or
    >> right outer.
    >> 
    >> Who guarantees that the left and right outer joins are preserving the order
    >> of the inputs?
    >> Does the SQL standard requires that these types of joins are order
    >> preserving?
    >> 
    >> Since we are working with logical operators, I would tend to think that we
    >> cannot assume anything about the physical equivalent.
    >> 
    >> Best,
    >> Stamatis
    > 
    > 
    > 
    
    


Re: Correctness of SortJoinTransposeRule

Posted by Julian Hyde <jh...@apache.org>.
Ah, that makes sense.

Reading the code, I couldn’t figure out why it applies to LEFT and RIGHT but not to INNER. (For some kinds of join, for example inner merge join, it could push the sort to both sides, as long as the sort was compatible with what is needed to ensure that the keys arrive at the right time.)

If needed, we could have a variant of the rule that omits the Sort after the Join. Or perhaps we leave the Sort and have a rule that notices the output order of the Join and, based on that, weakens[1] or removes the Sort.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-2540 <https://issues.apache.org/jira/browse/CALCITE-2540>



> On Sep 6, 2018, at 10:08 AM, Jesus Camacho Rodriguez <jc...@hortonworks.com> wrote:
> 
> If I remember correctly, the rule pushes the Sort through the Join (if possible), but it also preserves the Sort on top of the Join to ensure correctness.
> 
> -Jesús
> 
> 
> On 9/6/18, 9:57 AM, "Julian Hyde" <jh...@apache.org> wrote:
> 
>    Yes, it depends very much on the operator. Some examples:
>    Merge join typically requires inputs to be sorted, and preserves that order. (But some outer joins may throw in null values out of order.)
>    Map join typically preserves the order of the probing side, not the build side.
>    Hash join typically destroys the order of both sides.
>    Use the rule with caution.
> 
>    Julian
> 
> 
>> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
>> 
>> Hello,
>> 
>> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
>> pushes a LogicalSort past a LogicalJoin if the join is either left outer or
>> right outer.
>> 
>> Who guarantees that the left and right outer joins are preserving the order
>> of the inputs?
>> Does the SQL standard requires that these types of joins are order
>> preserving?
>> 
>> Since we are working with logical operators, I would tend to think that we
>> cannot assume anything about the physical equivalent.
>> 
>> Best,
>> Stamatis
> 
> 
> 


Re: Correctness of SortJoinTransposeRule

Posted by Jesus Camacho Rodriguez <jc...@hortonworks.com>.
If I remember correctly, the rule pushes the Sort through the Join (if possible), but it also preserves the Sort on top of the Join to ensure correctness.

-Jesús


On 9/6/18, 9:57 AM, "Julian Hyde" <jh...@apache.org> wrote:

    Yes, it depends very much on the operator. Some examples:
    Merge join typically requires inputs to be sorted, and preserves that order. (But some outer joins may throw in null values out of order.)
    Map join typically preserves the order of the probing side, not the build side.
    Hash join typically destroys the order of both sides.
    Use the rule with caution.
    
    Julian
    
    
    > On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
    > 
    > Hello,
    > 
    > I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
    > pushes a LogicalSort past a LogicalJoin if the join is either left outer or
    > right outer.
    > 
    > Who guarantees that the left and right outer joins are preserving the order
    > of the inputs?
    > Does the SQL standard requires that these types of joins are order
    > preserving?
    > 
    > Since we are working with logical operators, I would tend to think that we
    > cannot assume anything about the physical equivalent.
    > 
    > Best,
    > Stamatis
    
    


Re: Correctness of SortJoinTransposeRule

Posted by Julian Hyde <jh...@apache.org>.
Yes, it depends very much on the operator. Some examples:
Merge join typically requires inputs to be sorted, and preserves that order. (But some outer joins may throw in null values out of order.)
Map join typically preserves the order of the probing side, not the build side.
Hash join typically destroys the order of both sides.
Use the rule with caution.

Julian


> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
> 
> Hello,
> 
> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
> pushes a LogicalSort past a LogicalJoin if the join is either left outer or
> right outer.
> 
> Who guarantees that the left and right outer joins are preserving the order
> of the inputs?
> Does the SQL standard requires that these types of joins are order
> preserving?
> 
> Since we are working with logical operators, I would tend to think that we
> cannot assume anything about the physical equivalent.
> 
> Best,
> Stamatis