You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Maryann Xue <ma...@gmail.com> on 2015/03/20 20:10:03 UTC

About JoinCommuteRule

Hi Julian,

Noticed that JoinCommuteRule does not swap outer joins by default. Is there
any specific reason to avoid swapping outer joins? And is there a way to
enable swapping outer joins?

Phoenix hash join output columns from left to right, so we prefer left
outer join over right outer join if hash is possible. So unlike
EnumerableJoin, PhoenixJoin assigns the left hand side with a lower cost
than the right hand side when hashjoin is doable.



Thanks,
Maryann

Re: About JoinCommuteRule

Posted by Maryann Xue <ma...@gmail.com>.
Tried with swapOuterJoins = true and got two failures in the calcite/core
tests, which were both caused by switching left join to right join. And I
think this is further because EnumerableJoin gives most cost to the left
side (by mistake). So I went ahead to change "computeSelfCost()" in
EnumerableJoin, and it turned out I got 10 failures this time (all due
explain plan difference").

I'll file JIRAs for the above two issues respectively. But I would like to
get around this quickly by adding a SWAP_OUTER static instance to
JoinCommuteRule. Would that be OK?


Thanks,
Maryann


On Mon, Mar 23, 2015 at 1:56 PM, Julian Hyde <ju...@gmail.com> wrote:

> On Mar 20, 2015, at 3:02 PM, Maryann Xue <ma...@gmail.com> wrote:
>
> >>> I can't think of a good reason why JoinCommuteRule doesn't swap outer
> > joins.
> >
> > But right now the only call to swap() is with swapOuterJoins set to
> false.
> > So I thought it might have some reason to do so. Can we change that?
>
> I don’t remember why. Can you investigate, by running the test suite, and
> make a recommendation?
>
> >
> >>> EnumerableJoin originally built the left, probed the right, and
> > therefore had a smaller cost if the smaller input were on the left.
> >
> > Phoenix actually builds the right and probes the left.
> >
> >>> But we changed it, because the convention in the optimizer world is to
> > build left-deep trees, with the largest input on the left, and smaller,
> > hopefully selective, inputs on the right.
> >
> > So I assume EnumerableJoin now should give LHS a cheaper cost, right? It
> > does not look like so in the code.
>
> Oops, you’re right. EnumerableJoin is more expensive if the larger input
> is placed on the left. I think that is a mistake.
>
> > Don't know if my understanding is correct, but I think a left-deep tree
> > with largest relation on the left would most likely benefit nested loop
> > joins. Phoenix is not able to do NL join, so either a left-deep tree with
> > largest on the right or, if memory limit allows, a right-deep tree with
> > largest on the left is preferable.
>
> Although it would be nice if each join algorithm could choose its cost
> model, I think it would make it a lot more complicated to build re-usable
> rules.
>
> You should consider changing your join to match the convention. (And yes
> we need to change EnumerableJoin also.)
>
> Julian
>
>

Re: About JoinCommuteRule

Posted by Julian Hyde <ju...@gmail.com>.
On Mar 20, 2015, at 3:02 PM, Maryann Xue <ma...@gmail.com> wrote:

>>> I can't think of a good reason why JoinCommuteRule doesn't swap outer
> joins.
> 
> But right now the only call to swap() is with swapOuterJoins set to false.
> So I thought it might have some reason to do so. Can we change that?

I don’t remember why. Can you investigate, by running the test suite, and make a recommendation?

> 
>>> EnumerableJoin originally built the left, probed the right, and
> therefore had a smaller cost if the smaller input were on the left.
> 
> Phoenix actually builds the right and probes the left.
> 
>>> But we changed it, because the convention in the optimizer world is to
> build left-deep trees, with the largest input on the left, and smaller,
> hopefully selective, inputs on the right.
> 
> So I assume EnumerableJoin now should give LHS a cheaper cost, right? It
> does not look like so in the code.

Oops, you’re right. EnumerableJoin is more expensive if the larger input is placed on the left. I think that is a mistake.

> Don't know if my understanding is correct, but I think a left-deep tree
> with largest relation on the left would most likely benefit nested loop
> joins. Phoenix is not able to do NL join, so either a left-deep tree with
> largest on the right or, if memory limit allows, a right-deep tree with
> largest on the left is preferable.

Although it would be nice if each join algorithm could choose its cost model, I think it would make it a lot more complicated to build re-usable rules.

You should consider changing your join to match the convention. (And yes we need to change EnumerableJoin also.)

Julian


Re: About JoinCommuteRule

Posted by Maryann Xue <ma...@gmail.com>.
>> I can't think of a good reason why JoinCommuteRule doesn't swap outer
joins.

But right now the only call to swap() is with swapOuterJoins set to false.
So I thought it might have some reason to do so. Can we change that?

>> EnumerableJoin originally built the left, probed the right, and
therefore had a smaller cost if the smaller input were on the left.

Phoenix actually builds the right and probes the left.

>> But we changed it, because the convention in the optimizer world is to
build left-deep trees, with the largest input on the left, and smaller,
hopefully selective, inputs on the right.

So I assume EnumerableJoin now should give LHS a cheaper cost, right? It
does not look like so in the code.

Don't know if my understanding is correct, but I think a left-deep tree
with largest relation on the left would most likely benefit nested loop
joins. Phoenix is not able to do NL join, so either a left-deep tree with
largest on the right or, if memory limit allows, a right-deep tree with
largest on the left is preferable.

Thanks for the idea of adding a Project. It would make a lot of things
easier.


Thanks,

Maryann


On Fri, Mar 20, 2015 at 3:28 PM, Julian Hyde <ju...@gmail.com> wrote:

> I can't think of a good reason why JoinCommuteRule doesn't swap outer
> joins. It would need to flip LEFT -> RIGHT etc. Run through the test
> suite and see what happens.
>
> Regarding which side of a hash-join is the build side and the probe
> side. Like Phoenix hash join, EnumerableJoin originally built the
> left, probed the right, and therefore had a smaller cost if the
> smaller input were on the left. But we changed it, because the
> convention in the optimizer world is to build left-deep trees, with
> the largest input on the left, and smaller, hopefully selective,
> inputs on the right. The heuristic join ordering rule follows this
> convention.
>
> I think you'll encounter fewer problems in the long run if you swap
> over to the same convention. Maybe you can achieve it with a quick
> hack like introducing a project just before you implement.
>
> On Fri, Mar 20, 2015 at 12:10 PM, Maryann Xue <ma...@gmail.com>
> wrote:
> > Hi Julian,
> >
> > Noticed that JoinCommuteRule does not swap outer joins by default. Is
> there
> > any specific reason to avoid swapping outer joins? And is there a way to
> > enable swapping outer joins?
> >
> > Phoenix hash join output columns from left to right, so we prefer left
> > outer join over right outer join if hash is possible. So unlike
> > EnumerableJoin, PhoenixJoin assigns the left hand side with a lower cost
> > than the right hand side when hashjoin is doable.
> >
> >
> >
> > Thanks,
> > Maryann
>

Re: About JoinCommuteRule

Posted by Julian Hyde <ju...@gmail.com>.
I can't think of a good reason why JoinCommuteRule doesn't swap outer
joins. It would need to flip LEFT -> RIGHT etc. Run through the test
suite and see what happens.

Regarding which side of a hash-join is the build side and the probe
side. Like Phoenix hash join, EnumerableJoin originally built the
left, probed the right, and therefore had a smaller cost if the
smaller input were on the left. But we changed it, because the
convention in the optimizer world is to build left-deep trees, with
the largest input on the left, and smaller, hopefully selective,
inputs on the right. The heuristic join ordering rule follows this
convention.

I think you'll encounter fewer problems in the long run if you swap
over to the same convention. Maybe you can achieve it with a quick
hack like introducing a project just before you implement.

On Fri, Mar 20, 2015 at 12:10 PM, Maryann Xue <ma...@gmail.com> wrote:
> Hi Julian,
>
> Noticed that JoinCommuteRule does not swap outer joins by default. Is there
> any specific reason to avoid swapping outer joins? And is there a way to
> enable swapping outer joins?
>
> Phoenix hash join output columns from left to right, so we prefer left
> outer join over right outer join if hash is possible. So unlike
> EnumerableJoin, PhoenixJoin assigns the left hand side with a lower cost
> than the right hand side when hashjoin is doable.
>
>
>
> Thanks,
> Maryann