You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Yuzhao Chen <yu...@gmail.com> on 2019/04/15 09:40:43 UTC

What is the exactly definition as an equi join ?

If we checkout the java doc for Calcite EuqiJoin, there is definition for it:
> for any join whose condition is based on column equality

But what about if there are function calls in the equi condition operands ? For example:
Should we consider

Select A.a, B.b from A join B on cast(A.a as int) = B.b

as an equi join ?

Now Calcite think it is not, which I think will lost some possibilities for sql plan promotion, e.g. join condition push down.

Best,
Danny Chan

Re: What is the exactly definition as an equi join ?

Posted by Yuzhao Chen <yu...@gmail.com>.
For the example I gave, the join condition is push down with RelOptUtil.pushDownJoinConditions(), but when we new a join info [1], we only decide if the join is EquiJoin if all the operands of operator <=> is inputRef, which is the question i ask here.

[1] https://github.com/apache/calcite/blob/9538374e8fae5cec7d6f7b270850f5dfb4c1fc06/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L1435

Best,
Danny Chan
在 2019年4月15日 +0800 PM7:25,Xiening Dai <xn...@live.com>,写道:
> I think Calcite always pushes down equal join conditions. In SqlToRelConverter.createJoin(), before ruction returns, it calls RelOptUtil.pushDownJoinConditions(). So in your example, the cast expression will be pushed down and it will still be an equal join.
>
> > On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <yu...@gmail.com> wrote:
> >
> > If we checkout the java doc for Calcite EuqiJoin, there is definition for it:
> > > for any join whose condition is based on column equality
> >
> > But what about if there are function calls in the equi condition operands ? For example:
> > Should we consider
> >
> > Select A.a, B.b from A join B on cast(A.a as int) = B.b
> >
> > as an equi join ?
> >
> > Now Calcite think it is not, which I think will lost some possibilities for sql plan promotion, e.g. join condition push down.
> >
> > Best,
> > Danny Chan
>

Re: What is the exactly definition as an equi join ?

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hello,

In many research papers and database books the equi join is defined as an
equality operator between two attribute names (columns etc.) which more or
less corresponds to the current definition in Calcite thus I don't see an
obvious reason to change this definition. If necessary, there could be
rules/methods which create the necessary projections below the join to
satisfy the requirement of an equi join.

I wouldn't consider Enumerable operators as an implementation detail that
is specific to Calcite.
Enumerable operators is good way to provide full SQL support to a project
that does not naturally support every operation (think of various NoSQL
databases which do not support joins for instance).
This is exactly what we do in our adapters and I guess it will be similar
for other people building adapters over various data sources.

Even for projects where the plan is to provide native operators for
everything having Enumerables at the beginning is convenient.
It can help to kick-off the project fast without waiting for every operator
to get implemented.

Best,
Stamatis



On Tue, Apr 16, 2019 at 5:53 AM Yuzhao Chen <yu...@gmail.com> wrote:

> Now many compute engine do not use Calcite EnumerableXXXs and only use the
> logical node for planning, after all, the Enumerables are implementations
> are only specific to Calcite, I still think Calcite need to give more
> accurate definitions for what equi join is.
>
> Best,
> Danny Chan
> 在 2019年4月16日 +0800 AM12:19,Ruben Q L <ru...@gmail.com>,写道:
> > Danny,
> > I have seen the full picture and I have actually changed mind:
> >
> > If I am not mistaken, currently the way to make your example (and mine)
> to
> > work as an EquiJoin is using intermediate projections (so that RexCall
> > / RexFieldAccess "becomes" RexInputRef):
> >
> > Select A.a, B.b from A join B on cast(A.a as int) = B.b
> >
> > => option 1 (analyzed as equijoin)
> > Project($0, $2)
> > Join(condition: $1 = $2) -- i.e. cast(A.a as int) = B.b
> > Project($0=a; $1=cast($0 as int))
> > Scan(A)
> > Scan(B)
> >
> > => option 2 (analyzed as non-equijoin)
> > Project($0, $1)
> > Join(condition: cast($0 as int) = $1) -- i.e. cast(A.a as int) = B.b
> > Scan(A)
> > Scan(B)
> >
> > It might seem "wrong", but the thing is, the Enumerable implementations
> > that extend EquiJoin (i.e. EnumerableJoin, EnumerableMergeJoin,
> > EnumerableSemiJoin) are based on the EquiJoin fields:
> > public final ImmutableIntList leftKeys;
> > public final ImmutableIntList rightKeys;
> >
> > And rely on the the fact that they are representing an equality on
> leftKeys
> > and rightKeys field indices, and that we can directly generate accessors
> > for these fields without any extra computation (i.e. without any extra
> > call). That's the reason why EquiJoin cannot support RexCall
> > / RexFieldAccess, because they cannot be translatable to a key (i.e. to a
> > field index).
> >
> > With this situation, we could improve this logic to support more complex
> > equijoin conditions; but I think this will not be worth it, because the
> > alternative is quite simple: add a projection for the RexCall
> > / RexFieldAccess and keep the existing (simple) logic.
> >
> > For this reason, I think we should stick to the current logic *an
> equi-join
> > is "field = field", not "expression = field" *and I should abandon and
> > close https://issues.apache.org/jira/browse/CALCITE-2898
> >
> > Best,
> > Ruben
> >
> >
> > Le lun. 15 avr. 2019 à 14:13, Yuzhao Chen <yu...@gmail.com> a
> écrit :
> >
> > > Thx Ruben, the issue really answer my questions, I encounter this when
> > > dong CALCITE-2969, when I refactor SemiJoinRule, I think not only
> > > RexFieldAccess, any RexCall should fit into this case, only if the
> RexCall
> > > function is deterministic, what do you think ?
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年4月15日 +0800 PM7:48,Ruben Q L <ru...@gmail.com>,写道:
> > > > Danny,
> > > > In the context of https://issues.apache.org/jira/browse/CALCITE-2898,
> a
> > > > discussion about this topic was started. In that ticket I pointed out
> > > that
> > > > Calcite does not recognize "RexFieldAccess = RexInputRef" as an
> EquiJoin
> > > > condition (even though the RexFieldAccess itself is referencing a
> > > > RexInputRef); which is somewhat similar to the situation that you
> propose
> > > > "RexCall = RexInputRef". According to Julian Hyde's comment on that
> > > > ticket: *'For
> > > > our purposes, an equi-join is "field = field", not "expression =
> field".
> > > > Even if that expression is a reference to sub-field'. *However, I
> agree
> > > > with you and maybe this definition should be reviewed (I believe your
> > > > example and my example should be valid cases of EquiJoin), but
> possibly
> > > > this will break some pieces of the current code, so the modification
> > > might
> > > > not be straightforward.
> > > >
> > > > Best,
> > > > Ruben
> > > >
> > > >
> > > > Le lun. 15 avr. 2019 à 13:25, Xiening Dai <xn...@live.com> a
> écrit :
> > > >
> > > > > I think Calcite always pushes down equal join conditions. In
> > > > > SqlToRelConverter.createJoin(), before ruction returns, it calls
> > > > > RelOptUtil.pushDownJoinConditions(). So in your example, the cast
> > > > > expression will be pushed down and it will still be an equal join.
> > > > >
> > > > > > On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > If we checkout the java doc for Calcite EuqiJoin, there is
> definition
> > > > > for it:
> > > > > > > for any join whose condition is based on column equality
> > > > > >
> > > > > > But what about if there are function calls in the equi condition
> > > > > operands ? For example:
> > > > > > Should we consider
> > > > > >
> > > > > > Select A.a, B.b from A join B on cast(A.a as int) = B.b
> > > > > >
> > > > > > as an equi join ?
> > > > > >
> > > > > > Now Calcite think it is not, which I think will lost some
> > > possibilities
> > > > > for sql plan promotion, e.g. join condition push down.
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > >
> > > > >
> > >
>

Re: What is the exactly definition as an equi join ?

Posted by Yuzhao Chen <yu...@gmail.com>.
Now many compute engine do not use Calcite EnumerableXXXs and only use the logical node for planning, after all, the Enumerables are implementations are only specific to Calcite, I still think Calcite need to give more accurate definitions for what equi join is.

Best,
Danny Chan
在 2019年4月16日 +0800 AM12:19,Ruben Q L <ru...@gmail.com>,写道:
> Danny,
> I have seen the full picture and I have actually changed mind:
>
> If I am not mistaken, currently the way to make your example (and mine) to
> work as an EquiJoin is using intermediate projections (so that RexCall
> / RexFieldAccess "becomes" RexInputRef):
>
> Select A.a, B.b from A join B on cast(A.a as int) = B.b
>
> => option 1 (analyzed as equijoin)
> Project($0, $2)
> Join(condition: $1 = $2) -- i.e. cast(A.a as int) = B.b
> Project($0=a; $1=cast($0 as int))
> Scan(A)
> Scan(B)
>
> => option 2 (analyzed as non-equijoin)
> Project($0, $1)
> Join(condition: cast($0 as int) = $1) -- i.e. cast(A.a as int) = B.b
> Scan(A)
> Scan(B)
>
> It might seem "wrong", but the thing is, the Enumerable implementations
> that extend EquiJoin (i.e. EnumerableJoin, EnumerableMergeJoin,
> EnumerableSemiJoin) are based on the EquiJoin fields:
> public final ImmutableIntList leftKeys;
> public final ImmutableIntList rightKeys;
>
> And rely on the the fact that they are representing an equality on leftKeys
> and rightKeys field indices, and that we can directly generate accessors
> for these fields without any extra computation (i.e. without any extra
> call). That's the reason why EquiJoin cannot support RexCall
> / RexFieldAccess, because they cannot be translatable to a key (i.e. to a
> field index).
>
> With this situation, we could improve this logic to support more complex
> equijoin conditions; but I think this will not be worth it, because the
> alternative is quite simple: add a projection for the RexCall
> / RexFieldAccess and keep the existing (simple) logic.
>
> For this reason, I think we should stick to the current logic *an equi-join
> is "field = field", not "expression = field" *and I should abandon and
> close https://issues.apache.org/jira/browse/CALCITE-2898
>
> Best,
> Ruben
>
>
> Le lun. 15 avr. 2019 à 14:13, Yuzhao Chen <yu...@gmail.com> a écrit :
>
> > Thx Ruben, the issue really answer my questions, I encounter this when
> > dong CALCITE-2969, when I refactor SemiJoinRule, I think not only
> > RexFieldAccess, any RexCall should fit into this case, only if the RexCall
> > function is deterministic, what do you think ?
> >
> > Best,
> > Danny Chan
> > 在 2019年4月15日 +0800 PM7:48,Ruben Q L <ru...@gmail.com>,写道:
> > > Danny,
> > > In the context of https://issues.apache.org/jira/browse/CALCITE-2898, a
> > > discussion about this topic was started. In that ticket I pointed out
> > that
> > > Calcite does not recognize "RexFieldAccess = RexInputRef" as an EquiJoin
> > > condition (even though the RexFieldAccess itself is referencing a
> > > RexInputRef); which is somewhat similar to the situation that you propose
> > > "RexCall = RexInputRef". According to Julian Hyde's comment on that
> > > ticket: *'For
> > > our purposes, an equi-join is "field = field", not "expression = field".
> > > Even if that expression is a reference to sub-field'. *However, I agree
> > > with you and maybe this definition should be reviewed (I believe your
> > > example and my example should be valid cases of EquiJoin), but possibly
> > > this will break some pieces of the current code, so the modification
> > might
> > > not be straightforward.
> > >
> > > Best,
> > > Ruben
> > >
> > >
> > > Le lun. 15 avr. 2019 à 13:25, Xiening Dai <xn...@live.com> a écrit :
> > >
> > > > I think Calcite always pushes down equal join conditions. In
> > > > SqlToRelConverter.createJoin(), before ruction returns, it calls
> > > > RelOptUtil.pushDownJoinConditions(). So in your example, the cast
> > > > expression will be pushed down and it will still be an equal join.
> > > >
> > > > > On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > If we checkout the java doc for Calcite EuqiJoin, there is definition
> > > > for it:
> > > > > > for any join whose condition is based on column equality
> > > > >
> > > > > But what about if there are function calls in the equi condition
> > > > operands ? For example:
> > > > > Should we consider
> > > > >
> > > > > Select A.a, B.b from A join B on cast(A.a as int) = B.b
> > > > >
> > > > > as an equi join ?
> > > > >
> > > > > Now Calcite think it is not, which I think will lost some
> > possibilities
> > > > for sql plan promotion, e.g. join condition push down.
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > >
> > > >
> >

Re: What is the exactly definition as an equi join ?

Posted by Ruben Q L <ru...@gmail.com>.
Danny,
I have seen the full picture and I have actually changed mind:

If I am not mistaken, currently the way to make your example (and mine) to
work as an EquiJoin is using intermediate projections (so that RexCall
/ RexFieldAccess "becomes" RexInputRef):

Select A.a, B.b from A join B on cast(A.a as int) = B.b

=> option 1 (analyzed as equijoin)
Project($0, $2)
  Join(condition: $1 = $2)  --  i.e. cast(A.a as int) = B.b
    Project($0=a; $1=cast($0 as int))
      Scan(A)
    Scan(B)

=> option 2 (analyzed as non-equijoin)
Project($0, $1)
  Join(condition: cast($0 as int) = $1)  --  i.e. cast(A.a as int) = B.b
    Scan(A)
    Scan(B)

It might seem "wrong", but the thing is, the Enumerable implementations
that extend EquiJoin (i.e. EnumerableJoin, EnumerableMergeJoin,
EnumerableSemiJoin) are based on the EquiJoin fields:
public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;

And rely on the the fact that they are representing an equality on leftKeys
and  rightKeys  field indices, and that we can directly generate accessors
for these fields without any extra computation (i.e. without any extra
call). That's the reason why EquiJoin cannot support RexCall
/ RexFieldAccess, because they cannot be translatable to a key (i.e. to a
field index).

With this situation, we could improve this logic to support more complex
equijoin conditions; but I think this will not be worth it, because the
alternative is quite simple: add a projection for the RexCall
/ RexFieldAccess and keep the existing (simple) logic.

For this reason, I think we should stick to the current logic *an equi-join
is "field = field", not "expression = field"  *and I should abandon and
close https://issues.apache.org/jira/browse/CALCITE-2898

Best,
Ruben


Le lun. 15 avr. 2019 à 14:13, Yuzhao Chen <yu...@gmail.com> a écrit :

> Thx Ruben, the issue really answer my questions, I encounter this when
> dong CALCITE-2969, when I refactor SemiJoinRule, I think not only
> RexFieldAccess, any RexCall should fit into this case, only if the RexCall
> function is deterministic, what do you think ?
>
> Best,
> Danny Chan
> 在 2019年4月15日 +0800 PM7:48,Ruben Q L <ru...@gmail.com>,写道:
> > Danny,
> > In the context of https://issues.apache.org/jira/browse/CALCITE-2898, a
> > discussion about this topic was started. In that ticket I pointed out
> that
> > Calcite does not recognize "RexFieldAccess = RexInputRef" as an EquiJoin
> > condition (even though the RexFieldAccess itself is referencing a
> > RexInputRef); which is somewhat similar to the situation that you propose
> > "RexCall = RexInputRef". According to Julian Hyde's comment on that
> > ticket: *'For
> > our purposes, an equi-join is "field = field", not "expression = field".
> > Even if that expression is a reference to sub-field'. *However, I agree
> > with you and maybe this definition should be reviewed (I believe your
> > example and my example should be valid cases of EquiJoin), but possibly
> > this will break some pieces of the current code, so the modification
> might
> > not be straightforward.
> >
> > Best,
> > Ruben
> >
> >
> > Le lun. 15 avr. 2019 à 13:25, Xiening Dai <xn...@live.com> a écrit :
> >
> > > I think Calcite always pushes down equal join conditions. In
> > > SqlToRelConverter.createJoin(), before ruction returns, it calls
> > > RelOptUtil.pushDownJoinConditions(). So in your example, the cast
> > > expression will be pushed down and it will still be an equal join.
> > >
> > > > On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <yu...@gmail.com>
> wrote:
> > > >
> > > > If we checkout the java doc for Calcite EuqiJoin, there is definition
> > > for it:
> > > > > for any join whose condition is based on column equality
> > > >
> > > > But what about if there are function calls in the equi condition
> > > operands ? For example:
> > > > Should we consider
> > > >
> > > > Select A.a, B.b from A join B on cast(A.a as int) = B.b
> > > >
> > > > as an equi join ?
> > > >
> > > > Now Calcite think it is not, which I think will lost some
> possibilities
> > > for sql plan promotion, e.g. join condition push down.
> > > >
> > > > Best,
> > > > Danny Chan
> > >
> > >
>

Re: What is the exactly definition as an equi join ?

Posted by Yuzhao Chen <yu...@gmail.com>.
Thx Ruben, the issue really answer my questions, I encounter this when dong CALCITE-2969, when I refactor SemiJoinRule, I think not only RexFieldAccess, any RexCall should fit into this case, only if the RexCall function is deterministic, what do you think ?

Best,
Danny Chan
在 2019年4月15日 +0800 PM7:48,Ruben Q L <ru...@gmail.com>,写道:
> Danny,
> In the context of https://issues.apache.org/jira/browse/CALCITE-2898, a
> discussion about this topic was started. In that ticket I pointed out that
> Calcite does not recognize "RexFieldAccess = RexInputRef" as an EquiJoin
> condition (even though the RexFieldAccess itself is referencing a
> RexInputRef); which is somewhat similar to the situation that you propose
> "RexCall = RexInputRef". According to Julian Hyde's comment on that
> ticket: *'For
> our purposes, an equi-join is "field = field", not "expression = field".
> Even if that expression is a reference to sub-field'. *However, I agree
> with you and maybe this definition should be reviewed (I believe your
> example and my example should be valid cases of EquiJoin), but possibly
> this will break some pieces of the current code, so the modification might
> not be straightforward.
>
> Best,
> Ruben
>
>
> Le lun. 15 avr. 2019 à 13:25, Xiening Dai <xn...@live.com> a écrit :
>
> > I think Calcite always pushes down equal join conditions. In
> > SqlToRelConverter.createJoin(), before ruction returns, it calls
> > RelOptUtil.pushDownJoinConditions(). So in your example, the cast
> > expression will be pushed down and it will still be an equal join.
> >
> > > On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <yu...@gmail.com> wrote:
> > >
> > > If we checkout the java doc for Calcite EuqiJoin, there is definition
> > for it:
> > > > for any join whose condition is based on column equality
> > >
> > > But what about if there are function calls in the equi condition
> > operands ? For example:
> > > Should we consider
> > >
> > > Select A.a, B.b from A join B on cast(A.a as int) = B.b
> > >
> > > as an equi join ?
> > >
> > > Now Calcite think it is not, which I think will lost some possibilities
> > for sql plan promotion, e.g. join condition push down.
> > >
> > > Best,
> > > Danny Chan
> >
> >

Re: What is the exactly definition as an equi join ?

Posted by Ruben Q L <ru...@gmail.com>.
Danny,
In the context of https://issues.apache.org/jira/browse/CALCITE-2898, a
discussion about this topic was started. In that ticket I pointed out that
Calcite does not recognize "RexFieldAccess = RexInputRef" as an EquiJoin
condition (even though the RexFieldAccess itself is referencing a
RexInputRef); which is somewhat similar to the situation that you propose
"RexCall = RexInputRef". According to Julian Hyde's comment on that
ticket: *'For
our purposes, an equi-join is "field = field", not "expression = field".
Even if that expression is a reference to sub-field'. *However, I agree
with you and maybe this definition should be reviewed (I believe your
example and my example should be valid cases of EquiJoin), but possibly
this will break some pieces of the current code, so the modification might
not be straightforward.

Best,
Ruben


Le lun. 15 avr. 2019 à 13:25, Xiening Dai <xn...@live.com> a écrit :

> I think Calcite always pushes down equal join conditions. In
> SqlToRelConverter.createJoin(), before ruction returns, it calls
> RelOptUtil.pushDownJoinConditions(). So in your example, the cast
> expression will be pushed down and it will still be an equal join.
>
> > On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <yu...@gmail.com> wrote:
> >
> > If we checkout the java doc for Calcite EuqiJoin, there is definition
> for it:
> >>  for any join whose condition is based on column equality
> >
> > But what about if there are function calls in the equi condition
> operands ? For example:
> > Should we consider
> >
> > Select A.a, B.b from A join B on cast(A.a as int) = B.b
> >
> > as an equi join ?
> >
> > Now Calcite think it is not, which I think will lost some possibilities
> for sql plan promotion, e.g. join condition push down.
> >
> > Best,
> > Danny Chan
>
>

Re: What is the exactly definition as an equi join ?

Posted by Xiening Dai <xn...@live.com>.
I think Calcite always pushes down equal join conditions. In SqlToRelConverter.createJoin(), before ruction returns, it calls RelOptUtil.pushDownJoinConditions(). So in your example, the cast expression will be pushed down and it will still be an equal join.

> On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <yu...@gmail.com> wrote:
> 
> If we checkout the java doc for Calcite EuqiJoin, there is definition for it:
>>  for any join whose condition is based on column equality
> 
> But what about if there are function calls in the equi condition operands ? For example:
> Should we consider
> 
> Select A.a, B.b from A join B on cast(A.a as int) = B.b
> 
> as an equi join ?
> 
> Now Calcite think it is not, which I think will lost some possibilities for sql plan promotion, e.g. join condition push down.
> 
> Best,
> Danny Chan