You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2020/08/24 08:11:34 UTC

[GitHub] [calcite] rubenada opened a new pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

rubenada opened a new pull request #2107:
URL: https://github.com/apache/calcite/pull/2107


   [CALCITE-4113] Support LEFT join in EnumerableMergeJoin


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] chunweilei commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
chunweilei commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r471875498



##########
File path: linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java
##########
@@ -4223,8 +4227,10 @@ private boolean advanceLeft(TSource left, TKey leftKey) {
         TKey leftKey2 = outerKeySelector.apply(left);
         if (leftKey2 == null) {
           // mergeJoin assumes inputs sorted in ascending order with nulls last,
-          // if we reach a null key, we are done
-          break;
+          // if we reach a null key, we are done (except LEFT join, that needs to process LHS fully)
+          if (joinType != JoinType.LEFT) {
+            break;
+          }
         }

Review comment:
       Could we merge these two if conditions?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] eolivelli commented on pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#issuecomment-678433122


   @amaliujia
   We are using Enumerable Merge Join implementation so we will have to simply wire this new option.
   
   See
   
   https://github.com/diennea/herddb/blob/6a2b38093043bb20da729d84719fce18ad91f2d3/herddb-core/src/main/java/herddb/model/planner/JoinOp.java#L158


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r474691265



##########
File path: core/src/test/resources/sql/sub-query.iq
##########
@@ -1761,9 +1761,10 @@ select sal from "scott".emp e
 
 !ok
 EnumerableCalc(expr#0..4=[{inputs}], expr#5=[NOT($t4)], expr#6=[IS NOT NULL($t4)], expr#7=[OR($t5, $t6)], expr#8=[IS NOT TRUE($t7)], SAL=[$t1], $condition=[$t8])
-  EnumerableHashJoin(condition=[=($2, $3)], joinType=[left])
-    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7])
-      EnumerableTableScan(table=[[scott, EMP]])
+  EnumerableMergeJoin(condition=[=($2, $3)], joinType=[left])

Review comment:
       I have looked in the debugger, and everything seems ok.
   It seems that "scott" tables (like `scott.emp `and `scott.dep` in this example) are already sorted by default by their keys (their Scan operator contain already the trait Collation[0]).
   Since Collation[0] is the required collation for the right input of this EnumerableMergeJoin, there is no need to use a sort operator on the right. However, since the left input requires Collation[2], an EnumerableSort is required on the left.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r474857174



##########
File path: core/src/test/resources/sql/sub-query.iq
##########
@@ -1761,9 +1761,10 @@ select sal from "scott".emp e
 
 !ok
 EnumerableCalc(expr#0..4=[{inputs}], expr#5=[NOT($t4)], expr#6=[IS NOT NULL($t4)], expr#7=[OR($t5, $t6)], expr#8=[IS NOT TRUE($t7)], SAL=[$t1], $condition=[$t8])
-  EnumerableHashJoin(condition=[=($2, $3)], joinType=[left])
-    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7])
-      EnumerableTableScan(table=[[scott, EMP]])
+  EnumerableMergeJoin(condition=[=($2, $3)], joinType=[left])

Review comment:
       Thanks for the investigation. So no sort on the right side is an optimization of Calcite (thus intended behavior). If so it is no longer a concern. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#issuecomment-678562794


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#issuecomment-678426993


   The PR overall looks good to me. I will take a closer look on those changes of merge join Enumerable implementation. After that I can sign off this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r474301415



##########
File path: core/src/test/resources/sql/sub-query.iq
##########
@@ -1761,9 +1761,10 @@ select sal from "scott".emp e
 
 !ok
 EnumerableCalc(expr#0..4=[{inputs}], expr#5=[NOT($t4)], expr#6=[IS NOT NULL($t4)], expr#7=[OR($t5, $t6)], expr#8=[IS NOT TRUE($t7)], SAL=[$t1], $condition=[$t8])
-  EnumerableHashJoin(condition=[=($2, $3)], joinType=[left])
-    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7])
-      EnumerableTableScan(table=[[scott, EMP]])
+  EnumerableMergeJoin(condition=[=($2, $3)], joinType=[left])

Review comment:
       Top-down optimization could fix this problem thought (which is not enabled by default). The top-down opt can enforce collations by inserting Sort.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r474486590



##########
File path: core/src/test/resources/sql/blank.iq
##########
@@ -90,16 +90,20 @@ insert into table2 values (NULL, 1), (2, 1);
 !set lateDecorrelate true
 select i, j from table1 where table1.j NOT IN (select i from table2 where table1.i=table2.j);
 EnumerableCalc(expr#0..7=[{inputs}], expr#8=[0], expr#9=[=($t3, $t8)], expr#10=[IS NULL($t1)], expr#11=[IS NOT NULL($t7)], expr#12=[<($t4, $t3)], expr#13=[OR($t10, $t11, $t12)], expr#14=[IS NOT TRUE($t13)], expr#15=[OR($t9, $t14)], proj#0..1=[{exprs}], $condition=[$t15])
-  EnumerableHashJoin(condition=[AND(=($0, $6), =($1, $5))], joinType=[left])
-    EnumerableHashJoin(condition=[=($0, $2)], joinType=[left])
-      EnumerableTableScan(table=[[BLANK, TABLE1]])
-      EnumerableAggregate(group=[{1}], c=[COUNT()], ck=[COUNT($0)])
-        EnumerableCalc(expr#0..1=[{inputs}], expr#2=[IS NOT NULL($t1)], proj#0..1=[{exprs}], $condition=[$t2])
-          EnumerableTableScan(table=[[BLANK, TABLE2]])
-    EnumerableCalc(expr#0..1=[{inputs}], expr#2=[true], proj#0..2=[{exprs}])
-      EnumerableAggregate(group=[{0, 1}])
-        EnumerableCalc(expr#0..1=[{inputs}], expr#2=[IS NOT NULL($t1)], expr#3=[IS NOT NULL($t0)], expr#4=[AND($t2, $t3)], proj#0..1=[{exprs}], $condition=[$t4])
-          EnumerableTableScan(table=[[BLANK, TABLE2]])
+  EnumerableMergeJoin(condition=[AND(=($0, $6), =($1, $5))], joinType=[left])

Review comment:
       Yes, you are correct, that is the cost-based model decision (same thing happens already with INNER joins).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r474299358



##########
File path: core/src/test/resources/sql/sub-query.iq
##########
@@ -1761,9 +1761,10 @@ select sal from "scott".emp e
 
 !ok
 EnumerableCalc(expr#0..4=[{inputs}], expr#5=[NOT($t4)], expr#6=[IS NOT NULL($t4)], expr#7=[OR($t5, $t6)], expr#8=[IS NOT TRUE($t7)], SAL=[$t1], $condition=[$t8])
-  EnumerableHashJoin(condition=[=($2, $3)], joinType=[left])
-    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7])
-      EnumerableTableScan(table=[[scott, EMP]])
+  EnumerableMergeJoin(condition=[=($2, $3)], joinType=[left])

Review comment:
       Agreed. In fact, I am actually thinking this is a case of wrong plan (more serious than a "concern"). Because for merge join implementation (not only Enumerables but other system's implementation), I think assuming input is sorted is reasonable. So for systems that only use Calcite as a planner, they might see surprising plans   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
rubenada commented on pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#issuecomment-677645841


   @hsyuan @amaliujia maybe you could help reviewing this one, please?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r474299358



##########
File path: core/src/test/resources/sql/sub-query.iq
##########
@@ -1761,9 +1761,10 @@ select sal from "scott".emp e
 
 !ok
 EnumerableCalc(expr#0..4=[{inputs}], expr#5=[NOT($t4)], expr#6=[IS NOT NULL($t4)], expr#7=[OR($t5, $t6)], expr#8=[IS NOT TRUE($t7)], SAL=[$t1], $condition=[$t8])
-  EnumerableHashJoin(condition=[=($2, $3)], joinType=[left])
-    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7])
-      EnumerableTableScan(table=[[scott, EMP]])
+  EnumerableMergeJoin(condition=[=($2, $3)], joinType=[left])

Review comment:
       Agreed. In fact, I am actually thinking this is a case of wrong plan (more serious than a "concern"). Because for merge join implementation (not only Enumerables but other system's implementation), I think assuming sorted inputs is reasonable. So for systems that only use Calcite as a planner, they might see surprising plans   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada merged pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
rubenada merged pull request #2107:
URL: https://github.com/apache/calcite/pull/2107


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#issuecomment-677800301


   No problem. I will give a code review tomorrow (pretty busy today and also need to pick up the context of this JIRA).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r474301415



##########
File path: core/src/test/resources/sql/sub-query.iq
##########
@@ -1761,9 +1761,10 @@ select sal from "scott".emp e
 
 !ok
 EnumerableCalc(expr#0..4=[{inputs}], expr#5=[NOT($t4)], expr#6=[IS NOT NULL($t4)], expr#7=[OR($t5, $t6)], expr#8=[IS NOT TRUE($t7)], SAL=[$t1], $condition=[$t8])
-  EnumerableHashJoin(condition=[=($2, $3)], joinType=[left])
-    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7])
-      EnumerableTableScan(table=[[scott, EMP]])
+  EnumerableMergeJoin(condition=[=($2, $3)], joinType=[left])

Review comment:
       Top-down optimization could fix this problem though (which is not enabled by default). The top-down opt can enforce collations by inserting Sort.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r471963381



##########
File path: linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java
##########
@@ -4223,8 +4227,10 @@ private boolean advanceLeft(TSource left, TKey leftKey) {
         TKey leftKey2 = outerKeySelector.apply(left);
         if (leftKey2 == null) {
           // mergeJoin assumes inputs sorted in ascending order with nulls last,
-          // if we reach a null key, we are done
-          break;
+          // if we reach a null key, we are done (except LEFT join, that needs to process LHS fully)
+          if (joinType != JoinType.LEFT) {
+            break;
+          }
         }

Review comment:
       well spotted, thanks.
   done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#issuecomment-678431304


   @eolivelli 
   
   Out of curiosity:  Can you share what kind of impact it could have on HerdDB?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r471985036



##########
File path: core/src/test/resources/sql/sub-query.iq
##########
@@ -1761,9 +1761,10 @@ select sal from "scott".emp e
 
 !ok
 EnumerableCalc(expr#0..4=[{inputs}], expr#5=[NOT($t4)], expr#6=[IS NOT NULL($t4)], expr#7=[OR($t5, $t6)], expr#8=[IS NOT TRUE($t7)], SAL=[$t1], $condition=[$t8])
-  EnumerableHashJoin(condition=[=($2, $3)], joinType=[left])
-    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7])
-      EnumerableTableScan(table=[[scott, EMP]])
+  EnumerableMergeJoin(condition=[=($2, $3)], joinType=[left])

Review comment:
       I am bit concerned about some of the "new" plans in `sub-query.iq`, which introduce `EnumerableMergeJoin` (with `joinType=left`), but only one of the join inputs is sorted with an `EnumerableSort` (in this example the left input). The other input (in this case, the right) does not contain the Sort operator, so this could be a problem.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2107:
URL: https://github.com/apache/calcite/pull/2107#discussion_r474304854



##########
File path: core/src/test/resources/sql/blank.iq
##########
@@ -90,16 +90,20 @@ insert into table2 values (NULL, 1), (2, 1);
 !set lateDecorrelate true
 select i, j from table1 where table1.j NOT IN (select i from table2 where table1.i=table2.j);
 EnumerableCalc(expr#0..7=[{inputs}], expr#8=[0], expr#9=[=($t3, $t8)], expr#10=[IS NULL($t1)], expr#11=[IS NOT NULL($t7)], expr#12=[<($t4, $t3)], expr#13=[OR($t10, $t11, $t12)], expr#14=[IS NOT TRUE($t13)], expr#15=[OR($t9, $t14)], proj#0..1=[{exprs}], $condition=[$t15])
-  EnumerableHashJoin(condition=[AND(=($0, $6), =($1, $5))], joinType=[left])
-    EnumerableHashJoin(condition=[=($0, $2)], joinType=[left])
-      EnumerableTableScan(table=[[BLANK, TABLE1]])
-      EnumerableAggregate(group=[{1}], c=[COUNT()], ck=[COUNT($0)])
-        EnumerableCalc(expr#0..1=[{inputs}], expr#2=[IS NOT NULL($t1)], proj#0..1=[{exprs}], $condition=[$t2])
-          EnumerableTableScan(table=[[BLANK, TABLE2]])
-    EnumerableCalc(expr#0..1=[{inputs}], expr#2=[true], proj#0..2=[{exprs}])
-      EnumerableAggregate(group=[{0, 1}])
-        EnumerableCalc(expr#0..1=[{inputs}], expr#2=[IS NOT NULL($t1)], expr#3=[IS NOT NULL($t0)], expr#4=[AND($t2, $t3)], proj#0..1=[{exprs}], $condition=[$t4])
-          EnumerableTableScan(table=[[BLANK, TABLE2]])
+  EnumerableMergeJoin(condition=[AND(=($0, $6), =($1, $5))], joinType=[left])

Review comment:
       Just an observation, looks like in many cases, Calcite believes merge join are better than hash join. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada closed pull request #2107: [CALCITE-4113] Support LEFT join in EnumerableMergeJoin

Posted by GitBox <gi...@apache.org>.
rubenada closed pull request #2107:
URL: https://github.com/apache/calcite/pull/2107


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org