You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2020/05/31 09:19:34 UTC

[GitHub] [calcite] amaliujia opened a new pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

amaliujia opened a new pull request #1995:
URL: https://github.com/apache/calcite/pull/1995


   … EnumerableNestedLoopJoin


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-639895051


   @hsyuan 
   Thanks for your suggestion. Added traits derivation and properly tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r432927229



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       @hsyuan 
   
   I want to see this plan (sort on left join input for this left outer join), however its cost is too high(`{67.94680261461362 rows, 1346.0848941260901 cpu, 0.0 io}`)
   
   So Calcite always chooses 
   ```
   EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
     EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
       EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
       EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
   ```
   whose cost is `{54.94680261461362 rows, 68.0 cpu, 0.0 io}`.
   
   It could be make sense if EnumerableHashJoin reduces number of tuples thus Sort at the end could have less cost to sort joined result.
   
   
   Do you have a suggestion how to deterministically test trait pass down for EnumerableHashJoin?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435005777



##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       Problem is solved. I noticed that there are several large tables in `customers` schema. Those large tables can generate plans I want.And if there are future needs, we can always mock new tables for testing purpose.
   
   I pushed a new commit to illustrate which table I am using.
   
   With good input data, I will continue implementation work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r434049773



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {

Review comment:
       Thanks @rubenada for the reminder. It's surprising though. Let me check the implementation of EnumerableHashJoin to see how it make right to build hash table on right input for **RIGHT OUTER JOIN**.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433701229



##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       @hsyuan 
   Finally I find a way to "fool" planner to choose the plan I want, basically the plan that pushes sort down to left child of hash join.
   
   The reason that simple join's cheapest plan has a Sort on top of HashJoin, is because the Join's estimated is computed as simple as `left_row_count * right_row_count * 0.15`, 0.15 is selectively. When inputs's row count is small, the selectively plays a bigger role, thus HashJoin gives a small cost, thus Sort is put after Join because Join gives a very small row count estimation.
   
   So I intentionally use UNION to increase input size, which make the  ``left_row_count * right_row_count` plays a bigger role for Join, and then Join produces larger row count estimation. Then finally planner believes push down Sort is cheaper, thus gives the plan I want to see. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-639279517


   @hsyuan 
   
   I have push commits for collation pushdown for both EnumerableHashJoin and EnumerableNestedLoopJoin.
   
   Regarding traits derivation, I gave some thoughts on it and now I think it is not useful. It is because for both EnumerableHashJoin and EnumerableNestedLoopJoin, they can perserve ordering from left join input. Based on examples of MergeJoin, seems like the goal of trait derivation was to enforce a sort on right input, which does not work for hashjoin and nested loop join in this case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia edited a comment on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-640872951






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r432927229



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       @hsyuan 
   
   I want to see this plan (sort on left join input for this left outer join), however its cost is too high(`{67.94680261461362 rows, 1346.0848941260901 cpu, 0.0 io}`)
   
   So Calcite always chooses 
   EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
     EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
       EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
       EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
   
   whose cost is `{54.94680261461362 rows, 68.0 cpu, 0.0 io}`.
   
   It could be make sense if EnumerableHashJoin reduces number of tuples thus Sort at the end could have less cost to sort joined result.
   
   
   Do you have a suggestion how to deterministically test trait pass down for EnumerableHashJoin?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436290293



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableNestedLoopJoin.java
##########
@@ -118,6 +129,104 @@ public static EnumerableNestedLoopJoin create(
     return cost;
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    // EnumerableNestedLoopJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableNestedLoopJoin always uses left input as the outer loop,
+    // thus only left input can preserve ordering.
+    // Push sort both to left and right inputs does not help right outer join. It's because in
+    // implementation, EnumerableNestedLoopJoin produces (null, right_unmatched) all together,
+    // which does not preserve ordering from right side.
+
+
+    Set<Integer> fields = extractTrivialFieldOfLeftInputFromJoinCondition();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (!fields.contains(fc.getFieldIndex())) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    Set<Integer> fields = extractTrivialFieldOfLeftInputFromJoinCondition();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (!fields.contains(fc.getFieldIndex())) {
+        return null;
+      }
+    }
+
+    List<RelFieldCollation> collationFieldsToDerive = new ArrayList<>();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (fields.contains(fc.getFieldIndex())) {
+        collationFieldsToDerive.add(fc);
+      } else {
+        break;
+      }
+    }
+
+    if (collationFieldsToDerive.size() > 0) {
+      final RelCollation newCollation = RelCollations
+          .of(collationFieldsToDerive);
+      return Pair.of(getTraitSet().replace(newCollation),
+          ImmutableList.of(childTraits, right.getTraitSet()));
+    } else {
+      return null;
+    }
+  }
+
+  @Override public DeriveMode getDeriveMode() {
+    if (joinType == JoinRelType.FULL || joinType == JoinRelType.RIGHT) {
+      return DeriveMode.PROHIBITED;
+    }
+
+    return DeriveMode.LEFT_FIRST;
+  }
+
+  private Set<Integer> extractTrivialFieldOfLeftInputFromJoinCondition() {

Review comment:
       Correct.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r432953832



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {

Review comment:
       On the other hand, it shows exactly that trait propagation should only be done on physical operators, because it is physical implementation dependent.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436306805



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableNestedLoopJoin.java
##########
@@ -118,6 +124,64 @@ public static EnumerableNestedLoopJoin create(
     return cost;
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    // EnumerableNestedLoopJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableNestedLoopJoin always uses left input as the outer loop,
+    // thus only left input can preserve ordering.
+    // Push sort both to left and right inputs does not help right outer join. It's because in
+    // implementation, EnumerableNestedLoopJoin produces (null, right_unmatched) all together,
+    // which does not preserve ordering from right side.
+
+
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      // If field collation belongs to right input: cannot push down collation.
+      if (fc.getFieldIndex() >= getLeft().getRowType().getFieldCount()) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    return Pair.of(
+        getTraitSet().replace(collation),
+        ImmutableList.of(childTraits, right.getTraitSet()));

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r434983920



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(leftKeySet)) {
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required,
+                required.replace(RelCollations.EMPTY)));
+      }
+    } else if (joinType == JoinRelType.RIGHT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(rightKeySet)) {
+        RelCollation rightCollation = RelCollations.shift(collation,
+            -left.getRowType().getFieldCount());
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required.replace(RelCollations.EMPTY),
+                required.replace(rightCollation)));
+      }
+    }
+
+    return null;
+  }
+
+//  @Override public DeriveMode getDeriveMode() {
+//    return DeriveMode.BOTH;

Review comment:
       Sounds good for full outer join (will use `DeriveMode.PROHIBITED` to dis-allow it)
   
   Still understanding why RIGHT OUTER JOIN should be forbidon (why EnumerableHashJoin always builds hash table on right input).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436290152



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableNestedLoopJoin.java
##########
@@ -118,6 +129,104 @@ public static EnumerableNestedLoopJoin create(
     return cost;
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    // EnumerableNestedLoopJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableNestedLoopJoin always uses left input as the outer loop,
+    // thus only left input can preserve ordering.
+    // Push sort both to left and right inputs does not help right outer join. It's because in
+    // implementation, EnumerableNestedLoopJoin produces (null, right_unmatched) all together,
+    // which does not preserve ordering from right side.
+
+
+    Set<Integer> fields = extractTrivialFieldOfLeftInputFromJoinCondition();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (!fields.contains(fc.getFieldIndex())) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    Set<Integer> fields = extractTrivialFieldOfLeftInputFromJoinCondition();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (!fields.contains(fc.getFieldIndex())) {
+        return null;
+      }
+    }
+
+    List<RelFieldCollation> collationFieldsToDerive = new ArrayList<>();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (fields.contains(fc.getFieldIndex())) {
+        collationFieldsToDerive.add(fc);
+      } else {
+        break;
+      }
+    }
+
+    if (collationFieldsToDerive.size() > 0) {
+      final RelCollation newCollation = RelCollations
+          .of(collationFieldsToDerive);
+      return Pair.of(getTraitSet().replace(newCollation),
+          ImmutableList.of(childTraits, right.getTraitSet()));
+    } else {
+      return null;
+    }
+  }
+
+  @Override public DeriveMode getDeriveMode() {
+    if (joinType == JoinRelType.FULL || joinType == JoinRelType.RIGHT) {
+      return DeriveMode.PROHIBITED;
+    }
+
+    return DeriveMode.LEFT_FIRST;
+  }
+
+  private Set<Integer> extractTrivialFieldOfLeftInputFromJoinCondition() {

Review comment:
       In fact, I should also change NestedLoopJoin because it also does not require to match with join keys.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-639850789


   How about this:
   ```sql
   select * from (select a,b,c from foo oder by a desc, b desc limit 10) foo
   join bar on foo.a = bar.a
   order by foo.a desc, foo.b desc;
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436240823



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,81 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset
+    return null;
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    if (childId > 0) {

Review comment:
       Ah yes. I think it is because this code block
   
   ```
           if (mode == DeriveMode.LEFT_FIRST
               || mode == DeriveMode.RIGHT_FIRST) {
             break;
           }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-640761358






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436302600



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableNestedLoopJoin.java
##########
@@ -118,6 +124,64 @@ public static EnumerableNestedLoopJoin create(
     return cost;
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    // EnumerableNestedLoopJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableNestedLoopJoin always uses left input as the outer loop,
+    // thus only left input can preserve ordering.
+    // Push sort both to left and right inputs does not help right outer join. It's because in
+    // implementation, EnumerableNestedLoopJoin produces (null, right_unmatched) all together,
+    // which does not preserve ordering from right side.
+
+
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      // If field collation belongs to right input: cannot push down collation.
+      if (fc.getFieldIndex() >= getLeft().getRowType().getFieldCount()) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    return Pair.of(
+        getTraitSet().replace(collation),
+        ImmutableList.of(childTraits, right.getTraitSet()));

Review comment:
       We also need to replace childTraits with enumerable convention. also hashjoin.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436387179



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableNestedLoopJoin.java
##########
@@ -118,6 +124,65 @@ public static EnumerableNestedLoopJoin create(
     return cost;
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    // EnumerableNestedLoopJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableNestedLoopJoin always uses left input as the outer loop,
+    // thus only left input can preserve ordering.
+    // Push sort both to left and right inputs does not help right outer join. It's because in
+    // implementation, EnumerableNestedLoopJoin produces (null, right_unmatched) all together,
+    // which does not preserve ordering from right side.
+
+
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      // If field collation belongs to right input: cannot push down collation.
+      if (fc.getFieldIndex() >= getLeft().getRowType().getFieldCount()) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {

Review comment:
       Ah I see. The `getDeriveMode` method is used for traits derivation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436215966



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,81 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {

Review comment:
       hash join / nested loopjoin are different with MergeJoin. The required collation keys can be any keys other than join keys. 
   ```sql
   select * from foo join bar using (a) order by foo.b;
   ```
   We can still pass down the collation request to HashJoin's left child, because hashjoin doesn't require any collation from its left child.
   But for mergejoin, we can't pass down. because the join's parent requires collation on `b`, but the mergejoin requires its children to be sorted on `a`.

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,45 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset

Review comment:
       so this comment should be removed.

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,81 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset
+    return null;
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    if (childId > 0) {
+      return null;
+    }
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    final int colCount = collation.getFieldCollations().size();
+    final int keyCount = joinInfo.leftKeys.size();
+    if (colCount < keyCount || keyCount == 0) {
+      return null;
+    }

Review comment:
       this is not needed. we can derive any collation from its left child. no matter what are the join keys.

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,81 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset
+    return null;
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    if (childId > 0) {
+      return null;
+    }
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    final int colCount = collation.getFieldCollations().size();
+    final int keyCount = joinInfo.leftKeys.size();
+    if (colCount < keyCount || keyCount == 0) {
+      return null;
+    }
+
+    if (colCount > keyCount) {
+      collation = RelCollations.of(collation.getFieldCollations().subList(0, keyCount));
+    }

Review comment:
       same here.

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,81 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset
+    return null;
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    if (childId > 0) {

Review comment:
       should be `assert childId == 0`, it is illegal to be greater than 0




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435711254



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,45 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset

Review comment:
       How about this query in which it uses HashJoin:
   ```
   select * from
   (select * from) r join
   (select * from) s 
   on r.a = s.b and r.b = s.b
   order by r.a desc
   ```
   
   I think we can still pass down collation `r.a desc` to left input?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435005777



##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       Problem is solved. I noticed that there are several large tables in `customers` schema. Those large tables can generate plans I want.And if there are future needs, we can always mock new tables for optimization testing.
   
   I pushed a new commit to illustrate which table I am using.
   
   With good input data, I will continue implementation work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r432958466



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(leftKeySet)) {
+        return Pair.of(
+            required,

Review comment:
       should not use required directly, because required may have different convention. Itself and children's traits should all have EnumerableConvention. See how EnumerableFilter does.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-639283407


   Ok then. I can go ahead to support trait derivation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435711527



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,45 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset
+    return null;
+  }
+
+  @Override public DeriveMode getDeriveMode() {
+    if (joinType == JoinRelType.FULL || joinType == JoinRelType.RIGHT) {
+      return DeriveMode.PROHIBITED;
+    }
+
+    return DeriveMode.BOTH;

Review comment:
       Unless you want to support distribution, otherwise, `LEFT_FIRST` should be the only choice.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436379664



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,57 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      // If field collation belongs to right input: cannot push down collation.
+      if (fc.getFieldIndex() >= getLeft().getRowType().getFieldCount()) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {

Review comment:
       and here

##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,472 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select contactno, email from customer.contact_peek) r left outer join
+(select acctno, type from customer.account) s
+on r.contactno=s.acctno and r.email=s.type
+order by r.contactno desc, r.email desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$0], sort1=[$1], dir0=[DESC], dir1=[DESC])
+  LogicalProject(CONTACTNO=[$0], EMAIL=[$1], ACCTNO=[$2], TYPE=[$3])
+    LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[left])
+      LogicalProject(CONTACTNO=[$0], EMAIL=[$3])
+        LogicalTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+      LogicalProject(ACCTNO=[$0], TYPE=[$1])
+        LogicalTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[left])
+  EnumerableProject(CONTACTNO=[$0], EMAIL=[$3])
+    EnumerableSort(sort0=[$0], sort1=[$3], dir0=[DESC], dir1=[DESC])
+      EnumerableTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+  EnumerableProject(ACCTNO=[$0], TYPE=[$1])
+    EnumerableTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort2">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+customer.contact_peek r left outer join customer.account s
+on r.contactno=s.acctno and r.email=s.type
+order by r.fname desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4], Y=[$5], unit=[$6], COORD_NE=[ROW($7, ROW($8, $9))], ACCTNO=[$10], TYPE=[$11], BALANCE=[$12])
+  LogicalSort(sort0=[$1], dir0=[DESC])
+    LogicalProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4], Y=[$5], unit=[$6], COORD_NE=[$7], COORD_NE8=[$8], COORD_NE9=[$9], ACCTNO=[$10], TYPE=[$11], BALANCE=[$12])
+      LogicalJoin(condition=[AND(=($0, $10), =($3, $11))], joinType=[left])
+        LogicalProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4.X], Y=[$4.Y], unit=[$4.unit], M=[$5.M], A=[$5.SUB.A], B=[$5.SUB.B])
+          LogicalTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+        LogicalTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4], Y=[$5], unit=[$6], COORD_NE=[ROW($7, ROW($8, $9))], ACCTNO=[$10], TYPE=[$11], BALANCE=[$12])
+  EnumerableHashJoin(condition=[AND(=($0, $10), =($3, $11))], joinType=[left])
+    EnumerableProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4.X], Y=[$4.Y], unit=[$4.unit], M=[$5.M], A=[$5.SUB.A], B=[$5.SUB.B])
+      EnumerableSort(sort0=[$1], dir0=[DESC])
+        EnumerableTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+    EnumerableTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinInnerJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select contactno, email from customer.contact_peek) r inner join
+(select acctno, type from customer.account) s
+on r.contactno=s.acctno and r.email=s.type
+order by r.contactno desc, r.email desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$0], sort1=[$1], dir0=[DESC], dir1=[DESC])
+  LogicalProject(CONTACTNO=[$0], EMAIL=[$1], ACCTNO=[$2], TYPE=[$3])
+    LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
+      LogicalProject(CONTACTNO=[$0], EMAIL=[$3])
+        LogicalTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+      LogicalProject(ACCTNO=[$0], TYPE=[$1])
+        LogicalTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
+  EnumerableProject(CONTACTNO=[$0], EMAIL=[$3])
+    EnumerableSort(sort0=[$0], sort1=[$3], dir0=[DESC], dir1=[DESC])
+      EnumerableTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+  EnumerableProject(ACCTNO=[$0], TYPE=[$1])
+    EnumerableTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinRightOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select contactno, email from customer.contact_peek) r right outer join
+(select acctno, type from customer.account) s
+on r.contactno=s.acctno and r.email=s.type
+order by s.acctno desc, s.type desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$3], dir0=[DESC], dir1=[DESC])
+  LogicalProject(CONTACTNO=[$0], EMAIL=[$1], ACCTNO=[$2], TYPE=[$3])
+    LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[right])
+      LogicalProject(CONTACTNO=[$0], EMAIL=[$3])
+        LogicalTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+      LogicalProject(ACCTNO=[$0], TYPE=[$1])
+        LogicalTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$3], dir0=[DESC], dir1=[DESC])
+  EnumerableHashJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[right])
+    EnumerableProject(CONTACTNO=[$0], EMAIL=[$3])
+      EnumerableTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+    EnumerableProject(ACCTNO=[$0], TYPE=[$1])
+      EnumerableTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinTraitDerivation">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select ename, job, mgr from sales.emp order by ename desc, job desc, mgr limit 10) r
+join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.ename desc, r.job desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$0], sort1=[$1], dir0=[DESC], dir1=[DESC])
+  LogicalProject(ENAME=[$0], JOB=[$1], MGR=[$2], ENAME0=[$3], JOB0=[$4], SAL=[$5], COMM=[$6])
+    LogicalJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[inner])
+      LogicalSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[DESC], dir1=[DESC], dir2=[ASC], fetch=[10])
+        LogicalProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[inner])
+  EnumerableLimit(fetch=[10])
+    EnumerableProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+      EnumerableSort(sort0=[$1], sort1=[$2], sort2=[$3], dir0=[DESC], dir1=[DESC], dir2=[ASC])
+        EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+  EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinTraitDerivation2">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select ename, job, mgr from sales.emp order by mgr desc limit 10) r
+join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.mgr desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], dir0=[DESC])
+  LogicalProject(ENAME=[$0], JOB=[$1], MGR=[$2], ENAME0=[$3], JOB0=[$4], SAL=[$5], COMM=[$6])
+    LogicalJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[inner])
+      LogicalSort(sort0=[$2], dir0=[DESC], fetch=[10])
+        LogicalProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[inner])
+  EnumerableLimit(fetch=[10])
+    EnumerableProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+      EnumerableSort(sort0=[$3], dir0=[DESC])
+        EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+  EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinTraitDerivationNegativeCase">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select ename, job, mgr from sales.emp order by mgr desc limit 10) r
+join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.mgr
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], dir0=[ASC])
+  LogicalProject(ENAME=[$0], JOB=[$1], MGR=[$2], ENAME0=[$3], JOB0=[$4], SAL=[$5], COMM=[$6])
+    LogicalJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[inner])
+      LogicalSort(sort0=[$2], dir0=[DESC], fetch=[10])
+        LogicalProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], dir0=[ASC])
+  EnumerableHashJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[inner])
+    EnumerableLimit(fetch=[10])
+      EnumerableProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+        EnumerableSort(sort0=[$3], dir0=[DESC])
+          EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testNestedLoopJoinTraitDerivation">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select ename, job, mgr from sales.emp order by ename desc, job desc, mgr limit 10) r
+join sales.bonus s on r.ename>s.ename and r.job<s.job
+order by r.ename desc, r.job desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$0], sort1=[$1], dir0=[DESC], dir1=[DESC])
+  LogicalProject(ENAME=[$0], JOB=[$1], MGR=[$2], ENAME0=[$3], JOB0=[$4], SAL=[$5], COMM=[$6])
+    LogicalJoin(condition=[AND(>($0, $3), <($1, $4))], joinType=[inner])
+      LogicalSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[DESC], dir1=[DESC], dir2=[ASC], fetch=[10])
+        LogicalProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableNestedLoopJoin(condition=[AND(>($0, $3), <($1, $4))], joinType=[inner])
+  EnumerableLimit(fetch=[10])
+    EnumerableProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+      EnumerableSort(sort0=[$1], sort1=[$2], sort2=[$3], dir0=[DESC], dir1=[DESC], dir2=[ASC])
+        EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+  EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testNestedLoopJoinTraitDerivation2">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select ename, job, mgr from sales.emp order by mgr limit 10) r
+join sales.bonus s on r.ename>s.ename and r.job<s.job
+order by r.mgr
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], dir0=[ASC])
+  LogicalProject(ENAME=[$0], JOB=[$1], MGR=[$2], ENAME0=[$3], JOB0=[$4], SAL=[$5], COMM=[$6])
+    LogicalJoin(condition=[AND(>($0, $3), <($1, $4))], joinType=[inner])
+      LogicalSort(sort0=[$2], dir0=[ASC], fetch=[10])
+        LogicalProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableNestedLoopJoin(condition=[AND(>($0, $3), <($1, $4))], joinType=[inner])
+  EnumerableLimit(fetch=[10])
+    EnumerableProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+      EnumerableSort(sort0=[$3], dir0=[ASC])
+        EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+  EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testNestedLoopJoinTraitDerivationNegativeCase">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+(select ename, job, mgr from sales.emp order by mgr limit 10) r
+join sales.bonus s on r.ename>s.ename and r.job<s.job
+order by r.mgr desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], dir0=[DESC])
+  LogicalProject(ENAME=[$0], JOB=[$1], MGR=[$2], ENAME0=[$3], JOB0=[$4], SAL=[$5], COMM=[$6])
+    LogicalJoin(condition=[AND(>($0, $3), <($1, $4))], joinType=[inner])
+      LogicalSort(sort0=[$2], dir0=[ASC], fetch=[10])
+        LogicalProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], dir0=[DESC])
+  EnumerableNestedLoopJoin(condition=[AND(>($0, $3), <($1, $4))], joinType=[inner])
+    EnumerableLimit(fetch=[10])
+      EnumerableProject(ENAME=[$1], JOB=[$2], MGR=[$3])
+        EnumerableSort(sort0=[$3], dir0=[ASC])
+          EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testNestedLoopJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+select * from
+customer.contact_peek r left outer join
+customer.account s
+on r.contactno>s.acctno and r.email<s.type
+order by r.contactno desc, r.email desc
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4], Y=[$5], unit=[$6], COORD_NE=[ROW($7, ROW($8, $9))], ACCTNO=[$10], TYPE=[$11], BALANCE=[$12])
+  LogicalSort(sort0=[$0], sort1=[$3], dir0=[DESC], dir1=[DESC])
+    LogicalProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4], Y=[$5], unit=[$6], COORD_NE=[$7], COORD_NE8=[$8], COORD_NE9=[$9], ACCTNO=[$10], TYPE=[$11], BALANCE=[$12])
+      LogicalJoin(condition=[AND(>($0, $10), <($3, $11))], joinType=[left])
+        LogicalProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4.X], Y=[$4.Y], unit=[$4.unit], M=[$5.M], A=[$5.SUB.A], B=[$5.SUB.B])
+          LogicalTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+        LogicalTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4], Y=[$5], unit=[$6], COORD_NE=[ROW($7, ROW($8, $9))], ACCTNO=[$10], TYPE=[$11], BALANCE=[$12])
+  EnumerableNestedLoopJoin(condition=[AND(>($0, $10), <($3, $11))], joinType=[left])
+    EnumerableProject(CONTACTNO=[$0], FNAME=[$1], LNAME=[$2], EMAIL=[$3], X=[$4.X], Y=[$4.Y], unit=[$4.unit], M=[$5.M], A=[$5.SUB.A], B=[$5.SUB.B])
+      EnumerableSort(sort0=[$0], sort1=[$3], dir0=[DESC], dir1=[DESC])
+        EnumerableTableScan(table=[[CATALOG, CUSTOMER, CONTACT_PEEK]])
+    EnumerableTableScan(table=[[CATALOG, CUSTOMER, ACCOUNT]])

Review comment:
       Left outer join's cardinality is always greater equal with its left input's cardinality. So this plan is always better than sorting on join's output. 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436387179



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableNestedLoopJoin.java
##########
@@ -118,6 +124,65 @@ public static EnumerableNestedLoopJoin create(
     return cost;
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    // EnumerableNestedLoopJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableNestedLoopJoin always uses left input as the outer loop,
+    // thus only left input can preserve ordering.
+    // Push sort both to left and right inputs does not help right outer join. It's because in
+    // implementation, EnumerableNestedLoopJoin produces (null, right_unmatched) all together,
+    // which does not preserve ordering from right side.
+
+
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      // If field collation belongs to right input: cannot push down collation.
+      if (fc.getFieldIndex() >= getLeft().getRowType().getFieldCount()) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {

Review comment:
       Ah I see. The deriveMode method is used for traits derivation.

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,57 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      // If field collation belongs to right input: cannot push down collation.
+      if (fc.getFieldIndex() >= getLeft().getRowType().getFieldCount()) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-640256318


   Comments addressed! Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-639988304


   It is a great point that HashJoin does not care about collations, thus it can pass through or derive collations. Updated this PR and also added a few test cases.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435712438



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,45 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset
+    return null;
+  }
+
+  @Override public DeriveMode getDeriveMode() {
+    if (joinType == JoinRelType.FULL || joinType == JoinRelType.RIGHT) {
+      return DeriveMode.PROHIBITED;
+    }
+
+    return DeriveMode.BOTH;

Review comment:
       Sure. let me make a change.  I haven't fully understood the differnet of `LEFT_FIRST`, `RIGHT_FIRST` and `BOTH`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-639281013


   > Based on examples of MergeJoin, seems like the goal of trait derivation was to enforce a sort on right input, which does not work for hashjoin and nested loop join in this case.
   
   This is incorrect. `derive()` is used to derived new traits from child trait, just like project, filter. MergeJoin is different, because it requires both sides sorted on the same keys.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r432953656



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {

Review comment:
       Only right outer join uses right child as the probe side, I guess. Need to check the implementation of HashJoin. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433731666



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {

Review comment:
       Please note that currently EnumerableHashJoin implementation builds the hash table always on the right input.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r432977096



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {

Review comment:
       Agreed. I will check EnumerableHashJoin. Basically collation passes down will depend on which side to build the hash table.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435711254



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,45 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset

Review comment:
       How about this query in which it uses HashJoin:
   
    "select * from\n"
           + "(select contactno, email from customer.contact_peek) r left outer join\n"
           + "(select acctno, type from customer.account) s\n"
           + "on r.contactno=s.acctno and r.email=s.type\n"
           + "order by r.contactno desc";
   
   I think we can still pass collation `r.contactno desc`?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433631323



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       I am testing trait pass through so adding aggregation to child does not solve the problem.
   
   I tried to dig into the cost computation processing. It turns out that for this plan:
   
   ```
   EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
     EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
       EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
       EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
   ```
   
   The top Sort's row count is just 2.1, which was is very surprising cause my debug point at the `computeSelfCost` in `EnumerableHashJoin` hasn't caught 2.1 row count, but ~50 row counts.
   
   As a comparison, the plan I am expecting:
   ```
   EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
     EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
       EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
       EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
   ```
   The Sort gets 14 row counts, which is from `EnumerableTableScan(table=[[CATALOG, SALES, EMP]])`, which is correct, and that's why this Sort's sort is update to `1346.0848941260901`, which is understandable.
   
   
   So either I didn't fully understand how Calcite propagate row counts, or something wrong for top-down opt code (e.g. all debug point's stops didn't give me a 2.1 row count at EnumerableHashJoin while the EnumerableSort on top of HashJoin gets that 2.1 row count).
   
   Will continue investigating the root cause.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r434053876



##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       It's best actually if current testing suites can allow us mock data, due to cost-based model, its's better to use data rather than wired query to play with it.
   
   I could also check if I can modify the testing framework.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan closed pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan closed pull request #1995:
URL: https://github.com/apache/calcite/pull/1995


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r434983306



##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       Add an aggregation is not helping. The hash join still uses `left_row_count * right_row_count * selectivity` to estimate cost while the aggreagtion cannot give a big enough `left_row_count`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-640128410


   Also rebased to master and squashed commits to one commit with better description.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433701742



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       Figured out a way to generate expected plan. See my new comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia edited a comment on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-639849246


   @hsyuan any suggestions on how hash join (maybe also nested loop join) traits derivation should be tested?
   
   I am thinking doing this
   ```
   select * from
   (select distinct a, b from foo) foo
   join bar on foo.c = bar.c 
   ```
   
   Then the produced plan might add a Sort on top of Select.
   
   Do you think it is right?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433657423



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       Ok after trace down the optimizer, I realize that Calcite does an estimated row count computation at [1]. in the estimated rowcount, cause of formula of how JOIN estimated row count is computed (selectivity), the input for EnumerableSort becomes as low as 2.1.
   
   [1]: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L144




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r432953227



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {

Review comment:
       INNER join, SEMI/ANTI SEMI, LEFT all can pass request to left child, and derive from left child.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-639849246


   @hsyuan any suggestions on how hash join traits derivation should be tested?
   
   I am thinking doing this
   ```
   select * from
   (select distinct a, b from foo) foo
   join bar on foo.c = bar.c 
   ```
   
   Then the produced plan might add a Sort on top of Select.
   
   Do you think it is right?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435711254



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,45 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset

Review comment:
       How about this query in which it uses HashJoin:
   ```
   select * from
   (select * from) r join
   (select * from) s 
   on r.a = s.b and r.b = s.b
   order by r.a desc
   ```
   
   I think we can still pass collation `r.a desc`?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r432952670



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(leftKeySet)) {
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required,
+                required.replace(RelCollations.EMPTY)));
+      }
+    } else if (joinType == JoinRelType.RIGHT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(rightKeySet)) {
+        RelCollation rightCollation = RelCollations.shift(collation,
+            -left.getRowType().getFieldCount());
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required.replace(RelCollations.EMPTY),
+                required.replace(rightCollation)));
+      }
+    }
+
+    return null;
+  }
+
+//  @Override public DeriveMode getDeriveMode() {
+//    return DeriveMode.BOTH;

Review comment:
       should prohibit for full outer join. 

##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       Don't test it directly. Try to add a aggregate in the left child.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r434623429



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {

Review comment:
       Then for FULL/RIGHT OUTER JOIN, we should prohibit trait propagations. 

##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       How about this:
   ```sql
   select * from
   (select distinct a, b from foo) foo
   join bar on foo.c = bar.c
   order by b desc, a desc; 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433657423



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       Ok after trace down the optimizer, I realize that Calcite does an estimated row count computation at [1]. in the estimated rowcount, cause of formula of how JOIN estimated row count is computed, the input for EnumerableSort becomes as low as 2.1.
   
   [1]: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L144




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436240840



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,81 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {

Review comment:
       This is a greta point.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435710128



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,45 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset

Review comment:
       HashJoin/NestloopJoin doesn't need this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435018987



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(leftKeySet)) {
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required,
+                required.replace(RelCollations.EMPTY)));
+      }
+    } else if (joinType == JoinRelType.RIGHT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(rightKeySet)) {
+        RelCollation rightCollation = RelCollations.shift(collation,
+            -left.getRowType().getFieldCount());
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required.replace(RelCollations.EMPTY),
+                required.replace(rightCollation)));
+      }
+    }
+
+    return null;
+  }
+
+//  @Override public DeriveMode getDeriveMode() {
+//    return DeriveMode.BOTH;

Review comment:
       I confirmed that indeed EnumerableHashJoin always builds hash table on right side. 
   
   For right/full outer join, to make result right, the implementation keeps a set of keys from right side. During probing it also removes keys from that set. At the end it will know how many keys from right side does not have a match with left side, thus it can produces null with right tuples. 
   
   
   So I will prohibit traits propogation for right/full outer join.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r434983306



##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       Add an aggregation is not helping. The hash join still uses `left_row_count * right_row_count * selectivity` to estimate cost while the aggreagtion cannot give a big enough `left_row_count`. (which means sort on top of hash join is cheaper cause less row count to sort)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#issuecomment-640764842






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433657423



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       Ok after trace down the optimizer, I realize that Calcite does an estimated row count computation at [1]. in the estimated rowcount, cause of formula of how JOIN estimated row count is computed, the input for EnumerableSort becomes as low as 2.1.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435714138



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,45 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+
+    // EnumerableHashJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableHashJoin always builds hash table on right input,
+    // and only non-hashed side can preserve ordering.
+
+    // Only consider exact key match for now
+    if (requiredKeySet.equals(leftKeySet)) {
+      RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+      return Pair.of(passthroughTraitSet,
+          ImmutableList.of(
+              passthroughTraitSet,
+              passthroughTraitSet.replace(RelCollations.EMPTY)));
+    }
+
+    // TODO: consider super keyset and sub keyset
+    return null;
+  }
+
+  @Override public DeriveMode getDeriveMode() {
+    if (joinType == JoinRelType.FULL || joinType == JoinRelType.RIGHT) {
+      return DeriveMode.PROHIBITED;
+    }
+
+    return DeriveMode.BOTH;

Review comment:
       I need to consider refining the java doc to make it easier to understand.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436376618



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableNestedLoopJoin.java
##########
@@ -118,6 +124,65 @@ public static EnumerableNestedLoopJoin create(
     return cost;
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    // EnumerableNestedLoopJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableNestedLoopJoin always uses left input as the outer loop,
+    // thus only left input can preserve ordering.
+    // Push sort both to left and right inputs does not help right outer join. It's because in
+    // implementation, EnumerableNestedLoopJoin produces (null, right_unmatched) all together,
+    // which does not preserve ordering from right side.
+
+
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      // If field collation belongs to right input: cannot push down collation.
+      if (fc.getFieldIndex() >= getLeft().getRowType().getFieldCount()) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {

Review comment:
       These are not needed for hashjoin and nestedloop join, because they are prohibited for derivation. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433701431



##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       I know the test query becomes ugly and not easy to check, but I don't know if there is a better idea.

##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -288,6 +290,44 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Not push down sort for hash join in full outer join case.
+  @Test void testHashJoinFullOuterJoinNotPushDownSort() {
+    final String sql = "select * from\n"
+        + "sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job\n"
+        + "order by r.job desc nulls last, r.ename nulls first";
+    Query.create(sql)
+        .removeRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
+        .removeRule(EnumerableRules.ENUMERABLE_BATCH_NESTED_LOOP_JOIN_RULE)
+        .check();
+  }
+
+  // Push down sort to left input.
+  @Test void testHashJoinLeftOuterJoinPushDownSort() {
+    final String sql = "select * from\n"
+        + "(select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp UNION ALL select ename from sales.emp) r left outer join\n"
+        + "(select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus UNION ALL select ename from sales.bonus) s\n"
+        + "on r.ename=s.ename\n"
+        + "order by r.ename nulls first";

Review comment:
       @hsyuan 
   Finally I find a way to "fool" planner to choose the plan I want, basically the plan that pushes sort down to left child of hash join.
   
   The reason that simple join's cheapest plan has a Sort on top of HashJoin, is because the Join's estimated is computed as simple as `left_row_count * right_row_count * 0.15`, 0.15 is selectively. When inputs's row count is small, the selectively plays a bigger role, thus HashJoin gives a small cost, thus Sort is put after Join because Join gives a very small row count estimation.
   
   So I intentionally use UNION to increase input size, which make the  `left_row_count * right_row_count` plays a bigger role for Join, and then Join produces larger row count estimation. Then finally planner believes push down Sort is cheaper, thus gives the plan I want to see. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r434985641



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(leftKeySet)) {
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required,
+                required.replace(RelCollations.EMPTY)));
+      }
+    } else if (joinType == JoinRelType.RIGHT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(rightKeySet)) {
+        RelCollation rightCollation = RelCollations.shift(collation,
+            -left.getRowType().getFieldCount());
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required.replace(RelCollations.EMPTY),
+                required.replace(rightCollation)));
+      }
+    }
+
+    return null;
+  }
+
+//  @Override public DeriveMode getDeriveMode() {
+//    return DeriveMode.BOTH;

Review comment:
       I am a little bit skeptical about it. It shouldn't work that way. If there isn't a match, it should generate nulls on left, but building hash table on right side will make it impossible. Can you write a query and execute it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r436301143



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableNestedLoopJoin.java
##########
@@ -118,6 +129,104 @@ public static EnumerableNestedLoopJoin create(
     return cost;
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    // EnumerableNestedLoopJoin traits passdown shall only pass through collation to left input.
+    // It is because for EnumerableNestedLoopJoin always uses left input as the outer loop,
+    // thus only left input can preserve ordering.
+    // Push sort both to left and right inputs does not help right outer join. It's because in
+    // implementation, EnumerableNestedLoopJoin produces (null, right_unmatched) all together,
+    // which does not preserve ordering from right side.
+
+
+    Set<Integer> fields = extractTrivialFieldOfLeftInputFromJoinCondition();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (!fields.contains(fc.getFieldIndex())) {
+        return null;
+      }
+    }
+
+    RelTraitSet passthroughTraitSet = traitSet.replace(collation);
+    return Pair.of(passthroughTraitSet,
+        ImmutableList.of(
+            passthroughTraitSet,
+            passthroughTraitSet.replace(RelCollations.EMPTY)));
+  }
+
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> deriveTraits(
+      final RelTraitSet childTraits, final int childId) {
+    // should only derive traits (limited to collation for now) from left join input.
+    assert childId == 0;
+
+    RelCollation collation = childTraits.getCollation();
+    if (collation == null
+        || collation == RelCollations.EMPTY
+        || joinType == JoinRelType.FULL
+        || joinType == JoinRelType.RIGHT) {
+      return null;
+    }
+
+    Set<Integer> fields = extractTrivialFieldOfLeftInputFromJoinCondition();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (!fields.contains(fc.getFieldIndex())) {
+        return null;
+      }
+    }
+
+    List<RelFieldCollation> collationFieldsToDerive = new ArrayList<>();
+    for (RelFieldCollation fc : collation.getFieldCollations()) {
+      if (fields.contains(fc.getFieldIndex())) {
+        collationFieldsToDerive.add(fc);
+      } else {
+        break;
+      }
+    }
+
+    if (collationFieldsToDerive.size() > 0) {
+      final RelCollation newCollation = RelCollations
+          .of(collationFieldsToDerive);
+      return Pair.of(getTraitSet().replace(newCollation),
+          ImmutableList.of(childTraits, right.getTraitSet()));
+    } else {
+      return null;
+    }
+  }
+
+  @Override public DeriveMode getDeriveMode() {
+    if (joinType == JoinRelType.FULL || joinType == JoinRelType.RIGHT) {
+      return DeriveMode.PROHIBITED;
+    }
+
+    return DeriveMode.LEFT_FIRST;
+  }
+
+  private Set<Integer> extractTrivialFieldOfLeftInputFromJoinCondition() {

Review comment:
       Done. Updated NestedLoopJoin with several extra tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r434989695



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(leftKeySet)) {
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required,
+                required.replace(RelCollations.EMPTY)));
+      }
+    } else if (joinType == JoinRelType.RIGHT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(rightKeySet)) {
+        RelCollation rightCollation = RelCollations.shift(collation,
+            -left.getRowType().getFieldCount());
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required.replace(RelCollations.EMPTY),
+                required.replace(rightCollation)));
+      }
+    }
+
+    return null;
+  }
+
+//  @Override public DeriveMode getDeriveMode() {
+//    return DeriveMode.BOTH;

Review comment:
       Will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435711908



##########
File path: core/src/test/java/org/apache/calcite/test/TopDownOptTest.java
##########
@@ -287,6 +289,119 @@
         .removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE)
         .check();
   }
+
+  // Not push down sort for hash join in inner join case.
+  @Test void testHashJoinInnerJoinNotPushDownSort() {

Review comment:
       In fact, this test case is not valid (sort not pushdown due to cost-baed optimization).
   
   I will remove this one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433658770



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       The estimated cost (from `computeSelfCost`) is certainly different from `bestCost` in `RelSubset` though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #1995: [CALCITE-4012] Implement trait propagation for EnumerableHashJoin and…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r435230877



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java
##########
@@ -100,6 +106,53 @@ public static EnumerableHashJoin create(
         condition, variablesSet, joinType);
   }
 
+  @Override public Pair<RelTraitSet, List<RelTraitSet>> passThroughTraits(
+      final RelTraitSet required) {
+    RelCollation collation = required.getCollation();
+    if (collation == null || collation == RelCollations.EMPTY) {
+      return null;
+    }
+
+    List<Integer> requiredKeys = RelCollations.ordinals(collation);
+    ImmutableBitSet requiredKeySet = ImmutableBitSet.of(requiredKeys);
+
+    ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
+    ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
+        .shift(left.getRowType().getFieldCount());
+
+    // HashJoin traits passdown shall only consider left/right outer join.
+    // It is because for a hash-based implementation, only non-hashed side can
+    // preserve ordering, thus only for left/right outer join, we are sure which
+    // side is non-hashed side (the outer child).
+    if (joinType == JoinRelType.LEFT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(leftKeySet)) {
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required,
+                required.replace(RelCollations.EMPTY)));
+      }
+    } else if (joinType == JoinRelType.RIGHT) {
+      // Only consider exact key match for now
+      if (requiredKeySet.equals(rightKeySet)) {
+        RelCollation rightCollation = RelCollations.shift(collation,
+            -left.getRowType().getFieldCount());
+        return Pair.of(
+            required,
+            ImmutableList.of(
+                required.replace(RelCollations.EMPTY),
+                required.replace(rightCollation)));
+      }
+    }
+
+    return null;
+  }
+
+//  @Override public DeriveMode getDeriveMode() {
+//    return DeriveMode.BOTH;

Review comment:
       Cool, thanks for checking.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org