You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2020/06/07 09:01:52 UTC

[GitHub] [calcite] amaliujia opened a new pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

amaliujia opened a new pull request #2006:
URL: https://github.com/apache/calcite/pull/2006


   …erset of join keys for EnumerableMergeJoin.
   
   
   see: https://issues.apache.org/jira/browse/CALCITE-4015


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-655585431


   @amaliujia Sorry for my late reply. I will take a look. We will get it into 1.24.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r437886885



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -131,8 +160,35 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
           required, ImmutableList.of(
           required.replace(leftCollation),
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, rightKeys)) {
+      // if sort keys are subset of right join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, rightKeys);
+      RelCollation rightCollation = RelCollations.shift(collation,
+          -left.getRowType().getFieldCount());
+      Mappings.TargetMapping invMapping = mapping.inverse();
+      RelCollation leftCollation = RexUtil.apply(invMapping, rightCollation);
+      return Pair.of(
+          required, ImmutableList.of(
+              required.replace(leftCollation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(rightKeys, reqKeys)
+        && allElementsGe(reqKeys, getLeft().getRowType().getFieldCount())) {

Review comment:
       Done! Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-643500881


   Friendly ping~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-676720960


   @rubenada
   
   Have tried to address your comments. PTAL, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r475767057



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);
+    for (RelFieldCollation rf : collation.getFieldCollations()) {
+      keySet.remove(rf.getFieldIndex());
+    }
+    for (Integer i : keySet) {
+      fieldsForNewCollation.add(new RelFieldCollation(i));
+    }
+    return RelCollations.of(fieldsForNewCollation);
+  }
+
+  private RelCollation removeCollationFieldsNotOnJoinKey(

Review comment:
       `intersectCollation AndJoinKey` is a reasonable name for me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439784121



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       No. Only ordering on collations defined on key need to preserve. For example, required collation [foo.a, foo.b, foo.c ,foo.d] defined on `foo.a=bar.a and foo.b=bar.b`, either pass down [foo.a, foo.b, foo.c, foo.d] or pass down [foo.a, foo.b, foo.d, foo.c] will give correct answer (but [a, b]'s ordering must remain).
   
   It is because, for MergeJoin implementation, the pointers moves when left tuple compares to right tuple, and the only requirement is next tuple should be bigger than previous one.
   
   Still use example above:
   left join input could be sorted by [a, b, d, c], and right join input is sorted by [a, b]. So 
   1) `left_tuple.compare(right_tuple) < 0`, move left pointer, of course next tuple will be both bigger than a) the previous left tuple b) the current right tuple (due to they are all sorted by [a, b]).
   2) same for ``left_tuple.compare(right_tuple) > 0` case.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-676721751


   Have also rebased this PR against latest master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r437889032



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);
+    List<Integer> leftKeys = immutableIntListToList(joinInfo.leftKeys, 0);
+    List<Integer> rightKeys = immutableIntListToList(joinInfo.rightKeys,
+        left.getRowType().getFieldCount());
+    List<Integer> rightKeysNotShifted = immutableIntListToList(joinInfo.rightKeys, 0);
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
         .shift(left.getRowType().getFieldCount());
 
     Map<Integer, Integer> keyMap = new HashMap<>();
-    final int keyCount = leftKeySet.cardinality();
+    final int keyCount = leftKeys.size();
     for (int i = 0; i < keyCount; i++) {
       keyMap.put(joinInfo.leftKeys.get(i), joinInfo.rightKeys.get(i));
     }
     Mappings.TargetMapping mapping = Mappings.target(keyMap,
         left.getRowType().getFieldCount(),
         right.getRowType().getFieldCount());
 
-    // Only consider exact key match for now
+
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       RelCollation rightCollation = RexUtil.apply(mapping, collation);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, leftKeys)) {
+      // if sort keys are subset of left join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, leftKeys);
+      RelCollation rightCollation = RexUtil.apply(mapping, collation);
+      return Pair.of(
+          required, ImmutableList.of(required.replace(collation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(leftKeys, reqKeys)

Review comment:
       `isPrefixOrderingNotRequired` probably is the one I cannot use `RelCollations.containsOrderless`. It is designed for superset case. In which join keys should be 1) an orderless subset 2) prefix of required collations.
   
   `RelCollations.containsOrderless` does not consider prefix requirement.
   
   Alternatively, I am ok to move `isPrefixOrderingNotRequired` to `RelCollations` as a util function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439784770



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();

Review comment:
       Let me check if this process can be simplified somehow.
   
   For the example above, I am only looking for a way to 
   ```
   leftCollation = requiredCollation.apply(join keys) // only need collation defined on keys for left input,
   ``` 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-643773213


   So, To answer your question, yes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r475767057



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);
+    for (RelFieldCollation rf : collation.getFieldCollations()) {
+      keySet.remove(rf.getFieldIndex());
+    }
+    for (Integer i : keySet) {
+      fieldsForNewCollation.add(new RelFieldCollation(i));
+    }
+    return RelCollations.of(fieldsForNewCollation);
+  }
+
+  private RelCollation removeCollationFieldsNotOnJoinKey(

Review comment:
       `intersectCollationAndJoinKey` is a reasonable name for me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473316727



##########
File path: core/src/main/java/org/apache/calcite/rel/RelCollations.java
##########
@@ -199,25 +199,45 @@ public static boolean contains(List<RelCollation> collations,
    * @param keys List of keys
    * @return Whether the collection contains the given keys
    */
-  private static boolean containsOrderless(RelCollation collation,
+  public static boolean containsOrderless(RelCollation collation,
       List<Integer> keys) {
     final List<Integer> distinctKeys = Util.distinctList(keys);
     final ImmutableBitSet keysBitSet = ImmutableBitSet.of(distinctKeys);
     List<Integer> colKeys = Util.distinctList(collation.getKeys());
+
     if (colKeys.size() < distinctKeys.size()) {
       return false;
+    } else {
+      ImmutableBitSet bitset = ImmutableBitSet.of(
+          colKeys.subList(0, distinctKeys.size()));
+      return bitset.equals(keysBitSet);
+    }
+  }
+
+  /** Returns whether a collation is contained by a given list of keys regardless ordering.
+   *
+   * @param collation Collation
+   * @param keys List of keys
+   * @return Whether the collection contains the given keys
+   */
+  public static boolean containsOrderless(
+      List<Integer> keys, RelCollation collation) {
+    final List<Integer> distinctKeys = Util.distinctList(keys);
+    List<Integer> colKeys = Util.distinctList(collation.getKeys());
+
+    if (colKeys.size() > distinctKeys.size()) {
+      return false;
+    } else {
+      return colKeys.stream().allMatch(i -> distinctKeys.contains(i));
     }
-    ImmutableBitSet bitset = ImmutableBitSet.of(
-        colKeys.subList(0, distinctKeys.size()));
-    return bitset.equals(keysBitSet);
   }
 
   /**
    * Returns whether one of a list of collations contains the given list of keys
    * regardless the order.
    */
-  public static boolean containsOrderless(List<RelCollation> collations,
-      List<Integer> keys) {
+  public static boolean collationsContainKeysOrderless(

Review comment:
       The reasons that I didn't use overloaded approach
   1. these names are in the format of `A verb B`, means the first parameter contains the second parameter. And arguments of these functions have an ordering of A and B. I think this is more readable.
   2. For Java, List<T> is the same type. For example, List<Integer>  and List<Collation> are both List<T> thus List<Object>. So we cannot overload, for example `collationsContainKeysOrderless` and `keysContainCollationsOrderless`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439783577



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();

Review comment:
       To this specific case, `toIntegerList` to get a list of key indexes (not shifted, so index from 0). Then, later, in superset cases, we can use this list to remove collations that are not on keys. Use this query as an example:
   `select * from foo join bar on foo.a=bar.a and foo.b=bar.b order by bar.a, bar.b, bar.c;
   `
   
   we can push
   `collation a b to left`
   `collation a, b, c to right`
   Thus the output of join should still be sorted by a, b, c.
   
   
   Then, the `toIntegerList` helps to to that only `a, b` needs to pushed to left.  It works along with 
   ```
   RelCollation rightCollation = RelCollations.shift(collation, -leftInputFieldCount);
   Mappings.TargetMapping mapping = buildMapping(false);
   RelCollation leftCollation =
             RexUtil.apply(
                 mapping,
                 removeCollationFieldsNotOnJoinKey(rightCollation, rightKeysNotShifted));
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439791902



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(collation, leftKeys)) {

Review comment:
       This breaks the definition of the method name. `containsOrderless` means the former contains the latter. We can create another method `containsOrderless(List<Integer>, List<Integer>)` if necessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439791727



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       Will check how to use `bitset.except`. 
   
   Meanwhile, in fact, the ordering does matter for EnumerableMergeJoin, it is because key selector and key comparator are constructed by sort key ordering.
   
   Thus we have to follow sort key ordering to extend required collations.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439783735



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();

Review comment:
       why ImmutableIntList is not suitable?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439928463



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       Use `bitset.except` now to compute diff.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473006265



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);
+    for (RelFieldCollation rf : collation.getFieldCollations()) {
+      keySet.remove(rf.getFieldIndex());
+    }
+    for (Integer i : keySet) {
+      fieldsForNewCollation.add(new RelFieldCollation(i));
+    }
+    return RelCollations.of(fieldsForNewCollation);
+  }
+
+  private RelCollation removeCollationFieldsNotOnJoinKey(

Review comment:
       I also think the method name is counter-intuitive. It is actually not removing anything, it just builds a new collation based on a "source" collation, considering only the fields which are part of the joinKeys. I would propose as name e.g. `reduceCollation`, `intersect`, or something like that..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r475759626



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);
+    for (RelFieldCollation rf : collation.getFieldCollations()) {
+      keySet.remove(rf.getFieldIndex());
+    }
+    for (Integer i : keySet) {
+      fieldsForNewCollation.add(new RelFieldCollation(i));
+    }
+    return RelCollations.of(fieldsForNewCollation);
+  }
+
+  private RelCollation removeCollationFieldsNotOnJoinKey(

Review comment:
       oops. I have missed this comment. 
   
   Will address this one and another comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473001005



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +138,82 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(leftKeys, collation)) {

Review comment:
       I think it would be helpful to describe the general logic and all the different cases in a javadoc on `passThroughTraits`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439784248



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       to add a bit explanation on the example above, the only usefulness of pushed collocations on [d, c], is just to maintain increasing order when move left pointer on left join input during merging. Thus [c, d] will achieve the same thing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-683150884


   @rubenada maybe we can tag this PR as `LGTM merging soon` to attract people to review, if anyone has an interest?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439928527



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(collation, leftKeys)) {

Review comment:
       Add `containsOrderless(List<Integer>, List<Integer>)` now to obey the naming convention.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439789787



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();

Review comment:
       I still don't understand, why can't just use `joinInfo.rightKeys`, instead use `joinInfo.rightKeys. toIntegerList`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439784146



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);
+    for (RelFieldCollation rf : collation.getFieldCollations()) {
+      keySet.remove(rf.getFieldIndex());
+    }
+    for (Integer i : keySet) {
+      fieldsForNewCollation.add(new RelFieldCollation(i));
+    }
+    return RelCollations.of(fieldsForNewCollation);
+  }
+
+  private RelCollation removeCollationFieldsNotOnJoinKey(

Review comment:
       Let me add some comments with an example (a little bit complicated but let me try).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r475449330



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);
+    for (RelFieldCollation rf : collation.getFieldCollations()) {
+      keySet.remove(rf.getFieldIndex());
+    }
+    for (Integer i : keySet) {
+      fieldsForNewCollation.add(new RelFieldCollation(i));
+    }
+    return RelCollations.of(fieldsForNewCollation);
+  }
+
+  private RelCollation removeCollationFieldsNotOnJoinKey(

Review comment:
       friendly reminder about these comments




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r475448937



##########
File path: core/src/test/java/org/apache/calcite/rel/RelCollationTest.java
##########
@@ -84,18 +84,36 @@
         is(true));
   }
 
-  /** Unit test for {@link RelCollations#containsOrderless(List, List)}. */
+  /** Unit test for {@link RelCollations#collationsContainKeysOrderless(List, List)}. */

Review comment:
       maybe we could also add some unit tests for the new method `keysContainCollationsOrderless`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-675781844


   @chunweilei @rubenada Can you help review this pull request?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473316727



##########
File path: core/src/main/java/org/apache/calcite/rel/RelCollations.java
##########
@@ -199,25 +199,45 @@ public static boolean contains(List<RelCollation> collations,
    * @param keys List of keys
    * @return Whether the collection contains the given keys
    */
-  private static boolean containsOrderless(RelCollation collation,
+  public static boolean containsOrderless(RelCollation collation,
       List<Integer> keys) {
     final List<Integer> distinctKeys = Util.distinctList(keys);
     final ImmutableBitSet keysBitSet = ImmutableBitSet.of(distinctKeys);
     List<Integer> colKeys = Util.distinctList(collation.getKeys());
+
     if (colKeys.size() < distinctKeys.size()) {
       return false;
+    } else {
+      ImmutableBitSet bitset = ImmutableBitSet.of(
+          colKeys.subList(0, distinctKeys.size()));
+      return bitset.equals(keysBitSet);
+    }
+  }
+
+  /** Returns whether a collation is contained by a given list of keys regardless ordering.
+   *
+   * @param collation Collation
+   * @param keys List of keys
+   * @return Whether the collection contains the given keys
+   */
+  public static boolean containsOrderless(
+      List<Integer> keys, RelCollation collation) {
+    final List<Integer> distinctKeys = Util.distinctList(keys);
+    List<Integer> colKeys = Util.distinctList(collation.getKeys());
+
+    if (colKeys.size() > distinctKeys.size()) {
+      return false;
+    } else {
+      return colKeys.stream().allMatch(i -> distinctKeys.contains(i));
     }
-    ImmutableBitSet bitset = ImmutableBitSet.of(
-        colKeys.subList(0, distinctKeys.size()));
-    return bitset.equals(keysBitSet);
   }
 
   /**
    * Returns whether one of a list of collations contains the given list of keys
    * regardless the order.
    */
-  public static boolean containsOrderless(List<RelCollation> collations,
-      List<Integer> keys) {
+  public static boolean collationsContainKeysOrderless(

Review comment:
       The reasons that I didn't use overloaded approach
   1. these names are in the format of `A verb B`, means the first parameter contains the second parameter. And arguments of these functions have an ordering of A and B. I think this is more readable.
   2. For Java, List<T> is the same type. For example, `List<Integer>`  and `List<Collation>` are both `List<T>` thus `List<Object>`. So we cannot overload, for example `collationsContainKeysOrderless` and `keysContainCollationsOrderless`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-647042133


   Tried to rebase this PR, squashed commits and added a detailed commit message.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439782434



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       Do we need to keep the keys order?

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();

Review comment:
       Why `toIntegerList`?

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);
+    for (RelFieldCollation rf : collation.getFieldCollations()) {
+      keySet.remove(rf.getFieldIndex());
+    }
+    for (Integer i : keySet) {
+      fieldsForNewCollation.add(new RelFieldCollation(i));
+    }
+    return RelCollations.of(fieldsForNewCollation);
+  }
+
+  private RelCollation removeCollationFieldsNotOnJoinKey(

Review comment:
       It is not obvious to get the meaning from method name. An example and comment will help. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada merged pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada merged pull request #2006:
URL: https://github.com/apache/calcite/pull/2006


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r472865245



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,48 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  /**
+   * This function extends collation by appending new collation fields defiend on keys.

Review comment:
       minor: typo in 'defiend'




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r475829864



##########
File path: core/src/test/java/org/apache/calcite/rel/RelCollationTest.java
##########
@@ -84,18 +84,36 @@
         is(true));
   }
 
-  /** Unit test for {@link RelCollations#containsOrderless(List, List)}. */
+  /** Unit test for {@link RelCollations#collationsContainKeysOrderless(List, List)}. */

Review comment:
       Makes sense. Added the test.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439791959



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(collation, leftKeys)) {

Review comment:
       I need to update `containsOrderlessSubset` to make it consider list ordering.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r438537972



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);
+    List<Integer> leftKeys = immutableIntListToList(joinInfo.leftKeys, 0);
+    List<Integer> rightKeys = immutableIntListToList(joinInfo.rightKeys,
+        left.getRowType().getFieldCount());
+    List<Integer> rightKeysNotShifted = immutableIntListToList(joinInfo.rightKeys, 0);
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
         .shift(left.getRowType().getFieldCount());
 
     Map<Integer, Integer> keyMap = new HashMap<>();
-    final int keyCount = leftKeySet.cardinality();
+    final int keyCount = leftKeys.size();
     for (int i = 0; i < keyCount; i++) {
       keyMap.put(joinInfo.leftKeys.get(i), joinInfo.rightKeys.get(i));
     }
     Mappings.TargetMapping mapping = Mappings.target(keyMap,
         left.getRowType().getFieldCount(),
         right.getRowType().getFieldCount());
 
-    // Only consider exact key match for now
+
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       RelCollation rightCollation = RexUtil.apply(mapping, collation);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, leftKeys)) {
+      // if sort keys are subset of left join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, leftKeys);
+      RelCollation rightCollation = RexUtil.apply(mapping, collation);
+      return Pair.of(
+          required, ImmutableList.of(required.replace(collation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(leftKeys, reqKeys)

Review comment:
       In fact, I mis-read how `containsOrderless` is implemented. It actually meets the need here.
   
   Have changed to `containsOrderless` and dropped unused code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-640180682


   R: @hsyuan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439791727



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       Will check how to use `bitset.except`. 
   
   Meanwhile, in fact, the ordering does matter for EnumerableMergeJoin, it is because key selector and key comparator are constructed by join key ordering.
   
   Thus we have to follow join key ordering to extend required collations.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-643822485


   O nice. Didn't realize there is already a JIRA to refine the code generation implementation of MergeJoin.
   
   Ok then in this PR I will make the assumption that physical implementation will utilize collations thus is correct. So I won't consider not necessary ordering of sort keys too much. 
   
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r438448447



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);
+    List<Integer> leftKeys = immutableIntListToList(joinInfo.leftKeys, 0);
+    List<Integer> rightKeys = immutableIntListToList(joinInfo.rightKeys,
+        left.getRowType().getFieldCount());
+    List<Integer> rightKeysNotShifted = immutableIntListToList(joinInfo.rightKeys, 0);
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
         .shift(left.getRowType().getFieldCount());
 
     Map<Integer, Integer> keyMap = new HashMap<>();
-    final int keyCount = leftKeySet.cardinality();
+    final int keyCount = leftKeys.size();
     for (int i = 0; i < keyCount; i++) {
       keyMap.put(joinInfo.leftKeys.get(i), joinInfo.rightKeys.get(i));
     }
     Mappings.TargetMapping mapping = Mappings.target(keyMap,
         left.getRowType().getFieldCount(),
         right.getRowType().getFieldCount());
 
-    // Only consider exact key match for now
+
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       RelCollation rightCollation = RexUtil.apply(mapping, collation);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, leftKeys)) {
+      // if sort keys are subset of left join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, leftKeys);
+      RelCollation rightCollation = RexUtil.apply(mapping, collation);
+      return Pair.of(
+          required, ImmutableList.of(required.replace(collation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(leftKeys, reqKeys)

Review comment:
       can you explain `RelCollations.containsOrderless does not consider prefix requirement.`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia edited a comment on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-643722774


   Ok, in fact, I need to refactor a lot on this PR. 
   
   I ignored one biggest fact, which caused my implementation wrong in some places.
   
   Basically, I believe, per reading EnumerableMergeJoin implementation, it will
   1) construct key selector based on ordering of left join keys and right join keys.
   2) construct key comparator by `keys must be sorted in ascending order, nulls last`: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L306.
   
   This implementation limits traits propagation in EnumerableMergeJoin to 
   1. Collations that is constructed for join (not the required one, but the one we want to construct) must have the same ordering as join keys.
   2. Collations must be in ascending order, null last
   
   So I think based on this understanding, even the exact key matching is not correct:
   1. violates the second requirement. 
   2. just consider orderless case so not following join key's ordering.
   
   However, the most useful way is, EnumerableMergeJoin construct key selector and key comparator based on collations. 
   
   @hsyuan do you agree?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-683808303


   @hsyuan the current PR LGTM, I plan to merge it in the next 24h, if nobody objects


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439784121



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       No. Only ordering on collations defined on key need to preserve. For example, required collation [foo.a, foo.b, foo.c ,foo.d] defined on `foo.a=bar.a and foo.b=bar.b`, either pass down [foo.a, foo.b, foo.c, foo.d] or pass down [foo.a, foo.b, foo.d, foo.c] will give correct answer (but [a, b]'s ordering must remain).
   
   It is because, for MergeJoin implementation, the pointers moves when left tuple not equals to right tuple, and the only requirement is next tuple should be bigger than previous ones.
   
   Still use example above:
   left join input could be sorted by [a, b, d, c], and right join input is sorted by [a, b]. So 
   1) `left_tuple.compare(right_tuple) < 0`, move left pointer, of course next tuple will be both bigger than a) the previous left tuple b) the current right tuple (due to they are all sorted by [a, b]).
   2) same for ``left_tuple.compare(right_tuple) > 0` case.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439791483



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(collation, leftKeys)) {

Review comment:
       yeah I guess I have made this wrong. 
   
   The current MergeJoin implementation construct key comparator just by sort key. If keep code here unchanged, I think I need to update at least this line: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L310
   
   Basically if there are collations defined (on join keys), the key comparator (and key selector) will need to be constructed on collations's ordering.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-643722774


   Ok, in fact, I need to refactor a lot on this PR. 
   
   I ignored one biggest fact, which caused my implementation wrong in some places.
   
   Basically, I believe, per reading EnumerableMergeJoin implementation, it will
   1) construct key selector based on left join keys and right join keys ordering.
   2) construct key comparator by `keys must be sorted in ascending order, nulls last`.
   
   This implementation limits traits propagation in EnumerableMergeJoin to 
   1. Collations that is constructed for join (not the required one, but the one we want to construct) must have the same ordering as join keys.
   2. Collations must be in ascending order, null last: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L306.
   
   So I think based on this understanding, even the exact key matching is not correct (violates the second requirement.)
   
   However, the most useful way is, EnumerableMergeJoin construct key selector and key comparator based on collations. 
   
   @hsyuan do you agree?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439784121



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       No for subset cases. For example, required collation [foo.a, foo.b] defined on `foo.a=bar.a and foo.b=bar.b and foo.c=bar.c and foo.d=bar.d`, either pass down [foo.a, foo.b, foo.c, foo.d] or pass down [foo.a, foo.b, foo.d, foo.c] will give correct answer (same to push the same collations to right join input).
   
   It is because, for MergeJoin implementation, the pointers moves when left tuple not equals to right tuple, and the only requirement is next tuple should be bigger than previous ones.
   
   Still use example above:
   left join input and right join input both are sorted by [a, b, d, c]. So 
   1) `left_tuple.compare(right_tuple) < 0`, move left pointer, of course next tuple will be both bigger than a) the previous left tuple b) the current right tuple (due to they are all sorted by [a, b]).
   2) same for ``left_tuple.compare(right_tuple) > 0` case.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439790064



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(collation, leftKeys)) {

Review comment:
       How can it be true if sort keys are subset of left join keys?

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       Why not just use bitset.except to compute the diff keys?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473931233



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -491,6 +491,81 @@ EnumerableMergeJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testSortMergeJoinSubsetKey">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], dir0=[DESC-nulls-last])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableMergeJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC])

Review comment:
       I agree, this will be addressed by CALCITE-4010, I just wanted to make sure that we were aware of this current limitation




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-679008064


   @amaliujia thanks for your work. There are some remaining comments about some minor details, but overall PR LGTM
   @hsyuan do you want to take a final look?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r436386235



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -194,6 +250,81 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return DeriveMode.BOTH;
   }
 
+  private List<Integer> immutableIntListToList(ImmutableIntList intList, int offset) {

Review comment:
       I will probably add more comments to these new methods to explain what they are doing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-675901741


   Sure @hsyuan I will take a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439784121



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       No for subset cases. For example, required collation [foo.a, foo.b] defined on `foo.a=bar.a and foo.b=bar.b and foo.c=bar.d and foo.d=bar.d`, either pass down [foo.a, foo.b, foo.c, foo.d] or pass down [foo.a, foo.b, foo.d, foo.c] will give correct answer (same to push the same collations to right join input).
   
   It is because, for MergeJoin implementation, the pointers moves when left tuple not equals to right tuple, and the only requirement is next tuple should be bigger than previous ones.
   
   Still use example above:
   left join input and right join input both are sorted by [a, b, d, c], and right join input is sorted by [a, b]. So 
   1) `left_tuple.compare(right_tuple) < 0`, move left pointer, of course next tuple will be both bigger than a) the previous left tuple b) the current right tuple (due to they are all sorted by [a, b]).
   2) same for ``left_tuple.compare(right_tuple) > 0` case.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia edited a comment on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-643722774


   Ok, in fact, I need to refactor a lot on this PR. 
   
   I ignored one biggest fact, which caused my implementation wrong in some places.
   
   Basically, I believe, per reading EnumerableMergeJoin implementation, it will
   1) construct key selector based on ordering of left join keys and right join keys.
   2) construct key comparator by `keys must be sorted in ascending order, nulls last`: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L306.
   
   This implementation limits traits propagation in EnumerableMergeJoin to 
   1. Collations that is constructed for join (not the required one, but the one we want to construct) must have the same ordering as join keys.
   2. Collations must be in ascending order, null last
   
   So I think based on this understanding, even the exact key matching is not correct (violates the second requirement.)
   
   However, the most useful way is, EnumerableMergeJoin construct key selector and key comparator based on collations. 
   
   @hsyuan do you agree?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-643900365


   Updated this PR (but assume that CALCITE-4010 will be addressed) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473930318



##########
File path: core/src/main/java/org/apache/calcite/rel/RelCollations.java
##########
@@ -199,25 +199,45 @@ public static boolean contains(List<RelCollation> collations,
    * @param keys List of keys
    * @return Whether the collection contains the given keys
    */
-  private static boolean containsOrderless(RelCollation collation,
+  public static boolean containsOrderless(RelCollation collation,
       List<Integer> keys) {
     final List<Integer> distinctKeys = Util.distinctList(keys);
     final ImmutableBitSet keysBitSet = ImmutableBitSet.of(distinctKeys);
     List<Integer> colKeys = Util.distinctList(collation.getKeys());
+
     if (colKeys.size() < distinctKeys.size()) {
       return false;
+    } else {
+      ImmutableBitSet bitset = ImmutableBitSet.of(
+          colKeys.subList(0, distinctKeys.size()));
+      return bitset.equals(keysBitSet);
+    }
+  }
+
+  /** Returns whether a collation is contained by a given list of keys regardless ordering.
+   *
+   * @param collation Collation
+   * @param keys List of keys
+   * @return Whether the collection contains the given keys
+   */
+  public static boolean containsOrderless(
+      List<Integer> keys, RelCollation collation) {
+    final List<Integer> distinctKeys = Util.distinctList(keys);
+    List<Integer> colKeys = Util.distinctList(collation.getKeys());
+
+    if (colKeys.size() > distinctKeys.size()) {
+      return false;
+    } else {
+      return colKeys.stream().allMatch(i -> distinctKeys.contains(i));
     }
-    ImmutableBitSet bitset = ImmutableBitSet.of(
-        colKeys.subList(0, distinctKeys.size()));
-    return bitset.equals(keysBitSet);
   }
 
   /**
    * Returns whether one of a list of collations contains the given list of keys
    * regardless the order.
    */
-  public static boolean containsOrderless(List<RelCollation> collations,
-      List<Integer> keys) {
+  public static boolean collationsContainKeysOrderless(

Review comment:
       Ok, I understand, thanks for the explanation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439791866



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();

Review comment:
       yes `joinInfo.rightKeys` can be used because `ImmutableIntList` implements `IndexOf`. I will change it to `joinInfo.rightKeys`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-675902415


   Thanks @rubenada for your help!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r472941179



##########
File path: core/src/main/java/org/apache/calcite/rel/RelCollations.java
##########
@@ -227,6 +247,21 @@ public static boolean containsOrderless(List<RelCollation> collations,
     return false;
   }
 
+  /**
+   * Returns whether one of a list of collations is contained by the given list of keys
+   * regardless the order.
+   */
+  public static boolean keysContainCollationsOrderless(
+      List<Integer> keys,  List<RelCollation> collations) {
+    final List<Integer> distinctKeys = Util.distinctList(keys);

Review comment:
       Given that `containsOrderless` already does a `Util.distinctList(keys)`, we could skip it here.
   The same applies to `collationsContainKeysOrderless`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r437062358



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -194,6 +250,81 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return DeriveMode.BOTH;
   }
 
+  private List<Integer> immutableIntListToList(ImmutableIntList intList, int offset) {

Review comment:
       Can you rebase on master? I just updated this file.
   Use `ImmutableIntList.incr` instead.

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -194,6 +250,81 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return DeriveMode.BOTH;
   }
 
+  private List<Integer> immutableIntListToList(ImmutableIntList intList, int offset) {
+    ArrayList<Integer> arrayList = new ArrayList<>(intList.size());
+    for (int i : intList) {
+      arrayList.add(i + offset);
+    }
+    return arrayList;
+  }
+
+  private boolean isSubset(List<Integer> a, List<Integer> b) {
+    if (a.size() > b.size()) {
+      return false;
+    }
+    Set<Integer> set = new HashSet<>(b);
+    for (int i = 0; i < a.size(); i++) {
+      if (!set.contains(a.get(i))) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  private boolean isPrefixOrderingNotRequired(List<Integer> a, List<Integer> b) {

Review comment:
       See `RelCollations.containsOrderless`, you can make the private one public is necessary.

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);
+    List<Integer> leftKeys = immutableIntListToList(joinInfo.leftKeys, 0);
+    List<Integer> rightKeys = immutableIntListToList(joinInfo.rightKeys,
+        left.getRowType().getFieldCount());
+    List<Integer> rightKeysNotShifted = immutableIntListToList(joinInfo.rightKeys, 0);
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
         .shift(left.getRowType().getFieldCount());
 
     Map<Integer, Integer> keyMap = new HashMap<>();
-    final int keyCount = leftKeySet.cardinality();
+    final int keyCount = leftKeys.size();

Review comment:
       nice catch. in case the keys are `1,1,2,2`. IMHO, we shouldn't see duplicate join keys in physical merge/hash join operators. They should be optimized away, because that means the predicate is not pushed down at all.
   like foo.a = bar.b and foo.a=bar.c. The predicate bar.b=bar.c should be pushed down for table bar.

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);

Review comment:
       why do you want to sort it?

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);
+    List<Integer> leftKeys = immutableIntListToList(joinInfo.leftKeys, 0);
+    List<Integer> rightKeys = immutableIntListToList(joinInfo.rightKeys,
+        left.getRowType().getFieldCount());
+    List<Integer> rightKeysNotShifted = immutableIntListToList(joinInfo.rightKeys, 0);
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
         .shift(left.getRowType().getFieldCount());
 
     Map<Integer, Integer> keyMap = new HashMap<>();
-    final int keyCount = leftKeySet.cardinality();
+    final int keyCount = leftKeys.size();
     for (int i = 0; i < keyCount; i++) {
       keyMap.put(joinInfo.leftKeys.get(i), joinInfo.rightKeys.get(i));
     }
     Mappings.TargetMapping mapping = Mappings.target(keyMap,
         left.getRowType().getFieldCount(),
         right.getRowType().getFieldCount());
 
-    // Only consider exact key match for now
+
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       RelCollation rightCollation = RexUtil.apply(mapping, collation);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, leftKeys)) {
+      // if sort keys are subset of left join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, leftKeys);
+      RelCollation rightCollation = RexUtil.apply(mapping, collation);
+      return Pair.of(
+          required, ImmutableList.of(required.replace(collation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(leftKeys, reqKeys)

Review comment:
       You can use RelCollations.containsOrderless




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439791684



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +137,83 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
+    List<Integer> rightKeysNotShifted = joinInfo.rightKeys.toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(collation, leftKeys)) {

Review comment:
       You might not notice, I have changed the code of `containsOrderless` and added this
   ```
       if (colKeys.size() < distinctKeys.size()) {
         return containsOrderlessSubset(collation.getKeys().toIntegerList(), distinctKeys);
       } else {
   ```
   So firstly, collations contains fields less than join keys is allowed. Then, it is only legal if collations follows the ordering of sort keys (and later at runtime, key selector and key comparator are constructed by sort key ordering).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439784121



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       No for subset cases. For example, required collation [foo.a, foo.b] defined on `foo.a=bar.a and foo.b=bar.b and foo.c=bar.c and foo.d=bar.d`, either pass down [foo.a, foo.b, foo.c, foo.d] or pass down [foo.a, foo.b, foo.d, foo.c] will give correct answer (same to push the same collations to right join input).
   
   It is because, for MergeJoin implementation, the pointers moves when left tuple not equals to right tuple, and the only requirement is next tuple should be bigger than previous ones.
   
   Still use example above:
   left join input and right join input both are sorted by [a, b, d, c], and right join input is sorted by [a, b]. So 
   1) `left_tuple.compare(right_tuple) < 0`, move left pointer, of course next tuple will be both bigger than a) the previous left tuple b) the current right tuple (due to they are all sorted by [a, b]).
   2) same for ``left_tuple.compare(right_tuple) > 0` case.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473192513



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -491,6 +491,81 @@ EnumerableMergeJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testSortMergeJoinSubsetKey">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], dir0=[DESC-nulls-last])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableMergeJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC])

Review comment:
       Ack! Indeed it is. 
   
   We have created a JIRA: https://issues.apache.org/jira/browse/CALCITE-4010. After this PR is done, I will ask Feng whether he still has time to work on CALCITE-4010. If not I can take it over to update the EnumerableMergeJoin implementation. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473040182



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -491,6 +491,81 @@ EnumerableMergeJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testSortMergeJoinSubsetKey">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+        order by r.job desc nulls last]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], dir0=[DESC-nulls-last])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableMergeJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC])

Review comment:
       Warning: the current implementation of EnumerableMergeJoin only supports keys sorted in ascending order nulls last, this will be a limitation if we ever want to actually run a plan like this one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r437126855



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);

Review comment:
       `isPrefixOrderingNotRequired` sorts its input, and reqKeys is used as this function's parameter below.
   
   But since you have suggested `RelCollations.containsOrderless`, I might not need this anymore.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r438534299



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);
+    List<Integer> leftKeys = immutableIntListToList(joinInfo.leftKeys, 0);
+    List<Integer> rightKeys = immutableIntListToList(joinInfo.rightKeys,
+        left.getRowType().getFieldCount());
+    List<Integer> rightKeysNotShifted = immutableIntListToList(joinInfo.rightKeys, 0);
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
         .shift(left.getRowType().getFieldCount());
 
     Map<Integer, Integer> keyMap = new HashMap<>();
-    final int keyCount = leftKeySet.cardinality();
+    final int keyCount = leftKeys.size();
     for (int i = 0; i < keyCount; i++) {
       keyMap.put(joinInfo.leftKeys.get(i), joinInfo.rightKeys.get(i));
     }
     Mappings.TargetMapping mapping = Mappings.target(keyMap,
         left.getRowType().getFieldCount(),
         right.getRowType().getFieldCount());
 
-    // Only consider exact key match for now
+
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       RelCollation rightCollation = RexUtil.apply(mapping, collation);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, leftKeys)) {
+      // if sort keys are subset of left join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, leftKeys);
+      RelCollation rightCollation = RexUtil.apply(mapping, collation);
+      return Pair.of(
+          required, ImmutableList.of(required.replace(collation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(leftKeys, reqKeys)

Review comment:
       `containsOrderless` only consider if one is another's subset, for example:
   [1, 2, 3] is [2, 3, 4, 1]'s subset
   [3, 2, 1] is [2, 1, 3, 4] 's subset
   
   The prefix requirement is, still use examples above, but apply sublist:
   [2,3,4,1] -> [2,3,4], then containsOrderless([1,2,3]) = false.
   [2,1,3,4] ->[2,1,3], then containsOrderless([3,2,1]) = true.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473338842



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +138,82 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(leftKeys, collation)) {

Review comment:
       Added a summary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-654598360


   @hsyuan 
   
   As people has started to discuss 1.24.0 release, do you think if we could make this change in 1.24.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r475767057



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);
+    for (RelFieldCollation rf : collation.getFieldCollations()) {
+      keySet.remove(rf.getFieldIndex());
+    }
+    for (Integer i : keySet) {
+      fieldsForNewCollation.add(new RelFieldCollation(i));
+    }
+    return RelCollations.of(fieldsForNewCollation);
+  }
+
+  private RelCollation removeCollationFieldsNotOnJoinKey(

Review comment:
       `intersectCollecationAndJoinKey` is a reasonable name for me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r473193678



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -134,31 +138,82 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
       final RelTraitSet required) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getCollation();
+    int leftInputFieldCount = left.getRowType().getFieldCount();
+
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    List<Integer> leftKeys = joinInfo.leftKeys.toIntegerList();
+    List<Integer> rightKeys =
+        joinInfo.rightKeys.incr(leftInputFieldCount).toIntegerList();
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
-        .shift(left.getRowType().getFieldCount());
+        .shift(leftInputFieldCount);
 
-    // Only consider exact key match for now
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       Mappings.TargetMapping mapping = buildMapping(true);
       RelCollation rightCollation = collation.apply(mapping);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (containsOrderless(leftKeys, collation)) {

Review comment:
       Got it. Will give a summary as java doc.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
hsyuan commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r437685863



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -131,8 +160,35 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
           required, ImmutableList.of(
           required.replace(leftCollation),
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, rightKeys)) {
+      // if sort keys are subset of right join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, rightKeys);
+      RelCollation rightCollation = RelCollations.shift(collation,
+          -left.getRowType().getFieldCount());
+      Mappings.TargetMapping invMapping = mapping.inverse();
+      RelCollation leftCollation = RexUtil.apply(invMapping, rightCollation);
+      return Pair.of(
+          required, ImmutableList.of(
+              required.replace(leftCollation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(rightKeys, reqKeys)
+        && allElementsGe(reqKeys, getLeft().getRowType().getFieldCount())) {

Review comment:
       can we use `reqKeys.stream().allMatch()`?

##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -131,8 +160,35 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
           required, ImmutableList.of(
           required.replace(leftCollation),
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, rightKeys)) {
+      // if sort keys are subset of right join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, rightKeys);
+      RelCollation rightCollation = RelCollations.shift(collation,
+          -left.getRowType().getFieldCount());
+      Mappings.TargetMapping invMapping = mapping.inverse();
+      RelCollation leftCollation = RexUtil.apply(invMapping, rightCollation);
+      return Pair.of(
+          required, ImmutableList.of(
+              required.replace(leftCollation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(rightKeys, reqKeys)
+        && allElementsGe(reqKeys, getLeft().getRowType().getFieldCount())) {
+      // if sort keys are superset of right join keys, and right join keys is prefix of sort keys
+      // (order not matter), also sort keys are all from right join input.
+      RelCollation rightCollation = RelCollations.shift(collation,
+          -left.getRowType().getFieldCount());

Review comment:
       let's use a var to store `left.getRowType().getFieldCount()`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r472863817



##########
File path: core/src/main/java/org/apache/calcite/rel/RelCollations.java
##########
@@ -199,25 +199,45 @@ public static boolean contains(List<RelCollation> collations,
    * @param keys List of keys
    * @return Whether the collection contains the given keys
    */
-  private static boolean containsOrderless(RelCollation collation,
+  public static boolean containsOrderless(RelCollation collation,
       List<Integer> keys) {
     final List<Integer> distinctKeys = Util.distinctList(keys);
     final ImmutableBitSet keysBitSet = ImmutableBitSet.of(distinctKeys);
     List<Integer> colKeys = Util.distinctList(collation.getKeys());
+
     if (colKeys.size() < distinctKeys.size()) {
       return false;
+    } else {
+      ImmutableBitSet bitset = ImmutableBitSet.of(
+          colKeys.subList(0, distinctKeys.size()));
+      return bitset.equals(keysBitSet);
+    }
+  }
+
+  /** Returns whether a collation is contained by a given list of keys regardless ordering.
+   *
+   * @param collation Collation
+   * @param keys List of keys
+   * @return Whether the collection contains the given keys
+   */
+  public static boolean containsOrderless(
+      List<Integer> keys, RelCollation collation) {
+    final List<Integer> distinctKeys = Util.distinctList(keys);
+    List<Integer> colKeys = Util.distinctList(collation.getKeys());
+
+    if (colKeys.size() > distinctKeys.size()) {
+      return false;
+    } else {
+      return colKeys.stream().allMatch(i -> distinctKeys.contains(i));
     }
-    ImmutableBitSet bitset = ImmutableBitSet.of(
-        colKeys.subList(0, distinctKeys.size()));
-    return bitset.equals(keysBitSet);
   }
 
   /**
    * Returns whether one of a list of collations contains the given list of keys
    * regardless the order.
    */
-  public static boolean containsOrderless(List<RelCollation> collations,
-      List<Integer> keys) {
+  public static boolean collationsContainKeysOrderless(

Review comment:
       I would name this method (and the following one) just `containsOrderless`, i.e. provide 4 overloaded methods with that name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] rubenada commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
rubenada commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-683266172


   I agree @amaliujia , let's do that


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r437890122



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -101,28 +103,55 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     // Required collation keys can be subset or superset of merge join keys.
     RelCollation collation = required.getTrait(RelCollationTraitDef.INSTANCE);
     List<Integer> reqKeys = RelCollations.ordinals(collation);
-    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
+    // need to copy reqKeys because it is not sortable.
+    reqKeys = new ArrayList<>(reqKeys);
+    List<Integer> leftKeys = immutableIntListToList(joinInfo.leftKeys, 0);
+    List<Integer> rightKeys = immutableIntListToList(joinInfo.rightKeys,
+        left.getRowType().getFieldCount());
+    List<Integer> rightKeysNotShifted = immutableIntListToList(joinInfo.rightKeys, 0);
 
+    ImmutableBitSet reqKeySet = ImmutableBitSet.of(reqKeys);
     ImmutableBitSet leftKeySet = ImmutableBitSet.of(joinInfo.leftKeys);
     ImmutableBitSet rightKeySet = ImmutableBitSet.of(joinInfo.rightKeys)
         .shift(left.getRowType().getFieldCount());
 
     Map<Integer, Integer> keyMap = new HashMap<>();
-    final int keyCount = leftKeySet.cardinality();
+    final int keyCount = leftKeys.size();
     for (int i = 0; i < keyCount; i++) {
       keyMap.put(joinInfo.leftKeys.get(i), joinInfo.rightKeys.get(i));
     }
     Mappings.TargetMapping mapping = Mappings.target(keyMap,
         left.getRowType().getFieldCount(),
         right.getRowType().getFieldCount());
 
-    // Only consider exact key match for now
+
     if (reqKeySet.equals(leftKeySet)) {
+      // if sort keys equal to left join keys, we can pass through all collations directly.
       RelCollation rightCollation = RexUtil.apply(mapping, collation);
       return Pair.of(
           required, ImmutableList.of(required,
           required.replace(rightCollation)));
+    } else if (isSubset(reqKeys, leftKeys)) {
+      // if sort keys are subset of left join keys, we can extend collations to make sure all join
+      // keys are sorted.
+      collation = extendCollation(collation, leftKeys);
+      RelCollation rightCollation = RexUtil.apply(mapping, collation);
+      return Pair.of(
+          required, ImmutableList.of(required.replace(collation),
+              required.replace(rightCollation)));
+    } else if (isPrefixOrderingNotRequired(leftKeys, reqKeys)

Review comment:
       For supset case, for example if required collation is [d, b, a ,c]
   then join keys [d, b, a] is valid 
   join keys [d,a,c] is not (if push down [d,b,a,c], it does not guarantee the ordering of [d,a,c]




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on a change in pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#discussion_r439791727



##########
File path: core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java
##########
@@ -222,6 +277,30 @@ public static boolean isMergeJoinSupported(JoinRelType joinType) {
     return mapping;
   }
 
+  private RelCollation extendCollation(RelCollation collation, List<Integer> keys) {
+    List<RelFieldCollation> fieldsForNewCollation = new ArrayList<>(keys.size());
+    fieldsForNewCollation.addAll(collation.getFieldCollations());
+    Set<Integer> keySet = new HashSet<>(keys);

Review comment:
       Will check how to use `bitset.except`. 
   
   Meanwhile, in fact, the ordering does matter for EnumerableMergeJoin, it is because key selector and key comparator are constructed by join key ordering.
   
   Thus we have to follow sort key ordering to extend required collations.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-641756134


   Rebased and comments addressed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] amaliujia commented on pull request #2006: [CALCITE-4015] Pass through parent collation request on subset or sup…

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2006:
URL: https://github.com/apache/calcite/pull/2006#issuecomment-679262806


   Ok I should have addressed existing comments. Thanks @rubenada for the review!
   
   @hsyuan do you want to take a final look? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org