You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/09 18:29:13 UTC

[GitHub] [druid] jihoonson commented on a change in pull request #9648: SQL: More straightforward handling of join planning.

jihoonson commented on a change in pull request #9648: SQL: More straightforward handling of join planning.
URL: https://github.com/apache/druid/pull/9648#discussion_r406396533
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java
 ##########
 @@ -148,4 +246,148 @@ static boolean canHandleCondition(final RexNode condition, final RelDataType lef
 
     return retVal;
   }
+
+  private static boolean isLeftExpression(final RexNode rexNode, final int numLeftFields)
+  {
+    return ImmutableBitSet.range(numLeftFields).contains(RelOptUtil.InputFinder.bits(rexNode));
+  }
+
+  private static boolean isRightInputRef(final RexNode rexNode, final int numLeftFields)
+  {
+    return rexNode.isA(SqlKind.INPUT_REF) && ((RexInputRef) rexNode).getIndex() >= numLeftFields;
+  }
+
+  @VisibleForTesting
+  static class ConditionAnalysis
+  {
+    /**
+     * Number of fields on the left-hand side. Useful for identifying if a particular field is from on the left
+     * or right side of a join.
+     */
+    private final int numLeftFields;
+
+    /**
+     * Each equality subcondition is an equality of the form f(LeftRel) = g(RightRel).
+     */
+    private final List<Pair<RexNode, RexInputRef>> equalitySubConditions;
+
+    /**
+     * Each literal subcondition is... a literal.
+     */
+    private final List<RexLiteral> literalSubConditions;
+
+    ConditionAnalysis(
+        int numLeftFields,
+        List<Pair<RexNode, RexInputRef>> equalitySubConditions,
+        List<RexLiteral> literalSubConditions
+    )
+    {
+      this.numLeftFields = numLeftFields;
+      this.equalitySubConditions = equalitySubConditions;
+      this.literalSubConditions = literalSubConditions;
+    }
+
+    public ConditionAnalysis pushThroughLeftProject(final Project leftProject)
+    {
+      // Pushing through the project will shift right-hand field references by this amount.
+      final int rhsShift =
+          leftProject.getInput().getRowType().getFieldCount() - leftProject.getRowType().getFieldCount();
+
+      return new ConditionAnalysis(
+          leftProject.getInput().getRowType().getFieldCount(),
+          equalitySubConditions
+              .stream()
 
 Review comment:
   I have some benchmark results for stream ([StreamBenchmark](https://github.com/jihoonson/druid/blob/stream-benchmark/benchmarks/src/test/java/org/apache/druid/StreamBenchmark.java) and [HighlyNestedStreamBenchmark](https://github.com/jihoonson/druid/blob/stream-benchmark/benchmarks/src/test/java/org/apache/druid/HighlyNestedStreamBenchmark.java)).
   
   ```Benchmark                                  Mode  Cnt   Score   Error  Units
   StreamBenchmark.flatMapIterator            avgt   10   9.584 ± 0.055  ms/op
   StreamBenchmark.flatMapSequenceAccumulate  avgt   10   5.908 ± 0.041  ms/op
   StreamBenchmark.flatMapSequenceYielder     avgt   10  22.976 ± 0.064  ms/op
   StreamBenchmark.flatMapStream              avgt   10   6.249 ± 0.050  ms/op
   StreamBenchmark.fluentIteratorToList       avgt   10  18.455 ± 0.254  ms/op
   StreamBenchmark.sequenceToList             avgt   10  14.326 ± 0.662  ms/op
   StreamBenchmark.streamToList               avgt   10  17.720 ± 0.173  ms/op
   StreamBenchmark.sumIterator                avgt   10   2.648 ± 0.023  ms/op  <-- native for loop
   StreamBenchmark.sumIteratorFlatMap         avgt   10   7.270 ± 0.095  ms/op
   StreamBenchmark.sumSequence                avgt   10   6.654 ± 0.137  ms/op
   StreamBenchmark.sumStream                  avgt   10   2.162 ± 0.048  ms/op
   ```
   
   ```
   Benchmark                                              Mode  Cnt     Score    Error  Units
   HighlyNestedStreamBenchmark.flatMapIterator            avgt   10  2413.571 ± 15.151  ms/op
   HighlyNestedStreamBenchmark.flatMapSequenceAccumulate  avgt   10  1157.817 ± 13.575  ms/op
   HighlyNestedStreamBenchmark.flatMapSequenceYielder     avgt   10  6308.657 ± 28.802  ms/op
   HighlyNestedStreamBenchmark.flatMapStream              avgt   10   953.539 ±  6.755  ms/op
   HighlyNestedStreamBenchmark.sumIteratorFlatMap         avgt   10  2129.499 ± 20.541  ms/op
   HighlyNestedStreamBenchmark.sumNestedFor               avgt   10   297.307 ±  0.626  ms/op <-- native for loop
   HighlyNestedStreamBenchmark.sumSequence                avgt   10  1503.816 ±  5.720  ms/op
   HighlyNestedStreamBenchmark.sumStream                  avgt   10  1136.636 ± 12.353  ms/op
   ```
   
   I did notice that stream is sometimes bad and these benchmarks were to see what makes it bad. Unfortunately, these benchmarks show that stream is pretty good for computing a sum, it can compete with even native for loop when the stream is not highly nested. For highly nested stream, it's worse than native for loop, but still best among others. For `toList`, stream seems worse than others which I'm not sure how much it matters.
   
   I'm suspecting that stream could be very efficient in these benchmarks because the benchmark code could be easily inlined with stream. I was planning to do another benchmark with a `map` function which is complex enough to hinder inlining, but haven't done yet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org