You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/27 05:18:46 UTC

[GitHub] [druid] clintropolis opened a new pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression

clintropolis opened a new pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression
URL: https://github.com/apache/druid/pull/9429
 
 
   
   Fixes issue uncovered in #9412.
   
   ### Description
   As an optimization added in #6902, Druid SQL re-uses the same expression virtual column whenever possible when constructing a native query, to take advantage of things like the caching selectors, etc.
   
   These virtual columns are translated into `DimensionExpression` when taking part in a grouping, using the virtual column name as the output name and the input dimension on the resulting `DimensionSpec`. However, if 2 dimensions taking part in a grouping were using the same expression, the output signature would incorrectly duplicate the expression causing an error of the form:
   
   ```
   ...
   Caused by: org.apache.druid.java.util.common.ISE: Duplicate field name: v0
   	at org.apache.druid.sql.calcite.rel.Grouping.<init>(Grouping.java:71)
   	at org.apache.druid.sql.calcite.rel.Grouping.create(Grouping.java:100)
   	at org.apache.druid.sql.calcite.rel.DruidQuery.computeGrouping(DruidQuery.java:346)
   	at org.apache.druid.sql.calcite.rel.DruidQuery.fromPartialQuery(DruidQuery.java:193)
   	at org.apache.druid.sql.calcite.rel.PartialDruidQuery.build(PartialDruidQuery.java:306)
   ...
   ```
   
   This PR fixes this issue by checking for repeated usage of virtual columns in dimensions, and uses normal dimension output names rather than the virtual column name in case a duplicate is encountered. The added test and modified existing tests illustrate the effects of the new behavior.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added unit tests or modified existing tests to cover new code paths.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `DruidQuery`
    * `DimensionExpression`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression
URL: https://github.com/apache/druid/pull/9429#discussion_r385253790
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/DimensionExpression.java
 ##########
 @@ -28,6 +28,7 @@
 
 public class DimensionExpression
 {
+  private final String inputDimension;
 
 Review comment:
   How about `inputColumn`? (I think it's a bit weird to call columns in the table "dimensions" and would like to get away from that in new code.)
   
   It would also be good to include a comment about what the relationship is between `inputColumn` and `expression`, with some examples. It might get confusing to have both. Additionally: if the expression is a simple column reference, is it required that `expression.getDirectColumn().equals(inputColumn)`? If so it would be good to validate that in the constructor.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression
URL: https://github.com/apache/druid/pull/9429#discussion_r389201178
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/DimensionExpression.java
 ##########
 @@ -72,21 +107,23 @@ public boolean equals(final Object o)
       return false;
     }
     final DimensionExpression that = (DimensionExpression) o;
-    return Objects.equals(outputName, that.outputName) &&
+    return Objects.equals(virtualColumn, that.virtualColumn) &&
+           Objects.equals(outputName, that.outputName) &&
            Objects.equals(expression, that.expression) &&
            outputType == that.outputType;
   }
 
   @Override
   public int hashCode()
   {
-    return Objects.hash(outputName, expression, outputType);
+    return Objects.hash(virtualColumn, outputName, expression, outputType);
   }
 
   @Override
   public String toString()
   {
     return "DimensionExpression{" +
+           "virtualColumn='" + virtualColumn + '\'' +
            "outputName='" + outputName + '\'' +
 
 Review comment:
   Missing a `, ` here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression
URL: https://github.com/apache/druid/pull/9429
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression
URL: https://github.com/apache/druid/pull/9429#discussion_r385254202
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/DimensionExpression.java
 ##########
 @@ -38,11 +39,27 @@ public DimensionExpression(
       final ValueType outputType
   )
   {
+    this(outputName, outputName, expression, outputType);
+  }
+
+  public DimensionExpression(
 
 Review comment:
   It's not immediately clear when each constructor should be used. They need javadocs, or to be made private and replaced with nicely-named static factories, or (IMO ideally) both.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression
URL: https://github.com/apache/druid/pull/9429#discussion_r385425359
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/DimensionExpression.java
 ##########
 @@ -28,6 +28,7 @@
 
 public class DimensionExpression
 {
+  private final String inputDimension;
 
 Review comment:
   Heh, I initially had it named `inputColumn` but changed to `inputDimension` to match the `DimensionSpec` which it creates.
   
   However, since this field _actually_ is only used when creating a `DimensionSpec` when the `DimensionExpression` is for a virtual column, I have renamed the field to `virtualColumn`. When a `DimensionExpression` is a direct access or simple extraction, we use `expression.getSimpleExtraction().toDimensionSpec` to craft the `DimensionSpec` instead, so this field is ignored.
   
   To help clarify this, i collapsed the constructors into a single private constructor and made static methods `DimensionExpression.ofSimpleColumn` and `DimensionExpression.ofVirtualColumn` for creating the two types of `DimensionExpression` we have, as per your other comment, which I think helps clear up usage.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9429: fix issue when distinct grouping dimensions are optimized into the same virtual column expression
URL: https://github.com/apache/druid/pull/9429#discussion_r389201201
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/DimensionExpression.java
 ##########
 @@ -28,21 +29,55 @@
 
 public class DimensionExpression
 {
+  /**
+   * Create a dimension expresison for direct column access or simple extractions
 
 Review comment:
   "expression" (spelling)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org