You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/04 10:33:43 UTC

[GitHub] [druid] maytasm opened a new pull request #9986: Fix groupBy with literal in subquery grouping

maytasm opened a new pull request #9986:
URL: https://github.com/apache/druid/pull/9986


   Fix groupBy with literal in subquery grouping
   
   ### Description
   
   I believe this is a regression/bug caused by https://github.com/apache/druid/pull/9122
   When we have a query such as:
   ```
   SELECT 
   	t1, t2
   FROM
   	( SELECT
   		'm' as t1,
   		CASE
   	  	WHEN 
   	  		cityName = 'Egypt'
   	  	THEN cityName
         	ELSE NULL
       END AS t2
   	  FROM
   		wikipedia
   	  GROUP BY
   		cityName
   	)
   GROUP BY
   	t1,t2
   ```
   Calcite will creates a Rel `2020-06-04T08:59:34,977 TRACE [sql[0840a1a8-ff59-4986-ad69-be2ce27a5e68]] org.apache.calcite.plan.RelOptPlanner - Register rel#5424:DruidQueryRel.NONE.[](query={"queryType":"groupBy","dataSource":{"type":"table","name":"wikipedia"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"cityName","outputName":"d0","outputType":"STRING"}],"aggregations":[],"postAggregations":[],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"populateCache":false,"sqlQueryId":"0840a1a8-ff59-4986-ad69-be2ce27a5e68","useCache":false},"descending":false},signature={d0:STRING}) in rel#5415:Subset#2.NONE.[]` for part of the subquery. This Rel signature is only a single field. The other field which is a literal is not part of the groupBy and is part of the projection on top of this. 
   
   Now the problem is when we try to build new Rel, in computeSubtotals with the above Rel as the sourceRel and a partialQuery with aggregate looking like `aggregate=rel#5418:LogicalAggregate.NONE.[](input=RelSubset#5417,group={0, 1}), havingFilter=null, aggregateProject=null, sort=null, sortProject=null}`. This aggregate has two bits in the groupSet. This fails without taking into account the projection mentioned above. Hence, we should skip and not build this new Rel and handle this case with CannotBuildQueryException so that we can move on to other Rels.
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #9986:
URL: https://github.com/apache/druid/pull/9986#issuecomment-639102247


   @gianm 
   Thank you for your review.
   Initially, I thought that was a valid case where we can have positions in aggregate.getGroupSet() not matching fields in the rowSignature. But as you pointed out I think that is reasonable too and there is a bug in how we determine the rowSignature. You mentioned that rowSignature comes from computeOutputRowSignature(sourceRowSignature, selectProjection, null, null), which is the source data plus any pre-aggregation projections (selectProjection). However, selectProjection in computeOutputRowSignature(sourceRowSignature, selectProjection, null, null) will always be null. So rowSignature that is pass to computeGrouping for computeGrouping is only source data excludes any pre-aggregation projections (selectProjection). I did see that other methods in computeGrouping such as computeDimensions and computeAggregations respect selectProjection by getting the selectProjection from partialDruidQuery. I think computeSubtotals should also respect selectProjection in partialDruidQuery if  selectProjection exists. 
   
   I have modify computeSubtotals to get the fields from selectProjection inside partialDruidQuery and ignore the original rowSignature passed into the method. Getting the fieldCount from selectProjection inside partialDruidQuery is done similar to Projection.preAggregation (in computeSelectProjection), this should make sure that each projection is valid. 
   
   Please let me know what you think. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435574098



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -424,6 +424,8 @@ private static Grouping computeGrouping(
 
   /**
    * Builds a {@link Subtotals} object based on {@link Aggregate#getGroupSets()}.
+   *
+   * @throws CannotBuildQueryException if subtotals cannot be computed

Review comment:
       stale javadoc. remove?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435539109



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -13419,6 +13420,73 @@ public void testNvlColumns() throws Exception
     );
   }
 
+  @Test
+  public void testGroupByWithLiteralInSubqueryGrouping() throws Exception
+  {
+    testQuery(
+        "SELECT \n"
+        + "   t1, t2\n"
+        + "  FROM\n"
+        + "   ( SELECT\n"
+        + "     'dummy' as t1,\n"
+        + "     CASE\n"
+        + "       WHEN \n"
+        + "         dim4 = 'b'\n"
+        + "       THEN dim4\n"
+        + "       ELSE NULL\n"
+        + "     END AS t2\n"
+        + "     FROM\n"
+        + "       numfoo\n"
+        + "     GROUP BY\n"
+        + "       dim4\n"
+        + "   )\n"
+        + " GROUP BY\n"
+        + "   t1,t2\n",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(
+                            GroupByQuery.builder()
+                                        .setDataSource(CalciteTests.DATASOURCE3)
+                                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                                        .setGranularity(Granularities.ALL)
+                                        .setDimensions(new DefaultDimensionSpec("dim4", "_d0", ValueType.STRING))
+                                        .setContext(QUERY_CONTEXT_DEFAULT)
+                                        .build()
+                        )
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "\'dummy\'",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "case_searched((\"_d0\" == 'b'),\"_d0\",null)",
+                                ValueType.STRING
+                            )
+                        )                        .setInterval(querySegmentSpec(Filtration.eternity()))

Review comment:
       Oops. Fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm merged pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
maytasm merged pull request #9986:
URL: https://github.com/apache/druid/pull/9986


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435531102



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -13419,6 +13420,73 @@ public void testNvlColumns() throws Exception
     );
   }
 
+  @Test
+  public void testGroupByWithLiteralInSubqueryGrouping() throws Exception
+  {
+    testQuery(
+        "SELECT \n"
+        + "   t1, t2\n"
+        + "  FROM\n"
+        + "   ( SELECT\n"
+        + "     'dummy' as t1,\n"
+        + "     CASE\n"
+        + "       WHEN \n"
+        + "         dim4 = 'b'\n"
+        + "       THEN dim4\n"
+        + "       ELSE NULL\n"
+        + "     END AS t2\n"
+        + "     FROM\n"
+        + "       numfoo\n"
+        + "     GROUP BY\n"
+        + "       dim4\n"
+        + "   )\n"
+        + " GROUP BY\n"
+        + "   t1,t2\n",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(
+                            GroupByQuery.builder()
+                                        .setDataSource(CalciteTests.DATASOURCE3)
+                                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                                        .setGranularity(Granularities.ALL)
+                                        .setDimensions(new DefaultDimensionSpec("dim4", "_d0", ValueType.STRING))
+                                        .setContext(QUERY_CONTEXT_DEFAULT)
+                                        .build()
+                        )
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "\'dummy\'",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "case_searched((\"_d0\" == 'b'),\"_d0\",null)",
+                                ValueType.STRING
+                            )
+                        )                        .setInterval(querySegmentSpec(Filtration.eternity()))

Review comment:
       This indentation is kind of weird. I'm not sure if checkstyle will flag it, but watch out.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -13419,6 +13420,73 @@ public void testNvlColumns() throws Exception
     );
   }
 
+  @Test
+  public void testGroupByWithLiteralInSubqueryGrouping() throws Exception
+  {
+    testQuery(
+        "SELECT \n"
+        + "   t1, t2\n"
+        + "  FROM\n"
+        + "   ( SELECT\n"
+        + "     'dummy' as t1,\n"
+        + "     CASE\n"
+        + "       WHEN \n"
+        + "         dim4 = 'b'\n"
+        + "       THEN dim4\n"
+        + "       ELSE NULL\n"
+        + "     END AS t2\n"
+        + "     FROM\n"
+        + "       numfoo\n"
+        + "     GROUP BY\n"
+        + "       dim4\n"
+        + "   )\n"
+        + " GROUP BY\n"
+        + "   t1,t2\n",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(
+                            GroupByQuery.builder()
+                                        .setDataSource(CalciteTests.DATASOURCE3)
+                                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                                        .setGranularity(Granularities.ALL)
+                                        .setDimensions(new DefaultDimensionSpec("dim4", "_d0", ValueType.STRING))
+                                        .setContext(QUERY_CONTEXT_DEFAULT)
+                                        .build()
+                        )
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "\'dummy\'",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "case_searched((\"_d0\" == 'b'),\"_d0\",null)",
+                                ValueType.STRING
+                            )
+                        )                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "d0", ValueType.STRING),
+                                new DefaultDimensionSpec("v1", "d1", ValueType.STRING)
+                            )
+                        )
+                        .setGranularity(Granularities.ALL)
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        NullHandling.replaceWithDefault() ?
+        ImmutableList.of(
+            new Object[]{"dummy", ""},
+            new Object[]{"dummy", "b"}
+        ) :
+        ImmutableList.of(
+            new Object[]{"dummy", null},

Review comment:
       FYI: You could simplify this by using `NULL_STRING` for the null / "" and avoiding the ternary.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -424,16 +425,39 @@ private static Grouping computeGrouping(
 
   /**
    * Builds a {@link Subtotals} object based on {@link Aggregate#getGroupSets()}.
+   *
+   * @throws CannotBuildQueryException if subtotals cannot be computed
    */
   private static Subtotals computeSubtotals(
       final PartialDruidQuery partialQuery,
+      final PlannerContext plannerContext,
       final RowSignature rowSignature
   )
   {
     final Aggregate aggregate = partialQuery.getAggregate();
 
     // dimBitMapping maps from input field position to group set position (dimension number).
-    final int[] dimBitMapping = new int[rowSignature.size()];
+    final int[] dimBitMapping;
+    if (partialQuery.getSelectProject() != null) {
+      int fieldCount = 0;
+      for (final RexNode rexNode : partialQuery.getSelectProject().getChildExps()) {
+        final DruidExpression expression = Expressions.toDruidExpression(
+            plannerContext,
+            rowSignature,
+            rexNode
+        );
+
+        if (expression == null) {
+          throw new CannotBuildQueryException(partialQuery.getSelectProject(), rexNode);

Review comment:
       I think there's no need to do this check, because it isn't needed to build the Subtotals object, and `computeDimensions` will handle the case where we're grouping on something where dimensions are based on expressions that are not translatable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435538248



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -424,16 +425,39 @@ private static Grouping computeGrouping(
 
   /**
    * Builds a {@link Subtotals} object based on {@link Aggregate#getGroupSets()}.
+   *
+   * @throws CannotBuildQueryException if subtotals cannot be computed
    */
   private static Subtotals computeSubtotals(
       final PartialDruidQuery partialQuery,
+      final PlannerContext plannerContext,
       final RowSignature rowSignature
   )
   {
     final Aggregate aggregate = partialQuery.getAggregate();
 
     // dimBitMapping maps from input field position to group set position (dimension number).
-    final int[] dimBitMapping = new int[rowSignature.size()];
+    final int[] dimBitMapping;
+    if (partialQuery.getSelectProject() != null) {
+      int fieldCount = 0;
+      for (final RexNode rexNode : partialQuery.getSelectProject().getChildExps()) {
+        final DruidExpression expression = Expressions.toDruidExpression(
+            plannerContext,
+            rowSignature,
+            rexNode
+        );
+
+        if (expression == null) {
+          throw new CannotBuildQueryException(partialQuery.getSelectProject(), rexNode);

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #9986:
URL: https://github.com/apache/druid/pull/9986#issuecomment-638905062


   This seems like a logic error somewhere. Throwing a CannotBuildQueryException abandons the current partial query and, if there's another path available, will cause that other path to be gone down instead. This doesn't seem like an ideal solution. There might be some queries where there is no alternate path. The logic error might cause other problems besides exceptions. So it would be better to identify and correct the logic error.
   
   The loop you modified is assuming that the positions in `aggregate.getGroupSet()` are all referring to fields in the `rowSignature` (`dimBitMapping` was initialized to `rowSignature.size()`). This sounds reasonable to me, because the group set is all the input fields that we are grouping on, and the `rowSignature` is the input row signature. It comes from `computeOutputRowSignature(sourceRowSignature, selectProjection, null, null)`, which is the source data plus any pre-aggregation projections (`selectProjection`).
   
   But there must be something wrong somewhere if we're getting dimBits that are lying outside the `rowSignature`. The error might be in `computeSubtotals` or perhaps in whatever created the `partialQuery` and `rowSignature` that it is being given. Maybe there is something, somewhere, that is eliminating literals, and reflecting that in `rowSignature` but not in `partialQuery.getAggregate()`. If I was you I would look for this kind of thing first.
   
   Please look a little deeper and see if you can find the place that the logic breaks down.
   
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435578056



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -424,6 +424,8 @@ private static Grouping computeGrouping(
 
   /**
    * Builds a {@link Subtotals} object based on {@link Aggregate#getGroupSets()}.
+   *
+   * @throws CannotBuildQueryException if subtotals cannot be computed

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435538934



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -13419,6 +13420,73 @@ public void testNvlColumns() throws Exception
     );
   }
 
+  @Test
+  public void testGroupByWithLiteralInSubqueryGrouping() throws Exception
+  {
+    testQuery(
+        "SELECT \n"
+        + "   t1, t2\n"
+        + "  FROM\n"
+        + "   ( SELECT\n"
+        + "     'dummy' as t1,\n"
+        + "     CASE\n"
+        + "       WHEN \n"
+        + "         dim4 = 'b'\n"
+        + "       THEN dim4\n"
+        + "       ELSE NULL\n"
+        + "     END AS t2\n"
+        + "     FROM\n"
+        + "       numfoo\n"
+        + "     GROUP BY\n"
+        + "       dim4\n"
+        + "   )\n"
+        + " GROUP BY\n"
+        + "   t1,t2\n",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(
+                            GroupByQuery.builder()
+                                        .setDataSource(CalciteTests.DATASOURCE3)
+                                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                                        .setGranularity(Granularities.ALL)
+                                        .setDimensions(new DefaultDimensionSpec("dim4", "_d0", ValueType.STRING))
+                                        .setContext(QUERY_CONTEXT_DEFAULT)
+                                        .build()
+                        )
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "\'dummy\'",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "case_searched((\"_d0\" == 'b'),\"_d0\",null)",
+                                ValueType.STRING
+                            )
+                        )                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "d0", ValueType.STRING),
+                                new DefaultDimensionSpec("v1", "d1", ValueType.STRING)
+                            )
+                        )
+                        .setGranularity(Granularities.ALL)
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        NullHandling.replaceWithDefault() ?
+        ImmutableList.of(
+            new Object[]{"dummy", ""},
+            new Object[]{"dummy", "b"}
+        ) :
+        ImmutableList.of(
+            new Object[]{"dummy", null},

Review comment:
       Nice!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org