You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/04 10:33:43 UTC
[GitHub] [druid] maytasm opened a new pull request #9986: Fix groupBy with literal in subquery grouping
maytasm opened a new pull request #9986:
URL: https://github.com/apache/druid/pull/9986
Fix groupBy with literal in subquery grouping
### Description
I believe this is a regression/bug caused by https://github.com/apache/druid/pull/9122
When we have a query such as:
```
SELECT
t1, t2
FROM
( SELECT
'm' as t1,
CASE
WHEN
cityName = 'Egypt'
THEN cityName
ELSE NULL
END AS t2
FROM
wikipedia
GROUP BY
cityName
)
GROUP BY
t1,t2
```
Calcite will creates a Rel `2020-06-04T08:59:34,977 TRACE [sql[0840a1a8-ff59-4986-ad69-be2ce27a5e68]] org.apache.calcite.plan.RelOptPlanner - Register rel#5424:DruidQueryRel.NONE.[](query={"queryType":"groupBy","dataSource":{"type":"table","name":"wikipedia"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"cityName","outputName":"d0","outputType":"STRING"}],"aggregations":[],"postAggregations":[],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"populateCache":false,"sqlQueryId":"0840a1a8-ff59-4986-ad69-be2ce27a5e68","useCache":false},"descending":false},signature={d0:STRING}) in rel#5415:Subset#2.NONE.[]` for part of the subquery. This Rel signature is only a single field. The other field which is a literal is not part of the groupBy and is part of the projection on top of this.
Now the problem is when we try to build new Rel, in computeSubtotals with the above Rel as the sourceRel and a partialQuery with aggregate looking like `aggregate=rel#5418:LogicalAggregate.NONE.[](input=RelSubset#5417,group={0, 1}), havingFilter=null, aggregateProject=null, sort=null, sortProject=null}`. This aggregate has two bits in the groupSet. This fails without taking into account the projection mentioned above. Hence, we should skip and not build this new Rel and handle this case with CannotBuildQueryException so that we can move on to other Rels.
This PR has:
- [x] been self-reviewed.
- [ ] added documentation for new or modified features or behaviors.
- [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] maytasm commented on pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #9986:
URL: https://github.com/apache/druid/pull/9986#issuecomment-639102247
@gianm
Thank you for your review.
Initially, I thought that was a valid case where we can have positions in aggregate.getGroupSet() not matching fields in the rowSignature. But as you pointed out I think that is reasonable too and there is a bug in how we determine the rowSignature. You mentioned that rowSignature comes from computeOutputRowSignature(sourceRowSignature, selectProjection, null, null), which is the source data plus any pre-aggregation projections (selectProjection). However, selectProjection in computeOutputRowSignature(sourceRowSignature, selectProjection, null, null) will always be null. So rowSignature that is pass to computeGrouping for computeGrouping is only source data excludes any pre-aggregation projections (selectProjection). I did see that other methods in computeGrouping such as computeDimensions and computeAggregations respect selectProjection by getting the selectProjection from partialDruidQuery. I think computeSubtotals should also respect selectProjection in partialDruidQuery if selectProjection exists.
I have modify computeSubtotals to get the fields from selectProjection inside partialDruidQuery and ignore the original rowSignature passed into the method. Getting the fieldCount from selectProjection inside partialDruidQuery is done similar to Projection.preAggregation (in computeSelectProjection), this should make sure that each projection is valid.
Please let me know what you think. Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435574098
##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -424,6 +424,8 @@ private static Grouping computeGrouping(
/**
* Builds a {@link Subtotals} object based on {@link Aggregate#getGroupSets()}.
+ *
+ * @throws CannotBuildQueryException if subtotals cannot be computed
Review comment:
stale javadoc. remove?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435539109
##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -13419,6 +13420,73 @@ public void testNvlColumns() throws Exception
);
}
+ @Test
+ public void testGroupByWithLiteralInSubqueryGrouping() throws Exception
+ {
+ testQuery(
+ "SELECT \n"
+ + " t1, t2\n"
+ + " FROM\n"
+ + " ( SELECT\n"
+ + " 'dummy' as t1,\n"
+ + " CASE\n"
+ + " WHEN \n"
+ + " dim4 = 'b'\n"
+ + " THEN dim4\n"
+ + " ELSE NULL\n"
+ + " END AS t2\n"
+ + " FROM\n"
+ + " numfoo\n"
+ + " GROUP BY\n"
+ + " dim4\n"
+ + " )\n"
+ + " GROUP BY\n"
+ + " t1,t2\n",
+ ImmutableList.of(
+ GroupByQuery.builder()
+ .setDataSource(
+ GroupByQuery.builder()
+ .setDataSource(CalciteTests.DATASOURCE3)
+ .setInterval(querySegmentSpec(Filtration.eternity()))
+ .setGranularity(Granularities.ALL)
+ .setDimensions(new DefaultDimensionSpec("dim4", "_d0", ValueType.STRING))
+ .setContext(QUERY_CONTEXT_DEFAULT)
+ .build()
+ )
+ .setVirtualColumns(
+ expressionVirtualColumn(
+ "v0",
+ "\'dummy\'",
+ ValueType.STRING
+ ),
+ expressionVirtualColumn(
+ "v1",
+ "case_searched((\"_d0\" == 'b'),\"_d0\",null)",
+ ValueType.STRING
+ )
+ ) .setInterval(querySegmentSpec(Filtration.eternity()))
Review comment:
Oops. Fixed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] maytasm merged pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
maytasm merged pull request #9986:
URL: https://github.com/apache/druid/pull/9986
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435531102
##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -13419,6 +13420,73 @@ public void testNvlColumns() throws Exception
);
}
+ @Test
+ public void testGroupByWithLiteralInSubqueryGrouping() throws Exception
+ {
+ testQuery(
+ "SELECT \n"
+ + " t1, t2\n"
+ + " FROM\n"
+ + " ( SELECT\n"
+ + " 'dummy' as t1,\n"
+ + " CASE\n"
+ + " WHEN \n"
+ + " dim4 = 'b'\n"
+ + " THEN dim4\n"
+ + " ELSE NULL\n"
+ + " END AS t2\n"
+ + " FROM\n"
+ + " numfoo\n"
+ + " GROUP BY\n"
+ + " dim4\n"
+ + " )\n"
+ + " GROUP BY\n"
+ + " t1,t2\n",
+ ImmutableList.of(
+ GroupByQuery.builder()
+ .setDataSource(
+ GroupByQuery.builder()
+ .setDataSource(CalciteTests.DATASOURCE3)
+ .setInterval(querySegmentSpec(Filtration.eternity()))
+ .setGranularity(Granularities.ALL)
+ .setDimensions(new DefaultDimensionSpec("dim4", "_d0", ValueType.STRING))
+ .setContext(QUERY_CONTEXT_DEFAULT)
+ .build()
+ )
+ .setVirtualColumns(
+ expressionVirtualColumn(
+ "v0",
+ "\'dummy\'",
+ ValueType.STRING
+ ),
+ expressionVirtualColumn(
+ "v1",
+ "case_searched((\"_d0\" == 'b'),\"_d0\",null)",
+ ValueType.STRING
+ )
+ ) .setInterval(querySegmentSpec(Filtration.eternity()))
Review comment:
This indentation is kind of weird. I'm not sure if checkstyle will flag it, but watch out.
##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -13419,6 +13420,73 @@ public void testNvlColumns() throws Exception
);
}
+ @Test
+ public void testGroupByWithLiteralInSubqueryGrouping() throws Exception
+ {
+ testQuery(
+ "SELECT \n"
+ + " t1, t2\n"
+ + " FROM\n"
+ + " ( SELECT\n"
+ + " 'dummy' as t1,\n"
+ + " CASE\n"
+ + " WHEN \n"
+ + " dim4 = 'b'\n"
+ + " THEN dim4\n"
+ + " ELSE NULL\n"
+ + " END AS t2\n"
+ + " FROM\n"
+ + " numfoo\n"
+ + " GROUP BY\n"
+ + " dim4\n"
+ + " )\n"
+ + " GROUP BY\n"
+ + " t1,t2\n",
+ ImmutableList.of(
+ GroupByQuery.builder()
+ .setDataSource(
+ GroupByQuery.builder()
+ .setDataSource(CalciteTests.DATASOURCE3)
+ .setInterval(querySegmentSpec(Filtration.eternity()))
+ .setGranularity(Granularities.ALL)
+ .setDimensions(new DefaultDimensionSpec("dim4", "_d0", ValueType.STRING))
+ .setContext(QUERY_CONTEXT_DEFAULT)
+ .build()
+ )
+ .setVirtualColumns(
+ expressionVirtualColumn(
+ "v0",
+ "\'dummy\'",
+ ValueType.STRING
+ ),
+ expressionVirtualColumn(
+ "v1",
+ "case_searched((\"_d0\" == 'b'),\"_d0\",null)",
+ ValueType.STRING
+ )
+ ) .setInterval(querySegmentSpec(Filtration.eternity()))
+ .setDimensions(
+ dimensions(
+ new DefaultDimensionSpec("v0", "d0", ValueType.STRING),
+ new DefaultDimensionSpec("v1", "d1", ValueType.STRING)
+ )
+ )
+ .setGranularity(Granularities.ALL)
+ .setContext(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ NullHandling.replaceWithDefault() ?
+ ImmutableList.of(
+ new Object[]{"dummy", ""},
+ new Object[]{"dummy", "b"}
+ ) :
+ ImmutableList.of(
+ new Object[]{"dummy", null},
Review comment:
FYI: You could simplify this by using `NULL_STRING` for the null / "" and avoiding the ternary.
##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -424,16 +425,39 @@ private static Grouping computeGrouping(
/**
* Builds a {@link Subtotals} object based on {@link Aggregate#getGroupSets()}.
+ *
+ * @throws CannotBuildQueryException if subtotals cannot be computed
*/
private static Subtotals computeSubtotals(
final PartialDruidQuery partialQuery,
+ final PlannerContext plannerContext,
final RowSignature rowSignature
)
{
final Aggregate aggregate = partialQuery.getAggregate();
// dimBitMapping maps from input field position to group set position (dimension number).
- final int[] dimBitMapping = new int[rowSignature.size()];
+ final int[] dimBitMapping;
+ if (partialQuery.getSelectProject() != null) {
+ int fieldCount = 0;
+ for (final RexNode rexNode : partialQuery.getSelectProject().getChildExps()) {
+ final DruidExpression expression = Expressions.toDruidExpression(
+ plannerContext,
+ rowSignature,
+ rexNode
+ );
+
+ if (expression == null) {
+ throw new CannotBuildQueryException(partialQuery.getSelectProject(), rexNode);
Review comment:
I think there's no need to do this check, because it isn't needed to build the Subtotals object, and `computeDimensions` will handle the case where we're grouping on something where dimensions are based on expressions that are not translatable.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435538248
##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -424,16 +425,39 @@ private static Grouping computeGrouping(
/**
* Builds a {@link Subtotals} object based on {@link Aggregate#getGroupSets()}.
+ *
+ * @throws CannotBuildQueryException if subtotals cannot be computed
*/
private static Subtotals computeSubtotals(
final PartialDruidQuery partialQuery,
+ final PlannerContext plannerContext,
final RowSignature rowSignature
)
{
final Aggregate aggregate = partialQuery.getAggregate();
// dimBitMapping maps from input field position to group set position (dimension number).
- final int[] dimBitMapping = new int[rowSignature.size()];
+ final int[] dimBitMapping;
+ if (partialQuery.getSelectProject() != null) {
+ int fieldCount = 0;
+ for (final RexNode rexNode : partialQuery.getSelectProject().getChildExps()) {
+ final DruidExpression expression = Expressions.toDruidExpression(
+ plannerContext,
+ rowSignature,
+ rexNode
+ );
+
+ if (expression == null) {
+ throw new CannotBuildQueryException(partialQuery.getSelectProject(), rexNode);
Review comment:
Done
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm commented on pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #9986:
URL: https://github.com/apache/druid/pull/9986#issuecomment-638905062
This seems like a logic error somewhere. Throwing a CannotBuildQueryException abandons the current partial query and, if there's another path available, will cause that other path to be gone down instead. This doesn't seem like an ideal solution. There might be some queries where there is no alternate path. The logic error might cause other problems besides exceptions. So it would be better to identify and correct the logic error.
The loop you modified is assuming that the positions in `aggregate.getGroupSet()` are all referring to fields in the `rowSignature` (`dimBitMapping` was initialized to `rowSignature.size()`). This sounds reasonable to me, because the group set is all the input fields that we are grouping on, and the `rowSignature` is the input row signature. It comes from `computeOutputRowSignature(sourceRowSignature, selectProjection, null, null)`, which is the source data plus any pre-aggregation projections (`selectProjection`).
But there must be something wrong somewhere if we're getting dimBits that are lying outside the `rowSignature`. The error might be in `computeSubtotals` or perhaps in whatever created the `partialQuery` and `rowSignature` that it is being given. Maybe there is something, somewhere, that is eliminating literals, and reflecting that in `rowSignature` but not in `partialQuery.getAggregate()`. If I was you I would look for this kind of thing first.
Please look a little deeper and see if you can find the place that the logic breaks down.
Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435578056
##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -424,6 +424,8 @@ private static Grouping computeGrouping(
/**
* Builds a {@link Subtotals} object based on {@link Aggregate#getGroupSets()}.
+ *
+ * @throws CannotBuildQueryException if subtotals cannot be computed
Review comment:
Done
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9986: Fix groupBy with literal in subquery grouping
Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9986:
URL: https://github.com/apache/druid/pull/9986#discussion_r435538934
##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -13419,6 +13420,73 @@ public void testNvlColumns() throws Exception
);
}
+ @Test
+ public void testGroupByWithLiteralInSubqueryGrouping() throws Exception
+ {
+ testQuery(
+ "SELECT \n"
+ + " t1, t2\n"
+ + " FROM\n"
+ + " ( SELECT\n"
+ + " 'dummy' as t1,\n"
+ + " CASE\n"
+ + " WHEN \n"
+ + " dim4 = 'b'\n"
+ + " THEN dim4\n"
+ + " ELSE NULL\n"
+ + " END AS t2\n"
+ + " FROM\n"
+ + " numfoo\n"
+ + " GROUP BY\n"
+ + " dim4\n"
+ + " )\n"
+ + " GROUP BY\n"
+ + " t1,t2\n",
+ ImmutableList.of(
+ GroupByQuery.builder()
+ .setDataSource(
+ GroupByQuery.builder()
+ .setDataSource(CalciteTests.DATASOURCE3)
+ .setInterval(querySegmentSpec(Filtration.eternity()))
+ .setGranularity(Granularities.ALL)
+ .setDimensions(new DefaultDimensionSpec("dim4", "_d0", ValueType.STRING))
+ .setContext(QUERY_CONTEXT_DEFAULT)
+ .build()
+ )
+ .setVirtualColumns(
+ expressionVirtualColumn(
+ "v0",
+ "\'dummy\'",
+ ValueType.STRING
+ ),
+ expressionVirtualColumn(
+ "v1",
+ "case_searched((\"_d0\" == 'b'),\"_d0\",null)",
+ ValueType.STRING
+ )
+ ) .setInterval(querySegmentSpec(Filtration.eternity()))
+ .setDimensions(
+ dimensions(
+ new DefaultDimensionSpec("v0", "d0", ValueType.STRING),
+ new DefaultDimensionSpec("v1", "d1", ValueType.STRING)
+ )
+ )
+ .setGranularity(Granularities.ALL)
+ .setContext(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ NullHandling.replaceWithDefault() ?
+ ImmutableList.of(
+ new Object[]{"dummy", ""},
+ new Object[]{"dummy", "b"}
+ ) :
+ ImmutableList.of(
+ new Object[]{"dummy", null},
Review comment:
Nice!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org