You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/06/30 04:33:12 UTC
[GitHub] [druid] jihoonson commented on a change in pull request #11379: improve groupBy query granularity translation with 2x query performance improve when issued from sql layer

jihoonson commented on a change in pull request #11379:
URL: https://github.com/apache/druid/pull/11379#discussion_r661121700



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -977,6 +977,33 @@ public GroupByQuery toGroupByQuery()
       // Cannot handle zero or negative limits.
       return null;
     }
+    Map<String, Object> theContext = plannerContext.getQueryContext();
+
+    Granularity queryGranularity = null;
+
+    if (!grouping.getDimensions().isEmpty()) {
+      for (DimensionExpression dimensionExpression : grouping.getDimensions()) {

Review comment:
       Should we exclude the `dimensionExpression` from the dimensionSpecs that is passed to the constructor of `GroupByQuery` if it's a time_floor function? It seems duplicate since we can now pass the right granularity.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java
##########
@@ -213,8 +215,34 @@ public boolean doMergeResults(final GroupByQuery query)
     context.put("finalize", false);
     context.put(GroupByQueryConfig.CTX_KEY_STRATEGY, GroupByStrategySelector.STRATEGY_V2);
     context.put(CTX_KEY_OUTERMOST, false);
-    if (query.getUniversalTimestamp() != null) {
-      context.put(CTX_KEY_FUDGE_TIMESTAMP, String.valueOf(query.getUniversalTimestamp().getMillis()));
+
+    DateTime universalTimestamp = query.getUniversalTimestamp();
+    Granularity granularity = query.getGranularity();
+    List<DimensionSpec> dimensionSpecs = query.getDimensions();
+    final String timestampResultField = query.getContextValue(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD);
+    final boolean hasTimestampResultField = (timestampResultField != null && timestampResultField.length() != 0)
+                                            && query.getContextBoolean(CTX_KEY_OUTERMOST, true)
+                                            && !query.isApplyLimitPushDown();
+    if (hasTimestampResultField) {
+      // sql like "group by city_id,time_floor(__time to day)",
+      // the original translated query is granularity=all and dimensions:[d0, d1]
+      // the better plan is granularity=day and dimensions:[d0]
+      // but the ResultRow structure is changed from [d0, d1] to [__time, d0]
+      // this structure should be fixed as [d0, d1] (actually it is [d0, __time]) before postAggs are called
+      final Granularity timestampResultFieldGranularity
+          = query.getContextValue(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD_GRANULARITY);
+      dimensionSpecs =
+          query.getDimensions()
+               .stream()
+               .filter(dimensionSpec -> !dimensionSpec.getOutputName().equals(timestampResultField))
+               .collect(Collectors.toList());
+      granularity = timestampResultFieldGranularity;
+      universalTimestamp = null;
+      int timestampResultFieldIndexInOriginalDimensions = query.getContextValue(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD_INDEX);
+      context.put(GroupByQuery.CTX_KEY_SORT_BY_DIMS_FIRST, timestampResultFieldIndexInOriginalDimensions > 0);

Review comment:
       This query planning should be better to be done in `DruidQuery`. Can we move this to there?

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java
##########
@@ -285,6 +342,34 @@ public boolean doMergeResults(final GroupByQuery query)
     }
   }
 
+  private void fixResultRowWithTimestampResultField(

Review comment:
       This remapping should be done in the sql layer. See `QueryMaker.remapFields()`.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/GroupByQuery.java
##########
@@ -207,9 +210,17 @@ private GroupByQuery(
       Preconditions.checkArgument(spec != null, "dimensions has null DimensionSpec");
     }
 
+    List<String> dimensionNames = this.dimensions
+        .stream()
+        .map(DimensionSpec::getOutputName)
+        .collect(Collectors.toList());
+    String timestampField = getContextValue(CTX_TIMESTAMP_RESULT_FIELD);
+    if (timestampField != null) {
+      dimensionNames.add(timestampField);

Review comment:
       I don't think groupBy query engine should know about the timestamp column remapping at all. It should be able to use the `__time` column as it is and the sql layer should handle the column remapping.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -977,6 +977,33 @@ public GroupByQuery toGroupByQuery()
       // Cannot handle zero or negative limits.
       return null;
     }
+    Map<String, Object> theContext = plannerContext.getQueryContext();
+
+    Granularity queryGranularity = null;
+
+    if (!grouping.getDimensions().isEmpty()) {
+      for (DimensionExpression dimensionExpression : grouping.getDimensions()) {
+        Granularity granularity = Expressions.toQueryGranularity(
+            dimensionExpression.getDruidExpression(),
+            plannerContext.getExprMacroTable()
+        );
+        if (granularity == null) {
+          continue;
+        }
+        if (queryGranularity != null) {
+          // group by more than one timestamp_floor
+          // eg: group by timestamp_floor(__time to DAY),timestamp_floor(__time, to HOUR)
+          theContext = plannerContext.getQueryContext();
+          break;
+        }
+        queryGranularity = granularity;
+        int timestampDimensionIndexInDimensions = grouping.getDimensions().indexOf(dimensionExpression);
+        theContext = new HashMap<>(plannerContext.getQueryContext());
+        theContext.put(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD, dimensionExpression.getOutputName());
+        theContext.put(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD_GRANULARITY, queryGranularity);

Review comment:
       Why not passing `queryGranularity` to the constructor of `GroupByQuery` directly?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org