You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "gianm (via GitHub)" <gi...@apache.org> on 2023/03/13 20:20:14 UTC

[GitHub] [druid] gianm commented on a diff in pull request #13179: Disabling rollup on post agg operators for MSQ sql based ingestion.

gianm commented on code in PR #13179:
URL: https://github.com/apache/druid/pull/13179#discussion_r1134572518


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1841,7 +1841,16 @@ private static Pair<List<DimensionSchema>, List<AggregatorFactory>> makeDimensio
     if (isRollupQuery) {
       // Populate aggregators from the native query when doing an ingest in rollup mode.
       for (AggregatorFactory aggregatorFactory : ((GroupByQuery) query).getAggregatorSpecs()) {
-        String outputColumn = Iterables.getOnlyElement(columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()));
+        List<String> outputColumns = columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName());
+        if (outputColumns == null || outputColumns.size() != 1) {
+          throw new ISE(
+              "Cannot use aggregator [%s] with input fields [%s] in the rollup mode. It might be using a post aggregator. Please check the native plan to figure out more information about the aggregator. You can disable the rollup mode or use a different aggregator. Please refer to SQL-based ingestion docs for more details.",

Review Comment:
   This error should really be in the SQL layer, for a couple reasons:
   
   1. It's a validation-style error that doesn't require actually executing the query. So if we can detect this problem in the SQL layer, users will get their error faster, and won't have tasks launched needlessly.
   2. We can refer to SQL concepts and SQL names rather than native concepts like "aggregator" and native names like `a0`.
   
   There's also a couple of issues with actionability that should be fixed when moving this error to the SQL layer:
   
   - It suggests "disable the rollup mode" as a way to fix the problem, but "rollup mode" is not a thing in MSQ. As we mention in `multi-stage-query/concepts.md`, rollup is something that MSQ does automatically when certain conditions are met. The user would need to switch on aggregation finalization, which may not be what they want.
   - It suggests "refer to SQL-based ingestion docs", but doesn't provide a link or a hint on what to look for. Doc references should be URLs.
   - It suggests "check the native plan", but doesn't say how to do that. (Although I think when we move this to SQL layer, we won't need this part.)
   
   There's three places we can check things in the SQL layer:
   
   1. In the SQL validation phase (after parsing, prior to optimizing), i.e. in `validate` in `DruidPlanner` or `IngestHandler`.
   2. Immediately prior to translation from logical plan to Druid plan, i.e. in `MSQTaskSqlEngine#buildQueryMakerForInsert`
   3. Immediately prior to execution, i.e. in `MSQTaskQueryMaker` (the latest possible time to validate something before a controller task is launched)
   
   Ideally we validate as much as possible as early as possible. Also, ideally, we validate it in a place where we can know the SQL name of the agg function (e.g. from an instance of `SqlAggFunction`). That'll let us include the SQL name in the error message.
   
   I'm not totally sure which place is best for this. My guess is (1) or (2), since by (3) we've gone pretty much fully native. Perhaps @paul-rogers would have some advice as he has some experience with the validation stack.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org