You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "cryptoe (via GitHub)" <gi...@apache.org> on 2023/04/10 14:08:09 UTC

[GitHub] [druid] cryptoe commented on a diff in pull request #14025: MSQ: Support multiple result columns with the same name.

cryptoe commented on code in PR #14025:
URL: https://github.com/apache/druid/pull/14025#discussion_r1161739137


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1841,13 +1856,15 @@ private static Pair<List<DimensionSchema>, List<AggregatorFactory>> makeDimensio
     if (isRollupQuery) {
       // Populate aggregators from the native query when doing an ingest in rollup mode.
       for (AggregatorFactory aggregatorFactory : ((GroupByQuery) query).getAggregatorSpecs()) {
-        String outputColumn = Iterables.getOnlyElement(columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()));
-        if (outputColumnAggregatorFactories.containsKey(outputColumn)) {
-          throw new ISE("There can only be one aggregator factory for column [%s].", outputColumn);
+        final int outputColumn =
+            Iterables.getOnlyElement(columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()));

Review Comment:
   nit: `Iterables.getOnlyElement` fails with a non intuitive user error. Cam we remove that call here and do a size check. 
   
   If the check fails throw an appropriate user error. 
   



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1856,35 +1873,38 @@ private static Pair<List<DimensionSchema>, List<AggregatorFactory>> makeDimensio
     // Each column can be of either time, dimension, aggregator. For this method. we can ignore the time column.
     // For non-complex columns, If the aggregator factory of the column is not available, we treat the column as
     // a dimension. For complex columns, certains hacks are in place.
-    for (final String outputColumn : outputColumnsInOrder) {
-      final String queryColumn = columnMappings.getQueryColumnForOutputColumn(outputColumn);
+    for (final String outputColumnName : outputColumnsInOrder) {
+      // Iterables.getOnlyElement because this method is only called during ingestion, where we require
+      // that output names be unique.
+      final int outputColumn = Iterables.getOnlyElement(columnMappings.getOutputColumnsByName(outputColumnName));

Review Comment:
   Same comment here 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/ColumnMappings.java:
##########
@@ -66,34 +74,80 @@ public static ColumnMappings identity(final RowSignature signature)
     );
   }
 
+  /**
+   * Number of output columns.
+   */
+  public int size()
+  {
+    return mappings.size();
+  }
+
+  /**
+   * All output column names, in order. Some names may appear more than once, unless
+   * {@link #hasUniqueOutputColumnNames()} is true.
+   */
   public List<String> getOutputColumnNames()
   {
     return mappings.stream().map(ColumnMapping::getOutputColumn).collect(Collectors.toList());
   }
 
+  /**
+   * Whether output column names from {@link #getOutputColumnNames()} are all unique.
+   */
+  public boolean hasUniqueOutputColumnNames()
+  {
+    final Set<String> encountered = new HashSet<>();
+
+    for (final ColumnMapping mapping : mappings) {
+      if (!encountered.add(mapping.getOutputColumn())) {
+        return false;
+      }
+    }
+
+    return true;
+  }
+
+  /**
+   * Whether a particular output column name exists.
+   */
   public boolean hasOutputColumn(final String columnName)
   {
-    return outputToQueryColumnMap.containsKey(columnName);
+    return outputNameToPositionMap.containsKey(columnName);
   }
 
-  public String getQueryColumnForOutputColumn(final String outputColumn)
+  /**
+   * Query column name for a particular output column position.
+   */
+  public String getQueryColumnName(final int outputColumn)
   {
-    final String queryColumn = outputToQueryColumnMap.get(outputColumn);
-    if (queryColumn != null) {
-      return queryColumn;
-    } else {
-      throw new IAE("No such output column [%s]", outputColumn);
-    }
+    return mappings.get(outputColumn).getQueryColumn();
   }
 
-  public List<String> getOutputColumnsForQueryColumn(final String queryColumn)
+  /**
+   * Output column name for a particular output column position.
+   */
+  public String getOutputColumnName(final int outputColumn)
   {
-    final List<String> outputColumns = queryToOutputColumnsMap.get(queryColumn);
-    if (outputColumns != null) {
-      return outputColumns;
-    } else {
-      return Collections.emptyList();
+    return mappings.get(outputColumn).getOutputColumn();

Review Comment:
   Same defensive coding comment here. 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/ColumnMappings.java:
##########
@@ -66,34 +74,80 @@ public static ColumnMappings identity(final RowSignature signature)
     );
   }
 
+  /**
+   * Number of output columns.
+   */
+  public int size()
+  {
+    return mappings.size();
+  }
+
+  /**
+   * All output column names, in order. Some names may appear more than once, unless
+   * {@link #hasUniqueOutputColumnNames()} is true.
+   */
   public List<String> getOutputColumnNames()
   {
     return mappings.stream().map(ColumnMapping::getOutputColumn).collect(Collectors.toList());
   }
 
+  /**
+   * Whether output column names from {@link #getOutputColumnNames()} are all unique.
+   */
+  public boolean hasUniqueOutputColumnNames()
+  {
+    final Set<String> encountered = new HashSet<>();
+
+    for (final ColumnMapping mapping : mappings) {
+      if (!encountered.add(mapping.getOutputColumn())) {
+        return false;
+      }
+    }
+
+    return true;
+  }
+
+  /**
+   * Whether a particular output column name exists.
+   */
   public boolean hasOutputColumn(final String columnName)
   {
-    return outputToQueryColumnMap.containsKey(columnName);
+    return outputNameToPositionMap.containsKey(columnName);
   }
 
-  public String getQueryColumnForOutputColumn(final String outputColumn)
+  /**
+   * Query column name for a particular output column position.
+   */
+  public String getQueryColumnName(final int outputColumn)
   {
-    final String queryColumn = outputToQueryColumnMap.get(outputColumn);
-    if (queryColumn != null) {
-      return queryColumn;
-    } else {
-      throw new IAE("No such output column [%s]", outputColumn);
-    }
+    return mappings.get(outputColumn).getQueryColumn();

Review Comment:
   Shouldn't we have defensive coding here since this is a public method. 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/report/MSQResultsReport.java:
##########
@@ -63,7 +68,7 @@ static MSQResultsReport fromJson(
   }
 
   @JsonProperty("signature")
-  public RowSignature getSignature()
+  public List<ColumnAndType> getSignature()

Review Comment:
   Nice. I guess we would not require any json changes with this. 
   +1



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskSqlEngine.java:
##########
@@ -257,7 +256,7 @@ private static void validateNoDuplicateAliases(final List<Pair<Integer, String>>
 
     for (final Pair<Integer, String> field : fieldMappings) {
       if (!aliasesSeen.add(field.right)) {
-        throw new ValidationException("Duplicate field in SELECT: " + field.right);
+        throw new ValidationException("Duplicate field in SELECT: [" + field.right + "]");

Review Comment:
   ```suggestion
           throw new ValidationException("Duplicate field in INSERT/REPLACE: [" + field.right + "]");
   ```



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/ColumnMappings.java:
##########
@@ -36,23 +35,32 @@
 import java.util.Set;
 import java.util.stream.Collectors;
 
+/**
+ * Maps column names from {@link MSQSpec#getQuery()} to output names desired by the user, in the order
+ * desired by the user.
+ *
+ * The {@link MSQSpec#getQuery()} is translated by {@link org.apache.druid.msq.querykit.QueryKit} into
+ * a {@link org.apache.druid.msq.kernel.QueryDefinition}. So, this class also represents mappings from
+ * {@link org.apache.druid.msq.kernel.QueryDefinition#getFinalStageDefinition()} into the output names desired
+ * by the user.
+ */
 public class ColumnMappings
 {
   private final List<ColumnMapping> mappings;
-  private final Map<String, String> outputToQueryColumnMap;
-  private final Map<String, List<String>> queryToOutputColumnsMap;
+  private final Map<String, IntList> outputNameToPositionMap;
+  private final Map<String, IntList> queryNameToPositionMap;

Review Comment:
   ```suggestion
     private final Map<String, IntList> queryColNameToPositionMap;
   ```
   
   Nit: rename



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org