You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/12/17 11:19:22 UTC

[GitHub] [druid] cryptoe opened a new pull request #12078: Grouping on arrays as arrays

cryptoe opened a new pull request #12078:
URL: https://github.com/apache/druid/pull/12078


   <!-- Thanks for trying to help us make Apache Druid be the best it can be! Please fill out as much of the following information as is possible (where relevant, and remove it when irrelevant) to help make the intention and scope of this PR clear in order to ease review. -->
   
   <!-- Please read the doc for contribution (https://github.com/apache/druid/blob/master/CONTRIBUTING.md) before making this PR. Also, once you open a PR, please _avoid using force pushes and rebasing_ since these make it difficult for reviewers to see what you've changed in response to their reviews. See [the 'If your pull request shows conflicts with master' section](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master) for more details. -->
   
   
   <!-- Replace XXXX with the id of the issue fixed in this PR. Remove this section if there is no corresponding issue. Don't reference the issue in the title of this pull-request. -->
   
   <!-- If you are a committer, follow the PR action item checklist for committers:
   https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers. -->
   
   ### Description
   Currently, grouping on multivalue columns inside druid work by exploding each value. From the druid [docs](https://druid.apache.org/docs/latest/querying/multi-value-dimensions.html#grouping) : 
   
   For an example datasoure 
   ```json
   {"timestamp": "2011-01-12T00:00:00.000Z", "tags": ["t1","t2","t3"]}  #row1
   {"timestamp": "2011-01-13T00:00:00.000Z", "tags": ["t3","t4","t5"]}  #row2
   {"timestamp": "2011-01-14T00:00:00.000Z", "tags": ["t5","t6","t7"]}  #row3
   {"timestamp": "2011-01-14T00:00:00.000Z", "tags": []}                #row4
   ```
   
   ```json
   {
     "queryType": "groupBy",
     "dataSource": "test",
     "intervals": [
       "1970-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
     ],
     "granularity": {
       "type": "all"
     },
     "dimensions": [
       {
         "type": "default",
         "dimension": "tags",
         "outputName": "tags"
       }
     ],
     "aggregations": [
       {
         "type": "count",
         "name": "count"
       }
     ]
   }
   ```
   
   This query returns the following result:
   
   ```json
   [
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 1,
         "tags": "t1"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 1,
         "tags": "t2"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 2,
         "tags": "t3"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 1,
         "tags": "t4"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 2,
         "tags": "t5"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 1,
         "tags": "t6"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 1,
         "tags": "t7"
       }
     }
   ]
   ```
   
   Notice that original rows are "**exploded**" into multiple rows and merged.
   
   
   
   <!-- Describe the goal of this PR, what problem are you fixing. If there is a corresponding issue (referenced above), it's not necessary to repeat the description here, however, you may choose to keep one summary sentence. -->
   
   <!-- Describe your patch: what did you change in code? How did you fix the problem? -->
   
   <!-- If there are several relatively logically separate changes in this PR, create a mini-section for each of them. For example: -->
   The goal of this PR is to group on multi value columns without exploding them. We are setting the **columnType**
   in the dimensionSpec to **STRING_ARRAY** to activate this new behavior. For eg:
   
   ```json
   {
     "queryType": "groupBy",
     "dataSource": "test",
     "intervals": [
       "1970-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
     ],
     "granularity": {
       "type": "all"
     },
     "dimensions": [
       {
         "type": "default",
         "dimension": "tags",
         "outputName": "tags"
         "outputType":"STRING_ARRAY"
       }
     ],
     "aggregations": [
       {
         "type": "count",
         "name": "count"
       }
     ]
   }
   ```
   will return with the following results 
   
   ```json
   [
    {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 1,
         "tags": "[]"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 1,
         "tags": "["t1","t2","t3"]"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 1,
         "tags": "[t3","t4","t5"]"
       }
     },
     {
       "timestamp": "1970-01-01T00:00:00.000Z",
       "event": {
         "count": 2,
         "tags": "["t5","t6","t7"]"
       }
     }
   ]
   ```
   
   <!--
   In each section, please describe design decisions made, including:
    - Choice of algorithms
    - Behavioral aspects. What configuration values are acceptable? How are corner cases and error conditions handled, such as when there are insufficient resources?
    - Class organization and design (how the logic is split between classes, inheritance, composition, design patterns)
    - Method organization and design (how the logic is split between methods, parameters and return types)
    - Naming (class, method, API, configuration, HTTP endpoint, names of emitted metrics)
   -->
   
   
   <!-- It's good to describe an alternative design (or mention an alternative name) for every design (or naming) decision point and compare the alternatives with the designs that you've implemented (or the names you've chosen) to highlight the advantages of the chosen designs and names. -->
   
   <!-- If there was a discussion of the design of the feature implemented in this PR elsewhere (e. g. a "Proposal" issue, any other issue, or a thread in the development mailing list), link to that discussion from this PR description and explain what have changed in your final design compared to your original proposal or the consensus version in the end of the discussion. If something hasn't changed since the original discussion, you can omit a detailed discussion of those aspects of the design here, perhaps apart from brief mentioning for the sake of readability of this PR description. -->
   
   <!-- Some of the aspects mentioned above may be omitted for simple and small changes. -->
   
   The idea was to copy concepts from how `DictionaryBuildingStringGroupByColumnSelectorStrategy` generated int<-> String on the fly. 
   Some optimizations are done while generating the dictionary keeping in mind that we want to store each string only once on the heap and reference that with the integer. As we are dealing with arrays, we are referencing the indexed array with yet another int so that the least amount of heap is used when string cardinality is low. 
   So if we have 2 row's has:
   ```
    [a,b]
    [b,c]
   ```
   the global dictionary would look like:
   ```
   "a"<> 1
   "b"<>2
   "c"<>3
   ```
   and the corresponding arrays would look like
   ``` 
   [1,2]<>1,
   [2,3]<>2 
   ```
   
   Also the inMemory structures needed to implement `comparable` and would be called per row. Hence `ComparableStringArrays`, 'ComparableIntArrays' are coded in such a way that they don't have the overhead of 'Lists' and 'Integer'. 
   
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `ComparableStringArray.java`
    * `ComparableIntArray.java`
    * `ResultRowDeserializer.java`
    * `RowBasedGrouperHelper.java`
    * `ArrayGroupByColumnSelectorStrategy.java`
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780203919



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       the engine has been waiting on array grouping for this coercion to go away, though there is a missing piece I think where we need to ensure that when grouping on a virtual column we never plan to a topn query in `DruidQuery`.
   
   I think most of the `MV_` prefixed functions have their return type as `VARCHAR` rather than `ARRAY`, so in theory the "documented" multi-value string functions shouldn't actually be affected, and any that will be is a bug that we should fix. The undocumented `ARRAY_` prefixed functions will now behave as intended and validate/operate as true SQL arrays (and so group correctly as arrays). If anyone was using the undocumented `ARRAY_` prefix functions, then I guess that behavior will change, since grouping on those expressions would previously have been grouping as `STRING` due to the implicit unnest.
   
   The only reason I can think of to add a feature flag is if need a way to hide bugs that this change causes, and that _should_ only happen if any of the `MV_` functions have bugs in their SQL output types, or if people were using the undocumented `ARRAY_` functions.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -143,54 +144,6 @@ public void testSelectNonConstantArrayExpressionFromTable() throws Exception
     );
   }
 
-  @Test
-  public void testSelectNonConstantArrayExpressionFromTableForMultival() throws Exception

Review comment:
       heh, i wrote this test originally to document moderately strange current behavior. This PR changes the behavior of `array` constructor function to no longer "map" multi-value string inputs and instead treat them as arrays, so it should probably be repurposed to test whatever the new behavior should be probably
   
   The map behavior was not very intuitive, since it would make a nested array where the elements are single element arrays from the multi-value string, so idk that it needs preserved.
   
   We might want to consider making use of a multi-value string inside of the array constructor an error, since i'm not sure there is an intuitive behavior because it is ambiguous whether the user is acknowledging the column as a multi-value, or doesn't know and thinks it single valued, which most of the other functions don't have this ambiguity, though it probably doesn't need to be changed in this PR.

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose input type is array. So array function has

Review comment:
       i have a concern with this change, best illustrated by the implication that now an expression like this
   
   ```
   array(array('x','y','z'))
   ```
   will now produce `['x','y','z']` instead of `[['x','y','z']]` as one would expect, or would happen if the expression were this instead:
   
   ```
   array(array('x','y','z'),array('a','b','c'))
   ```
   which will still produce `[['x','y','z'],['a','b','c']]`
   
   Instead of hijacking this function to accommodate multi-value string behavior, I think we should instead add a `MV_TO_ARRAY` function in SQL that only accepts a string, and then in the native expression layer either just use `cast(x, 'ARRAY<STRING>')` or also add a `mv_to_array` function that basically does the same thing as cast. This way we are explicit on when we intend to treat a multi-value column directly as an array 

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose input type is array. So array function has

Review comment:
       i have a concern with this change, best illustrated by the implication that now an expression like this
   
   ```
   array(array('x','y','z'))
   ```
   will now produce `['x','y','z']` instead of `[['x','y','z']]` as one would expect, or would happen if the expression were this instead:
   
   ```
   array(array('x','y','z'),array('a','b','c'))
   ```
   which will still produce `[['x','y','z'],['a','b','c']]`
   
   Instead of hijacking this function to accommodate multi-value string behavior, I think we should instead add a `MV_TO_ARRAY` function in SQL that only accepts a string, and then in the native expression layer either just use `cast(x, 'ARRAY<STRING>')` or also add a `mv_to_array` function that basically does the same thing as cast. This way we are explicit on when we intend to treat a multi-value column directly as an array.
   
   I think the changes to prevent the array constructor from being mapped for multi-value strings are fine to leave as is though, since the mapping is sort of strange




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] dbardbar commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
dbardbar commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-998812329


   @cryptoe,
   
   The use-case is that some of the values in the MV field are not interesting for a specific query, and we would like to ignore them for the purpose of the GROUP BY. They are kept there because those ignored tags might be used for filtering, or might be used for GROUP BY when performing a different query.
   
   An example query, using the example data appearing at the top of this PR:
   
   SELECT MV_FILTER_ONLY(tags, ARRAY['t3', 't4'], COUNT(*) FROM test GROUP BY 1
   
   with your new code enabled, I would expect the following to return:
   ```
   ["t3"],       1            (from row1)
   ["t3", "t4"], 1            (from row2)
   [],           2            (from row3+row4)
   ```
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r778013977



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/ResultRowDeserializer.java
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby;
+
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.databind.DeserializationContext;
+import com.fasterxml.jackson.databind.JsonDeserializer;
+import org.apache.druid.data.input.Row;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.data.ComparableStringArray;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+
+public class ResultRowDeserializer extends JsonDeserializer<ResultRow>
+{
+  final List<ColumnType> types;
+  final GroupByQuery query;
+
+  public ResultRowDeserializer(final List<ColumnType> types, final GroupByQuery query)
+  {
+    this.types = types;
+    this.query = query;
+  }
+
+  public static ResultRowDeserializer fromQuery(
+      final GroupByQuery query
+  )
+  {
+    RowSignature rowSignature = query.getResultRowSignature();
+    final List<ColumnType> types = new ArrayList<>(rowSignature.size());
+
+    for (String name : rowSignature.getColumnNames()) {
+      final ColumnType type = rowSignature.getColumnType(name)
+                                          .orElseThrow(() -> new ISE("No type for column [%s]", name));
+
+      types.add(type);
+    }
+
+    return new ResultRowDeserializer(types, query);
+
+  }
+
+  @Override
+  public ResultRow deserialize(JsonParser jp, DeserializationContext ctxt) throws IOException
+  {
+    // Deserializer that can deserialize either array- or map-based rows.
+    if (jp.isExpectedStartObjectToken()) {
+      final Row row = jp.readValueAs(Row.class);
+      return ResultRow.fromLegacyRow(row, query);
+    } else if (jp.isExpectedStartArrayToken()) {
+      final Object[] retVal = new Object[types.size()];
+
+      for (int i = 0; i < types.size(); i++) {
+        final JsonToken token = jp.nextToken();
+        switch (types.get(i).getType()) {
+          case STRING:
+            if (token == JsonToken.VALUE_NULL) {
+              retVal[i] = null;
+            } else if (token == JsonToken.VALUE_STRING) {
+              retVal[i] = jp.getText();
+            } else {
+              throw ctxt.instantiationException(
+                  ResultRow.class,
+                  StringUtils.format("Unexpected token [%s] when reading string", token)
+              );
+            }
+            break;
+
+          case LONG:
+            retVal[i] = token == JsonToken.VALUE_NULL ? null : jp.getLongValue();
+            break;
+          case DOUBLE:
+            retVal[i] = token == JsonToken.VALUE_NULL ? null : jp.getDoubleValue();
+            break;
+          case FLOAT:
+            retVal[i] = token == JsonToken.VALUE_NULL ? null : jp.getFloatValue();
+            break;
+          case ARRAY:
+            if (types.get(i).equals(ColumnType.STRING_ARRAY)) {
+              final List<String> strings = new ArrayList<>();
+              while (jp.nextToken() != JsonToken.END_ARRAY) {
+                strings.add(jp.getText());
+              }
+              retVal[i] = ComparableStringArray.of(strings.toArray(new String[0]));
+              break;
+            }
+
+          default:
+            throw new ISE("Can't handle type [%s]", types.get(i).asTypeString());

Review comment:
       Fair enough. I can remove this class altogether. Added it initially to have stricter types while deserializing. But if we have unknown types then falling back on Jackson seems much better.  

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/column/ArrayGroupByColumnSelectorStrategy.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae.column;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.BiMap;
+import com.google.common.collect.HashBiMap;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.query.groupby.epinephelinae.Grouper;
+import org.apache.druid.query.ordering.StringComparator;
+import org.apache.druid.query.ordering.StringComparators;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.data.ComparableIntArray;
+import org.apache.druid.segment.data.ComparableStringArray;
+import org.apache.druid.segment.data.IndexedInts;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class ArrayGroupByColumnSelectorStrategy
+    implements GroupByColumnSelectorStrategy
+{
+  private static final int GROUP_BY_MISSING_VALUE = -1;
+
+
+  // contains string <-> id for each element of the multi value grouping column
+  // for eg : [a,b,c] is the col value. dictionaryToInt will contain { a <-> 1, b <-> 2, c <-> 3}
+  private final BiMap<String, Integer> dictionaryToInt;
+
+  // stores each row as a integer array where the int represents the value in dictionaryToInt
+  // for eg : [a,b,c] would be converted to [1,2,3] and assigned a integer value 1.
+  // [1,2,3] <-> 1
+  private final BiMap<ComparableIntArray, Integer> intListToInt;
+
+  @Override
+  public int getGroupingKeySize()
+  {
+    return Integer.BYTES;
+  }
+
+  public ArrayGroupByColumnSelectorStrategy()
+  {
+    dictionaryToInt = HashBiMap.create();
+    intListToInt = HashBiMap.create();
+  }
+
+  @VisibleForTesting
+  ArrayGroupByColumnSelectorStrategy(
+      BiMap<String, Integer> dictionaryToInt,
+      BiMap<ComparableIntArray, Integer> intArrayToInt
+  )
+  {
+    this.dictionaryToInt = dictionaryToInt;
+    this.intListToInt = intArrayToInt;
+  }
+
+  @Override
+  public void processValueFromGroupingKey(
+      GroupByColumnSelectorPlus selectorPlus,
+      ByteBuffer key,
+      ResultRow resultRow,
+      int keyBufferPosition
+  )
+  {
+    final int id = key.getInt(keyBufferPosition);
+
+    // GROUP_BY_MISSING_VALUE is used to indicate empty rows
+    if (id != GROUP_BY_MISSING_VALUE) {
+      final int[] intRepresentation = intListToInt.inverse()
+                                                  .get(id).getDelegate();
+      final String[] stringRepresentaion = new String[intRepresentation.length];
+      for (int i = 0; i < intRepresentation.length; i++) {
+        stringRepresentaion[i] = dictionaryToInt.inverse().get(intRepresentation[i]);
+      }
+      resultRow.set(selectorPlus.getResultRowPosition(), ComparableStringArray.of(stringRepresentaion));
+    } else {
+      resultRow.set(selectorPlus.getResultRowPosition(), ComparableStringArray.of(NullHandling.defaultStringValues()));

Review comment:
       The thought initially was to make it similar to StringGroupByColumnSelectorStrategy. But thinking about it now arrays need not have a default value and can just return null. 
   Acked.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/builtin/ArrayConstructorOperatorConversion.java
##########
@@ -45,6 +53,22 @@ public DruidExpression toDruidExpression(
       final RexNode rexNode
   )
   {
+    // Check if array needs to be unnested
+    if (plannerContext.getQueryContext()
+                      .getOrDefault(
+                          QueryContexts.ENABLE_UNNESTED_ARRAYS_KEY,
+                          QueryContexts.DEFAULT_ENABLE_UNNESTED_ARRAYS
+                      ).equals(Boolean.FALSE)) {
+      List<RexNode> nodes = ((RexCall) rexNode).getOperands();
+      Preconditions.checkArgument(
+          nodes == null || nodes.size() != 1,
+          "ARRAY[] should have exactly one argument"
+      );
+      if (nodes.get(0).getKind() == SqlKind.LITERAL) {
+        throw new UOE("ARRAY[] support for literals not implemented");
+      }
+      return Expressions.toDruidExpression(plannerContext, rowSignature, nodes.get(0));
+    }

Review comment:
       Makes sense. I have removed the array hook from here and pushed the logic to ArrayConstructorFunction. 

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -386,13 +393,16 @@ public static boolean canPushDownLimit(ColumnSelectorFactory columnSelectorFacto
     @Override
     public GroupByColumnSelectorStrategy makeColumnSelectorStrategy(
         ColumnCapabilities capabilities,
-        ColumnValueSelector selector
+        ColumnValueSelector selector,
+        DimensionSpec dimensionSpec
     )
     {
       switch (capabilities.getType()) {
         case STRING:

Review comment:
       I will be breaking up the PR into two phases.
   1. Add the string implementation first but also have the generic building blocks in place.
   2. Add the rest of the world implementation. 
   At this point, I am not sure. If one PR or 2 PR's would be necessary. 

##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -376,11 +373,37 @@ public static Float convertObjectToFloat(@Nullable Object valObj, boolean report
         return convertObjectToDouble(obj, reportParseExceptions);
       case STRING:
         return convertObjectToString(obj);
+      case ARRAY:
+        return convertToArray(obj);

Review comment:
       acked. Working on it

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java
##########
@@ -1357,6 +1373,12 @@ private RowBasedKeySerdeHelper makeSerdeHelper(
     )
     {
       switch (valueType.getType()) {
+        case ARRAY:
+          return new ArrayRowBasedKeySerdeHelper(

Review comment:
       Acked. Working on it.

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChestTest.java
##########
@@ -573,7 +572,7 @@ public void testResultSerde() throws Exception
         .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
         .setDimensions(Collections.singletonList(DefaultDimensionSpec.of("test")))
         .setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.ROWS_COUNT))
-        .setPostAggregatorSpecs(Collections.singletonList(new ConstantPostAggregator("post", 10)))
+        .setPostAggregatorSpecs(Collections.singletonList(new ConstantPostAggregator("post", 10.0)))

Review comment:
       Yup.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -403,7 +404,17 @@ private static Grouping computeGrouping(
       }
 
       final RelDataType dataType = rexNode.getType();
-      final ColumnType outputType = Calcites.getColumnTypeForRelDataType(dataType);
+      final ColumnType outputType;
+      if (plannerContext.getQueryContext()
+                        .getOrDefault(
+                            QueryContexts.ENABLE_UNNESTED_ARRAYS_KEY,
+                            QueryContexts.DEFAULT_ENABLE_UNNESTED_ARRAYS
+                        ).equals(Boolean.FALSE)) {
+        outputType = Calcites.getValueTypeForRelDataTypeFull(dataType);
+      } else {
+        outputType = Calcites.getColumnTypeForRelDataType(dataType);
+      }

Review comment:
       acked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] dbardbar edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
dbardbar edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-998812329


   @cryptoe,
   
   The use-case is that some of the values in the MV field are not interesting for a specific query, and we would like to ignore them for the purpose of the GROUP BY. They are kept there because those ignored tags might be used for filtering, or might be used for GROUP BY when performing a different query.
   
   An example query, using the example data appearing at the top of this PR:
   
   SELECT MV_FILTER_ONLY(tags, ARRAY['t3', 't4']), COUNT(*) FROM test GROUP BY 1
   
   with your new code enabled, I would expect the following to return:
   ```
   ["t3"],       1            (from row1)
   ["t3", "t4"], 1            (from row2)
   null,         2            (from row3+row4)
   ```
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] dbardbar commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
dbardbar commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-997874710


   Just wondering - can't you achieve the same thing with MV_TO_STRING? 
   Also, have you given any thought about how your code will be generated from SQL?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780896568



##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn("v0", "array(placementish)", ColumnType.STRING_ARRAY,
+                                                       ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("a", "preferred"), "rows", 2L, "idx", 282L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("b", "preferred"), "rows", 2L, "idx", 230L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("e", "preferred"), "rows", 2L, "idx", 324L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("h", "preferred"), "rows", 2L, "idx", 233L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("m", "preferred"), "rows", 6L, "idx", 5317L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("n", "preferred"), "rows", 2L, "idx", 235L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("p", "preferred"), "rows", 6L, "idx", 5405L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("preferred", "t"), "rows", 4L, "idx", 420L)
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithOtherDims()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn("v0", "array(placementish)", ColumnType.STRING_ARRAY,
+                                                       ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY),
+            new DefaultDimensionSpec("quality", "quality")
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("a", "preferred"),
+            "quality",
+            "automotive",
+            "rows",
+            2L,
+            "idx",
+            282L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("b", "preferred"),
+            "quality",
+            "business",
+            "rows",
+            2L,
+            "idx",
+            230L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("e", "preferred"),
+            "quality",
+            "entertainment",
+            "rows",
+            2L,
+            "idx",
+            324L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("h", "preferred"),
+            "quality",
+            "health",
+            "rows",
+            2L,
+            "idx",
+            233L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("m", "preferred"),
+            "quality",
+            "mezzanine",
+            "rows",
+            6L,
+            "idx",
+            5317L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("n", "preferred"),
+            "quality",
+            "news",
+            "rows",
+            2L,
+            "idx",
+            235L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("p", "preferred"),
+            "quality",
+            "premium",
+            "rows",
+            6L,
+            "idx",
+            5405L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "technology",
+            "rows",
+            2L,
+            "idx",
+            175L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "travel",
+            "rows",
+            2L,
+            "idx",
+            245L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dims-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithVirtualDim()

Review comment:
       Good catch. Looks like duplicate test case got added during refactoring




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1012731768


   Sure. Collating all the next steps:
   
   - [X] Getting ordering on array grouping key to work
   - [] Null coercion
   - [] Optimized Limit pushdown in array comparators
   - [] mv_to_array to support expressions. Move to native cast. 
   - [] Query context flag to not allow grouping on multi value columns
   - [] Refactor stuff to remove DimensionHandlerUtils returning a comparable
   - [] Tighter validation on matching dimension spec with column type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-997437279


   > 
   
   Hey @FrankChen021 Thanks for looking into it. I am not sure that I understand your concern. We generally put the column type in the dimension spec no. Checkout the class `DefaultDimensionSpec`. I am using that.
   
   The pr extends the functionality of the group by engine. The grouping keys themselves are changing. I am not sure how `resultFormat` would help here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780624974



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       
   
   
   
   > As part of this patch, then, let's check all the MV functions to make sure the return types are what we expect them to be.
   
   Looks like all mv functions are overriding the output signature of the corresponding array functions to return a VARCHAR instead of an array
   
   > 
   > As to the array stuff, one of them is documented: ARRAY_AGG. And you could do that in a subquery and group on the result. Some questions about that:
   > 
   >     * Do we have tests for the case of grouping on the result of an ARRAY_AGG? If not we should add it.
   Were you looking for [this](https://github.com/apache/druid/pull/12078/files#diff-8bc53aec44924e671b3c6f6f34c6f0c499f873d9000649c47c237444707aea4bL1640) test case ?
   
   > 
   >     * Does the behavior change or does this introduce excess risk due to potential bugs? If so, a feature flag would be a good idea.
   
   Still thinking about this 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1012731768


   Sure. Collating all the next steps:
   
   - [X] Getting ordering on array grouping key to work
   - [X] Null coercion
   - [ ] Optimized Limit pushdown in array comparators
   - [ ] mv_to_array to support expressions. Move to native cast. 
   - [ ] Query context flag to not allow grouping on multi value columns
   - [ ] Refactor stuff to remove DimensionHandlerUtils returning a comparable
   - [ ] Tighter validation on matching dimension spec with column type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-997437279


   > 
   
   Hey @FrankChen021 Thanks for looking into it. I am not sure that I understand your concern. We generally put the column type in the dimension spec no? Checkout the class `DefaultDimensionSpec`. I am using that.
   
   The pr extends the functionality of the group by engine. The grouping keys themselves are changing. I am not sure how `resultFormat` would help here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1009068625


   
   > The use-case is that some of the values in the MV field are not interesting for a specific query, and we would like to ignore them for the purpose of the GROUP BY. They are kept there because those ignored tags might be used for filtering, or might be used for GROUP BY when performing a different query.
   > 
   > An example query, using the example data appearing at the top of this PR:
   > 
   > SELECT MV_FILTER_ONLY(tags, ARRAY['t3', 't4']), COUNT(*) FROM test GROUP BY 1
   > 
   > with your new code enabled, I would expect the following to return:
   > 
   > ```
   > ["t3"],       1            (from row1)
   > ["t3", "t4"], 1            (from row2)
   > null,         2            (from row3+row4)
   > ```
   
   For the first cut, I have not enabled expression as part of the MV_TO_ARRAY(). It has to be a native multiValueString/String col. 
   FWIF @dbardbar you can query like
   ```
   SELECT MV_TO_ARRAY(dim3), SUM(cnt) FROM druid.numfoo where MV_CONTAINS(dim3, ARRAY['b']) GROUP BY 1 
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1009068625


   > The use-case is that some of the values in the MV field are not interesting for a specific query, and we would like to ignore them for the purpose of the GROUP BY. They are kept there because those ignored tags might be used for filtering, or might be used for GROUP BY when performing a different query.
   > 
   > An example query, using the example data appearing at the top of this PR:
   > 
   > SELECT MV_FILTER_ONLY(tags, ARRAY['t3', 't4']), COUNT(*) FROM test GROUP BY 1
   > 
   > with your new code enabled, I would expect the following to return:
   > 
   > ```
   > ["t3"],       1            (from row1)
   > ["t3", "t4"], 1            (from row2)
   > null,         2            (from row3+row4)
   > ```
   
   For the first cut, I have not enabled expression as part of the MV_TO_ARRAY(). It has to be a native multiValueString/String col. 
   @dbardbar you can query like this
   ```
   SELECT MV_TO_ARRAY(dim3), SUM(cnt) FROM druid.numfoo where MV_CONTAINS(dim3, ARRAY['b']) GROUP BY 1 
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1009068625


   For the first 
   
   > The use-case is that some of the values in the MV field are not interesting for a specific query, and we would like to ignore them for the purpose of the GROUP BY. They are kept there because those ignored tags might be used for filtering, or might be used for GROUP BY when performing a different query.
   > 
   > An example query, using the example data appearing at the top of this PR:
   > 
   > SELECT MV_FILTER_ONLY(tags, ARRAY['t3', 't4']), COUNT(*) FROM test GROUP BY 1
   > 
   > with your new code enabled, I would expect the following to return:
   > 
   > ```
   > ["t3"],       1            (from row1)
   > ["t3", "t4"], 1            (from row2)
   > null,         2            (from row3+row4)
   > ```
   
   For the first cut, I have not enabled expression as part of the MV_TO_ARRAY(). It has to be a native multiValueString/String col. 
   FWIF @dbardbar you can query like
   ```
   SELECT MV_TO_ARRAY(dim3), SUM(cnt) FROM druid.numfoo where MV_CONTAINS(dim3, ARRAY['b']) GROUP BY 1 
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r782133152



##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1309,531 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING.");
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("a", "preferred"), "rows", 2L, "idx", 282L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("b", "preferred"), "rows", 2L, "idx", 230L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("e", "preferred"), "rows", 2L, "idx", 324L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("h", "preferred"), "rows", 2L, "idx", 233L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("m", "preferred"), "rows", 6L, "idx", 5317L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("n", "preferred"), "rows", 2L, "idx", 235L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("p", "preferred"), "rows", 6L, "idx", 5405L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("preferred", "t"), "rows", 4L, "idx", 420L)
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testSingleValueDimensionAsArray()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING");
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placement)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(
+            QueryRunnerTestHelper.ROWS_COUNT,
+            new LongSumAggregatorFactory("idx", "index")
+        )
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = ImmutableList.of(
+        makeRow(query, "2011-04-01", "alias",
+                ComparableStringArray.of("preferred"), "rows", 26L, "idx", 12446L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithOtherDims()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING");
+    }
+
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY),
+            new DefaultDimensionSpec("quality", "quality")
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("a", "preferred"),
+            "quality",
+            "automotive",
+            "rows",
+            2L,
+            "idx",
+            282L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("b", "preferred"),
+            "quality",
+            "business",
+            "rows",
+            2L,
+            "idx",
+            230L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("e", "preferred"),
+            "quality",
+            "entertainment",
+            "rows",
+            2L,
+            "idx",
+            324L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("h", "preferred"),
+            "quality",
+            "health",
+            "rows",
+            2L,
+            "idx",
+            233L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("m", "preferred"),
+            "quality",
+            "mezzanine",
+            "rows",
+            6L,
+            "idx",
+            5317L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("n", "preferred"),
+            "quality",
+            "news",
+            "rows",
+            2L,
+            "idx",
+            235L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("p", "preferred"),
+            "quality",
+            "premium",
+            "rows",
+            6L,
+            "idx",
+            5405L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "technology",
+            "rows",
+            2L,
+            "idx",
+            175L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "travel",
+            "rows",
+            2L,
+            "idx",
+            245L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dims-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("Not supported for multi-value dimensions");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(
+            new DefaultDimensionSpec("placementish", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testSingleValueDimensionAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("java.lang.String cannot be cast to [Ljava.util.Objects");
+    }
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(
+            new DefaultDimensionSpec("placement", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    List<ResultRow> expectedResults = ImmutableList.of(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred"),
+            "rows",
+            26L,
+            "idx",
+            12446L
+        ));
+    TestHelper.assertExpectedObjects(
+        expectedResults,
+        results,
+        "single-value-dims-groupby-arrays-as-string-arrays"
+    );
+  }
+
+
+  @Test
+  public void testNumericDimAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("java.lang.Double cannot be cast to [Ljava.util.Objects");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(
+            new DefaultDimensionSpec("index", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+
+  @Test
+  public void testMultiValueVirtualDimAsString()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("java.lang.Double cannot be cast to [Ljava.util.Objects");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("vo", "alias", ColumnType.STRING)
+        )
+        .setDimensions(
+            new DefaultDimensionSpec("index", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testExtractionStringSpecWithMultiValueVirtualDimAsInput()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 does not support dimension selectors with unknown cardinality");
+    }
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new ExtractionDimensionSpec("v0", "alias", ColumnType.STRING,
+                                        new SubstringDimExtractionFn(1, 1)
+            )
+        )
+
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            null,
+            "rows",
+            26L,
+            "idx",
+            12446L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            "r",
+            "rows",
+            26L,
+            "idx",
+            12446L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(
+        expectedResults,
+        results,
+        "multi-value-extraction-spec-as-string-dim-groupby-arrays"
+    );
+  }
+
+
+  @Test
+  public void testExtractionStringArraySpecWithMultiValueVirtualDimAsInput()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("Not supported for multi-value dimensions");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new ExtractionDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY,
+                                        new SubstringDimExtractionFn(1, 1)
+            )
+        )
+
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testVirtualColumnNumericTypeAsStringArray()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage(
+          "org.apache.druid.segment.data.ComparableList cannot be cast to [Ljava.util.Objects");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "array(index)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY
+            )
+        )
+
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT)
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testNestedGroupByWithArrays()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else {
+      expectedException.expect(IAE.class);
+      expectedException.expectMessage("Cannot create query type helper from invalid type [ARRAY<STRING>]");

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1012731768


   Sure. Collating all the next steps:
   - Getting ordering on array grouping key to work
   - Null coercion
   - mv_to_array to support expressions 
   - Query context flag to not allow grouping on multi value columns
   - Refactor stuff to remove DimensionHandlerUtils returning a comparable
   - Tighter validation on matching dimension spec with column type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #12078:
URL: https://github.com/apache/druid/pull/12078


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1012731768


   Sure. Collating all the next steps:
   - Getting ordering on array grouping key to work
   - Null coercion
   - Limit pushdown in array comparators
   - mv_to_array to support expressions 
   - Query context flag to not allow grouping on multi value columns
   - Refactor stuff to remove DimensionHandlerUtils returning a comparable
   - Tighter validation on matching dimension spec with column type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 closed pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 closed pull request #12078:
URL: https://github.com/apache/druid/pull/12078


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-996704668


   I don't think putting the output format into the data source spec is a good design. Is it possible to extend current `ResultFormat` to meet your need?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-997933103


   @dbardbar Good Question. The goal here is to make druid SQL layer similar to other SQL layers . In traditional DB's like POSTGRES , the SQL construct of declaring an array is an 'array'. I am using that construct itself to hook the SQL layer with the native layer. Just finishing up some test cases for that.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r779334172



##########
File path: processing/src/main/java/org/apache/druid/query/dimension/ColumnSelectorStrategyFactory.java
##########
@@ -24,5 +24,9 @@
 
 public interface ColumnSelectorStrategyFactory<ColumnSelectorStrategyClass extends ColumnSelectorStrategy>
 {
-  ColumnSelectorStrategyClass makeColumnSelectorStrategy(ColumnCapabilities capabilities, ColumnValueSelector selector);
+  ColumnSelectorStrategyClass makeColumnSelectorStrategy(
+      ColumnCapabilities capabilities,
+      ColumnValueSelector selector,
+      DimensionSpec dimensionSpec

Review comment:
       Fixed!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780495521



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       As part of this patch, then, let's check all the MV functions to make sure the return types are what we expect them to be.
   
   As to the array stuff, one of them is documented: ARRAY_AGG. And you could do that in a subquery and group on the result. Some questions about that:
   
   - Do we have tests for the case of grouping on the result of an ARRAY_AGG? If not we should add it.
   - Does the behavior change or does this introduce excess risk due to potential bugs? If so, a feature flag would be a good idea.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1010489198


   @cryptoe could you please post a current list of what you intend to do as near-term follow-up PRs? I'm asking since while reading through the latest state of the patch, I had a hard time keeping track of what was "done", what was "not done", and what missing stuff is "soon to come".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r781391716



##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()

Review comment:
       Added the above mentioned test cases




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1012731768


   Sure. Collating all the next steps:
   - Getting ordering on array grouping key to work
   - Null coercion
   - Limit pushdown in comparators
   - mv_to_array to support expressions 
   - Query context flag to not allow grouping on multi value columns
   - Refactor stuff to remove DimensionHandlerUtils returning a comparable
   - Tighter validation on matching dimension spec with column type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r788384963



##########
File path: core/src/main/java/org/apache/druid/math/expr/ExpressionProcessingConfig.java
##########
@@ -28,17 +28,22 @@
 {
   public static final String NESTED_ARRAYS_CONFIG_STRING = "druid.expressions.allowNestedArrays";
   public static final String NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING = "druid.expressions.useStrictBooleans";
+  public static final String CAST_ARRAY_TO_STRING_CONFIG_STRING = "druid.expressions.allowArrayToStringCast";
 
   @JsonProperty("allowNestedArrays")
   private final boolean allowNestedArrays;
 
   @JsonProperty("useStrictBooleans")
   private final boolean useStrictBooleans;
 
+  @JsonProperty("allowArrayToStringCast")
+  private final boolean allowArrayToStringCast;
+
   @JsonCreator
   public ExpressionProcessingConfig(
       @JsonProperty("allowNestedArrays") @Nullable Boolean allowNestedArrays,
-      @JsonProperty("useStrictBooleans") @Nullable Boolean useStrictBooleans
+      @JsonProperty("useStrictBooleans") @Nullable Boolean useStrictBooleans,
+      @JsonProperty("allowArrayToStringCast") @Nullable Boolean allowArrayToStringCast

Review comment:
       this name seems too positive, like something someone might want to enable to allow permissive casting of stuff.. how about `processArraysAsMultiValueStrings`?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/ArraySqlAggregator.java
##########
@@ -114,7 +113,7 @@ public Aggregation toDruidAggregation(
 
     final String fieldName;
     final String initialvalue;
-    final ColumnType druidType = Calcites.getValueTypeForRelDataTypeFull(aggregateCall.getType());
+    final ColumnType druidType = Calcites.getColumnTypeForRelDataType(aggregateCall.getType());

Review comment:
       we should maybe still call the old version of this method so that behavior doesn't change when the flag is set to the old behavior (which would change the behavior of this method)

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -403,6 +413,20 @@ public GroupByColumnSelectorStrategy makeColumnSelectorStrategy(
           return makeNullableNumericStrategy(new FloatGroupByColumnSelectorStrategy());
         case DOUBLE:
           return makeNullableNumericStrategy(new DoubleGroupByColumnSelectorStrategy());
+        case ARRAY:
+          switch (capabilities.getElementType().getType()) {
+            case LONG:
+              return new ArrayLongGroupByColumnSelectorStrategy();
+            case STRING:
+              return new ArrayStringGroupByColumnSelectorStrategy();
+            case DOUBLE:
+              return new ArrayDoubleGroupByColumnSelectorStrategy();
+            case FLOAT:
+              // Array<Float> not supported in expressions, ingestion

Review comment:
       note to self, double check this is actually true at this layer (if it is not, it might be possibly handled with the Double strategy). While definitely true that `FLOAT` doesn't exist in expressions, and so within expressions there exists no float array, this type might still be specified by the SQL planner, whenever some float column is added into an array for example. I'm unsure if the expression selectors column capabilities would report ARRAY<FLOAT> or ARRAY<DOUBLE> as the type of the virtual column, i know it coerces DOUBLE back to FLOAT when the planner requests FLOAT types, but don't think it does the same thing for `ARRAY` so, this is _probably_ true.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1012731768


   Sure. Collating all the next steps:
   
   - [X] Getting ordering on array grouping key to work
   - [ ] Null coercion
   - [ ] Optimized Limit pushdown in array comparators
   - [ ] mv_to_array to support expressions. Move to native cast. 
   - [ ] Query context flag to not allow grouping on multi value columns
   - [ ] Refactor stuff to remove DimensionHandlerUtils returning a comparable
   - [ ] Tighter validation on matching dimension spec with column type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe closed pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe closed pull request #12078:
URL: https://github.com/apache/druid/pull/12078


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780362210



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -403,6 +412,20 @@ public GroupByColumnSelectorStrategy makeColumnSelectorStrategy(
           return makeNullableNumericStrategy(new FloatGroupByColumnSelectorStrategy());
         case DOUBLE:
           return makeNullableNumericStrategy(new DoubleGroupByColumnSelectorStrategy());
+        case ARRAY:
+          switch (capabilities.getElementType().getType()) {
+            case LONG:
+              return new ListLongGroupByColumnSelectorStrategy();

Review comment:
       The reason was that ListLong internally working on a list as the DS and hence the name but seems weird here. 
   Changed everything to ArrayX as I think that's a better name signifying whats the selector strategy for.

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableIntArray.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+
+import java.util.Arrays;
+
+public class ComparableIntArray implements Comparable<ComparableIntArray>

Review comment:
       acked

##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -376,11 +374,58 @@ public static Float convertObjectToFloat(@Nullable Object valObj, boolean report
         return convertObjectToDouble(obj, reportParseExceptions);
       case STRING:
         return convertObjectToString(obj);
+      case ARRAY:
+        switch (type.getElementType().getType()) {
+          case STRING:
+            return convertToComparableStringArray(obj);
+          default:
+            return convertToList(obj);
+        }
+
       default:
         throw new IAE("Type[%s] is not supported for dimensions!", type);
     }
   }
 
+  private static ComparableList convertToList(Object obj)
+  {
+    if (obj == null) {
+      return null;
+    }
+    if (obj instanceof List) {
+      return new ComparableList((List) obj);
+    }
+    if (obj instanceof ComparableList) {
+      return (ComparableList) obj;
+    }
+    throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
+  }
+
+
+  @Nullable
+  public static ComparableStringArray convertToComparableStringArray(Object obj)
+  {
+    if (obj == null) {
+      return null;
+    }
+    if (obj instanceof ComparableStringArray) {
+      return (ComparableStringArray) obj;
+    }
+    if (obj instanceof String[]) {
+      return ComparableStringArray.of((String[]) obj);
+    }
+    // Jackson converts the serialized array into a string. Converting it back to a string array

Review comment:
       Thanks for the catch. 

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java
##########
@@ -1426,6 +1464,186 @@ private RowBasedKeySerdeHelper makeNumericSerdeHelper(
       }
     }
 
+    private class ListRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper
+    {
+      final int keyBufferPosition;
+      final BufferComparator bufferComparator;
+
+      public ListRowBasedKeySerdeHelper(
+          int keyBufferPosition,
+          @Nullable StringComparator stringComparator
+      )
+      {
+        this.keyBufferPosition = keyBufferPosition;
+        final StringComparator comparator = stringComparator == null
+                                            ? StringComparators.LEXICOGRAPHIC
+                                            : stringComparator;
+
+        this.bufferComparator = (lhsBuffer, rhsBuffer, lhsPosition, rhsPosition) -> {
+          final ComparableList lhsComparableArray = listDictionary.get(lhsBuffer.getInt(lhsPosition
+                                                                                        + keyBufferPosition));
+          final ComparableList rhsComparableArray = listDictionary.get(rhsBuffer.getInt(rhsPosition
+                                                                                        + keyBufferPosition));
+          if (lhsComparableArray == null && rhsComparableArray == null) {
+            return 0;
+          } else if (lhsComparableArray == null) {
+            return -1;
+          } else if (rhsComparableArray == null) {
+            return 1;
+          }
+
+          List lhs = lhsComparableArray.getDelegate();
+          List rhs = rhsComparableArray.getDelegate();
+
+          int minLength = Math.min(lhs.size(), rhs.size());
+
+          //noinspection ArrayEquality
+          if (lhs == rhs) {
+            return 0;
+          }
+          for (int i = 0; i < minLength; i++) {
+            final int cmp = comparator.compare(String.valueOf(lhs.get(i)), String.valueOf(rhs.get(i)));
+            if (cmp == 0) {
+              continue;
+            }
+            return cmp;
+          }
+          if (lhs.size() == rhs.size()) {
+            return 0;
+          } else if (lhs.size() < rhs.size()) {
+            return -1;
+          }
+          return 1;
+        };
+      }
+
+      @Override
+      public int getKeyBufferValueSize()
+      {
+        return Integer.BYTES;
+      }
+
+      @Override
+      public boolean putToKeyBuffer(RowBasedKey key, int idx)
+      {
+        final ComparableList comparableList = (ComparableList) key.getKey()[idx];
+        int id = reverseDictionary.getInt(comparableList);
+        if (id == UNKNOWN_DICTIONARY_ID) {
+          id = listDictionary.size();
+          reverseListDictionary.put(comparableList, id);
+          listDictionary.add(comparableList);
+        }
+        keyBuffer.putInt(id);
+        return true;
+      }
+
+      @Override
+      public void getFromByteBuffer(ByteBuffer buffer, int initialOffset, int dimValIdx, Comparable[] dimValues)
+      {
+        dimValues[dimValIdx] = listDictionary.get(buffer.getInt(initialOffset + keyBufferPosition));
+      }
+
+      @Override
+      public BufferComparator getBufferComparator()
+      {
+        return bufferComparator;
+      }
+    }
+    private class ArrayRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper
+    {
+      final int keyBufferPosition;
+      final BufferComparator bufferComparator;
+
+      ArrayRowBasedKeySerdeHelper(
+          int keyBufferPosition,
+          @Nullable StringComparator stringComparator
+      )
+      {
+        //TODO: karan : add pushLimitDown ?

Review comment:
       Acked

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableList.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonValue;
+
+import java.util.List;
+
+
+public class ComparableList<T extends Comparable> implements Comparable<ComparableList>

Review comment:
       Done

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableStringArray.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+
+import java.util.Arrays;
+
+public class ComparableStringArray implements Comparable<ComparableStringArray>

Review comment:
       Done

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose input type is array. So array function has

Review comment:
       The original intention was to keep it close to SQL but i realized SQL does not have multi value col's :). So a new function may be necessary . 
   Cool I am adding a new `MV_TO_ARRAY` function to both the native and the SQL layer. 
   Thanks for the tip

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -143,54 +144,6 @@ public void testSelectNonConstantArrayExpressionFromTable() throws Exception
     );
   }
 
-  @Test
-  public void testSelectNonConstantArrayExpressionFromTableForMultival() throws Exception

Review comment:
       I have added this test back as the changes made to the PR as part of the array constructor are being reverted in favour of the mv_to_array function

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -923,6 +876,277 @@ public void testArrayOffset() throws Exception
     );
   }
 
+
+  @Test
+  public void testArrayGroupAsArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[dim3], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"dim3\")",
+                            ColumnType.STRING_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.STRING_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{
+                new ArrayList<String>()
+                {
+                  {
+                    add(null);
+                  }
+                }, 3L
+            },
+            new Object[]{ImmutableList.of("a", "b"), 1L},
+            new Object[]{ImmutableList.of("b", "c"), 1L},
+            new Object[]{ImmutableList.of("d"), 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testArrayGroupAsLongArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[l1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"l1\")",
+                            ColumnType.LONG_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.LONG_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0L), 4L},
+            new Object[]{ImmutableList.of(325323L), 1L},
+            new Object[]{ImmutableList.of(7L), 1L}
+        )
+    );
+  }
+
+
+  @Test
+  public void testArrayGroupAsDoubleArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[d1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"d1\")",
+                            ColumnType.DOUBLE_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.DOUBLE_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0.0), 4L},
+            new Object[]{ImmutableList.of(1.0), 1L},
+            new Object[]{ImmutableList.of(1.7), 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testArrayGroupAsFloatArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[f1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"f1\")",
+                            ColumnType.DOUBLE_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.DOUBLE_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0.0), 4L},
+            new Object[]{ImmutableList.of(0.10000000149011612), 1L},
+            new Object[]{ImmutableList.of(1.0), 1L}
+        )
+    );
+  }
+
+  @Test

Review comment:
       As we no longer are changing the array constructor, will skip adding these UT's.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       
   
   
   
   > As part of this patch, then, let's check all the MV functions to make sure the return types are what we expect them to be.
   
   Looks like all mv functions are overriding the output signature of the corresponding array functions to return a VARCHAR instead of an array
   
   > 
   > As to the array stuff, one of them is documented: ARRAY_AGG. And you could do that in a subquery and group on the result. Some questions about that:
   > 
   >     * Do we have tests for the case of grouping on the result of an ARRAY_AGG? If not we should add it.
   Were you looking for [this](https://github.com/apache/druid/pull/12078/files#diff-8bc53aec44924e671b3c6f6f34c6f0c499f873d9000649c47c237444707aea4bL1640) test case ?
   
   > 
   >     * Does the behavior change or does this introduce excess risk due to potential bugs? If so, a feature flag would be a good idea.
   
   Still thinking about this 

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -340,7 +348,8 @@ public static boolean isAllSingleValueDims(
 
               // Now check column capabilities, which must be present and explicitly not multi-valued
               final ColumnCapabilities columnCapabilities = inspector.getColumnCapabilities(dimension.getDimension());
-              return columnCapabilities != null && columnCapabilities.hasMultipleValues().isFalse();
+              return dimension.getOutputType().equals(ColumnType.STRING_ARRAY)

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r791331710



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/ArraySqlAggregator.java
##########
@@ -114,7 +113,7 @@ public Aggregation toDruidAggregation(
 
     final String fieldName;
     final String initialvalue;
-    final ColumnType druidType = Calcites.getValueTypeForRelDataTypeFull(aggregateCall.getType());
+    final ColumnType druidType = Calcites.getColumnTypeForRelDataType(aggregateCall.getType());

Review comment:
       good catch. Fixed. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r791331710



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/ArraySqlAggregator.java
##########
@@ -114,7 +113,7 @@ public Aggregation toDruidAggregation(
 
     final String fieldName;
     final String initialvalue;
-    final ColumnType druidType = Calcites.getValueTypeForRelDataTypeFull(aggregateCall.getType());
+    final ColumnType druidType = Calcites.getColumnTypeForRelDataType(aggregateCall.getType());

Review comment:
       good catch. Fixed. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1012731768


   Sure. Collating all the next steps:
   
   - [X] Getting ordering on array grouping key to work
   - [X] Null coercion
   - [ ] Optimized Limit pushdown in array comparators
   - [ ] mv_to_array to support expressions. Move to native cast. 
   - [ ] Query context flag to not allow grouping on multi value columns
   - [ ] Refactor stuff to remove DimensionHandlerUtils returning a comparable
   - [ ] Tighter validation on matching dimension spec with column type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780107136



##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableIntArray.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+
+import java.util.Arrays;
+
+public class ComparableIntArray implements Comparable<ComparableIntArray>

Review comment:
       It'd be good to have unit tests specifically for this class. Especially for hashCode, equals, and compareTo.
   
   Also I suggest implementing toString, since it'll make unit tests & debugging easier.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -340,7 +348,8 @@ public static boolean isAllSingleValueDims(
 
               // Now check column capabilities, which must be present and explicitly not multi-valued
               final ColumnCapabilities columnCapabilities = inspector.getColumnCapabilities(dimension.getDimension());
-              return columnCapabilities != null && columnCapabilities.hasMultipleValues().isFalse();
+              return dimension.getOutputType().equals(ColumnType.STRING_ARRAY)

Review comment:
       This is confusing; it's changing the meaning of the function away from its name. It's called `isAllSingleValueDims`, but it'll return true if some columns are STRING_ARRAY? Seems weird.
   
   How about renaming the function `hasNoExplodingDimensions` and updating the javadocs accordingly?

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn("v0", "array(placementish)", ColumnType.STRING_ARRAY,
+                                                       ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("a", "preferred"), "rows", 2L, "idx", 282L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("b", "preferred"), "rows", 2L, "idx", 230L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("e", "preferred"), "rows", 2L, "idx", 324L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("h", "preferred"), "rows", 2L, "idx", 233L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("m", "preferred"), "rows", 6L, "idx", 5317L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("n", "preferred"), "rows", 2L, "idx", 235L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("p", "preferred"), "rows", 6L, "idx", 5405L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("preferred", "t"), "rows", 4L, "idx", 420L)
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithOtherDims()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn("v0", "array(placementish)", ColumnType.STRING_ARRAY,
+                                                       ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY),
+            new DefaultDimensionSpec("quality", "quality")
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("a", "preferred"),
+            "quality",
+            "automotive",
+            "rows",
+            2L,
+            "idx",
+            282L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("b", "preferred"),
+            "quality",
+            "business",
+            "rows",
+            2L,
+            "idx",
+            230L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("e", "preferred"),
+            "quality",
+            "entertainment",
+            "rows",
+            2L,
+            "idx",
+            324L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("h", "preferred"),
+            "quality",
+            "health",
+            "rows",
+            2L,
+            "idx",
+            233L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("m", "preferred"),
+            "quality",
+            "mezzanine",
+            "rows",
+            6L,
+            "idx",
+            5317L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("n", "preferred"),
+            "quality",
+            "news",
+            "rows",
+            2L,
+            "idx",
+            235L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("p", "preferred"),
+            "quality",
+            "premium",
+            "rows",
+            6L,
+            "idx",
+            5405L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "technology",
+            "rows",
+            2L,
+            "idx",
+            175L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "travel",
+            "rows",
+            2L,
+            "idx",
+            245L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dims-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithVirtualDim()

Review comment:
       Seems like the same test as testMultiValueDimensionAsArrayWithOtherDims.

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableStringArray.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+
+import java.util.Arrays;
+
+public class ComparableStringArray implements Comparable<ComparableStringArray>

Review comment:
       It'd be good to have unit tests specifically for this class. Especially for hashCode, equals, and compareTo.
   
   Also I suggest implementing toString, since it'll make unit tests & debugging easier.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java
##########
@@ -1426,6 +1464,186 @@ private RowBasedKeySerdeHelper makeNumericSerdeHelper(
       }
     }
 
+    private class ListRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper
+    {
+      final int keyBufferPosition;
+      final BufferComparator bufferComparator;
+
+      public ListRowBasedKeySerdeHelper(
+          int keyBufferPosition,
+          @Nullable StringComparator stringComparator
+      )
+      {
+        this.keyBufferPosition = keyBufferPosition;
+        final StringComparator comparator = stringComparator == null
+                                            ? StringComparators.LEXICOGRAPHIC
+                                            : stringComparator;
+
+        this.bufferComparator = (lhsBuffer, rhsBuffer, lhsPosition, rhsPosition) -> {
+          final ComparableList lhsComparableArray = listDictionary.get(lhsBuffer.getInt(lhsPosition
+                                                                                        + keyBufferPosition));
+          final ComparableList rhsComparableArray = listDictionary.get(rhsBuffer.getInt(rhsPosition
+                                                                                        + keyBufferPosition));
+          if (lhsComparableArray == null && rhsComparableArray == null) {
+            return 0;
+          } else if (lhsComparableArray == null) {
+            return -1;
+          } else if (rhsComparableArray == null) {
+            return 1;
+          }
+
+          List lhs = lhsComparableArray.getDelegate();
+          List rhs = rhsComparableArray.getDelegate();
+
+          int minLength = Math.min(lhs.size(), rhs.size());
+
+          //noinspection ArrayEquality
+          if (lhs == rhs) {
+            return 0;
+          }
+          for (int i = 0; i < minLength; i++) {
+            final int cmp = comparator.compare(String.valueOf(lhs.get(i)), String.valueOf(rhs.get(i)));
+            if (cmp == 0) {
+              continue;
+            }
+            return cmp;
+          }
+          if (lhs.size() == rhs.size()) {
+            return 0;
+          } else if (lhs.size() < rhs.size()) {
+            return -1;
+          }
+          return 1;
+        };
+      }
+
+      @Override
+      public int getKeyBufferValueSize()
+      {
+        return Integer.BYTES;
+      }
+
+      @Override
+      public boolean putToKeyBuffer(RowBasedKey key, int idx)
+      {
+        final ComparableList comparableList = (ComparableList) key.getKey()[idx];
+        int id = reverseDictionary.getInt(comparableList);
+        if (id == UNKNOWN_DICTIONARY_ID) {
+          id = listDictionary.size();
+          reverseListDictionary.put(comparableList, id);
+          listDictionary.add(comparableList);
+        }
+        keyBuffer.putInt(id);
+        return true;
+      }
+
+      @Override
+      public void getFromByteBuffer(ByteBuffer buffer, int initialOffset, int dimValIdx, Comparable[] dimValues)
+      {
+        dimValues[dimValIdx] = listDictionary.get(buffer.getInt(initialOffset + keyBufferPosition));
+      }
+
+      @Override
+      public BufferComparator getBufferComparator()
+      {
+        return bufferComparator;
+      }
+    }
+    private class ArrayRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper
+    {
+      final int keyBufferPosition;
+      final BufferComparator bufferComparator;
+
+      ArrayRowBasedKeySerdeHelper(
+          int keyBufferPosition,
+          @Nullable StringComparator stringComparator
+      )
+      {
+        //TODO: karan : add pushLimitDown ?

Review comment:
       Please either resolve or remove this TODO before committing, as appropriate.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       I take it that no callers are "scared of ARRAY types" anymore.
   
   1. What are the main risks of making this change, & are they mitigated by testing?
   2. Do we need a feature flag?

##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -376,11 +374,58 @@ public static Float convertObjectToFloat(@Nullable Object valObj, boolean report
         return convertObjectToDouble(obj, reportParseExceptions);
       case STRING:
         return convertObjectToString(obj);
+      case ARRAY:
+        switch (type.getElementType().getType()) {
+          case STRING:
+            return convertToComparableStringArray(obj);
+          default:
+            return convertToList(obj);
+        }
+
       default:
         throw new IAE("Type[%s] is not supported for dimensions!", type);
     }
   }
 
+  private static ComparableList convertToList(Object obj)
+  {
+    if (obj == null) {
+      return null;
+    }
+    if (obj instanceof List) {
+      return new ComparableList((List) obj);
+    }
+    if (obj instanceof ComparableList) {
+      return (ComparableList) obj;
+    }
+    throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
+  }
+
+
+  @Nullable
+  public static ComparableStringArray convertToComparableStringArray(Object obj)
+  {
+    if (obj == null) {
+      return null;
+    }
+    if (obj instanceof ComparableStringArray) {
+      return (ComparableStringArray) obj;
+    }
+    if (obj instanceof String[]) {
+      return ComparableStringArray.of((String[]) obj);
+    }
+    // Jackson converts the serialized array into a string. Converting it back to a string array

Review comment:
       Do you mean "Jackson converts the serialized array into a list"?

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -403,6 +412,20 @@ public GroupByColumnSelectorStrategy makeColumnSelectorStrategy(
           return makeNullableNumericStrategy(new FloatGroupByColumnSelectorStrategy());
         case DOUBLE:
           return makeNullableNumericStrategy(new DoubleGroupByColumnSelectorStrategy());
+        case ARRAY:
+          switch (capabilities.getElementType().getType()) {
+            case LONG:
+              return new ListLongGroupByColumnSelectorStrategy();

Review comment:
       Any reason why it's ListLong but ArrayString?

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()

Review comment:
       Please include other tests for:
   
   - Dimension type STRING_ARRAY on an underlying column of type STRING that happens to be single-valued, like `placement`
   - Dimension type STRING_ARRAY on an underlying column of type STRING that happens to be multi-valued, like `placementish`
   - Dimension type STRING_ARRAY on an underlying column of numeric type, like `index`
   - Dimension type STRING on an underlying column of type STRING_ARRAY (such as a virtual column)
   - An extraction dimension spec of type STRING on a column of type STRING_ARRAY, like `new ExtractionDimensionSpec("v0", "alias", ColumnType.STRING, new SubstringDimExtractionFn(1, 1))`
   - An extraction dimension spec of type STRING_ARRAY on a column of type STRING_ARRAY, like `new ExtractionDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY, new SubstringDimExtractionFn(1, 1))`
   - A virtual column that builds an array from a numeric type, like `array(index)`, and declares its output type as STRING_ARRAY
   - A nested groupBy query, where the inner and outer queries both have dimensions of type STRING_ARRAY
   
   If any of these aren't supposed to work, use `expectedException` to verify that the exception message is good. Otherwise, verify that the results are what you expect to see.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -923,6 +876,277 @@ public void testArrayOffset() throws Exception
     );
   }
 
+
+  @Test
+  public void testArrayGroupAsArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[dim3], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"dim3\")",
+                            ColumnType.STRING_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.STRING_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{
+                new ArrayList<String>()
+                {
+                  {
+                    add(null);
+                  }
+                }, 3L
+            },
+            new Object[]{ImmutableList.of("a", "b"), 1L},
+            new Object[]{ImmutableList.of("b", "c"), 1L},
+            new Object[]{ImmutableList.of("d"), 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testArrayGroupAsLongArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[l1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"l1\")",
+                            ColumnType.LONG_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.LONG_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0L), 4L},
+            new Object[]{ImmutableList.of(325323L), 1L},
+            new Object[]{ImmutableList.of(7L), 1L}
+        )
+    );
+  }
+
+
+  @Test
+  public void testArrayGroupAsDoubleArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[d1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"d1\")",
+                            ColumnType.DOUBLE_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.DOUBLE_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0.0), 4L},
+            new Object[]{ImmutableList.of(1.0), 1L},
+            new Object[]{ImmutableList.of(1.7), 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testArrayGroupAsFloatArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[f1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"f1\")",
+                            ColumnType.DOUBLE_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.DOUBLE_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0.0), 4L},
+            new Object[]{ImmutableList.of(0.10000000149011612), 1L},
+            new Object[]{ImmutableList.of(1.0), 1L}
+        )
+    );
+  }
+
+  @Test

Review comment:
       Please include tests for the `ARRAY` constructor with multiple arguments, like:
   
   - two multi-valued columns: `ARRAY[dim3, dim3]`
   - multi-valued plus regular column: `ARRAY[dim3, dim1]`

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableList.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonValue;
+
+import java.util.List;
+
+
+public class ComparableList<T extends Comparable> implements Comparable<ComparableList>

Review comment:
       It'd be good to have unit tests specifically for this class. Especially for hashCode, equals, and compareTo.
   
   Also I suggest implementing toString, since it'll make unit tests & debugging easier.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -143,54 +144,6 @@ public void testSelectNonConstantArrayExpressionFromTable() throws Exception
     );
   }
 
-  @Test
-  public void testSelectNonConstantArrayExpressionFromTableForMultival() throws Exception

Review comment:
       Why did you delete this test?

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;

Review comment:
       What happens if run with v1? If it's an error, better to use `expectedException` here.
   
   Same comment for the other tests too.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -403,7 +402,9 @@ private static Grouping computeGrouping(
       }
 
       final RelDataType dataType = rexNode.getType();
-      final ColumnType outputType = Calcites.getColumnTypeForRelDataType(dataType);
+      final ColumnType outputType;
+      outputType = Calcites.getColumnTypeForRelDataType(dataType);

Review comment:
       Needless change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r781661764



##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2977,6 +2977,67 @@ public ExpressionType getOutputType(Expr.InputBindingInspector inspector, List<E
     }
   }
 
+  class MVToArrayFunction implements Function
+  {
+    @Override
+    public String name()
+    {
+      return "mv_to_array";
+    }
+
+    @Override
+    public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
+    {
+      return args.get(0).eval(bindings);
+    }
+
+    @Override
+    public void validateArguments(List<Expr> args)
+    {
+      if (args.size() != 1) {
+        throw new IAE("Function[%s] needs exactly 1 argument of type String", name());
+      }
+      IdentifierExpr expr = args.get(0).getIdentifierExprIfIdentifierExpr();
+
+      if (expr == null) {
+        throw new IAE(
+            "Arg %s should be an identifier expression ie refer to columns directally. Use array[] instead",

Review comment:
       ```suggestion
               "Arg %s should be an identifier expression ie refer to columns directly. Use array[] instead",
   ```
   
   also, in the native expression layer the syntax is actually `array(...)`

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2977,6 +2977,67 @@ public ExpressionType getOutputType(Expr.InputBindingInspector inspector, List<E
     }
   }
 
+  class MVToArrayFunction implements Function
+  {
+    @Override
+    public String name()
+    {
+      return "mv_to_array";
+    }
+
+    @Override
+    public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
+    {
+      return args.get(0).eval(bindings);

Review comment:
       does this need to cast the output to string array for the cases where the input was not multi-valued?
   
   ```suggestion
         return args.get(0).eval(bindings).castTo(ExpressionType.STRING_ARRAY);
   ```
   
   assertArrayExpr in the tests calls `asArray` on the output before comparing them, but i'm not sure all possible selectors that could be built on top of this expression necessarily would, so it would probably be safer here since we explicitly want to always return an array to do the cast (or just eliminate this expression and use cast directly)

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1309,531 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING.");
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("a", "preferred"), "rows", 2L, "idx", 282L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("b", "preferred"), "rows", 2L, "idx", 230L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("e", "preferred"), "rows", 2L, "idx", 324L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("h", "preferred"), "rows", 2L, "idx", 233L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("m", "preferred"), "rows", 6L, "idx", 5317L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("n", "preferred"), "rows", 2L, "idx", 235L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("p", "preferred"), "rows", 6L, "idx", 5405L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("preferred", "t"), "rows", 4L, "idx", 420L)
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testSingleValueDimensionAsArray()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING");
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placement)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(
+            QueryRunnerTestHelper.ROWS_COUNT,
+            new LongSumAggregatorFactory("idx", "index")
+        )
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = ImmutableList.of(
+        makeRow(query, "2011-04-01", "alias",
+                ComparableStringArray.of("preferred"), "rows", 26L, "idx", 12446L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithOtherDims()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING");
+    }
+
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY),
+            new DefaultDimensionSpec("quality", "quality")
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("a", "preferred"),
+            "quality",
+            "automotive",
+            "rows",
+            2L,
+            "idx",
+            282L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("b", "preferred"),
+            "quality",
+            "business",
+            "rows",
+            2L,
+            "idx",
+            230L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("e", "preferred"),
+            "quality",
+            "entertainment",
+            "rows",
+            2L,
+            "idx",
+            324L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("h", "preferred"),
+            "quality",
+            "health",
+            "rows",
+            2L,
+            "idx",
+            233L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("m", "preferred"),
+            "quality",
+            "mezzanine",
+            "rows",
+            6L,
+            "idx",
+            5317L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("n", "preferred"),
+            "quality",
+            "news",
+            "rows",
+            2L,
+            "idx",
+            235L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("p", "preferred"),
+            "quality",
+            "premium",
+            "rows",
+            6L,
+            "idx",
+            5405L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "technology",
+            "rows",
+            2L,
+            "idx",
+            175L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "travel",
+            "rows",
+            2L,
+            "idx",
+            245L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dims-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("Not supported for multi-value dimensions");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(
+            new DefaultDimensionSpec("placementish", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testSingleValueDimensionAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("java.lang.String cannot be cast to [Ljava.util.Objects");
+    }
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(
+            new DefaultDimensionSpec("placement", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    List<ResultRow> expectedResults = ImmutableList.of(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred"),
+            "rows",
+            26L,
+            "idx",
+            12446L
+        ));
+    TestHelper.assertExpectedObjects(
+        expectedResults,
+        results,
+        "single-value-dims-groupby-arrays-as-string-arrays"
+    );
+  }
+
+
+  @Test
+  public void testNumericDimAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("java.lang.Double cannot be cast to [Ljava.util.Objects");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(
+            new DefaultDimensionSpec("index", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+
+  @Test
+  public void testMultiValueVirtualDimAsString()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("java.lang.Double cannot be cast to [Ljava.util.Objects");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("vo", "alias", ColumnType.STRING)
+        )
+        .setDimensions(
+            new DefaultDimensionSpec("index", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testExtractionStringSpecWithMultiValueVirtualDimAsInput()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 does not support dimension selectors with unknown cardinality");
+    }
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new ExtractionDimensionSpec("v0", "alias", ColumnType.STRING,
+                                        new SubstringDimExtractionFn(1, 1)
+            )
+        )
+
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            null,
+            "rows",
+            26L,
+            "idx",
+            12446L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            "r",
+            "rows",
+            26L,
+            "idx",
+            12446L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(
+        expectedResults,
+        results,
+        "multi-value-extraction-spec-as-string-dim-groupby-arrays"
+    );
+  }
+
+
+  @Test
+  public void testExtractionStringArraySpecWithMultiValueVirtualDimAsInput()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("Not supported for multi-value dimensions");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new ExtractionDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY,
+                                        new SubstringDimExtractionFn(1, 1)
+            )
+        )
+
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testVirtualColumnNumericTypeAsStringArray()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage(
+          "org.apache.druid.segment.data.ComparableList cannot be cast to [Ljava.util.Objects");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "array(index)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY
+            )
+        )
+
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT)
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testNestedGroupByWithArrays()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else {
+      expectedException.expect(IAE.class);
+      expectedException.expectMessage("Cannot create query type helper from invalid type [ARRAY<STRING>]");

Review comment:
       why doesn't this query work?

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/BaseCalciteQueryTest.java
##########
@@ -138,6 +138,12 @@ public static void setupNullValues()
     NULL_STRING = NullHandling.defaultStringValue();
     NULL_FLOAT = NullHandling.defaultFloatValue();
     NULL_LONG = NullHandling.defaultLongValue();
+    NULL_LIST = new ArrayList<Object>()
+    {
+      {
+        add(null);
+      }
+    };

Review comment:
       hmm, I know this is because of the coercion behavior in `ExprEval`, but I'm having some regret about the decision to homogenize `null`, `[]`, and `[null]` to `[null]`. I think it made the most sense when mapping back to `STRING` type, but I think the coercion logic should probably be different when we are grouping on a multi-value string as an ARRAY<STRING> instead of a STRING, and the input bindings should homogenize the value to `null` instead in that case.
   
   We don't need to change this right now I think, since this PR still isn't documenting this functionality, but something to think about for the future I think, since `[null]` is sort of strange on the result side of things when stuff is left as an actual array

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java
##########
@@ -432,7 +432,6 @@ public void serialize(
         }
       }
     };
-

Review comment:
       nit: unecessary change

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -404,6 +403,7 @@ private static Grouping computeGrouping(
 
       final RelDataType dataType = rexNode.getType();
       final ColumnType outputType = Calcites.getColumnTypeForRelDataType(dataType);
+

Review comment:
       nit: unnecessary changes

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -1640,13 +1834,65 @@ public void testArrayAggAsArrayFromJoin() throws Exception
   public void testArrayAggGroupByArrayAggFromSubquery() throws Exception
   {
     cannotVectorize();
-    // yo, can't group on array types right now so expect failure
-    expectedException.expect(RuntimeException.class);
-    expectedException.expectMessage("Cannot create query type helper from invalid type [ARRAY<STRING>]");

Review comment:
       :rocket:

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1309,531 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING.");
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("a", "preferred"), "rows", 2L, "idx", 282L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("b", "preferred"), "rows", 2L, "idx", 230L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("e", "preferred"), "rows", 2L, "idx", 324L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("h", "preferred"), "rows", 2L, "idx", 233L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("m", "preferred"), "rows", 6L, "idx", 5317L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("n", "preferred"), "rows", 2L, "idx", 235L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("p", "preferred"), "rows", 6L, "idx", 5405L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("preferred", "t"), "rows", 4L, "idx", 420L)
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testSingleValueDimensionAsArray()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING");
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placement)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(
+            QueryRunnerTestHelper.ROWS_COUNT,
+            new LongSumAggregatorFactory("idx", "index")
+        )
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = ImmutableList.of(
+        makeRow(query, "2011-04-01", "alias",
+                ComparableStringArray.of("preferred"), "rows", 26L, "idx", 12446L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithOtherDims()
+  {
+    // array types don't work with group by v1
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage(
+          "GroupBy v1 only supports dimensions with an outputType of STRING");
+    }
+
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn(
+            "v0",
+            "mv_to_array(placementish)",
+            ColumnType.STRING_ARRAY,
+            ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY),
+            new DefaultDimensionSpec("quality", "quality")
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("a", "preferred"),
+            "quality",
+            "automotive",
+            "rows",
+            2L,
+            "idx",
+            282L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("b", "preferred"),
+            "quality",
+            "business",
+            "rows",
+            2L,
+            "idx",
+            230L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("e", "preferred"),
+            "quality",
+            "entertainment",
+            "rows",
+            2L,
+            "idx",
+            324L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("h", "preferred"),
+            "quality",
+            "health",
+            "rows",
+            2L,
+            "idx",
+            233L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("m", "preferred"),
+            "quality",
+            "mezzanine",
+            "rows",
+            6L,
+            "idx",
+            5317L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("n", "preferred"),
+            "quality",
+            "news",
+            "rows",
+            2L,
+            "idx",
+            235L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("p", "preferred"),
+            "quality",
+            "premium",
+            "rows",
+            6L,
+            "idx",
+            5405L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "technology",
+            "rows",
+            2L,
+            "idx",
+            175L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "travel",
+            "rows",
+            2L,
+            "idx",
+            245L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dims-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("Not supported for multi-value dimensions");
+    }
+
+    cannotVectorize();
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(
+            new DefaultDimensionSpec("placementish", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+  }
+
+  @Test
+  public void testSingleValueDimensionAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("java.lang.String cannot be cast to [Ljava.util.Objects");
+    }
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setDimensions(
+            new DefaultDimensionSpec("placement", "alias", ColumnType.STRING_ARRAY)
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    List<ResultRow> expectedResults = ImmutableList.of(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred"),
+            "rows",
+            26L,
+            "idx",
+            12446L
+        ));
+    TestHelper.assertExpectedObjects(
+        expectedResults,
+        results,
+        "single-value-dims-groupby-arrays-as-string-arrays"
+    );
+  }
+
+
+  @Test
+  public void testNumericDimAsStringArrayWithoutExpression()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      expectedException.expect(UnsupportedOperationException.class);
+      expectedException.expectMessage("GroupBy v1 only supports dimensions with an outputType of STRING");
+    } else if (!vectorize) {
+      expectedException.expect(RuntimeException.class);
+      expectedException.expectMessage("java.lang.Double cannot be cast to [Ljava.util.Objects");

Review comment:
       i wonder if we should have tighter validation on matching dimension spec with column type, if we intend to treat dimension spec output type as a sort of cast, instead of letting it hit actual class cast exceptions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780629673



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -340,7 +348,8 @@ public static boolean isAllSingleValueDims(
 
               // Now check column capabilities, which must be present and explicitly not multi-valued
               final ColumnCapabilities columnCapabilities = inspector.getColumnCapabilities(dimension.getDimension());
-              return columnCapabilities != null && columnCapabilities.hasMultipleValues().isFalse();
+              return dimension.getOutputType().equals(ColumnType.STRING_ARRAY)

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780107136



##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableIntArray.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+
+import java.util.Arrays;
+
+public class ComparableIntArray implements Comparable<ComparableIntArray>

Review comment:
       It'd be good to have unit tests specifically for this class. Especially for hashCode, equals, and compareTo.
   
   Also I suggest implementing toString, since it'll make unit tests & debugging easier.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -340,7 +348,8 @@ public static boolean isAllSingleValueDims(
 
               // Now check column capabilities, which must be present and explicitly not multi-valued
               final ColumnCapabilities columnCapabilities = inspector.getColumnCapabilities(dimension.getDimension());
-              return columnCapabilities != null && columnCapabilities.hasMultipleValues().isFalse();
+              return dimension.getOutputType().equals(ColumnType.STRING_ARRAY)

Review comment:
       This is confusing; it's changing the meaning of the function away from its name. It's called `isAllSingleValueDims`, but it'll return true if some columns are STRING_ARRAY? Seems weird.
   
   How about renaming the function `hasNoExplodingDimensions` and updating the javadocs accordingly?

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn("v0", "array(placementish)", ColumnType.STRING_ARRAY,
+                                                       ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY))
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("a", "preferred"), "rows", 2L, "idx", 282L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("b", "preferred"), "rows", 2L, "idx", 230L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("e", "preferred"), "rows", 2L, "idx", 324L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("h", "preferred"), "rows", 2L, "idx", 233L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("m", "preferred"), "rows", 6L, "idx", 5317L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("n", "preferred"), "rows", 2L, "idx", 235L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("p", "preferred"), "rows", 6L, "idx", 5405L),
+        makeRow(query, "2011-04-01", "alias", ComparableStringArray.of("preferred", "t"), "rows", 4L, "idx", 420L)
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithOtherDims()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;
+    }
+
+    // Cannot vectorize due to multi-value dimensions.
+    cannotVectorize();
+
+    GroupByQuery query = makeQueryBuilder()
+        .setDataSource(QueryRunnerTestHelper.DATA_SOURCE)
+        .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
+        .setVirtualColumns(new ExpressionVirtualColumn("v0", "array(placementish)", ColumnType.STRING_ARRAY,
+                                                       ExprMacroTable.nil()
+        ))
+        .setDimensions(
+            new DefaultDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY),
+            new DefaultDimensionSpec("quality", "quality")
+        )
+        .setAggregatorSpecs(QueryRunnerTestHelper.ROWS_COUNT, new LongSumAggregatorFactory("idx", "index"))
+        .setGranularity(QueryRunnerTestHelper.ALL_GRAN)
+        .build();
+
+    List<ResultRow> expectedResults = Arrays.asList(
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("a", "preferred"),
+            "quality",
+            "automotive",
+            "rows",
+            2L,
+            "idx",
+            282L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("b", "preferred"),
+            "quality",
+            "business",
+            "rows",
+            2L,
+            "idx",
+            230L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("e", "preferred"),
+            "quality",
+            "entertainment",
+            "rows",
+            2L,
+            "idx",
+            324L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("h", "preferred"),
+            "quality",
+            "health",
+            "rows",
+            2L,
+            "idx",
+            233L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("m", "preferred"),
+            "quality",
+            "mezzanine",
+            "rows",
+            6L,
+            "idx",
+            5317L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("n", "preferred"),
+            "quality",
+            "news",
+            "rows",
+            2L,
+            "idx",
+            235L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("p", "preferred"),
+            "quality",
+            "premium",
+            "rows",
+            6L,
+            "idx",
+            5405L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "technology",
+            "rows",
+            2L,
+            "idx",
+            175L
+        ),
+        makeRow(
+            query,
+            "2011-04-01",
+            "alias",
+            ComparableStringArray.of("preferred", "t"),
+            "quality",
+            "travel",
+            "rows",
+            2L,
+            "idx",
+            245L
+        )
+    );
+
+    Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
+    TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dims-groupby-arrays");
+  }
+
+  @Test
+  public void testMultiValueDimensionAsArrayWithVirtualDim()

Review comment:
       Seems like the same test as testMultiValueDimensionAsArrayWithOtherDims.

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableStringArray.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+
+import java.util.Arrays;
+
+public class ComparableStringArray implements Comparable<ComparableStringArray>

Review comment:
       It'd be good to have unit tests specifically for this class. Especially for hashCode, equals, and compareTo.
   
   Also I suggest implementing toString, since it'll make unit tests & debugging easier.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java
##########
@@ -1426,6 +1464,186 @@ private RowBasedKeySerdeHelper makeNumericSerdeHelper(
       }
     }
 
+    private class ListRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper
+    {
+      final int keyBufferPosition;
+      final BufferComparator bufferComparator;
+
+      public ListRowBasedKeySerdeHelper(
+          int keyBufferPosition,
+          @Nullable StringComparator stringComparator
+      )
+      {
+        this.keyBufferPosition = keyBufferPosition;
+        final StringComparator comparator = stringComparator == null
+                                            ? StringComparators.LEXICOGRAPHIC
+                                            : stringComparator;
+
+        this.bufferComparator = (lhsBuffer, rhsBuffer, lhsPosition, rhsPosition) -> {
+          final ComparableList lhsComparableArray = listDictionary.get(lhsBuffer.getInt(lhsPosition
+                                                                                        + keyBufferPosition));
+          final ComparableList rhsComparableArray = listDictionary.get(rhsBuffer.getInt(rhsPosition
+                                                                                        + keyBufferPosition));
+          if (lhsComparableArray == null && rhsComparableArray == null) {
+            return 0;
+          } else if (lhsComparableArray == null) {
+            return -1;
+          } else if (rhsComparableArray == null) {
+            return 1;
+          }
+
+          List lhs = lhsComparableArray.getDelegate();
+          List rhs = rhsComparableArray.getDelegate();
+
+          int minLength = Math.min(lhs.size(), rhs.size());
+
+          //noinspection ArrayEquality
+          if (lhs == rhs) {
+            return 0;
+          }
+          for (int i = 0; i < minLength; i++) {
+            final int cmp = comparator.compare(String.valueOf(lhs.get(i)), String.valueOf(rhs.get(i)));
+            if (cmp == 0) {
+              continue;
+            }
+            return cmp;
+          }
+          if (lhs.size() == rhs.size()) {
+            return 0;
+          } else if (lhs.size() < rhs.size()) {
+            return -1;
+          }
+          return 1;
+        };
+      }
+
+      @Override
+      public int getKeyBufferValueSize()
+      {
+        return Integer.BYTES;
+      }
+
+      @Override
+      public boolean putToKeyBuffer(RowBasedKey key, int idx)
+      {
+        final ComparableList comparableList = (ComparableList) key.getKey()[idx];
+        int id = reverseDictionary.getInt(comparableList);
+        if (id == UNKNOWN_DICTIONARY_ID) {
+          id = listDictionary.size();
+          reverseListDictionary.put(comparableList, id);
+          listDictionary.add(comparableList);
+        }
+        keyBuffer.putInt(id);
+        return true;
+      }
+
+      @Override
+      public void getFromByteBuffer(ByteBuffer buffer, int initialOffset, int dimValIdx, Comparable[] dimValues)
+      {
+        dimValues[dimValIdx] = listDictionary.get(buffer.getInt(initialOffset + keyBufferPosition));
+      }
+
+      @Override
+      public BufferComparator getBufferComparator()
+      {
+        return bufferComparator;
+      }
+    }
+    private class ArrayRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper
+    {
+      final int keyBufferPosition;
+      final BufferComparator bufferComparator;
+
+      ArrayRowBasedKeySerdeHelper(
+          int keyBufferPosition,
+          @Nullable StringComparator stringComparator
+      )
+      {
+        //TODO: karan : add pushLimitDown ?

Review comment:
       Please either resolve or remove this TODO before committing, as appropriate.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       I take it that no callers are "scared of ARRAY types" anymore.
   
   1. What are the main risks of making this change, & are they mitigated by testing?
   2. Do we need a feature flag?

##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -376,11 +374,58 @@ public static Float convertObjectToFloat(@Nullable Object valObj, boolean report
         return convertObjectToDouble(obj, reportParseExceptions);
       case STRING:
         return convertObjectToString(obj);
+      case ARRAY:
+        switch (type.getElementType().getType()) {
+          case STRING:
+            return convertToComparableStringArray(obj);
+          default:
+            return convertToList(obj);
+        }
+
       default:
         throw new IAE("Type[%s] is not supported for dimensions!", type);
     }
   }
 
+  private static ComparableList convertToList(Object obj)
+  {
+    if (obj == null) {
+      return null;
+    }
+    if (obj instanceof List) {
+      return new ComparableList((List) obj);
+    }
+    if (obj instanceof ComparableList) {
+      return (ComparableList) obj;
+    }
+    throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
+  }
+
+
+  @Nullable
+  public static ComparableStringArray convertToComparableStringArray(Object obj)
+  {
+    if (obj == null) {
+      return null;
+    }
+    if (obj instanceof ComparableStringArray) {
+      return (ComparableStringArray) obj;
+    }
+    if (obj instanceof String[]) {
+      return ComparableStringArray.of((String[]) obj);
+    }
+    // Jackson converts the serialized array into a string. Converting it back to a string array

Review comment:
       Do you mean "Jackson converts the serialized array into a list"?

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -403,6 +412,20 @@ public GroupByColumnSelectorStrategy makeColumnSelectorStrategy(
           return makeNullableNumericStrategy(new FloatGroupByColumnSelectorStrategy());
         case DOUBLE:
           return makeNullableNumericStrategy(new DoubleGroupByColumnSelectorStrategy());
+        case ARRAY:
+          switch (capabilities.getElementType().getType()) {
+            case LONG:
+              return new ListLongGroupByColumnSelectorStrategy();

Review comment:
       Any reason why it's ListLong but ArrayString?

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()

Review comment:
       Please include other tests for:
   
   - Dimension type STRING_ARRAY on an underlying column of type STRING that happens to be single-valued, like `placement`
   - Dimension type STRING_ARRAY on an underlying column of type STRING that happens to be multi-valued, like `placementish`
   - Dimension type STRING_ARRAY on an underlying column of numeric type, like `index`
   - Dimension type STRING on an underlying column of type STRING_ARRAY (such as a virtual column)
   - An extraction dimension spec of type STRING on a column of type STRING_ARRAY, like `new ExtractionDimensionSpec("v0", "alias", ColumnType.STRING, new SubstringDimExtractionFn(1, 1))`
   - An extraction dimension spec of type STRING_ARRAY on a column of type STRING_ARRAY, like `new ExtractionDimensionSpec("v0", "alias", ColumnType.STRING_ARRAY, new SubstringDimExtractionFn(1, 1))`
   - A virtual column that builds an array from a numeric type, like `array(index)`, and declares its output type as STRING_ARRAY
   - A nested groupBy query, where the inner and outer queries both have dimensions of type STRING_ARRAY
   
   If any of these aren't supposed to work, use `expectedException` to verify that the exception message is good. Otherwise, verify that the results are what you expect to see.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -923,6 +876,277 @@ public void testArrayOffset() throws Exception
     );
   }
 
+
+  @Test
+  public void testArrayGroupAsArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[dim3], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"dim3\")",
+                            ColumnType.STRING_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.STRING_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{
+                new ArrayList<String>()
+                {
+                  {
+                    add(null);
+                  }
+                }, 3L
+            },
+            new Object[]{ImmutableList.of("a", "b"), 1L},
+            new Object[]{ImmutableList.of("b", "c"), 1L},
+            new Object[]{ImmutableList.of("d"), 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testArrayGroupAsLongArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[l1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"l1\")",
+                            ColumnType.LONG_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.LONG_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0L), 4L},
+            new Object[]{ImmutableList.of(325323L), 1L},
+            new Object[]{ImmutableList.of(7L), 1L}
+        )
+    );
+  }
+
+
+  @Test
+  public void testArrayGroupAsDoubleArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[d1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"d1\")",
+                            ColumnType.DOUBLE_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.DOUBLE_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0.0), 4L},
+            new Object[]{ImmutableList.of(1.0), 1L},
+            new Object[]{ImmutableList.of(1.7), 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testArrayGroupAsFloatArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[f1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"f1\")",
+                            ColumnType.DOUBLE_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.DOUBLE_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0.0), 4L},
+            new Object[]{ImmutableList.of(0.10000000149011612), 1L},
+            new Object[]{ImmutableList.of(1.0), 1L}
+        )
+    );
+  }
+
+  @Test

Review comment:
       Please include tests for the `ARRAY` constructor with multiple arguments, like:
   
   - two multi-valued columns: `ARRAY[dim3, dim3]`
   - multi-valued plus regular column: `ARRAY[dim3, dim1]`

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableList.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonValue;
+
+import java.util.List;
+
+
+public class ComparableList<T extends Comparable> implements Comparable<ComparableList>

Review comment:
       It'd be good to have unit tests specifically for this class. Especially for hashCode, equals, and compareTo.
   
   Also I suggest implementing toString, since it'll make unit tests & debugging easier.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -143,54 +144,6 @@ public void testSelectNonConstantArrayExpressionFromTable() throws Exception
     );
   }
 
-  @Test
-  public void testSelectNonConstantArrayExpressionFromTableForMultival() throws Exception

Review comment:
       Why did you delete this test?

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;

Review comment:
       What happens if run with v1? If it's an error, better to use `expectedException` here.
   
   Same comment for the other tests too.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -403,7 +402,9 @@ private static Grouping computeGrouping(
       }
 
       final RelDataType dataType = rexNode.getType();
-      final ColumnType outputType = Calcites.getColumnTypeForRelDataType(dataType);
+      final ColumnType outputType;
+      outputType = Calcites.getColumnTypeForRelDataType(dataType);

Review comment:
       Needless change?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       As part of this patch, then, let's check all the MV functions to make sure the return types are what we expect them to be.
   
   As to the array stuff, one of them is documented: ARRAY_AGG. And you could do that in a subquery and group on the result. Some questions about that:
   
   - Do we have tests for the case of grouping on the result of an ARRAY_AGG? If not we should add it.
   - Does the behavior change or does this introduce excess risk due to potential bugs? If so, a feature flag would be a good idea.

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose input type is array. So array function has

Review comment:
       Makes sense to me too. MV_TO_ARRAY would be simpler than ARRAY and have fewer cases to worry about. We can start there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe edited a comment on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe edited a comment on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-997933103


   @dbardbar Good Question. The goal here is to make druid SQL layer similar to other SQL layers. In traditional DB's like POSTGRES, the SQL construct of declaring an array is an 'array'. I am using that construct itself to hook the SQL layer with the native layer. Just finishing up some test cases for that.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-998784656


   >     1. Would this work also for a VirtualColumn which produces a multi-value? Specifically, VirtualColumn of type "mv-filtered".
    Is it possible to share a sample Q here. I am trying to understand the use case here. Do you want to group by on array<array> ?
    https://github.com/apache/druid/pull/12078/files#diff-8bc53aec44924e671b3c6f6f34c6f0c499f873d9000649c47c237444707aea4bR975 . This PR does support virtual col's .
   
   
   >     2. We needed the capability in this PR, but since it was missing we created another string dimension during ingestion, which holds the values of the MV column as a simple string. We then used that extra column in our GROUP BY, to achieve the same functionality as this PR. Do you think that your method would be comparable in performance?
   
   If the original string dimension is of low cardinality, then IMHO this implementation will be slightly more performant with regards to memory usage across the historical's, brokers as we are doing optimizations on the way we lookup.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r781753286



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       Adding a flag in the query Context and take a decision based on that may not be trivial as the code will need to be extensively refactored.
   For eg : checkout `RowSignature#fromRelDataType` https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/table/RowSignatures.java
   I am not sure if adding changes to a few places where the QueryContext is passable and where its not passable, convert to array without honoring the flag, has on the query Engine. Especially in the case where the user has set the flag and half of the places we are honoring the flag and the other half we are not.
   
   Therefore adding a RuntimeProperty in `ExpressionProcessingConfig`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780203919



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       the engine has been waiting on array grouping for this coercion to go away, though there is a missing piece I think where we need to ensure that when grouping on a virtual column we never plan to a topn query in `DruidQuery`.
   
   I think most of the `MV_` prefixed functions have their return type as `VARCHAR` rather than `ARRAY`, so in theory the "documented" multi-value string functions shouldn't actually be affected, and any that will be is a bug that we should fix. The undocumented `ARRAY_` prefixed functions will now behave as intended and validate/operate as true SQL arrays (and so group correctly as arrays). If anyone was using the undocumented `ARRAY_` prefix functions, then I guess that behavior will change, since grouping on those expressions would previously have been grouping as `STRING` due to the implicit unnest.
   
   The only reason I can think of to add a feature flag is if need a way to hide bugs that this change causes, and that _should_ only happen if any of the `MV_` functions have bugs in their SQL output types, or if people were using the undocumented `ARRAY_` functions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r779335429



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/column/ArrayGroupByColumnSelectorStrategy.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae.column;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.BiMap;
+import com.google.common.collect.HashBiMap;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.query.groupby.epinephelinae.Grouper;
+import org.apache.druid.query.ordering.StringComparator;
+import org.apache.druid.query.ordering.StringComparators;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.data.ComparableIntArray;
+import org.apache.druid.segment.data.ComparableStringArray;
+import org.apache.druid.segment.data.IndexedInts;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class ArrayGroupByColumnSelectorStrategy

Review comment:
       Done!!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780778209



##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
##########
@@ -1306,6 +1308,321 @@ public void testMultiValueDimension()
     TestHelper.assertExpectedObjects(expectedResults, results, "multi-value-dim");
   }
 
+  @Test
+  public void testMultiValueDimensionAsArray()
+  {
+    if (config.getDefaultStrategy().equals(GroupByStrategySelector.STRATEGY_V1)) {
+      return;

Review comment:
       Added expected exception




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780521061



##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose input type is array. So array function has

Review comment:
       Makes sense to me too. MV_TO_ARRAY would be simpler than ARRAY and have fewer cases to worry about. We can start there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1009957373


   Ordering on array grouping keys is not fully supported. I was hoping to finish that in a subsequent PR.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-1000730228


   @cryptoe It's my fault to misunderstand your design. Just ignore the comment I left before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r773583393



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/column/ArrayGroupByColumnSelectorStrategy.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae.column;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.BiMap;
+import com.google.common.collect.HashBiMap;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.query.groupby.epinephelinae.Grouper;
+import org.apache.druid.query.ordering.StringComparator;
+import org.apache.druid.query.ordering.StringComparators;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.data.ComparableIntArray;
+import org.apache.druid.segment.data.ComparableStringArray;
+import org.apache.druid.segment.data.IndexedInts;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class ArrayGroupByColumnSelectorStrategy

Review comment:
       this class only handles strings, so its name is perhaps incorrect, or it should be made more generic to handle other types of arrays

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/column/ArrayGroupByColumnSelectorStrategy.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae.column;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.BiMap;
+import com.google.common.collect.HashBiMap;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.query.groupby.epinephelinae.Grouper;
+import org.apache.druid.query.ordering.StringComparator;
+import org.apache.druid.query.ordering.StringComparators;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.data.ComparableIntArray;
+import org.apache.druid.segment.data.ComparableStringArray;
+import org.apache.druid.segment.data.IndexedInts;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class ArrayGroupByColumnSelectorStrategy
+    implements GroupByColumnSelectorStrategy
+{
+  private static final int GROUP_BY_MISSING_VALUE = -1;
+
+
+  // contains string <-> id for each element of the multi value grouping column
+  // for eg : [a,b,c] is the col value. dictionaryToInt will contain { a <-> 1, b <-> 2, c <-> 3}
+  private final BiMap<String, Integer> dictionaryToInt;
+
+  // stores each row as a integer array where the int represents the value in dictionaryToInt
+  // for eg : [a,b,c] would be converted to [1,2,3] and assigned a integer value 1.
+  // [1,2,3] <-> 1
+  private final BiMap<ComparableIntArray, Integer> intListToInt;
+
+  @Override
+  public int getGroupingKeySize()
+  {
+    return Integer.BYTES;
+  }
+
+  public ArrayGroupByColumnSelectorStrategy()
+  {
+    dictionaryToInt = HashBiMap.create();
+    intListToInt = HashBiMap.create();
+  }
+
+  @VisibleForTesting
+  ArrayGroupByColumnSelectorStrategy(
+      BiMap<String, Integer> dictionaryToInt,
+      BiMap<ComparableIntArray, Integer> intArrayToInt
+  )
+  {
+    this.dictionaryToInt = dictionaryToInt;
+    this.intListToInt = intArrayToInt;
+  }
+
+  @Override
+  public void processValueFromGroupingKey(
+      GroupByColumnSelectorPlus selectorPlus,
+      ByteBuffer key,
+      ResultRow resultRow,
+      int keyBufferPosition
+  )
+  {
+    final int id = key.getInt(keyBufferPosition);
+
+    // GROUP_BY_MISSING_VALUE is used to indicate empty rows
+    if (id != GROUP_BY_MISSING_VALUE) {
+      final int[] intRepresentation = intListToInt.inverse()
+                                                  .get(id).getDelegate();
+      final String[] stringRepresentaion = new String[intRepresentation.length];
+      for (int i = 0; i < intRepresentation.length; i++) {
+        stringRepresentaion[i] = dictionaryToInt.inverse().get(intRepresentation[i]);
+      }
+      resultRow.set(selectorPlus.getResultRowPosition(), ComparableStringArray.of(stringRepresentaion));
+    } else {
+      resultRow.set(selectorPlus.getResultRowPosition(), ComparableStringArray.of(NullHandling.defaultStringValues()));

Review comment:
       why wouldn't we use null for null? this seems incorrect to use an empty array, and comparators down the line seem to use nulls first so having nulls seems like it would be ok. am i missing something?

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/ResultRowDeserializer.java
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby;
+
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.databind.DeserializationContext;
+import com.fasterxml.jackson.databind.JsonDeserializer;
+import org.apache.druid.data.input.Row;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.data.ComparableStringArray;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+
+public class ResultRowDeserializer extends JsonDeserializer<ResultRow>
+{
+  final List<ColumnType> types;
+  final GroupByQuery query;
+
+  public ResultRowDeserializer(final List<ColumnType> types, final GroupByQuery query)
+  {
+    this.types = types;
+    this.query = query;
+  }
+
+  public static ResultRowDeserializer fromQuery(
+      final GroupByQuery query
+  )
+  {
+    RowSignature rowSignature = query.getResultRowSignature();
+    final List<ColumnType> types = new ArrayList<>(rowSignature.size());
+
+    for (String name : rowSignature.getColumnNames()) {
+      final ColumnType type = rowSignature.getColumnType(name)
+                                          .orElseThrow(() -> new ISE("No type for column [%s]", name));
+
+      types.add(type);
+    }
+
+    return new ResultRowDeserializer(types, query);
+
+  }
+
+  @Override
+  public ResultRow deserialize(JsonParser jp, DeserializationContext ctxt) throws IOException
+  {
+    // Deserializer that can deserialize either array- or map-based rows.
+    if (jp.isExpectedStartObjectToken()) {
+      final Row row = jp.readValueAs(Row.class);
+      return ResultRow.fromLegacyRow(row, query);
+    } else if (jp.isExpectedStartArrayToken()) {
+      final Object[] retVal = new Object[types.size()];
+
+      for (int i = 0; i < types.size(); i++) {
+        final JsonToken token = jp.nextToken();
+        switch (types.get(i).getType()) {
+          case STRING:
+            if (token == JsonToken.VALUE_NULL) {
+              retVal[i] = null;
+            } else if (token == JsonToken.VALUE_STRING) {
+              retVal[i] = jp.getText();
+            } else {
+              throw ctxt.instantiationException(
+                  ResultRow.class,
+                  StringUtils.format("Unexpected token [%s] when reading string", token)
+              );
+            }
+            break;
+
+          case LONG:
+            retVal[i] = token == JsonToken.VALUE_NULL ? null : jp.getLongValue();
+            break;
+          case DOUBLE:
+            retVal[i] = token == JsonToken.VALUE_NULL ? null : jp.getDoubleValue();
+            break;
+          case FLOAT:
+            retVal[i] = token == JsonToken.VALUE_NULL ? null : jp.getFloatValue();
+            break;
+          case ARRAY:
+            if (types.get(i).equals(ColumnType.STRING_ARRAY)) {
+              final List<String> strings = new ArrayList<>();
+              while (jp.nextToken() != JsonToken.END_ARRAY) {
+                strings.add(jp.getText());
+              }
+              retVal[i] = ComparableStringArray.of(strings.toArray(new String[0]));
+              break;
+            }
+
+          default:
+            throw new ISE("Can't handle type [%s]", types.get(i).asTypeString());

Review comment:
       I'm a bit worried about this method. On the one hand it is nice because it preserves the types exactly, but ... what about complex types? they often serialize as base64 strings, but could register potentially anything with the mapper. And what about other types of arrays, including arrays of arrays? Additionally, type is still allowed to be null if it is unknown afaik.
   
   I'm also worried if there might be any performance implications here, this change should be measured I think if we really want to do it, but I can't help but wonder if something down the line should just be wrapping arrays to perform comparisons on them, lazily only when they need comparisons, instead of trying to do it here, and whether or not deserializing directly into `ComparableStringArray` will trip up other things down the line expecting to run into `List`
   
   

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java
##########
@@ -1357,6 +1373,12 @@ private RowBasedKeySerdeHelper makeSerdeHelper(
     )
     {
       switch (valueType.getType()) {
+        case ARRAY:
+          return new ArrayRowBasedKeySerdeHelper(

Review comment:
       this doesn't look like it handles all arrays, though using the `TypeStrategy` added in #11888 would maybe give the necessary comparators to handle the other types of arrays, but `arrayDictionary` would need to be able to hold any type of array, not just strings.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/builtin/ArrayConstructorOperatorConversion.java
##########
@@ -45,6 +53,22 @@ public DruidExpression toDruidExpression(
       final RexNode rexNode
   )
   {
+    // Check if array needs to be unnested
+    if (plannerContext.getQueryContext()
+                      .getOrDefault(
+                          QueryContexts.ENABLE_UNNESTED_ARRAYS_KEY,
+                          QueryContexts.DEFAULT_ENABLE_UNNESTED_ARRAYS
+                      ).equals(Boolean.FALSE)) {
+      List<RexNode> nodes = ((RexCall) rexNode).getOperands();
+      Preconditions.checkArgument(
+          nodes == null || nodes.size() != 1,
+          "ARRAY[] should have exactly one argument"
+      );
+      if (nodes.get(0).getKind() == SqlKind.LITERAL) {
+        throw new UOE("ARRAY[] support for literals not implemented");
+      }
+      return Expressions.toDruidExpression(plannerContext, rowSignature, nodes.get(0));
+    }

Review comment:
       I don't think this needs a flag or any code change here if we allow arrays in SQL to remain arrays to the native layer (see other comment on consolidating `Calcites.getColumnTypeForRelDataType` and `Calcites.getValueTypeForRelDataTypeFull`)
   
   calling the array constructor on a multi-value column today doesn't work correctly anyway, because it actually attempts to map the array constructor function across each of the values of the multi-value string resulting in an a nested array of single element string arrays, instead of simply converting the multi-value string into a string array. So, changing that behavior doesn't seem problematic since I'm not sure the mapping is very intuitive.
   
   Additionally, i think if this flag did exist, and this flag was set, it would break all other uses of the array constructor function, which _do_ work today.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -386,13 +393,16 @@ public static boolean canPushDownLimit(ColumnSelectorFactory columnSelectorFacto
     @Override
     public GroupByColumnSelectorStrategy makeColumnSelectorStrategy(
         ColumnCapabilities capabilities,
-        ColumnValueSelector selector
+        ColumnValueSelector selector,
+        DimensionSpec dimensionSpec
     )
     {
       switch (capabilities.getType()) {
         case STRING:

Review comment:
       so, this PR doesn't _actually_ handle grouping on array typed columns yet, such as produced by virtual columns today with array functions, which is sort of a bummer (and part of why I'm pushing for having a more generic implementation of the grouping strategy in the first pass in a different comment). If you do plan to not do this yet, the PR title and description should be modified to indicate that it allows grouping on multi-value string 'STRING' typed columns as arrays or similar

##########
File path: processing/src/main/java/org/apache/druid/query/dimension/ColumnSelectorStrategyFactory.java
##########
@@ -24,5 +24,9 @@
 
 public interface ColumnSelectorStrategyFactory<ColumnSelectorStrategyClass extends ColumnSelectorStrategy>
 {
-  ColumnSelectorStrategyClass makeColumnSelectorStrategy(ColumnCapabilities capabilities, ColumnValueSelector selector);
+  ColumnSelectorStrategyClass makeColumnSelectorStrategy(
+      ColumnCapabilities capabilities,
+      ColumnValueSelector selector,
+      DimensionSpec dimensionSpec

Review comment:
       it seems sort of strange to pass a `DimensionSpec` into this method, maybe instead the caller should modify the `ColumnCapabilities` instead of pushing this in?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -403,7 +404,17 @@ private static Grouping computeGrouping(
       }
 
       final RelDataType dataType = rexNode.getType();
-      final ColumnType outputType = Calcites.getColumnTypeForRelDataType(dataType);
+      final ColumnType outputType;
+      if (plannerContext.getQueryContext()
+                        .getOrDefault(
+                            QueryContexts.ENABLE_UNNESTED_ARRAYS_KEY,
+                            QueryContexts.DEFAULT_ENABLE_UNNESTED_ARRAYS
+                        ).equals(Boolean.FALSE)) {
+        outputType = Calcites.getValueTypeForRelDataTypeFull(dataType);
+      } else {
+        outputType = Calcites.getColumnTypeForRelDataType(dataType);
+      }

Review comment:
       if grouping on arrays supported _all_ array types, the correct thing to do instead of a flag I think would to just be consolidate `getColumnTypeForRelDataType` and `getValueTypeForRelDataTypeFull` to remove the string coercion and let arrays stay as arrays. They are separate basically because we allow using array functions on string typed columns, but grouping on them requires they remain as string typed in the native layer, and if that were no longer true we wouldn't have to do this coercion any longer.

##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -376,11 +373,37 @@ public static Float convertObjectToFloat(@Nullable Object valObj, boolean report
         return convertObjectToDouble(obj, reportParseExceptions);
       case STRING:
         return convertObjectToString(obj);
+      case ARRAY:
+        return convertToArray(obj);

Review comment:
       Does this change mean that anything that deals with arrays that go through here would need to know how to deal with `ComparableStringArray` instead of the actual arrays? Additionally, this method only seems to handle string arrays, it should either check that the element type is string and fail otherwise, or it should be a more generic type that can handle comparison of any type of array

##########
File path: processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChestTest.java
##########
@@ -573,7 +572,7 @@ public void testResultSerde() throws Exception
         .setQuerySegmentSpec(QueryRunnerTestHelper.FIRST_TO_THIRD)
         .setDimensions(Collections.singletonList(DefaultDimensionSpec.of("test")))
         .setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.ROWS_COUNT))
-        .setPostAggregatorSpecs(Collections.singletonList(new ConstantPostAggregator("post", 10)))
+        .setPostAggregatorSpecs(Collections.singletonList(new ConstantPostAggregator("post", 10.0)))

Review comment:
       i assume these changes are from the serializer change?

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/column/ArrayGroupByColumnSelectorStrategy.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae.column;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.BiMap;
+import com.google.common.collect.HashBiMap;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.query.groupby.epinephelinae.Grouper;
+import org.apache.druid.query.ordering.StringComparator;
+import org.apache.druid.query.ordering.StringComparators;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.data.ComparableIntArray;
+import org.apache.druid.segment.data.ComparableStringArray;
+import org.apache.druid.segment.data.IndexedInts;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class ArrayGroupByColumnSelectorStrategy
+    implements GroupByColumnSelectorStrategy
+{
+  private static final int GROUP_BY_MISSING_VALUE = -1;
+
+
+  // contains string <-> id for each element of the multi value grouping column
+  // for eg : [a,b,c] is the col value. dictionaryToInt will contain { a <-> 1, b <-> 2, c <-> 3}
+  private final BiMap<String, Integer> dictionaryToInt;

Review comment:
       i wonder if maybe this a premature optimization? I think we should consider making a more generic implementation first, that just stores a lookup of int to full arrays, instead of arrays of ints, and preferably one that can handle _all_ types of arrays. With that in hand, it would be a lot easier to measure various other optimizations against that baseline.
   
   Not to say that this isn't the best approach to handling arrays of strings, it might very well be. But, I think there is also potentially a lot of overhead in this approach, since you'll need to be doing extra function calls, and saves the most memory footprint when there is low value cardinality, and to the largest extent only for arrays of variably sized types like strings. If all string values are unique then it is potentially worse than just storing the string arrays directly.
   
   we also don't actually seem to be taking advantage of the what I would think would be the real situation to use this approach, which is in the cases where an underlying multi-value string column is being used that is already dictionary encoded, we could use it's dictionary ids directly instead of making a new lookup here. In that special case, where grouping on a multi-value string column from a segment as an array, I can maybe see it being advantageous to just convert the `IndexedInts` to an array of ints directly, so you can defer ever looking up the string value for the dictionary id until it is needed. But, that _only_ works when the underlying type is dictionary encoded, which isn't always the case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780210083



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -143,54 +144,6 @@ public void testSelectNonConstantArrayExpressionFromTable() throws Exception
     );
   }
 
-  @Test
-  public void testSelectNonConstantArrayExpressionFromTableForMultival() throws Exception

Review comment:
       heh, i wrote this test originally to document moderately strange current behavior. This PR changes the behavior of `array` constructor function to no longer "map" multi-value string inputs and instead treat them as arrays, so it should probably be repurposed to test whatever the new behavior should be probably
   
   The map behavior was not very intuitive, since it would make a nested array where the elements are single element arrays from the multi-value string, so idk that it needs preserved.
   
   We might want to consider making use of a multi-value string inside of the array constructor an error, since i'm not sure there is an intuitive behavior because it is ambiguous whether the user is acknowledging the column as a multi-value, or doesn't know and thinks it single valued, which most of the other functions don't have this ambiguity, though it probably doesn't need to be changed in this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780232347



##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose input type is array. So array function has

Review comment:
       i have a concern with this change, best illustrated by the implication that now an expression like this
   
   ```
   array(array('x','y','z'))
   ```
   will now produce `['x','y','z']` instead of `[['x','y','z']]` as one would expect, or would happen if the expression were this instead:
   
   ```
   array(array('x','y','z'),array('a','b','c'))
   ```
   which will still produce `[['x','y','z'],['a','b','c']]`
   
   Instead of hijacking this function to accommodate multi-value string behavior, I think we should instead add a `MV_TO_ARRAY` function in SQL that only accepts a string, and then in the native expression layer either just use `cast(x, 'ARRAY<STRING>')` or also add a `mv_to_array` function that basically does the same thing as cast. This way we are explicit on when we intend to treat a multi-value column directly as an array.
   
   I think the changes to prevent the array constructor from being mapped for multi-value strings are fine to leave as is though, since the mapping is sort of strange




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780232347



##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose input type is array. So array function has

Review comment:
       i have a concern with this change, best illustrated by the implication that now an expression like this
   
   ```
   array(array('x','y','z'))
   ```
   will now produce `['x','y','z']` instead of `[['x','y','z']]` as one would expect, or would happen if the expression were this instead:
   
   ```
   array(array('x','y','z'),array('a','b','c'))
   ```
   which will still produce `[['x','y','z'],['a','b','c']]`
   
   Instead of hijacking this function to accommodate multi-value string behavior, I think we should instead add a `MV_TO_ARRAY` function in SQL that only accepts a string, and then in the native expression layer either just use `cast(x, 'ARRAY<STRING>')` or also add a `mv_to_array` function that basically does the same thing as cast. This way we are explicit on when we intend to treat a multi-value column directly as an array 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780362210



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -403,6 +412,20 @@ public GroupByColumnSelectorStrategy makeColumnSelectorStrategy(
           return makeNullableNumericStrategy(new FloatGroupByColumnSelectorStrategy());
         case DOUBLE:
           return makeNullableNumericStrategy(new DoubleGroupByColumnSelectorStrategy());
+        case ARRAY:
+          switch (capabilities.getElementType().getType()) {
+            case LONG:
+              return new ListLongGroupByColumnSelectorStrategy();

Review comment:
       The reason was that ListLong internally working on a list as the DS and hence the name but seems weird here. 
   Changed everything to ArrayX as I think that's a better name signifying whats the selector strategy for.

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableIntArray.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+
+import java.util.Arrays;
+
+public class ComparableIntArray implements Comparable<ComparableIntArray>

Review comment:
       acked

##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -376,11 +374,58 @@ public static Float convertObjectToFloat(@Nullable Object valObj, boolean report
         return convertObjectToDouble(obj, reportParseExceptions);
       case STRING:
         return convertObjectToString(obj);
+      case ARRAY:
+        switch (type.getElementType().getType()) {
+          case STRING:
+            return convertToComparableStringArray(obj);
+          default:
+            return convertToList(obj);
+        }
+
       default:
         throw new IAE("Type[%s] is not supported for dimensions!", type);
     }
   }
 
+  private static ComparableList convertToList(Object obj)
+  {
+    if (obj == null) {
+      return null;
+    }
+    if (obj instanceof List) {
+      return new ComparableList((List) obj);
+    }
+    if (obj instanceof ComparableList) {
+      return (ComparableList) obj;
+    }
+    throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
+  }
+
+
+  @Nullable
+  public static ComparableStringArray convertToComparableStringArray(Object obj)
+  {
+    if (obj == null) {
+      return null;
+    }
+    if (obj instanceof ComparableStringArray) {
+      return (ComparableStringArray) obj;
+    }
+    if (obj instanceof String[]) {
+      return ComparableStringArray.of((String[]) obj);
+    }
+    // Jackson converts the serialized array into a string. Converting it back to a string array

Review comment:
       Thanks for the catch. 

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java
##########
@@ -1426,6 +1464,186 @@ private RowBasedKeySerdeHelper makeNumericSerdeHelper(
       }
     }
 
+    private class ListRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper
+    {
+      final int keyBufferPosition;
+      final BufferComparator bufferComparator;
+
+      public ListRowBasedKeySerdeHelper(
+          int keyBufferPosition,
+          @Nullable StringComparator stringComparator
+      )
+      {
+        this.keyBufferPosition = keyBufferPosition;
+        final StringComparator comparator = stringComparator == null
+                                            ? StringComparators.LEXICOGRAPHIC
+                                            : stringComparator;
+
+        this.bufferComparator = (lhsBuffer, rhsBuffer, lhsPosition, rhsPosition) -> {
+          final ComparableList lhsComparableArray = listDictionary.get(lhsBuffer.getInt(lhsPosition
+                                                                                        + keyBufferPosition));
+          final ComparableList rhsComparableArray = listDictionary.get(rhsBuffer.getInt(rhsPosition
+                                                                                        + keyBufferPosition));
+          if (lhsComparableArray == null && rhsComparableArray == null) {
+            return 0;
+          } else if (lhsComparableArray == null) {
+            return -1;
+          } else if (rhsComparableArray == null) {
+            return 1;
+          }
+
+          List lhs = lhsComparableArray.getDelegate();
+          List rhs = rhsComparableArray.getDelegate();
+
+          int minLength = Math.min(lhs.size(), rhs.size());
+
+          //noinspection ArrayEquality
+          if (lhs == rhs) {
+            return 0;
+          }
+          for (int i = 0; i < minLength; i++) {
+            final int cmp = comparator.compare(String.valueOf(lhs.get(i)), String.valueOf(rhs.get(i)));
+            if (cmp == 0) {
+              continue;
+            }
+            return cmp;
+          }
+          if (lhs.size() == rhs.size()) {
+            return 0;
+          } else if (lhs.size() < rhs.size()) {
+            return -1;
+          }
+          return 1;
+        };
+      }
+
+      @Override
+      public int getKeyBufferValueSize()
+      {
+        return Integer.BYTES;
+      }
+
+      @Override
+      public boolean putToKeyBuffer(RowBasedKey key, int idx)
+      {
+        final ComparableList comparableList = (ComparableList) key.getKey()[idx];
+        int id = reverseDictionary.getInt(comparableList);
+        if (id == UNKNOWN_DICTIONARY_ID) {
+          id = listDictionary.size();
+          reverseListDictionary.put(comparableList, id);
+          listDictionary.add(comparableList);
+        }
+        keyBuffer.putInt(id);
+        return true;
+      }
+
+      @Override
+      public void getFromByteBuffer(ByteBuffer buffer, int initialOffset, int dimValIdx, Comparable[] dimValues)
+      {
+        dimValues[dimValIdx] = listDictionary.get(buffer.getInt(initialOffset + keyBufferPosition));
+      }
+
+      @Override
+      public BufferComparator getBufferComparator()
+      {
+        return bufferComparator;
+      }
+    }
+    private class ArrayRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper
+    {
+      final int keyBufferPosition;
+      final BufferComparator bufferComparator;
+
+      ArrayRowBasedKeySerdeHelper(
+          int keyBufferPosition,
+          @Nullable StringComparator stringComparator
+      )
+      {
+        //TODO: karan : add pushLimitDown ?

Review comment:
       Acked

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableList.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonValue;
+
+import java.util.List;
+
+
+public class ComparableList<T extends Comparable> implements Comparable<ComparableList>

Review comment:
       Done

##########
File path: processing/src/main/java/org/apache/druid/segment/data/ComparableStringArray.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+
+import java.util.Arrays;
+
+public class ComparableStringArray implements Comparable<ComparableStringArray>

Review comment:
       Done

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose input type is array. So array function has

Review comment:
       The original intention was to keep it close to SQL but i realized SQL does not have multi value col's :). So a new function may be necessary . 
   Cool I am adding a new `MV_TO_ARRAY` function to both the native and the SQL layer. 
   Thanks for the tip

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -143,54 +144,6 @@ public void testSelectNonConstantArrayExpressionFromTable() throws Exception
     );
   }
 
-  @Test
-  public void testSelectNonConstantArrayExpressionFromTableForMultival() throws Exception

Review comment:
       I have added this test back as the changes made to the PR as part of the array constructor are being reverted in favour of the mv_to_array function

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -923,6 +876,277 @@ public void testArrayOffset() throws Exception
     );
   }
 
+
+  @Test
+  public void testArrayGroupAsArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[dim3], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"dim3\")",
+                            ColumnType.STRING_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.STRING_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{
+                new ArrayList<String>()
+                {
+                  {
+                    add(null);
+                  }
+                }, 3L
+            },
+            new Object[]{ImmutableList.of("a", "b"), 1L},
+            new Object[]{ImmutableList.of("b", "c"), 1L},
+            new Object[]{ImmutableList.of("d"), 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testArrayGroupAsLongArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[l1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"l1\")",
+                            ColumnType.LONG_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.LONG_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0L), 4L},
+            new Object[]{ImmutableList.of(325323L), 1L},
+            new Object[]{ImmutableList.of(7L), 1L}
+        )
+    );
+  }
+
+
+  @Test
+  public void testArrayGroupAsDoubleArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[d1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"d1\")",
+                            ColumnType.DOUBLE_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.DOUBLE_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0.0), 4L},
+            new Object[]{ImmutableList.of(1.0), 1L},
+            new Object[]{ImmutableList.of(1.7), 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testArrayGroupAsFloatArray() throws Exception
+  {
+    // Cannot vectorize as we donot have support in native query subsytem for grouping on arrays as keys
+    cannotVectorize();
+    testQuery(
+        "SELECT ARRAY[f1], SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC",
+        QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE3)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(expressionVirtualColumn(
+                            "v0",
+                            "array(\"f1\")",
+                            ColumnType.DOUBLE_ARRAY
+                        ))
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "_d0", ColumnType.DOUBLE_ARRAY)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setLimitSpec(new DefaultLimitSpec(
+                            ImmutableList.of(new OrderByColumnSpec(
+                                "a0",
+                                OrderByColumnSpec.Direction.DESCENDING,
+                                StringComparators.NUMERIC
+                            )),
+                            Integer.MAX_VALUE
+                        ))
+                        .setContext(QUERY_CONTEXT_NO_STRINGIFY_ARRAY)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{ImmutableList.of(0.0), 4L},
+            new Object[]{ImmutableList.of(0.10000000149011612), 1L},
+            new Object[]{ImmutableList.of(1.0), 1L}
+        )
+    );
+  }
+
+  @Test

Review comment:
       As we no longer are changing the array constructor, will skip adding these UT's.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r782586594



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link #getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to {@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       > Were you looking for [this](https://github.com/apache/druid/pull/12078/files#diff-8bc53aec44924e671b3c6f6f34c6f0c499f873d9000649c47c237444707aea4bL1640) test case ?
   
   Yes! thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] dbardbar commented on pull request #12078: Grouping on arrays as arrays

Posted by GitBox <gi...@apache.org>.
dbardbar commented on pull request #12078:
URL: https://github.com/apache/druid/pull/12078#issuecomment-998524034


   @cryptoe - two questions
   1) Would this work also for a VirtualColumn which produces a multi-value? Specifically, VirtualColumn of type "mv-filtered".
   2) We needed the capability in this PR, but since it was missing we created another string dimension during ingestion, which holds the values of the MV column as a simple string. We then used that extra column in our GROUP BY, to achieve the same functionality as this PR. Do you think that your method would be comparable in performance?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org