You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "paul-rogers (via GitHub)" <gi...@apache.org> on 2023/03/17 18:26:38 UTC

[GitHub] [druid] paul-rogers commented on a diff in pull request #13887: Tuple sketch SQL support

paul-rogers commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1140544380


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.

Review Comment:
   Can we include a link to a description of tuple sketches, and why I'd want one, for newbies that don't already understand this concept?



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+

Review Comment:
   As above: perhaps a description of the purpose of these functions for those not already in the know?



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   This is quite a long name for a poor user to remember and type.
   
   First, might there be an `ARRAY_OF_LONGS` or `ARRAY_OF_STRINGS` sketch in the future? If so, common SQL practice is to use function overrides: `ARRAY_SKETCH()`, say, which works out the "OF_DOUBLES", etc. part based on the type of the argument. See how `LATEST()` handles this idea.
   
   Second, suppose we'll only ever do an array of doubles. Then, a common approach is to introduce an acronym to shorten names: `AODS`. Or, to make it shorter: "double array sketch" or "DAS".
   
   Then, the suffix only has to be enough to differentiate each function: `DAS_NOT`, `DAS_UNION`, `DAS_SUM`, etc.



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java:
##########
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.inject.Injector;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.guice.DruidInjectorBuilder;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchOperations;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator;
+import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
+import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.groupby.GroupByQuery;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import org.apache.druid.segment.join.JoinableFactoryWrapper;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.sql.calcite.BaseCalciteQueryTest;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.sql.calcite.util.TestDataBuilder;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class ArrayOfDoublesSketchSqlAggregatorTest extends BaseCalciteQueryTest
+{
+
+  private static final String DATA_SOURCE = "foo";
+
+  // built from ArrayOfDoublesUpdatableSketch.update("FEDCAB", new double[] {0.0}).compact()
+  private static final String COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION = "AQEJAwgBzJP/////////fwEAAAAAAAAAjFnadZuMrkgAAAAAAAAAAA==";
+
+  private static final List<InputRow> ROWS = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 5)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 12)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 3)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 8)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-03")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 2)
+                  .build()
+  ).stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
+
+  @Override
+  public void configureGuice(DruidInjectorBuilder builder)
+  {
+    super.configureGuice(builder);
+    builder.addModule(new ArrayOfDoublesSketchModule());
+  }
+
+  @Override
+  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
+      final QueryRunnerFactoryConglomerate conglomerate,
+      final JoinableFactoryWrapper joinableFactory,
+      final Injector injector
+  ) throws IOException
+  {
+    ArrayOfDoublesSketchModule.registerSerde();
+
+    final QueryableIndex index = IndexBuilder.create()
+                                             .tmpDir(temporaryFolder.newFolder())
+                                             .segmentWriteOutMediumFactory(
+                                                 OffHeapMemorySegmentWriteOutMediumFactory.instance()
+                                             )
+                                             .schema(
+                                                 new IncrementalIndexSchema.Builder()
+                                                     .withMetrics(
+                                                         new CountAggregatorFactory("cnt"),
+                                                         new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "tuplesketch_dim2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             1
+                                                         ),
+                                                         new LongSumAggregatorFactory("m1", "m1")
+                                                     )
+                                                     .withRollup(false)
+                                                     .build()
+                                             )
+                                             .rows(ROWS)
+                                             .buildMMappedIndex();
+
+    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        DataSegment.builder()
+                   .dataSource(DATA_SOURCE)
+                   .interval(index.getDataInterval())
+                   .version("1")
+                   .shardSpec(new LinearShardSpec(0))
+                   .size(0)
+                   .build(),
+        index
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimate()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  dim1,\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1, 256))\n"
+                   + "FROM druid.foo\n"
+                   + "GROUP BY dim1";
+
+    final List<Object[]> expectedResults;
+
+    expectedResults = ImmutableList.of(
+        new Object[]{
+            "CA",
+            2L,
+            "[8.0]",
+            "[8.0]",
+            "[8.0]"
+        },
+        new Object[]{
+            "US",
+            3L,
+            "[22.0]",
+            "[22.0]",
+            "[22.0]"
+        }
+    );
+
+    testQuery(
+        sql,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                  .setDataSource(CalciteTests.DATASOURCE1)
+                  .setInterval(querySegmentSpec(Filtration.eternity()))
+                  .setGranularity(Granularities.ALL)
+                  .setDimensions(new DefaultDimensionSpec("dim1", "d0", ColumnType.STRING))
+                  .setAggregatorSpecs(
+                      aggregators(
+                          new LongSumAggregatorFactory("a0", "cnt"),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a1",
+                                                             "tuplesketch_dim2",
+                                                             null,
+                                                             null,
+                                                             null
+                          ),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             null
+                          ),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a3",
+                                                             "dim2",
+                                                             256,
+                                                             ImmutableList.of("m1"),
+                                                             null
+                          )
+                      )
+                  )
+                  .setPostAggregatorSpecs(
+                      ImmutableList.of(
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p1",
+                           new FieldAccessPostAggregator("p0", "a1")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p3",
+                           new FieldAccessPostAggregator("p2", "a2")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p5",
+                           new FieldAccessPostAggregator("p4", "a3")
+                         )
+                      )
+                  )
+                  .setContext(QUERY_CONTEXT_DEFAULT)
+                  .build()
+        ),
+        expectedResults
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimateIntersect()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)) AS all_sum_estimates,\n"
+                   + StringUtils.replace(
+                      "ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH_INTERSECT(COMPLEX_DECODE_BASE64('arrayOfDoublesSketch', '%s'), ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2), 128)) AS intersect_sum_estimates\n",
+                      "%s",
+                      COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION
+                   )
+                   + "FROM druid.foo";
+
+    final List<Object[]> expectedResults;
+
+    expectedResults = ImmutableList.of(
+        new Object[]{
+            5L,
+            "[30.0]",
+            "[8.0]"
+        }
+    );
+
+    final String expectedBase64Constant = "'"
+                                 + StringUtils.replace(COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION, "=", "\\u003D")
+                                 + "'";
+
+    testQuery(
+        sql,
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE1)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .granularity(Granularities.ALL)
+                  .aggregators(
+                      ImmutableList.of(
+                          new LongSumAggregatorFactory("a0", "cnt"),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a1",
+                                                             "tuplesketch_dim2",
+                                                             null,
+                                                             null,
+                                                             null
+                          )
+                      )
+                  )
+                  .postAggregators(
+                      ImmutableList.of(
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p1",
+                           new FieldAccessPostAggregator("p0", "a1")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p5",
+                           new ArrayOfDoublesSketchSetOpPostAggregator(
+                             "p4",
+                             "INTERSECT",
+                             128,
+                             null,
+                             ImmutableList.of(
+                               new ExpressionPostAggregator(
+                                 "p2",
+                                 "complex_decode_base64('arrayOfDoublesSketch'," + expectedBase64Constant + ")",
+                                 null,
+                                 queryFramework().macroTable()
+                               ),
+                               new FieldAccessPostAggregator("p3", "a1")
+                             )
+                           )
+                         )
+                      )
+                  )
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .build()
+        ),
+        expectedResults
+    );
+  }
+
+  @Test
+  public void testNullInputs()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"

Review Comment:
   None of these test queries use aggregations (`GROUP BY` clause). Would these functions work in that case? Should we include tests to demonstrate how they'd work?



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   The set of functions here seem to mimic the new scheme that various Druid principals have outline: functions to create the intermediate types, and more functions to pull final values out of those intermediate types.
   
   That approach makes sense for a native query, where the query spells out the details of aggregation.
   
   For SQL, however, this pattern is not a good fit. SQL provides a single function that goes from inputs to finalized types. For example, for `LATEST()` at the native layer, we may eventually want:
   
   * `LATEST_LONG_SKETCH(string, long) -> COMPLEX<PAIR<string, long>>`
   * `LATEST_UNION(COMPLEX<PAIR<string, long>> COMPLEX<PAIR<string, long>>) -> COMPLEX<PAIR<string, long>>`
   * `LATEST_LEFT( COMPLEX<PAIR<string, long>>) -> string`
   * `LATEST_RIGHT(COMPLEX<PAIR<string, long>> -> long`
   
   However, SQL doesn't provide the semantics to use such functions: one cannot get "inside" of a `GROUP BY` to create the intermediate values, combine them, and then go "outside" of the `GROUP BY` to produce the finalized value.
   
   Instead, in SQL, there is one function, `LATEST(.)` which generates the internal forms when creating the native query.
   
   All of this is a long-winded way of asking, for SQL, do we just need an `APPROX_SUM(ARRAY<DOUBLE>)` function? The rest is worked out by the planner to handle doing approx sums for arrays of doubles?



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java:
##########
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.inject.Injector;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.guice.DruidInjectorBuilder;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchOperations;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator;
+import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
+import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.groupby.GroupByQuery;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import org.apache.druid.segment.join.JoinableFactoryWrapper;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.sql.calcite.BaseCalciteQueryTest;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.sql.calcite.util.TestDataBuilder;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class ArrayOfDoublesSketchSqlAggregatorTest extends BaseCalciteQueryTest
+{
+
+  private static final String DATA_SOURCE = "foo";
+
+  // built from ArrayOfDoublesUpdatableSketch.update("FEDCAB", new double[] {0.0}).compact()
+  private static final String COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION = "AQEJAwgBzJP/////////fwEAAAAAAAAAjFnadZuMrkgAAAAAAAAAAA==";
+
+  private static final List<InputRow> ROWS = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 5)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 12)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 3)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 8)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-03")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 2)
+                  .build()
+  ).stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
+
+  @Override
+  public void configureGuice(DruidInjectorBuilder builder)
+  {
+    super.configureGuice(builder);
+    builder.addModule(new ArrayOfDoublesSketchModule());
+  }
+
+  @Override
+  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
+      final QueryRunnerFactoryConglomerate conglomerate,
+      final JoinableFactoryWrapper joinableFactory,
+      final Injector injector
+  ) throws IOException
+  {
+    ArrayOfDoublesSketchModule.registerSerde();
+
+    final QueryableIndex index = IndexBuilder.create()
+                                             .tmpDir(temporaryFolder.newFolder())
+                                             .segmentWriteOutMediumFactory(
+                                                 OffHeapMemorySegmentWriteOutMediumFactory.instance()
+                                             )
+                                             .schema(
+                                                 new IncrementalIndexSchema.Builder()
+                                                     .withMetrics(
+                                                         new CountAggregatorFactory("cnt"),
+                                                         new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "tuplesketch_dim2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             1
+                                                         ),
+                                                         new LongSumAggregatorFactory("m1", "m1")
+                                                     )
+                                                     .withRollup(false)
+                                                     .build()
+                                             )
+                                             .rows(ROWS)
+                                             .buildMMappedIndex();
+
+    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        DataSegment.builder()
+                   .dataSource(DATA_SOURCE)
+                   .interval(index.getDataInterval())
+                   .version("1")
+                   .shardSpec(new LinearShardSpec(0))
+                   .size(0)
+                   .build(),
+        index
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimate()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  dim1,\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1, 256))\n"

Review Comment:
   From a usability perspective, this is somewhat un-SQL. Having to compose two functions, with very long names, to get one result is unusual. Since SQL is often hand-written, SQL goes for convenience rather than breaking things down into their component parts. The above approach would be fine in Java, but is awkward in SQL.



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT

Review Comment:
   A description of each function would be useful. I have to confess, I'm struggling to understand how one takes the logical `NOT` of a double, an array of doubles, or a sum of an array of doubles.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.util.Optionality;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );
+
+    if (potentialNominalEntriesArg.isA(SqlKind.LITERAL) &&
+        RexLiteral.value(potentialNominalEntriesArg) instanceof Number) {
+
+      nominalEntries = ((Number) RexLiteral.value(potentialNominalEntriesArg)).intValue();
+      metricExpressionEndIndex = lastArgIndex - 1;
+    } else {
+      nominalEntries = Util.DEFAULT_NOMINAL_ENTRIES;
+      metricExpressionEndIndex = lastArgIndex;
+    }
+
+    final List<String> fieldNames = new ArrayList<>();
+    for (int i = 0; i <= metricExpressionEndIndex; i++) {
+      final String fieldName;
+
+      final RexNode columnRexNode = Expressions.fromFieldAccess(
+          rowSignature,
+          project,
+          argList.get(i)
+      );

Review Comment:
   Druid likes to merge master rather than rebase. A rebase looses folks comment status, which folks kind of dislike.
   
   If your code conflicts with newer code, then, yes, please do merge master and resolve any conflicts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org