You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "frankgrimes97 (via GitHub)" <gi...@apache.org> on 2023/03/06 21:35:13 UTC

[GitHub] [druid] frankgrimes97 opened a new pull request, #13887: Tuple sketch SQL support

frankgrimes97 opened a new pull request, #13887:
URL: https://github.com/apache/druid/pull/13887

   This PR is a follow-up to https://github.com/apache/druid/pull/13819 so that the Tuple sketch functionality can be used in SQL for both ingestion using Multi-Stage Queries (MSQ) and also for analytic queries against Tuple sketch columns.
   
   #### Release note
   Add SQL functions for creating and operating on Tuple sketches
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] a release note entry in the PR description.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142458669


##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java:
##########
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.inject.Injector;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.guice.DruidInjectorBuilder;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchOperations;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator;
+import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
+import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.groupby.GroupByQuery;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import org.apache.druid.segment.join.JoinableFactoryWrapper;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.sql.calcite.BaseCalciteQueryTest;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.sql.calcite.util.TestDataBuilder;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class ArrayOfDoublesSketchSqlAggregatorTest extends BaseCalciteQueryTest
+{
+
+  private static final String DATA_SOURCE = "foo";
+
+  // built from ArrayOfDoublesUpdatableSketch.update("FEDCAB", new double[] {0.0}).compact()
+  private static final String COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION = "AQEJAwgBzJP/////////fwEAAAAAAAAAjFnadZuMrkgAAAAAAAAAAA==";
+
+  private static final List<InputRow> ROWS = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 5)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 12)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 3)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 8)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-03")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 2)
+                  .build()
+  ).stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
+
+  @Override
+  public void configureGuice(DruidInjectorBuilder builder)
+  {
+    super.configureGuice(builder);
+    builder.addModule(new ArrayOfDoublesSketchModule());
+  }
+
+  @Override
+  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
+      final QueryRunnerFactoryConglomerate conglomerate,
+      final JoinableFactoryWrapper joinableFactory,
+      final Injector injector
+  ) throws IOException
+  {
+    ArrayOfDoublesSketchModule.registerSerde();
+
+    final QueryableIndex index = IndexBuilder.create()
+                                             .tmpDir(temporaryFolder.newFolder())
+                                             .segmentWriteOutMediumFactory(
+                                                 OffHeapMemorySegmentWriteOutMediumFactory.instance()
+                                             )
+                                             .schema(
+                                                 new IncrementalIndexSchema.Builder()
+                                                     .withMetrics(
+                                                         new CountAggregatorFactory("cnt"),
+                                                         new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "tuplesketch_dim2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             1
+                                                         ),
+                                                         new LongSumAggregatorFactory("m1", "m1")
+                                                     )
+                                                     .withRollup(false)
+                                                     .build()
+                                             )
+                                             .rows(ROWS)
+                                             .buildMMappedIndex();
+
+    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        DataSegment.builder()
+                   .dataSource(DATA_SOURCE)
+                   .interval(index.getDataInterval())
+                   .version("1")
+                   .shardSpec(new LinearShardSpec(0))
+                   .size(0)
+                   .build(),
+        index
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimate()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  dim1,\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1, 256))\n"
+                   + "FROM druid.foo\n"
+                   + "GROUP BY dim1";
+
+    final List<Object[]> expectedResults;
+
+    expectedResults = ImmutableList.of(
+        new Object[]{
+            "CA",
+            2L,
+            "[8.0]",
+            "[8.0]",
+            "[8.0]"
+        },
+        new Object[]{
+            "US",
+            3L,
+            "[22.0]",
+            "[22.0]",
+            "[22.0]"
+        }
+    );
+
+    testQuery(
+        sql,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                  .setDataSource(CalciteTests.DATASOURCE1)
+                  .setInterval(querySegmentSpec(Filtration.eternity()))
+                  .setGranularity(Granularities.ALL)
+                  .setDimensions(new DefaultDimensionSpec("dim1", "d0", ColumnType.STRING))
+                  .setAggregatorSpecs(
+                      aggregators(
+                          new LongSumAggregatorFactory("a0", "cnt"),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a1",
+                                                             "tuplesketch_dim2",
+                                                             null,
+                                                             null,
+                                                             null
+                          ),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             null
+                          ),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a3",
+                                                             "dim2",
+                                                             256,
+                                                             ImmutableList.of("m1"),
+                                                             null
+                          )
+                      )
+                  )
+                  .setPostAggregatorSpecs(
+                      ImmutableList.of(
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p1",
+                           new FieldAccessPostAggregator("p0", "a1")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p3",
+                           new FieldAccessPostAggregator("p2", "a2")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p5",
+                           new FieldAccessPostAggregator("p4", "a3")
+                         )
+                      )
+                  )
+                  .setContext(QUERY_CONTEXT_DEFAULT)
+                  .build()
+        ),
+        expectedResults
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimateIntersect()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)) AS all_sum_estimates,\n"
+                   + StringUtils.replace(
+                      "ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH_INTERSECT(COMPLEX_DECODE_BASE64('arrayOfDoublesSketch', '%s'), ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2), 128)) AS intersect_sum_estimates\n",
+                      "%s",
+                      COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION
+                   )
+                   + "FROM druid.foo";
+
+    final List<Object[]> expectedResults;
+
+    expectedResults = ImmutableList.of(
+        new Object[]{
+            5L,
+            "[30.0]",
+            "[8.0]"
+        }
+    );
+
+    final String expectedBase64Constant = "'"
+                                 + StringUtils.replace(COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION, "=", "\\u003D")
+                                 + "'";
+
+    testQuery(
+        sql,
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE1)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .granularity(Granularities.ALL)
+                  .aggregators(
+                      ImmutableList.of(
+                          new LongSumAggregatorFactory("a0", "cnt"),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a1",
+                                                             "tuplesketch_dim2",
+                                                             null,
+                                                             null,
+                                                             null
+                          )
+                      )
+                  )
+                  .postAggregators(
+                      ImmutableList.of(
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p1",
+                           new FieldAccessPostAggregator("p0", "a1")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p5",
+                           new ArrayOfDoublesSketchSetOpPostAggregator(
+                             "p4",
+                             "INTERSECT",
+                             128,
+                             null,
+                             ImmutableList.of(
+                               new ExpressionPostAggregator(
+                                 "p2",
+                                 "complex_decode_base64('arrayOfDoublesSketch'," + expectedBase64Constant + ")",
+                                 null,
+                                 queryFramework().macroTable()
+                               ),
+                               new FieldAccessPostAggregator("p3", "a1")
+                             )
+                           )
+                         )
+                      )
+                  )
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .build()
+        ),
+        expectedResults
+    );
+  }
+
+  @Test
+  public void testNullInputs()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"

Review Comment:
   The `testMetricsSumEstimate()` test case does use a `GROUP BY` clause.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142530993


##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java:
##########
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.inject.Injector;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.guice.DruidInjectorBuilder;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchOperations;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator;
+import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
+import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.groupby.GroupByQuery;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import org.apache.druid.segment.join.JoinableFactoryWrapper;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.sql.calcite.BaseCalciteQueryTest;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.sql.calcite.util.TestDataBuilder;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class ArrayOfDoublesSketchSqlAggregatorTest extends BaseCalciteQueryTest
+{
+
+  private static final String DATA_SOURCE = "foo";
+
+  // built from ArrayOfDoublesUpdatableSketch.update("FEDCAB", new double[] {0.0}).compact()
+  private static final String COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION = "AQEJAwgBzJP/////////fwEAAAAAAAAAjFnadZuMrkgAAAAAAAAAAA==";
+
+  private static final List<InputRow> ROWS = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 5)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 12)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 3)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 8)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-03")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 2)
+                  .build()
+  ).stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
+
+  @Override
+  public void configureGuice(DruidInjectorBuilder builder)
+  {
+    super.configureGuice(builder);
+    builder.addModule(new ArrayOfDoublesSketchModule());
+  }
+
+  @Override
+  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
+      final QueryRunnerFactoryConglomerate conglomerate,
+      final JoinableFactoryWrapper joinableFactory,
+      final Injector injector
+  ) throws IOException
+  {
+    ArrayOfDoublesSketchModule.registerSerde();
+
+    final QueryableIndex index = IndexBuilder.create()
+                                             .tmpDir(temporaryFolder.newFolder())
+                                             .segmentWriteOutMediumFactory(
+                                                 OffHeapMemorySegmentWriteOutMediumFactory.instance()
+                                             )
+                                             .schema(
+                                                 new IncrementalIndexSchema.Builder()
+                                                     .withMetrics(
+                                                         new CountAggregatorFactory("cnt"),
+                                                         new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "tuplesketch_dim2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             1
+                                                         ),
+                                                         new LongSumAggregatorFactory("m1", "m1")
+                                                     )
+                                                     .withRollup(false)
+                                                     .build()
+                                             )
+                                             .rows(ROWS)
+                                             .buildMMappedIndex();
+
+    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        DataSegment.builder()
+                   .dataSource(DATA_SOURCE)
+                   .interval(index.getDataInterval())
+                   .version("1")
+                   .shardSpec(new LinearShardSpec(0))
+                   .size(0)
+                   .build(),
+        index
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimate()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  dim1,\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1, 256))\n"

Review Comment:
   I agree they may be a bit cumbersome to type, but the right tradeoff between explicit-ness and verbosity can be difficult to find.
   N.B. The same could be said of some of the existing Theta Sketch SQL functions. e.g.
   - `theta_sketch_estimate_with_error_bounds`
   - `APPROX_COUNT_DISTINCT_DS_THETA`
   https://github.com/apache/druid/blob/master/extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaSketchSqlAggregatorTest.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1478491790

   > It might be worth noting that the naming of the Data Sketch SQL functions doesn't seem entirely consistent across the board.
   > e.g. I see other sketch SQL functions use a `DS` prefix or suffix to designate them as Data Sketches:
   
   That's a good point about `DS` being used in most of the others. I think the suggestions at the end of your post (`DS_TUPLE_DOUBLES`, etc) are a good balance between consistency, clarity, and terseness. They work for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1136661053


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.util.Optionality;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );
+
+    if (potentialNominalEntriesArg.isA(SqlKind.LITERAL) &&
+        RexLiteral.value(potentialNominalEntriesArg) instanceof Number) {
+
+      nominalEntries = ((Number) RexLiteral.value(potentialNominalEntriesArg)).intValue();
+      metricExpressionEndIndex = lastArgIndex - 1;
+    } else {
+      nominalEntries = Util.DEFAULT_NOMINAL_ENTRIES;
+      metricExpressionEndIndex = lastArgIndex;
+    }
+
+    final List<String> fieldNames = new ArrayList<>();
+    for (int i = 0; i <= metricExpressionEndIndex; i++) {
+      final String fieldName;
+
+      final RexNode columnRexNode = Expressions.fromFieldAccess(
+          rowSignature,
+          project,
+          argList.get(i)
+      );

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Expressions.fromFieldAccess](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4395)



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.util.Optionality;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Expressions.fromFieldAccess](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4394)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1146203780


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`DS_TUPLE_DOUBLES(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches which contain an array of double values as their Summary Objects. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.

Review Comment:
   So, finally time to approve/merge?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1128555881


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,22 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`
+
+Creates a Tuple sketch.
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE
+
+`ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(expr)`
+
+Computes approximate sums of the values contained within a Tuple sketch.
+

Review Comment:
   ```suggestion
   
   **Function type:** [Aggregation](sql-aggregations.md)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1466374629

   @vtlim Hi, are there any further changes required before this PR can be accepted and merged? Is a review by more maintainers required? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1477790326

   > The changes generally look good to me. I'd consider changing the names to start with `TUPLE_DOUBLES_` rather than `ARRAY_OF_DOUBLES_SKETCH_`. (Like `TUPLE_DOUBLES_SKETCH`, `TUPLE_DOUBLES_NOT`, etc.) It's a bit less typing and IMO clearer.
   > 
   > Curious what people think.
   
   One issue with changing the naming only for the SQL functions is that it would be inconsistent with the naming in the native Druid functions and Apache Data Sketches codebase/documentation and so might lead to confusion.
   That issue aside, I do think the naming can be more terse without losing too much meaning.
   
   It might be worth noting that the naming of the Data Sketch SQL functions doesn't seem entirely consistent across the board.
   e.g.  I see other sketch SQL functions use a `DS` prefix or suffix to designate them as Data Sketches:
   - `APPROX_COUNT_DISTINCT_DS_HLL`
   - `APPROX_COUNT_DISTINCT_DS_THETA`
   - `APPROX_QUANTILE_DS`
   - `DS_CDF`
   - `DS_GET_QUANTILE`
   - `DS_GET_QUANTILES`
   - `DS_HISTOGRAM`
   - `DS_HLL`
   - `DS_QUANTILE_SUMMARY`
   - `DS_QUANTILES_SKETCH`
   - `DS_RANK`
   - `DS_THETA`
   
   We could perhaps consider the following:
   - `DS_TUPLE_DOUBLES`
   - `DS_TUPLE_DOUBLES_INTERSECT`
   - `DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE`
   - `DS_TUPLE_DOUBLES_NOT `
   - `DS_TUPLE_DOUBLES_UNION`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1127198652


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,17 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.
+|`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the dimension value of `dimensionColumnExpr` and the metric values contained in the list of `metricColumnExpr` columns. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.

Review Comment:
   If the last operand is a Numeric Literal, it’s assumed to be nominalEntries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142516715


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   Sketches in the Tuple family are all specialized implementations so I'm not sure making this generic can easily be done, if at all.
   Specifically, the various Tuple sketch implementations in the Apache DataSketches project don't all seem to share a type hierarchy so determining the type of the serialized sketch at runtime efficiently might be problematic.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1145572348


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`DS_TUPLE_DOUBLES(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches which contain an array of double values as their Summary Objects. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.

Review Comment:
   Thanks for the explanation! The doc changes overall look good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1140544380


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.

Review Comment:
   Can we include a link to a description of tuple sketches, and why I'd want one, for newbies that don't already understand this concept?



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+

Review Comment:
   As above: perhaps a description of the purpose of these functions for those not already in the know?



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   This is quite a long name for a poor user to remember and type.
   
   First, might there be an `ARRAY_OF_LONGS` or `ARRAY_OF_STRINGS` sketch in the future? If so, common SQL practice is to use function overrides: `ARRAY_SKETCH()`, say, which works out the "OF_DOUBLES", etc. part based on the type of the argument. See how `LATEST()` handles this idea.
   
   Second, suppose we'll only ever do an array of doubles. Then, a common approach is to introduce an acronym to shorten names: `AODS`. Or, to make it shorter: "double array sketch" or "DAS".
   
   Then, the suffix only has to be enough to differentiate each function: `DAS_NOT`, `DAS_UNION`, `DAS_SUM`, etc.



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java:
##########
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.inject.Injector;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.guice.DruidInjectorBuilder;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchOperations;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator;
+import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
+import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.groupby.GroupByQuery;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import org.apache.druid.segment.join.JoinableFactoryWrapper;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.sql.calcite.BaseCalciteQueryTest;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.sql.calcite.util.TestDataBuilder;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class ArrayOfDoublesSketchSqlAggregatorTest extends BaseCalciteQueryTest
+{
+
+  private static final String DATA_SOURCE = "foo";
+
+  // built from ArrayOfDoublesUpdatableSketch.update("FEDCAB", new double[] {0.0}).compact()
+  private static final String COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION = "AQEJAwgBzJP/////////fwEAAAAAAAAAjFnadZuMrkgAAAAAAAAAAA==";
+
+  private static final List<InputRow> ROWS = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 5)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 12)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 3)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 8)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-03")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 2)
+                  .build()
+  ).stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
+
+  @Override
+  public void configureGuice(DruidInjectorBuilder builder)
+  {
+    super.configureGuice(builder);
+    builder.addModule(new ArrayOfDoublesSketchModule());
+  }
+
+  @Override
+  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
+      final QueryRunnerFactoryConglomerate conglomerate,
+      final JoinableFactoryWrapper joinableFactory,
+      final Injector injector
+  ) throws IOException
+  {
+    ArrayOfDoublesSketchModule.registerSerde();
+
+    final QueryableIndex index = IndexBuilder.create()
+                                             .tmpDir(temporaryFolder.newFolder())
+                                             .segmentWriteOutMediumFactory(
+                                                 OffHeapMemorySegmentWriteOutMediumFactory.instance()
+                                             )
+                                             .schema(
+                                                 new IncrementalIndexSchema.Builder()
+                                                     .withMetrics(
+                                                         new CountAggregatorFactory("cnt"),
+                                                         new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "tuplesketch_dim2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             1
+                                                         ),
+                                                         new LongSumAggregatorFactory("m1", "m1")
+                                                     )
+                                                     .withRollup(false)
+                                                     .build()
+                                             )
+                                             .rows(ROWS)
+                                             .buildMMappedIndex();
+
+    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        DataSegment.builder()
+                   .dataSource(DATA_SOURCE)
+                   .interval(index.getDataInterval())
+                   .version("1")
+                   .shardSpec(new LinearShardSpec(0))
+                   .size(0)
+                   .build(),
+        index
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimate()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  dim1,\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1, 256))\n"
+                   + "FROM druid.foo\n"
+                   + "GROUP BY dim1";
+
+    final List<Object[]> expectedResults;
+
+    expectedResults = ImmutableList.of(
+        new Object[]{
+            "CA",
+            2L,
+            "[8.0]",
+            "[8.0]",
+            "[8.0]"
+        },
+        new Object[]{
+            "US",
+            3L,
+            "[22.0]",
+            "[22.0]",
+            "[22.0]"
+        }
+    );
+
+    testQuery(
+        sql,
+        ImmutableList.of(
+            GroupByQuery.builder()
+                  .setDataSource(CalciteTests.DATASOURCE1)
+                  .setInterval(querySegmentSpec(Filtration.eternity()))
+                  .setGranularity(Granularities.ALL)
+                  .setDimensions(new DefaultDimensionSpec("dim1", "d0", ColumnType.STRING))
+                  .setAggregatorSpecs(
+                      aggregators(
+                          new LongSumAggregatorFactory("a0", "cnt"),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a1",
+                                                             "tuplesketch_dim2",
+                                                             null,
+                                                             null,
+                                                             null
+                          ),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             null
+                          ),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a3",
+                                                             "dim2",
+                                                             256,
+                                                             ImmutableList.of("m1"),
+                                                             null
+                          )
+                      )
+                  )
+                  .setPostAggregatorSpecs(
+                      ImmutableList.of(
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p1",
+                           new FieldAccessPostAggregator("p0", "a1")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p3",
+                           new FieldAccessPostAggregator("p2", "a2")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p5",
+                           new FieldAccessPostAggregator("p4", "a3")
+                         )
+                      )
+                  )
+                  .setContext(QUERY_CONTEXT_DEFAULT)
+                  .build()
+        ),
+        expectedResults
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimateIntersect()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)) AS all_sum_estimates,\n"
+                   + StringUtils.replace(
+                      "ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH_INTERSECT(COMPLEX_DECODE_BASE64('arrayOfDoublesSketch', '%s'), ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2), 128)) AS intersect_sum_estimates\n",
+                      "%s",
+                      COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION
+                   )
+                   + "FROM druid.foo";
+
+    final List<Object[]> expectedResults;
+
+    expectedResults = ImmutableList.of(
+        new Object[]{
+            5L,
+            "[30.0]",
+            "[8.0]"
+        }
+    );
+
+    final String expectedBase64Constant = "'"
+                                 + StringUtils.replace(COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION, "=", "\\u003D")
+                                 + "'";
+
+    testQuery(
+        sql,
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE1)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .granularity(Granularities.ALL)
+                  .aggregators(
+                      ImmutableList.of(
+                          new LongSumAggregatorFactory("a0", "cnt"),
+                          new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "a1",
+                                                             "tuplesketch_dim2",
+                                                             null,
+                                                             null,
+                                                             null
+                          )
+                      )
+                  )
+                  .postAggregators(
+                      ImmutableList.of(
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p1",
+                           new FieldAccessPostAggregator("p0", "a1")
+                         ),
+                         new ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator(
+                           "p5",
+                           new ArrayOfDoublesSketchSetOpPostAggregator(
+                             "p4",
+                             "INTERSECT",
+                             128,
+                             null,
+                             ImmutableList.of(
+                               new ExpressionPostAggregator(
+                                 "p2",
+                                 "complex_decode_base64('arrayOfDoublesSketch'," + expectedBase64Constant + ")",
+                                 null,
+                                 queryFramework().macroTable()
+                               ),
+                               new FieldAccessPostAggregator("p3", "a1")
+                             )
+                           )
+                         )
+                      )
+                  )
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .build()
+        ),
+        expectedResults
+    );
+  }
+
+  @Test
+  public void testNullInputs()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"

Review Comment:
   None of these test queries use aggregations (`GROUP BY` clause). Would these functions work in that case? Should we include tests to demonstrate how they'd work?



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   The set of functions here seem to mimic the new scheme that various Druid principals have outline: functions to create the intermediate types, and more functions to pull final values out of those intermediate types.
   
   That approach makes sense for a native query, where the query spells out the details of aggregation.
   
   For SQL, however, this pattern is not a good fit. SQL provides a single function that goes from inputs to finalized types. For example, for `LATEST()` at the native layer, we may eventually want:
   
   * `LATEST_LONG_SKETCH(string, long) -> COMPLEX<PAIR<string, long>>`
   * `LATEST_UNION(COMPLEX<PAIR<string, long>> COMPLEX<PAIR<string, long>>) -> COMPLEX<PAIR<string, long>>`
   * `LATEST_LEFT( COMPLEX<PAIR<string, long>>) -> string`
   * `LATEST_RIGHT(COMPLEX<PAIR<string, long>> -> long`
   
   However, SQL doesn't provide the semantics to use such functions: one cannot get "inside" of a `GROUP BY` to create the intermediate values, combine them, and then go "outside" of the `GROUP BY` to produce the finalized value.
   
   Instead, in SQL, there is one function, `LATEST(.)` which generates the internal forms when creating the native query.
   
   All of this is a long-winded way of asking, for SQL, do we just need an `APPROX_SUM(ARRAY<DOUBLE>)` function? The rest is worked out by the planner to handle doing approx sums for arrays of doubles?



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java:
##########
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.inject.Injector;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.guice.DruidInjectorBuilder;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchOperations;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator;
+import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
+import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.groupby.GroupByQuery;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import org.apache.druid.segment.join.JoinableFactoryWrapper;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.sql.calcite.BaseCalciteQueryTest;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.sql.calcite.util.TestDataBuilder;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class ArrayOfDoublesSketchSqlAggregatorTest extends BaseCalciteQueryTest
+{
+
+  private static final String DATA_SOURCE = "foo";
+
+  // built from ArrayOfDoublesUpdatableSketch.update("FEDCAB", new double[] {0.0}).compact()
+  private static final String COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION = "AQEJAwgBzJP/////////fwEAAAAAAAAAjFnadZuMrkgAAAAAAAAAAA==";
+
+  private static final List<InputRow> ROWS = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 5)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 12)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 3)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 8)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-03")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 2)
+                  .build()
+  ).stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
+
+  @Override
+  public void configureGuice(DruidInjectorBuilder builder)
+  {
+    super.configureGuice(builder);
+    builder.addModule(new ArrayOfDoublesSketchModule());
+  }
+
+  @Override
+  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
+      final QueryRunnerFactoryConglomerate conglomerate,
+      final JoinableFactoryWrapper joinableFactory,
+      final Injector injector
+  ) throws IOException
+  {
+    ArrayOfDoublesSketchModule.registerSerde();
+
+    final QueryableIndex index = IndexBuilder.create()
+                                             .tmpDir(temporaryFolder.newFolder())
+                                             .segmentWriteOutMediumFactory(
+                                                 OffHeapMemorySegmentWriteOutMediumFactory.instance()
+                                             )
+                                             .schema(
+                                                 new IncrementalIndexSchema.Builder()
+                                                     .withMetrics(
+                                                         new CountAggregatorFactory("cnt"),
+                                                         new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "tuplesketch_dim2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             1
+                                                         ),
+                                                         new LongSumAggregatorFactory("m1", "m1")
+                                                     )
+                                                     .withRollup(false)
+                                                     .build()
+                                             )
+                                             .rows(ROWS)
+                                             .buildMMappedIndex();
+
+    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        DataSegment.builder()
+                   .dataSource(DATA_SOURCE)
+                   .interval(index.getDataInterval())
+                   .version("1")
+                   .shardSpec(new LinearShardSpec(0))
+                   .size(0)
+                   .build(),
+        index
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimate()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  dim1,\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1, 256))\n"

Review Comment:
   From a usability perspective, this is somewhat un-SQL. Having to compose two functions, with very long names, to get one result is unusual. Since SQL is often hand-written, SQL goes for convenience rather than breaking things down into their component parts. The above approach would be fine in Java, but is awkward in SQL.



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT

Review Comment:
   A description of each function would be useful. I have to confess, I'm struggling to understand how one takes the logical `NOT` of a double, an array of doubles, or a sum of an array of doubles.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.util.Optionality;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );
+
+    if (potentialNominalEntriesArg.isA(SqlKind.LITERAL) &&
+        RexLiteral.value(potentialNominalEntriesArg) instanceof Number) {
+
+      nominalEntries = ((Number) RexLiteral.value(potentialNominalEntriesArg)).intValue();
+      metricExpressionEndIndex = lastArgIndex - 1;
+    } else {
+      nominalEntries = Util.DEFAULT_NOMINAL_ENTRIES;
+      metricExpressionEndIndex = lastArgIndex;
+    }
+
+    final List<String> fieldNames = new ArrayList<>();
+    for (int i = 0; i <= metricExpressionEndIndex; i++) {
+      final String fieldName;
+
+      final RexNode columnRexNode = Expressions.fromFieldAccess(
+          rowSignature,
+          project,
+          argList.get(i)
+      );

Review Comment:
   Druid likes to merge master rather than rebase. A rebase looses folks comment status, which folks kind of dislike.
   
   If your code conflicts with newer code, then, yes, please do merge master and resolve any conflicts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134622210


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,17 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.
+|`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the dimension value of `dimensionColumnExpr` and the metric values contained in one or more `metricColumnExpr` columns. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
+|`ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(expr)`|Computes approximate sums of the values contained within a [Tuple sketch](../development/extensions-core/datasketches-tuple.md#estimated-metrics-values-for-each-column-of-arrayofdoublessketch) column.

Review Comment:
   @frankgrimes97 When you move this content to sql-scalar.md, please also update the link in `sql-functions.md`.
   
   ```
   **Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1462180042

   Hmm... not sure if there's anything to be done about this failing CI check:
   
   ```
   > spellcheck
   [69](https://github.com/apache/druid/actions/runs/4358587476/jobs/7648241472#step:21:70)
   > mdspell --en-us --ignore-numbers --report '../docs/**/*.md' || (./script/notify-spellcheck-issues && false)
   [70](https://github.com/apache/druid/actions/runs/4358587476/jobs/7648241472#step:21:71)
   
   [71](https://github.com/apache/druid/actions/runs/4358587476/jobs/7648241472#step:21:72)
       ../docs/querying/sql-functions.md
   [72](https://github.com/apache/druid/actions/runs/4358587476/jobs/7648241472#step:21:73)
         139 | ## ARRAY_OF_DOUBLES_SKETCH 
   [73](https://github.com/apache/druid/actions/runs/4358587476/jobs/7648241472#step:21:74)
         149 | ## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE
   ```
   https://github.com/apache/druid/actions/runs/4358587476/jobs/7648241472


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] anshu-makkar commented on pull request #13887: Tuple sketch SQL support

Posted by "anshu-makkar (via GitHub)" <gi...@apache.org>.
anshu-makkar commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1458113437

   @abhishekagarwal87 kindly check this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1478635101

   @gianm @vtlim I've renamed the SQL functions and updated the documentation accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1144108169


##########
docs/querying/sql-functions.md:
##########
@@ -491,6 +491,48 @@ Returns an approximate rank between 0 and 1 of a given value, in which the rank
 
 Creates a Theta sketch on a column containing Theta sketches or a regular column.
 
+## DS_TUPLE_DOUBLES
+
+`DS_TUPLE_DOUBLES(expr, [nominalEntries])`
+
+`DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch which contains an array of double values as the Summary Object. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
+
+## DS_TUPLE_DOUBLES_INTERSECT
+
+`DS_TUPLE_DOUBLES_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+Returns an intersection of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Objects are summed when combined. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
+
+## DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE
+
+`DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE(expr)`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+Computes approximate sums of the values contained within a Tuple sketch which contains an array of double values as the Summary Object.
+
+## DS_TUPLE_DOUBLES_NOT
+
+`DS_TUPLE_DOUBLES_NOT(expr, ..., [nominalEntries])`
+
+Returns a set difference of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Object are preserved as-is. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)

Review Comment:
   The function type line should go below the function signature and above the description



##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`DS_TUPLE_DOUBLES(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches which contain an array of double values as their Summary Objects. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.

Review Comment:
   I've noticed you started including "Summary Objects" as part of the description. What does this mean?
   
   I thought the last wording was clearer:
   
   > Creates a Tuple sketch on the values of `expr` which is a column containing Tuple sketches.
   
   It seems that all the Tuple sketches are also ArrayofDoublesSketch objects. https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html



##########
docs/querying/sql-functions.md:
##########
@@ -491,6 +491,48 @@ Returns an approximate rank between 0 and 1 of a given value, in which the rank
 
 Creates a Theta sketch on a column containing Theta sketches or a regular column.
 
+## DS_TUPLE_DOUBLES
+
+`DS_TUPLE_DOUBLES(expr, [nominalEntries])`
+
+`DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch which contains an array of double values as the Summary Object. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
+
+## DS_TUPLE_DOUBLES_INTERSECT
+
+`DS_TUPLE_DOUBLES_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+Returns an intersection of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Objects are summed when combined. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
+
+## DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE
+
+`DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE(expr)`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+Computes approximate sums of the values contained within a Tuple sketch which contains an array of double values as the Summary Object.
+
+## DS_TUPLE_DOUBLES_NOT
+
+`DS_TUPLE_DOUBLES_NOT(expr, ..., [nominalEntries])`
+
+Returns a set difference of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Object are preserved as-is. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).

Review Comment:
   ```suggestion
   Returns a set difference of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Object are preserved as is. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142883502


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   As to naming— maybe `TUPLE_DOUBLES_SKETCH`, `TUPLE_DOUBLES_INTERSECT`? IMO the `ARRAY_OF` doesn't add much, so we can skip it, and `TUPLE` _does_ add something: the docs call them "tuple sketches" extensively. I'm OK with the originals too. `AODS`, `AODS_INTERSECT` to my eye seem _too_ terse.
   
   As to the bigger potential changes— IMO, the decomposed versions proposed in this patch are a good place to start. Tuple sketches, like theta sketches, are likely to be combined in different ways by different users. Their purpose is to be intersected, not'ed, and unioned in various ways, and then converted to estimates once the set ops are done. Since there isn't a canonical thing that people will largely want to do, it seems like the composable functions are necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142522184


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   Would the following SQL function names be preferable?
   - `ARRAY_OF_DOUBLES_SKETCH  -> AODS`
   - `ARRAY_OF_DOUBLES_SKETCH_INTERSECT  -> AODS_INTERSECT`
   - `ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE -> AODS_METRICS_SUM_ESTIMATE`
   - `ARRAY_OF_DOUBLES_SKETCH_NOT -> AODS_NOT`
   - `ARRAY_OF_DOUBLES_SKETCH_UNION  -> AODS_UNION`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134666017


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion
+{
+  public ArrayOfDoublesSketchSetBaseOperatorConversion()
+  {
+  }
+
+  @Override
+  public SqlOperator calciteOperator()
+  {
+    return makeSqlFunction();
+  }
+
+  @Nullable
+  @Override
+  public DruidExpression toDruidExpression(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode
+  )
+  {
+    plannerContext.setPlanningError("%s can only be used on aggregates. " +
+        "It cannot be used directly on a column or on a scalar expression.", getFunctionName());
+    return null;
+  }
+
+  @Nullable
+  @Override
+  public PostAggregator toPostAggregator(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode,
+      PostAggregatorVisitor postAggregatorVisitor
+  )
+  {
+    final List<RexNode> operands = ((RexCall) rexNode).getOperands();
+    final List<PostAggregator> inputPostAggs = new ArrayList<>();
+    Integer nominalEntries = null;
+    Integer numberOfvalues = null;
+
+    for (int i = 0; i < operands.size(); i++) {
+      RexNode operand = operands.get(i);
+      if (i == 0 && operand.isA(SqlKind.LITERAL) && SqlTypeFamily.INTEGER.contains(operand.getType())) {
+        nominalEntries = RexLiteral.intValue(operand);

Review Comment:
   Hmm, not sure... I'm pretty new to the Calcite APIs.
   Can you point me to a relevant example?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134688200


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion

Review Comment:
   Ok, thanks! Let us know if you need any advice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134597082


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,17 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.
+|`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the dimension value of `dimensionColumnExpr` and the metric values contained in one or more `metricColumnExpr` columns. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
+|`ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(expr)`|Computes approximate sums of the values contained within a [Tuple sketch](../development/extensions-core/datasketches-tuple.md#estimated-metrics-values-for-each-column-of-arrayofdoublessketch) column.

Review Comment:
   Since `ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE` is a regular function (as opposed to an aggregation function) it should go in `querying/sql-scalar.md`.



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,24 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   This one needs to go in `.spelling` too.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchMergeAggregator.java:
##########
@@ -58,12 +58,20 @@ public ArrayOfDoublesSketchMergeAggregator(
   @Override
   public void aggregate()
   {
-    final ArrayOfDoublesSketch update = selector.getObject();
+    final Object update = selector.getObject();
     if (update == null) {
       return;
     }
+    final ArrayOfDoublesSketch sketch;
+    if (update instanceof ArrayOfDoublesSketch) {

Review Comment:
   What was this change needed for? (Asking since I'm wondering if we're missing a call to `factory.deserialize` somewhere; it's doing something similar.)



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,24 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH

Review Comment:
   This needs to go in the `.spelling` file. Search for `ARRAY_CONCAT_AGG`, it would be right after that one.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion

Review Comment:
   Looks like this family of functions isn't documented in this patch; is there a reason for that? If so, it'd be good to write it down in a class-level javadoc, so other developers are aware.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion
+{
+  public ArrayOfDoublesSketchSetBaseOperatorConversion()
+  {
+  }
+
+  @Override
+  public SqlOperator calciteOperator()
+  {
+    return makeSqlFunction();
+  }
+
+  @Nullable
+  @Override
+  public DruidExpression toDruidExpression(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode
+  )
+  {
+    plannerContext.setPlanningError("%s can only be used on aggregates. " +
+        "It cannot be used directly on a column or on a scalar expression.", getFunctionName());
+    return null;
+  }
+
+  @Nullable
+  @Override
+  public PostAggregator toPostAggregator(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode,
+      PostAggregatorVisitor postAggregatorVisitor
+  )
+  {
+    final List<RexNode> operands = ((RexCall) rexNode).getOperands();
+    final List<PostAggregator> inputPostAggs = new ArrayList<>();
+    Integer nominalEntries = null;
+    Integer numberOfvalues = null;
+
+    for (int i = 0; i < operands.size(); i++) {
+      RexNode operand = operands.get(i);
+      if (i == 0 && operand.isA(SqlKind.LITERAL) && SqlTypeFamily.INTEGER.contains(operand.getType())) {
+        nominalEntries = RexLiteral.intValue(operand);

Review Comment:
   These requirements could be better-plugged into the OperandTypeChecker, and if so, users would get better messages when using the function incorrectly. I'm willing to leave this for future work, though, if you aren't up to doing it in this patch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142887011


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.

Review Comment:
   IMO we don't need this description added here, since documentation already exists for tuple sketches generally: https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html. Admittedly it could use improvement— they are quite terse when it comes to why you might want to use the sketches in the first place— but I'd consider that out of scope for a patch that is about to add SQL bindings.



##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.

Review Comment:
   IMO we don't need this description added here, since documentation already exists for tuple sketches generally: https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html. Admittedly it could use improvement— they are quite terse when it comes to why you might want to use the sketches in the first place— but I'd consider that out of scope for a patch that is about adding SQL bindings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142522184


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   Would the following SQL functions names be preferable?
   - `ARRAY_OF_DOUBLES_SKETCH  -> AODS`
   - `ARRAY_OF_DOUBLES_SKETCH_INTERSECT  -> AODS_INTERSECT`
   - `AODS_METRIC_SUM_ESTIMATE -> AODS_METRIC_SUM_ESTIMATE`
   - `ARRAY_OF_DOUBLES_SKETCH_NOT -> AODS_NOT`
   - `ARRAY_OF_DOUBLES_SKETCH_UNION  -> AODS_UNION`



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   Would the following SQL function names be preferable?
   - `ARRAY_OF_DOUBLES_SKETCH  -> AODS`
   - `ARRAY_OF_DOUBLES_SKETCH_INTERSECT  -> AODS_INTERSECT`
   - `AODS_METRIC_SUM_ESTIMATE -> AODS_METRIC_SUM_ESTIMATE`
   - `ARRAY_OF_DOUBLES_SKETCH_NOT -> AODS_NOT`
   - `ARRAY_OF_DOUBLES_SKETCH_UNION  -> AODS_UNION`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142877531


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion
+{
+  public ArrayOfDoublesSketchSetBaseOperatorConversion()
+  {
+  }
+
+  @Override
+  public SqlOperator calciteOperator()
+  {
+    return makeSqlFunction();
+  }
+
+  @Nullable
+  @Override
+  public DruidExpression toDruidExpression(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode
+  )
+  {
+    plannerContext.setPlanningError("%s can only be used on aggregates. " +
+        "It cannot be used directly on a column or on a scalar expression.", getFunctionName());
+    return null;
+  }
+
+  @Nullable
+  @Override
+  public PostAggregator toPostAggregator(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode,
+      PostAggregatorVisitor postAggregatorVisitor
+  )
+  {
+    final List<RexNode> operands = ((RexCall) rexNode).getOperands();
+    final List<PostAggregator> inputPostAggs = new ArrayList<>();
+    Integer nominalEntries = null;
+    Integer numberOfvalues = null;
+
+    for (int i = 0; i < operands.size(); i++) {
+      RexNode operand = operands.get(i);
+      if (i == 0 && operand.isA(SqlKind.LITERAL) && SqlTypeFamily.INTEGER.contains(operand.getType())) {
+        nominalEntries = RexLiteral.intValue(operand);

Review Comment:
   > Given that the parsing/validation added here is similar to what's already being done for other data sketch SQL functions, perhaps it's best to save your suggested improvements for a future initiative.
   
   Yeah, I agree that the other sketch bindings could use better operand validation too. I'm OK with deferring this til the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142877793


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.util.Optionality;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );

Review Comment:
   Ah, yep, that's new. Switching over to the new one could be addressed in this PR or in a follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1135554183


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion
+{
+  public ArrayOfDoublesSketchSetBaseOperatorConversion()
+  {
+  }
+
+  @Override
+  public SqlOperator calciteOperator()
+  {
+    return makeSqlFunction();
+  }
+
+  @Nullable
+  @Override
+  public DruidExpression toDruidExpression(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode
+  )
+  {
+    plannerContext.setPlanningError("%s can only be used on aggregates. " +
+        "It cannot be used directly on a column or on a scalar expression.", getFunctionName());
+    return null;
+  }
+
+  @Nullable
+  @Override
+  public PostAggregator toPostAggregator(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode,
+      PostAggregatorVisitor postAggregatorVisitor
+  )
+  {
+    final List<RexNode> operands = ((RexCall) rexNode).getOperands();
+    final List<PostAggregator> inputPostAggs = new ArrayList<>();
+    Integer nominalEntries = null;
+    Integer numberOfvalues = null;
+
+    for (int i = 0; i < operands.size(); i++) {
+      RexNode operand = operands.get(i);
+      if (i == 0 && operand.isA(SqlKind.LITERAL) && SqlTypeFamily.INTEGER.contains(operand.getType())) {
+        nominalEntries = RexLiteral.intValue(operand);

Review Comment:
   @gianm It isn't obvious to me exactly how to implement this part. Given that the parsing/validation added here is similar to what's already being done for other data sketch SQL functions, perhaps it's best to save your suggested improvements improvements for a future initiative.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1137113257


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.util.Optionality;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );

Review Comment:
   It seems these are new deprecation warnings due to recent changes merged to `master`:
     https://github.com/apache/druid/pull/13904
     https://github.com/apache/druid/pull/13914
   
   Do they need to be addressed immediately on this PR? (Might require a new rebase on `master`?)
   



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.util.Optionality;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );
+
+    if (potentialNominalEntriesArg.isA(SqlKind.LITERAL) &&
+        RexLiteral.value(potentialNominalEntriesArg) instanceof Number) {
+
+      nominalEntries = ((Number) RexLiteral.value(potentialNominalEntriesArg)).intValue();
+      metricExpressionEndIndex = lastArgIndex - 1;
+    } else {
+      nominalEntries = Util.DEFAULT_NOMINAL_ENTRIES;
+      metricExpressionEndIndex = lastArgIndex;
+    }
+
+    final List<String> fieldNames = new ArrayList<>();
+    for (int i = 0; i <= metricExpressionEndIndex; i++) {
+      final String fieldName;
+
+      final RexNode columnRexNode = Expressions.fromFieldAccess(
+          rowSignature,
+          project,
+          argList.get(i)
+      );

Review Comment:
   It seems these are new deprecation warnings due to recent changes merged to `master`:
     https://github.com/apache/druid/pull/13904
     https://github.com/apache/druid/pull/13914
   
   Do they need to be addressed immediately on this PR? (Might require a new rebase on `master`?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1140463977


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT

Review Comment:
   @frankgrimes97 two comments:
   
   1) can you arrange the new functions in alphabetical order? That is, 
   * ARRAY_OF_DOUBLES_SKETCH_INTERSECT
   * ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE
   * ARRAY_OF_DOUBLES_SKETCH_NOT
   * ARRAY_OF_DOUBLES_SKETCH_UNION
   
   2) Make sure a description is included for INTERSECT, NOT, UNION



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142884962


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   IMO there are some "pre-composed" functions that would be useful to add, but I don't think we need to do that stuff in this PR, given that we're going to need the decomposed functions anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142530993


##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java:
##########
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.inject.Injector;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.guice.DruidInjectorBuilder;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchOperations;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator;
+import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
+import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.groupby.GroupByQuery;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import org.apache.druid.segment.join.JoinableFactoryWrapper;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.sql.calcite.BaseCalciteQueryTest;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.sql.calcite.util.TestDataBuilder;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class ArrayOfDoublesSketchSqlAggregatorTest extends BaseCalciteQueryTest
+{
+
+  private static final String DATA_SOURCE = "foo";
+
+  // built from ArrayOfDoublesUpdatableSketch.update("FEDCAB", new double[] {0.0}).compact()
+  private static final String COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION = "AQEJAwgBzJP/////////fwEAAAAAAAAAjFnadZuMrkgAAAAAAAAAAA==";
+
+  private static final List<InputRow> ROWS = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 5)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 12)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 3)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 8)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-03")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 2)
+                  .build()
+  ).stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
+
+  @Override
+  public void configureGuice(DruidInjectorBuilder builder)
+  {
+    super.configureGuice(builder);
+    builder.addModule(new ArrayOfDoublesSketchModule());
+  }
+
+  @Override
+  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
+      final QueryRunnerFactoryConglomerate conglomerate,
+      final JoinableFactoryWrapper joinableFactory,
+      final Injector injector
+  ) throws IOException
+  {
+    ArrayOfDoublesSketchModule.registerSerde();
+
+    final QueryableIndex index = IndexBuilder.create()
+                                             .tmpDir(temporaryFolder.newFolder())
+                                             .segmentWriteOutMediumFactory(
+                                                 OffHeapMemorySegmentWriteOutMediumFactory.instance()
+                                             )
+                                             .schema(
+                                                 new IncrementalIndexSchema.Builder()
+                                                     .withMetrics(
+                                                         new CountAggregatorFactory("cnt"),
+                                                         new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "tuplesketch_dim2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             1
+                                                         ),
+                                                         new LongSumAggregatorFactory("m1", "m1")
+                                                     )
+                                                     .withRollup(false)
+                                                     .build()
+                                             )
+                                             .rows(ROWS)
+                                             .buildMMappedIndex();
+
+    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        DataSegment.builder()
+                   .dataSource(DATA_SOURCE)
+                   .interval(index.getDataInterval())
+                   .version("1")
+                   .shardSpec(new LinearShardSpec(0))
+                   .size(0)
+                   .build(),
+        index
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimate()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  dim1,\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1, 256))\n"

Review Comment:
   I agree they may be a bit cumbersome to type, but the right tradeoff between explicitness and terseness can be difficult to find.
   N.B. The same could be said of some of the existing Theta Sketch SQL functions. e.g.
   - `theta_sketch_estimate_with_error_bounds`
   - `APPROX_COUNT_DISTINCT_DS_THETA`
   https://github.com/apache/druid/blob/master/extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaSketchSqlAggregatorTest.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 merged pull request #13887: Tuple sketch SQL support

Posted by "abhishekagarwal87 (via GitHub)" <gi...@apache.org>.
abhishekagarwal87 merged PR #13887:
URL: https://github.com/apache/druid/pull/13887


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1128557005


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,22 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`
+
+Creates a Tuple sketch.
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE
+
+`ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(expr)`
+

Review Comment:
   ```suggestion
   **Function type:** [Aggregation](sql-aggregations.md)
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1128557396


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,22 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`
+
+Creates a Tuple sketch.
+
+**Function type:** [Aggregation](sql-aggregations.md)

Review Comment:
   I think the function type goes before the description. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142886040


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchMergeAggregator.java:
##########
@@ -58,12 +58,20 @@ public ArrayOfDoublesSketchMergeAggregator(
   @Override
   public void aggregate()
   {
-    final ArrayOfDoublesSketch update = selector.getObject();
+    final Object update = selector.getObject();
     if (update == null) {
       return;
     }
+    final ArrayOfDoublesSketch sketch;
+    if (update instanceof ArrayOfDoublesSketch) {

Review Comment:
   It's not documented, IIRC because we are still sorting out what the best way is to expose this to end users, so we might need to change the API. It is fine for use in a unit test though. If we change the API we'll update the test too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142476374


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT

Review Comment:
   The `NOT` is done against a hash value (the keys in this case, similar to Theta Sketch) and will just preserve both the key and its associated Summary Object unmodified (in this case, an array of double values)
   Both `UNION` and `INTERSECT` add the values in the associated Summary Objects (e.g. `a[0] + b[0]` if the array has `length == 1`.



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT

Review Comment:
   The `NOT` is done against a hash value (the keys in this case, similar to Theta Sketch) and will just preserve both the key and its associated Summary Object unmodified (in this case, an array of double values)
   Both `UNION` and `INTERSECT` add the values in the associated Summary Objects (e.g. `a[0] + b[0]` if the array has `length == 1`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142476374


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT

Review Comment:
   The `NOT` is done against a hash value (the keys in this case, similar to Theta Sketch) and will just preserve both the key and its associated Summary Object (in this case, an array of double values)
   Both `UNION` and `INTERSECT` add the values in the associated Summary Object (e.g. `a[0] + b[0]` if the array has `length == 1`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134695043


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion

Review Comment:
   I've added `INTERSECT`, `NOT` and `UNION` to the docs and fixed the params to accept `nominalEntries` in a way that's consistent with `ARRAY_OF_DOUBLES_SKETCH`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134669808


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion

Review Comment:
   You're right... this was completely missed!
   Also, I think its operand handling was mostly just copied from Theta sketch classes and may need some rework to be consistent with how `nominalEntries` is specified on `ARRAY_OF_DOUBLES_SKETCH` (optional last numeric literal parameter)
   I'll try to address that ASAP and add reworked argument parsing and docs for `ARRAY_OF_DOUBLES_SKETCH_INTERSECT` and `ARRAY_OF_DOUBLES_SKETCH_NOT` and `ARRAY_OF_DOUBLES_SKETCH_UNION` in this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1466947992

   @frankgrimes97 Can you add the two new sketch aggregator names to the .spelling file? The tests catch the new names as a misspelling so we just need to add them to the dictionary. For example after L735 in https://github.com/apache/druid/blob/master/website/.spelling#L735. Add a line for ARRAY_OF_DOUBLES_SKETCH and one for ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE
   
   I only reviewed the docs side, so there will need to be a reviewer on the code side. cc @abhishekagarwal87 or @rohangarg 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1473913990

   Hi, can this now be merged or are more changes required? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1135554183


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion
+{
+  public ArrayOfDoublesSketchSetBaseOperatorConversion()
+  {
+  }
+
+  @Override
+  public SqlOperator calciteOperator()
+  {
+    return makeSqlFunction();
+  }
+
+  @Nullable
+  @Override
+  public DruidExpression toDruidExpression(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode
+  )
+  {
+    plannerContext.setPlanningError("%s can only be used on aggregates. " +
+        "It cannot be used directly on a column or on a scalar expression.", getFunctionName());
+    return null;
+  }
+
+  @Nullable
+  @Override
+  public PostAggregator toPostAggregator(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode,
+      PostAggregatorVisitor postAggregatorVisitor
+  )
+  {
+    final List<RexNode> operands = ((RexCall) rexNode).getOperands();
+    final List<PostAggregator> inputPostAggs = new ArrayList<>();
+    Integer nominalEntries = null;
+    Integer numberOfvalues = null;
+
+    for (int i = 0; i < operands.size(); i++) {
+      RexNode operand = operands.get(i);
+      if (i == 0 && operand.isA(SqlKind.LITERAL) && SqlTypeFamily.INTEGER.contains(operand.getType())) {
+        nominalEntries = RexLiteral.intValue(operand);

Review Comment:
   @gianm It isn't obvious to me exactly how to implement this part. Given that the parsing/validation added here is similar to what's already being done for other data sketch SQL functions, perhaps it's best to save your suggested improvements for a future initiative.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142516715


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   Sketches in the Tuple family are all specialized implementations so I'm not sure making this generic can easily be done, if at all.
   Specifically, the various Tuple sketch implementations in the Apache DataSketches project don't seem to share a type hierarchy so determining the type of the serialized sketch at runtime efficiently might be problematic.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1144757898


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`DS_TUPLE_DOUBLES(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches which contain an array of double values as their Summary Objects. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.

Review Comment:
   I felt adding that information to the descriptions would help clarify things for those familiar with Tuple Data Sketches.
   
   FYI, Summary Objects are explained in the Data Sketches documentation: https://datasketches.apache.org/docs/Tuple/TupleOverview.html
   
   ArrayOfDoublesSketch is a specialized Tuple Sketch whose Summary Object is an array of double values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1127642008


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );
+
+    if (potentialNominalEntriesArg.isA(SqlKind.LITERAL) &&
+        RexLiteral.value(potentialNominalEntriesArg) instanceof Number) {
+
+      nominalEntries = ((Number) RexLiteral.value(potentialNominalEntriesArg)).intValue();
+      metricExpressionEndIndex = lastArgIndex - 1;
+    } else {
+      nominalEntries = Util.DEFAULT_NOMINAL_ENTRIES;
+      metricExpressionEndIndex = lastArgIndex;
+    }
+
+    final List<String> fieldNames = new ArrayList<>();
+    for (int i = 0; i <= metricExpressionEndIndex; i++) {
+      final String fieldName;
+
+      final RexNode columnRexNode = Expressions.fromFieldAccess(
+          rowSignature,
+          project,
+          argList.get(i)
+      );
+
+      final DruidExpression columnArg = Expressions.toDruidExpression(
+          plannerContext,
+          rowSignature,
+          columnRexNode
+      );
+      if (columnArg == null) {
+        return null;
+      }
+
+      if (columnArg.isDirectColumnAccess() &&
+          rowSignature.getColumnType(columnArg.getDirectColumn())
+                      .map(type -> type.is(ValueType.COMPLEX))
+                      .orElse(false)) {
+        fieldName = columnArg.getDirectColumn();
+      } else {
+        final RelDataType dataType = columnRexNode.getType();
+        final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType);
+        if (inputType == null) {
+          throw new ISE(
+              "Cannot translate sqlTypeName[%s] to Druid type for field[%s]",
+              dataType.getSqlTypeName(),
+              name
+          );
+        }
+
+        final DimensionSpec dimensionSpec;
+
+        if (columnArg.isDirectColumnAccess()) {
+          dimensionSpec = columnArg.getSimpleExtraction().toDimensionSpec(null, inputType);
+        } else {
+          String virtualColumnName = virtualColumnRegistry.getOrCreateVirtualColumnForExpression(
+              columnArg,
+              dataType
+          );
+          dimensionSpec = new DefaultDimensionSpec(virtualColumnName, null, inputType);
+        }
+        fieldName = dimensionSpec.getDimension();
+      }
+
+      fieldNames.add(fieldName);
+    }
+
+    final AggregatorFactory aggregatorFactory;
+    final List<String> metricColumns = fieldNames.size() > 1 ? fieldNames.subList(1, fieldNames.size()) : null;
+    aggregatorFactory = new ArrayOfDoublesSketchAggregatorFactory(
+          name,
+          fieldNames.get(0), // first field is dimension
+          nominalEntries,
+          metricColumns,
+          null
+      );
+
+    return Aggregation.create(
+        Collections.singletonList(aggregatorFactory),
+        null);
+  }
+
+  private static class ArrayOfDoublesSqlAggFunction extends SqlAggFunction
+  {
+    private static final String SIGNATURE = "'" + NAME + "(expr, nominalEntries)'\n";
+
+    ArrayOfDoublesSqlAggFunction()
+    {
+      super(
+          NAME,
+          null,
+          SqlKind.OTHER_FUNCTION,
+          ArrayOfDoublesSketchSqlOperators.RETURN_TYPE_INFERENCE,
+          InferTypes.VARCHAR_1024,
+          OperandTypes.VARIADIC,
+          SqlFunctionCategory.USER_DEFINED_FUNCTION,
+          false,
+          false
+      );

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [SqlAggFunction.SqlAggFunction](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4380)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1128569394


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,17 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.
+|`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the dimension value of `dimensionColumnExpr` and the metric values contained in the list of `metricColumnExpr` columns. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.

Review Comment:
   ```suggestion
   |`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the dimension value of `dimensionColumnExpr` and the metric values contained in one or more `metricColumnExpr` columns. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142530993


##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java:
##########
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.inject.Injector;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.guice.DruidInjectorBuilder;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchOperations;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchToMetricsSumEstimatePostAggregator;
+import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
+import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.groupby.GroupByQuery;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import org.apache.druid.segment.join.JoinableFactoryWrapper;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.sql.calcite.BaseCalciteQueryTest;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.sql.calcite.util.TestDataBuilder;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class ArrayOfDoublesSketchSqlAggregatorTest extends BaseCalciteQueryTest
+{
+
+  private static final String DATA_SOURCE = "foo";
+
+  // built from ArrayOfDoublesUpdatableSketch.update("FEDCAB", new double[] {0.0}).compact()
+  private static final String COMPACT_BASE_64_ENCODED_SKETCH_FOR_INTERSECTION = "AQEJAwgBzJP/////////fwEAAAAAAAAAjFnadZuMrkgAAAAAAAAAAA==";
+
+  private static final List<InputRow> ROWS = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 5)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 12)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "CA")
+                  .put("dim2", "FEDCAB")
+                  .put("m1", 3)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-02")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 8)
+                  .build(),
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-03")
+                  .put("dim1", "US")
+                  .put("dim2", "ABCDEF")
+                  .put("m1", 2)
+                  .build()
+  ).stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
+
+  @Override
+  public void configureGuice(DruidInjectorBuilder builder)
+  {
+    super.configureGuice(builder);
+    builder.addModule(new ArrayOfDoublesSketchModule());
+  }
+
+  @Override
+  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
+      final QueryRunnerFactoryConglomerate conglomerate,
+      final JoinableFactoryWrapper joinableFactory,
+      final Injector injector
+  ) throws IOException
+  {
+    ArrayOfDoublesSketchModule.registerSerde();
+
+    final QueryableIndex index = IndexBuilder.create()
+                                             .tmpDir(temporaryFolder.newFolder())
+                                             .segmentWriteOutMediumFactory(
+                                                 OffHeapMemorySegmentWriteOutMediumFactory.instance()
+                                             )
+                                             .schema(
+                                                 new IncrementalIndexSchema.Builder()
+                                                     .withMetrics(
+                                                         new CountAggregatorFactory("cnt"),
+                                                         new ArrayOfDoublesSketchAggregatorFactory(
+                                                             "tuplesketch_dim2",
+                                                             "dim2",
+                                                             null,
+                                                             ImmutableList.of("m1"),
+                                                             1
+                                                         ),
+                                                         new LongSumAggregatorFactory("m1", "m1")
+                                                     )
+                                                     .withRollup(false)
+                                                     .build()
+                                             )
+                                             .rows(ROWS)
+                                             .buildMMappedIndex();
+
+    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        DataSegment.builder()
+                   .dataSource(DATA_SOURCE)
+                   .interval(index.getDataInterval())
+                   .version("1")
+                   .shardSpec(new LinearShardSpec(0))
+                   .size(0)
+                   .build(),
+        index
+    );
+  }
+
+  @Test
+  public void testMetricsSumEstimate()
+  {
+    cannotVectorize();
+
+    final String sql = "SELECT\n"
+                   + "  dim1,\n"
+                   + "  SUM(cnt),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(tuplesketch_dim2)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1)),\n"
+                   + "  ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(ARRAY_OF_DOUBLES_SKETCH(dim2, m1, 256))\n"

Review Comment:
   I agree they may be a bit cumbersome to type, but the right tradeoff between explicit-ness and terseness can be difficult to find.
   N.B. The same could be said of some of the existing Theta Sketch SQL functions. e.g.
   - `theta_sketch_estimate_with_error_bounds`
   - `APPROX_COUNT_DISTINCT_DS_THETA`
   https://github.com/apache/druid/blob/master/extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaSketchSqlAggregatorTest.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134720187


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchMergeAggregator.java:
##########
@@ -58,12 +58,20 @@ public ArrayOfDoublesSketchMergeAggregator(
   @Override
   public void aggregate()
   {
-    final ArrayOfDoublesSketch update = selector.getObject();
+    final Object update = selector.getObject();
     if (update == null) {
       return;
     }
+    final ArrayOfDoublesSketch sketch;
+    if (update instanceof ArrayOfDoublesSketch) {

Review Comment:
   Ah, I didn't know that `COMPLEX_DECODE_BASE64` function even existed! Is it documented? (I only seem to find references to it in some unit tests)
   
   In any case, the changes to `ArrayOfDoublesSketchMergeAggregator` are no longer required after I updated the unit test to use `COMPLEX_DECODE_BASE64`:
     https://github.com/apache/druid/pull/13887/commits/2a4b4715ed8e8bf5f8b21617412437551435d95c
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134731239


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion
+{
+  public ArrayOfDoublesSketchSetBaseOperatorConversion()
+  {
+  }
+
+  @Override
+  public SqlOperator calciteOperator()
+  {
+    return makeSqlFunction();
+  }
+
+  @Nullable
+  @Override
+  public DruidExpression toDruidExpression(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode
+  )
+  {
+    plannerContext.setPlanningError("%s can only be used on aggregates. " +
+        "It cannot be used directly on a column or on a scalar expression.", getFunctionName());
+    return null;
+  }
+
+  @Nullable
+  @Override
+  public PostAggregator toPostAggregator(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode,
+      PostAggregatorVisitor postAggregatorVisitor
+  )
+  {
+    final List<RexNode> operands = ((RexCall) rexNode).getOperands();
+    final List<PostAggregator> inputPostAggs = new ArrayList<>();
+    Integer nominalEntries = null;
+    Integer numberOfvalues = null;
+
+    for (int i = 0; i < operands.size(); i++) {
+      RexNode operand = operands.get(i);
+      if (i == 0 && operand.isA(SqlKind.LITERAL) && SqlTypeFamily.INTEGER.contains(operand.getType())) {
+        nominalEntries = RexLiteral.intValue(operand);

Review Comment:
   I had tried composing it with `or` and `sequence` but it didn't seem possible to support the required function signatures using them. (IIRC, `sequence` and `variadic` were not composable because `sequence` only can apply to `SqlSingleOperandTypeChecker` instances)
   
   I will take a look at `DoublesSketchListArgOperandTypeChecker` as an example. However, note that even the existing Theta sketch SQL function doesn't seem to enforce it more strictly: https://github.com/apache/druid/blob/4493275d88d8a73ed1766ee07954d95c55dda862/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaSketchSetBaseOperatorConversion.java#L120



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134660897


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchMergeAggregator.java:
##########
@@ -58,12 +58,20 @@ public ArrayOfDoublesSketchMergeAggregator(
   @Override
   public void aggregate()
   {
-    final ArrayOfDoublesSketch update = selector.getObject();
+    final Object update = selector.getObject();
     if (update == null) {
       return;
     }
+    final ArrayOfDoublesSketch sketch;
+    if (update instanceof ArrayOfDoublesSketch) {

Review Comment:
   I needed to make that code change in order to handle constructing the sketch from a base64-encoded String.
   e.g. https://github.com/frankgrimes97/druid/blob/185b6f6006ecc1fb8a178f6949eb0a3f7f15ebe0/extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java#L253-L256
   Is there a cleaner way to accomplish that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134695627


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion
+{
+  public ArrayOfDoublesSketchSetBaseOperatorConversion()
+  {
+  }
+
+  @Override
+  public SqlOperator calciteOperator()
+  {
+    return makeSqlFunction();
+  }
+
+  @Nullable
+  @Override
+  public DruidExpression toDruidExpression(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode
+  )
+  {
+    plannerContext.setPlanningError("%s can only be used on aggregates. " +
+        "It cannot be used directly on a column or on a scalar expression.", getFunctionName());
+    return null;
+  }
+
+  @Nullable
+  @Override
+  public PostAggregator toPostAggregator(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode,
+      PostAggregatorVisitor postAggregatorVisitor
+  )
+  {
+    final List<RexNode> operands = ((RexCall) rexNode).getOperands();
+    final List<PostAggregator> inputPostAggs = new ArrayList<>();
+    Integer nominalEntries = null;
+    Integer numberOfvalues = null;
+
+    for (int i = 0; i < operands.size(); i++) {
+      RexNode operand = operands.get(i);
+      if (i == 0 && operand.isA(SqlKind.LITERAL) && SqlTypeFamily.INTEGER.contains(operand.getType())) {
+        nominalEntries = RexLiteral.intValue(operand);

Review Comment:
   Currently the OperandTypeChecker for this class is `OperandTypes.variadic(SqlOperandCountRanges.from(2))`. It would accept any call with at least two arguments. If the stuff in `OperandTypes` isn't enough to 
   
   There's a couple of ways to do an OperandTypeChecker that has better checking:
   
   1. You can combine various `OperandTypes` using `or`, `and`, `sequence`, etc.
   2. You can make your own implementation of `SqlOperandTypeChecker`. For example, check out the `DoublesSketchListArgOperandTypeChecker` we have for `DS_GET_QUANTILES`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1140641414


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.util.Optionality;
+import org.apache.datasketches.Util;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class ArrayOfDoublesSketchSqlAggregator implements SqlAggregator
+{
+
+  private static final SqlAggFunction FUNCTION_INSTANCE = new ArrayOfDoublesSqlAggFunction();
+  private static final String NAME = "ARRAY_OF_DOUBLES_SKETCH";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      VirtualColumnRegistry virtualColumnRegistry,
+      RexBuilder rexBuilder,
+      String name,
+      AggregateCall aggregateCall,
+      Project project,
+      List<Aggregation> existingAggregations,
+      boolean finalizeAggregations
+  )
+  {
+
+    final List<Integer> argList = aggregateCall.getArgList();
+
+    // check last argument for nomimalEntries
+    final int nominalEntries;
+    final int metricExpressionEndIndex;
+    final int lastArgIndex = argList.size() - 1;
+    final RexNode potentialNominalEntriesArg = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        argList.get(lastArgIndex)
+    );
+
+    if (potentialNominalEntriesArg.isA(SqlKind.LITERAL) &&
+        RexLiteral.value(potentialNominalEntriesArg) instanceof Number) {
+
+      nominalEntries = ((Number) RexLiteral.value(potentialNominalEntriesArg)).intValue();
+      metricExpressionEndIndex = lastArgIndex - 1;
+    } else {
+      nominalEntries = Util.DEFAULT_NOMINAL_ENTRIES;
+      metricExpressionEndIndex = lastArgIndex;
+    }
+
+    final List<String> fieldNames = new ArrayList<>();
+    for (int i = 0; i <= metricExpressionEndIndex; i++) {
+      final String fieldName;
+
+      final RexNode columnRexNode = Expressions.fromFieldAccess(
+          rowSignature,
+          project,
+          argList.get(i)
+      );

Review Comment:
   Ok, thanks! I'll merge from upstream master so that I can address the CI-reported deprecation warnings. I assume the CI checks which are complaining about this run on the merged branch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142522184


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   Would the following SQL function names be preferable?
   - `ARRAY_OF_DOUBLES_SKETCH  -> AODS`
   - `ARRAY_OF_DOUBLES_SKETCH_INTERSECT  -> AODS_INTERSECT`
   - `ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE -> AODS_METRIC_SUM_ESTIMATE`
   - `ARRAY_OF_DOUBLES_SKETCH_NOT -> AODS_NOT`
   - `ARRAY_OF_DOUBLES_SKETCH_UNION  -> AODS_UNION`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142476374


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT

Review Comment:
   The `NOT` is done against a hash value (the keys in this case, similar to Theta Sketch) and will just preserve both the key and its associated Summary Object (in this case, an array of double values)
   Both `UNION` and `INTERSECT` add the values in the associated Summary Objects (e.g. `a[0] + b[0]` if the array has `length == 1`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on pull request #13887: Tuple sketch SQL support

Posted by "abhishekagarwal87 (via GitHub)" <gi...@apache.org>.
abhishekagarwal87 commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1486870937

   Indeed. I am merging this PR since failures are indeed unrelated. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on PR #13887:
URL: https://github.com/apache/druid/pull/13887#issuecomment-1486824613

   > LGTM after CI passes.
   @gianm The failing checks seem unrelated to my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142883502


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,42 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_INTERSECT
+
+`ARRAY_OF_DOUBLES_SKETCH_INTERSECT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_NOT
+
+`ARRAY_OF_DOUBLES_SKETCH_NOT(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_UNION
+
+`ARRAY_OF_DOUBLES_SKETCH_UNION(expr, ..., [nominalEntries])`
+
+**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   As to naming— maybe `TUPLE_DOUBLES_SKETCH`, `TUPLE_DOUBLES_INTERSECT`? IMO the `ARRAY_OF` doesn't add much, so we can skip it, and `TUPLE` _does_ add something: the docs call them "tuple sketches" extensively. I'm OK with the originals too. `AODS`, `AODS_INTERSECT` to my eye seem _too_ terse.
   
   As to the bigger potential changes— IMO, the decomposed versions proposed in this patch are a good place to start. Tuple sketches, like theta sketches, are likely to be combined in different ways by different users. They can be intersected, not'ed, and unioned in various ways, and then converted to estimates once the set ops are done. Since there isn't a canonical thing that we can assume people will want to do, it seems like the composable functions are necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1142455315


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.

Review Comment:
   I believe all the sketches in the `druid-datasketches` module are currently described in more detail at the linked page: https://github.com/apache/druid/blob/master/docs/development/extensions-core/datasketches-extension.md
   
   That page also already links to the [Apache DataSketches](https://datasketches.apache.org/) project page which has its own extensive documentation. Is that not sufficient?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] frankgrimes97 commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "frankgrimes97 (via GitHub)" <gi...@apache.org>.
frankgrimes97 commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1144757898


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,16 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`DS_TUPLE_DOUBLES(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches which contain an array of double values as their Summary Objects. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.

Review Comment:
   I felt adding that information to the descriptions would help clarify things for those familiar with Tuple Data Sketches.
   
   FYI, Summary Objects are explained in the Data Sketches documentation: https://datasketches.apache.org/docs/Tuple/TupleOverview.html
   
   ArrayOfDoublesSketch is a specialized Tuple Sketch whose Summary Object is an array of double values.
   Not every Tuple Sketch is a ArrayOfDoublesSketch... different specialized implementations may be added to the Apache Druid codebase in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1127088445


##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,22 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`
+
+Creates a Tuple sketch

Review Comment:
   ```suggestion
   Creates a Tuple sketch.
   ```



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,22 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`
+
+Creates a Tuple sketch
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE
+
+`ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(expr)`
+
+Computes approximate sums of the values contained within a Tuple sketch

Review Comment:
   ```suggestion
   Computes approximate sums of the values contained within a Tuple sketch.
   
   **Function type:** [Aggregation](sql-aggregations.md)
   ```



##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,17 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.
+|`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the dimension value of `dimensionColumnExpr` and the metric values contained in the list of `metricColumnExpr` columns. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.

Review Comment:
   Something that isn't clear to me based on this notation is how to provide multiple metrics columns. For example, if you use this format,
   ```
   ARRAY_OF_DOUBLES_SKETCH(dimension1, metric1, metric2, metric3)
   ```
   
   How do you know if `metric3` is a metric column, or if it is the optional parameter for nominal entries?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134697125


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchMergeAggregator.java:
##########
@@ -58,12 +58,20 @@ public ArrayOfDoublesSketchMergeAggregator(
   @Override
   public void aggregate()
   {
-    final ArrayOfDoublesSketch update = selector.getObject();
+    final Object update = selector.getObject();
     if (update == null) {
       return;
     }
+    final ArrayOfDoublesSketch sketch;
+    if (update instanceof ArrayOfDoublesSketch) {

Review Comment:
   Ah, in this case we have a special function: `complex_decode_base64('type name', 'base64 bytes')`. I think the type name for this case is `arrayOfDoublesSketch`.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchMergeAggregator.java:
##########
@@ -58,12 +58,20 @@ public ArrayOfDoublesSketchMergeAggregator(
   @Override
   public void aggregate()
   {
-    final ArrayOfDoublesSketch update = selector.getObject();
+    final Object update = selector.getObject();
     if (update == null) {
       return;
     }
+    final ArrayOfDoublesSketch sketch;
+    if (update instanceof ArrayOfDoublesSketch) {

Review Comment:
   Ah, in this case we have a special SQL function: `complex_decode_base64('type name', 'base64 bytes')`. I think the type name for this case is `arrayOfDoublesSketch`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org