You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "gianm (via GitHub)" <gi...@apache.org> on 2023/03/13 21:00:33 UTC

[GitHub] [druid] gianm commented on a diff in pull request #13887: Tuple sketch SQL support

gianm commented on code in PR #13887:
URL: https://github.com/apache/druid/pull/13887#discussion_r1134597082


##########
docs/querying/sql-aggregations.md:
##########
@@ -139,6 +139,17 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex
 |`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of `expr`, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation.<br/><br/>See the [known issue](sql-translation.md#approximations) with this function.|`'0'` (STRING)|
 
 
+### Tuple sketch functions
+
+Load the [DataSketches extension](../development/extensions-core/datasketches-extension.md) to use the following functions.
+
+|Function|Notes|Default|
+|--------|-----|-------|
+|`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation.
+|`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the dimension value of `dimensionColumnExpr` and the metric values contained in one or more `metricColumnExpr` columns. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md).
+|`ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE(expr)`|Computes approximate sums of the values contained within a [Tuple sketch](../development/extensions-core/datasketches-tuple.md#estimated-metrics-values-for-each-column-of-arrayofdoublessketch) column.

Review Comment:
   Since `ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE` is a regular function (as opposed to an aggregation function) it should go in `querying/sql-scalar.md`.



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,24 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH
+
+`ARRAY_OF_DOUBLES_SKETCH(expr, [nominalEntries])`
+
+`ARRAY_OF_DOUBLES_SKETCH(dimensionColumnExpr, metricColumnExpr..., [nominalEntries])`
+
+**Function type:** [Aggregation](sql-aggregations.md)
+
+Creates a Tuple sketch.
+
+## ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE

Review Comment:
   This one needs to go in `.spelling` too.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchMergeAggregator.java:
##########
@@ -58,12 +58,20 @@ public ArrayOfDoublesSketchMergeAggregator(
   @Override
   public void aggregate()
   {
-    final ArrayOfDoublesSketch update = selector.getObject();
+    final Object update = selector.getObject();
     if (update == null) {
       return;
     }
+    final ArrayOfDoublesSketch sketch;
+    if (update instanceof ArrayOfDoublesSketch) {

Review Comment:
   What was this change needed for? (Asking since I'm wondering if we're missing a call to `factory.deserialize` somewhere; it's doing something similar.)



##########
docs/querying/sql-functions.md:
##########
@@ -136,6 +136,24 @@ Returns an array of all values of the specified expression.
 
 Concatenates array inputs into a single array.
 
+## ARRAY_OF_DOUBLES_SKETCH

Review Comment:
   This needs to go in the `.spelling` file. Search for `ARRAY_CONCAT_AGG`, it would be right after that one.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion

Review Comment:
   Looks like this family of functions isn't documented in this patch; is there a reason for that? If so, it'd be good to write it down in a class-level javadoc, so other developers are aware.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.tuple.sql;
+
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchSetOpPostAggregator;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.OperatorConversions;
+import org.apache.druid.sql.calcite.expression.PostAggregatorVisitor;
+import org.apache.druid.sql.calcite.expression.SqlOperatorConversion;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class ArrayOfDoublesSketchSetBaseOperatorConversion implements SqlOperatorConversion
+{
+  public ArrayOfDoublesSketchSetBaseOperatorConversion()
+  {
+  }
+
+  @Override
+  public SqlOperator calciteOperator()
+  {
+    return makeSqlFunction();
+  }
+
+  @Nullable
+  @Override
+  public DruidExpression toDruidExpression(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode
+  )
+  {
+    plannerContext.setPlanningError("%s can only be used on aggregates. " +
+        "It cannot be used directly on a column or on a scalar expression.", getFunctionName());
+    return null;
+  }
+
+  @Nullable
+  @Override
+  public PostAggregator toPostAggregator(
+      PlannerContext plannerContext,
+      RowSignature rowSignature,
+      RexNode rexNode,
+      PostAggregatorVisitor postAggregatorVisitor
+  )
+  {
+    final List<RexNode> operands = ((RexCall) rexNode).getOperands();
+    final List<PostAggregator> inputPostAggs = new ArrayList<>();
+    Integer nominalEntries = null;
+    Integer numberOfvalues = null;
+
+    for (int i = 0; i < operands.size(); i++) {
+      RexNode operand = operands.get(i);
+      if (i == 0 && operand.isA(SqlKind.LITERAL) && SqlTypeFamily.INTEGER.contains(operand.getType())) {
+        nominalEntries = RexLiteral.intValue(operand);

Review Comment:
   These requirements could be better-plugged into the OperandTypeChecker, and if so, users would get better messages when using the function incorrectly. I'm willing to leave this for future work, though, if you aren't up to doing it in this patch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org