You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "somu-imply (via GitHub)" <gi...@apache.org> on 2023/05/19 00:28:41 UTC

[GitHub] [druid] somu-imply opened a new pull request, #14312: Hll Sketch estimate can now be used as an expression

somu-imply opened a new pull request, #14312:
URL: https://github.com/apache/druid/pull/14312

   
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #14312: Hll Sketch estimate can now be used as an expression

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #14312:
URL: https://github.com/apache/druid/pull/14312#issuecomment-1557637066

   Addressed most of the review comments.
   1. Separated the classes and using references
   2. Using lower case at all places now
   3. Added getOutputType
   4. Added new test case for rounding
   5. Added COUNT(*) to make test more topN like
   6. Removed unnecessary comment
   7. Updated error message and negative test case for HLL and Theta sketches


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1215123370


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaPostAggMacros.java:
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.theta.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.theta.SketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+public class ThetaPostAggMacros
+{
+  public static final String THETA_SKETCH_ESTIMATE = "theta_sketch_estimate";
+
+  public static class ThetaSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckArgumentCount(args, 1);
+      return new ThetaSketchEstimateExpr(args.get(0));

Review Comment:
   After discussion we decided to leave PostAggs as is and make the expressions similar only



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1208798793


##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchSqlAggregatorTest.java:
##########
@@ -888,4 +910,193 @@ public void testGroupByAggregatorDefaultValuesFinalizeOuterSketches()
         ImmutableList.of(new Object[]{"a", 0L, "0"})
     );
   }
+
+  @Test
+  public void testHllEstimateAsVirtualColumn()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\")",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithRound()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim3, FALSE), HLL_SKETCH_ESTIMATE(hllsketch_dim3, TRUE)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim3\",0)",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ), new ExpressionVirtualColumn(
+                    "v1",
+                    "hll_sketch_estimate(\"hllsketch_dim3\",1)",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0", "v1")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{2.000000004967054D, 2.0D},
+            new Object[]{2.000000004967054D, 2.0D},
+            new Object[]{1.0D, 1.0D},
+            new Object[]{0.0D, 0.0D},
+            new Object[]{0.0D, 0.0D},
+            new Object[]{0.0D, 0.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnOnNonHllCol()
+  {
+    try {
+      testQuery(
+          "SELECT"
+          + " HLL_SKETCH_ESTIMATE(dim2)"
+          + " FROM druid.foo",
+          ImmutableList.of(
+              newScanQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE1)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .virtualColumns(new ExpressionVirtualColumn(
+                      "v0",
+                      "hll_sketch_estimate(\"dim2\")",
+                      ColumnType.DOUBLE,
+                      MACRO_TABLE
+                  ))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .columns("v0")
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .build()
+          ),
+          ImmutableList.of()
+      );
+    }
+    catch (IllegalArgumentException e) {
+      Assert.assertTrue(
+          e.getMessage().contains("Input byte[] should at least have 2 bytes for base64 bytes")
+      );
+    }
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithGroupByOrderBy()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1), count(*)"
+        + " FROM druid.foo"
+        + " GROUP BY 1"
+        + " ORDER BY 2 DESC",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(new ExpressionVirtualColumn(
+                            "v0",
+                            "hll_sketch_estimate(\"hllsketch_dim1\")",
+                            ColumnType.DOUBLE,
+                            MACRO_TABLE
+                        ))
+                        .setDimensions(
+                            new DefaultDimensionSpec("v0", "d0", ColumnType.DOUBLE))
+                        .setAggregatorSpecs(
+                            aggregators(
+                                new CountAggregatorFactory("a0")
+                            )
+                        )
+                        .setLimitSpec(
+                            DefaultLimitSpec
+                                .builder()
+                                .orderBy(
+                                    new OrderByColumnSpec(
+                                        "a0",
+                                        OrderByColumnSpec.Direction.DESCENDING,
+                                        StringComparators.NUMERIC
+                                    )
+                                )
+                                .build()
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{1.0D, 5L},
+            new Object[]{0.0D, 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithTopN()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1), COUNT(*)"
+        + " FROM druid.foo"
+        + " GROUP BY 1 ORDER BY 1"

Review Comment:
   ORDER BY 2?



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaPostAggMacros.java:
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.theta.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.theta.SketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+public class ThetaPostAggMacros
+{
+  public static final String THETA_SKETCH_ESTIMATE = "theta_sketch_estimate";
+
+  public static class ThetaSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckArgumentCount(args, 1);
+      return new ThetaSketchEstimateExpr(args.get(0));

Review Comment:
   Probably nice to add the "round" option while we are at it, makes it consistent with the HLL version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1220442510


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  public static final String HLL_SKETCH_ESTIMATE = "hll_sketch_estimate";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckAnyOfArgumentCount(args, 1, 2);
+      return new HllSketchEstimateExpr(args);
+    }
+
+    @Override
+    public String name()
+    {
+      return HLL_SKETCH_ESTIMATE;
+    }
+  }
+
+  public static class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+  {
+    private Expr estimateExpr;
+    private Expr isRound;
+
+    public HllSketchEstimateExpr(List<Expr> args)
+    {
+      super(HLL_SKETCH_ESTIMATE, args);
+      this.estimateExpr = args.get(0);
+      if (args.size() == 2) {
+        isRound = args.get(1);
+      }
+    }
+
+    @Nullable
+    @Override
+    public ExpressionType getOutputType(InputBindingInspector inspector)
+    {
+      return ExpressionType.DOUBLE;
+    }
+
+    @Override
+    public ExprEval eval(ObjectBinding bindings)
+    {
+      boolean round = false;
+      ExprEval eval = estimateExpr.eval(bindings);
+      if (isRound != null) {

Review Comment:
   Technically, if someone wanted the determination of rounding to be passed in by a different column, the code as defined here would be correct no?  I guess the question is whether we believe that the argument is a literal and therefore can be extracted once and reused or if we believe that it can be a reference.  
   
   Do we have a good way of identifying that for expressions?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #14312: Hll Sketch estimate can now be used as an expression

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1198750759


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchEstimateOperatorConversion.java:
##########
@@ -43,7 +43,7 @@
 
 public class HllSketchEstimateOperatorConversion implements SqlOperatorConversion
 {
-  private static final String FUNCTION_NAME = "HLL_SKETCH_ESTIMATE";
+  private static final String FUNCTION_NAME = "hll_sketch_estimate";

Review Comment:
   btw, this can just extend `DirectOperatorConversion` and lowercase the function name (or not if we modify it to not be case sensitive for macros, though we should be thoughtful before making such a change)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #14312: Hll Sketch estimate can now be used as an expression

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1198766208


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding
+  public static final String HLL_SKETCH_ESTIMATE = "HLL_SKETCH_ESTIMATE";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+      {
+
+        public HllSketchEstimateExpr(List<Expr> args)
+        {
+          super(HLL_SKETCH_ESTIMATE, args);
+        }
+
+        @Override
+        public ExprEval eval(ObjectBinding bindings)
+        {
+          boolean round = false;
+          Expr arg = args.get(0);
+          ExprEval eval = arg.eval(bindings);
+          if (args.size() == 2) {
+            round = args.get(1).eval(bindings).asBoolean();
+          }

Review Comment:
   i think the issue is not having separate classes to use based on the number of arguments `apply` gets, splitting these up i think should resolve most of these sorts of issues



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1218989574


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  public static final String HLL_SKETCH_ESTIMATE = "hll_sketch_estimate";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckAnyOfArgumentCount(args, 1, 2);
+      return new HllSketchEstimateExpr(args);
+    }
+
+    @Override
+    public String name()
+    {
+      return HLL_SKETCH_ESTIMATE;
+    }
+  }
+
+  public static class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+  {
+    private Expr estimateExpr;
+    private Expr isRound;
+
+    public HllSketchEstimateExpr(List<Expr> args)
+    {
+      super(HLL_SKETCH_ESTIMATE, args);
+      this.estimateExpr = args.get(0);
+      if (args.size() == 2) {
+        isRound = args.get(1);
+      }
+    }
+
+    @Nullable
+    @Override
+    public ExpressionType getOutputType(InputBindingInspector inspector)
+    {
+      return ExpressionType.DOUBLE;
+    }
+
+    @Override
+    public ExprEval eval(ObjectBinding bindings)
+    {
+      boolean round = false;
+      ExprEval eval = estimateExpr.eval(bindings);
+      if (isRound != null) {

Review Comment:
   this should be split into two classes, one for estimate and one for rounded estimate. otherwise we have to execute this if statement every row for something that is constant for the entire query



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaPostAggMacros.java:
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.theta.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.theta.SketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+public class ThetaPostAggMacros
+{
+  public static final String THETA_SKETCH_ESTIMATE = "theta_sketch_estimate";
+
+  public static class ThetaSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckArgumentCount(args, 1);
+      return new ThetaSketchEstimateExpr(args.get(0));

Review Comment:
   i suppose could fall back to the expression post aggregator and use the rounding expression



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #14312: Hll Sketch estimate can now be used as an expression

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1198744574


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchEstimateOperatorConversion.java:
##########
@@ -43,7 +43,7 @@
 
 public class HllSketchEstimateOperatorConversion implements SqlOperatorConversion
 {
-  private static final String FUNCTION_NAME = "HLL_SKETCH_ESTIMATE";
+  private static final String FUNCTION_NAME = "hll_sketch_estimate";

Review Comment:
   it isn't case sensitive here, [we lowercase them all prior to adding to the map](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/math/expr/ExprMacroTable.java#L56), however we do not lower-case them while looking them up when parsing the expression, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/math/expr/ExprListenerImpl.java#L327 for macros, though this might technically be a bug because the built-in functions we do attempt to lowercase them first when looking up https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/math/expr/Parser.java#L90



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1217504030


##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaSketchSqlAggregatorTest.java:
##########
@@ -1073,6 +1081,38 @@ public void testThetaSketchEstimateAsVirtualColumn()
     );
   }
 
+  @Test
+  public void testThetaSketchEstimateAsVirtualColumn2Args()
+  {
+    testQuery(
+        "SELECT"
+        + " THETA_SKETCH_ESTIMATE(thetasketch_dim3, FALSE)"
+        + " FROM foo",

Review Comment:
   Thanks, I am using a bigger dataset for the `true` case will update shortly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cheddar merged pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "cheddar (via GitHub)" <gi...@apache.org>.
cheddar merged PR #14312:
URL: https://github.com/apache/druid/pull/14312


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #14312:
URL: https://github.com/apache/druid/pull/14312#issuecomment-1577981494

   After discussing with @cheddar I have reverted the change that made the syntax of HLL_SKETCH_ESTIMATE and THETA_SKETCH_ESTIMATE the same. Making that required removing a context parameter `sqlFinalizeOuterSketches` which was introduced for backward compatibility. Also keeping the expression and post agg different and making the post agg null for Theta Sketch Operator conversion, introduced issues with expressions over post aggs with theta_sketch_intersect and theta_sketch_estimate. Those changes are a lot to block this PR and those should be handled separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1212485041


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaPostAggMacros.java:
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.theta.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.theta.SketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+public class ThetaPostAggMacros
+{
+  public static final String THETA_SKETCH_ESTIMATE = "theta_sketch_estimate";
+
+  public static class ThetaSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckArgumentCount(args, 1);
+      return new ThetaSketchEstimateExpr(args.get(0));

Review Comment:
   There are 2 options for this:
   1. We only add the round to the expression and do not add it to the aggregator factory. Whatever result is returned by the aggregator factory is rounded (or not) and shown in the output. This makes the expressions consistent between Theta and HLL but not the post aggs. 
   2. We push the round as a parameter to the SketchMergeAggregatorFactory, compute it in the `finalizeComputation` method, and return the value as is. This makes the expression and the post aggs both consistent between Theta and HLL version.
   
   1 is the easier way and does not affect many test cases, but 2, I believe is the better way. @clintropolis , @imply-cheddar what do you suggest?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1212485041


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaPostAggMacros.java:
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.theta.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.theta.SketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+public class ThetaPostAggMacros
+{
+  public static final String THETA_SKETCH_ESTIMATE = "theta_sketch_estimate";
+
+  public static class ThetaSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckArgumentCount(args, 1);
+      return new ThetaSketchEstimateExpr(args.get(0));

Review Comment:
   There are 2 options for this:
   1. We only add the round to the expression and do not add it to the aggregator factory. Whatever result is returned by the aggregator factory is rounded (or not) and shown in the output.
   2. We push the round as a parameter to the SketchMergeAggregatorFactory, compute it in the `finalizeComputation` method, and return the value as is.
   
   1 is the easier way and does not affect many test cases, but 2, I believe is the better way. @imply-cheddar what do you suggest?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1217489296


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaPostAggMacros.java:
##########
@@ -38,8 +39,9 @@ public static class ThetaSketchEstimateExprMacro implements ExprMacroTable.ExprM
     @Override
     public Expr apply(List<Expr> args)
     {
-      validationHelperCheckArgumentCount(args, 1);
-      return new ThetaSketchEstimateExpr(args.get(0));
+      validationHelperCheckAnyOfArgumentCount(args, 1, 2);
+      //validationHelperCheckArgumentCount(args, 1);

Review Comment:
   commented out code?  Is this needed?



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaSketchSqlAggregatorTest.java:
##########
@@ -1073,6 +1081,38 @@ public void testThetaSketchEstimateAsVirtualColumn()
     );
   }
 
+  @Test
+  public void testThetaSketchEstimateAsVirtualColumn2Args()
+  {
+    testQuery(
+        "SELECT"
+        + " THETA_SKETCH_ESTIMATE(thetasketch_dim3, FALSE)"
+        + " FROM foo",

Review Comment:
   I don't expect this table to actually have enough items to generate numbers that would need rounding.  You should be able to access the wikipedia data set in the tests, which you should be able to get to estimate things.
   
   Note that the smallest logk for Theta sketch allowed is 4, which is 16 entries.  So, you would have to do a theta sketch aggregation with a size of 16 and then see more than 16 distinct values in order to get it to estimate (and generate a true floating point number)



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaPostAggMacros.java:
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.theta.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.theta.SketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+public class ThetaPostAggMacros
+{
+  public static final String THETA_SKETCH_ESTIMATE = "theta_sketch_estimate";
+
+  public static class ThetaSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckArgumentCount(args, 1);
+      return new ThetaSketchEstimateExpr(args.get(0));

Review Comment:
   Hrm, I'm okay with only implementing round in the expression, but in that case, because the expression and the PostAggregator are not consistent, you will need to stop planning the post aggregator from SQL.  I'm like 90% certain that the SQL planner is fine using an expression anywhere a post agg could do, it just "prefers" the PostAgg in certain circumstances.  So, it should be acceptable to change the SqlOperatorConversion to always return `null` from the call to build the PostAggregator and things should continue to just work.
   
   Note, that the PostAggregator cannot be removed for compatibility reasons as it could be used from the native query APIs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #14312: Hll Sketch estimate can now be used as an expression

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1198737541


##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchSqlAggregatorTest.java:
##########
@@ -888,4 +902,187 @@ public void testGroupByAggregatorDefaultValuesFinalizeOuterSketches()
         ImmutableList.of(new Object[]{"a", 0L, "0"})
     );
   }
+
+  @Test
+  public void testHllEstimateAsVirtualColumn()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\")",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithRound()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1, TRUE)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\",1)",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnOnNonHllCol()
+  {
+    try {
+      testQuery(
+          "SELECT"
+          + " HLL_SKETCH_ESTIMATE(dim2)"
+          + " FROM druid.foo",
+          ImmutableList.of(
+              newScanQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE1)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .virtualColumns(new ExpressionVirtualColumn(
+                      "v0",
+                      "hll_sketch_estimate(\"dim2\")",
+                      ColumnType.DOUBLE,
+                      MACRO_TABLE
+                  ))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .columns("v0")
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .build()
+          ),
+          ImmutableList.of()
+      );
+    }
+    catch (ExpressionValidationException e) {
+      Assert.assertTrue(
+          e.getMessage().contains("Function[HLL_SKETCH_ESTIMATE] requires a HllSketch as the argument")
+      );
+    }
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithGroupByOrderBy()
+  {
+    skipVectorize();
+    cannotVectorize();

Review Comment:
   you have to [implement](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/math/expr/Expr.java#L183) a [vector processor](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/math/expr/vector/ExprVectorProcessor.java#L29) for expression to be vectorizable
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #14312: Hll Sketch estimate can now be used as an expression

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1198763940


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding
+  public static final String HLL_SKETCH_ESTIMATE = "HLL_SKETCH_ESTIMATE";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+      {
+
+        public HllSketchEstimateExpr(List<Expr> args)
+        {
+          super(HLL_SKETCH_ESTIMATE, args);
+        }
+
+        @Override
+        public ExprEval eval(ObjectBinding bindings)
+        {
+          boolean round = false;
+          Expr arg = args.get(0);
+          ExprEval eval = arg.eval(bindings);
+          if (args.size() == 2) {
+            round = args.get(1).eval(bindings).asBoolean();
+          }
+
+          final Object valObj = eval.value();
+          if (valObj == null) {
+            return ExprEval.of(null);
+          }
+          if (eval.type().is(ExprType.COMPLEX) && valObj instanceof HllSketchHolder) {
+            HllSketchHolder h = HllSketchHolder.fromObj(valObj);
+            double estimate = h.getEstimate();
+            return round ? ExprEval.of(Math.round(estimate)) : ExprEval.of(estimate);
+          } else {
+            throw HLLSketchEstimateExprMacro.this.validationFailed("requires a HllSketch as the argument");
+          }
+        }
+

Review Comment:
   ideally should implement `getOutputType` so this can behave better with all the type inference stuff



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #14312: Hll Sketch estimate can now be used as an expression

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1198678103


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding

Review Comment:
   I think this comment was something you did for yourself?  Or is this intended for future readers of the code?



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchEstimateOperatorConversion.java:
##########
@@ -43,7 +43,7 @@
 
 public class HllSketchEstimateOperatorConversion implements SqlOperatorConversion
 {
-  private static final String FUNCTION_NAME = "HLL_SKETCH_ESTIMATE";
+  private static final String FUNCTION_NAME = "hll_sketch_estimate";

Review Comment:
   You made this lower case, but the ExprMacros still has it upper case, I'm actually not sure if the case matters, but why the inconsistency?  Did you explicitly do this because something was broken?



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding
+  public static final String HLL_SKETCH_ESTIMATE = "HLL_SKETCH_ESTIMATE";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+      {
+
+        public HllSketchEstimateExpr(List<Expr> args)
+        {
+          super(HLL_SKETCH_ESTIMATE, args);
+        }
+
+        @Override
+        public ExprEval eval(ObjectBinding bindings)
+        {
+          boolean round = false;
+          Expr arg = args.get(0);
+          ExprEval eval = arg.eval(bindings);
+          if (args.size() == 2) {
+            round = args.get(1).eval(bindings).asBoolean();
+          }
+
+          final Object valObj = eval.value();
+          if (valObj == null) {
+            return ExprEval.of(null);
+          }
+          if (eval.type().is(ExprType.COMPLEX) && valObj instanceof HllSketchHolder) {
+            HllSketchHolder h = HllSketchHolder.fromObj(valObj);

Review Comment:
   We already did an `instanceof` check for `HllSketchHolder` can't we just cast it?  Or, you could alternatively just pass in the `eval.value()` (assuming it's non-null) and just assume it's a type that this can deal with.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding
+  public static final String HLL_SKETCH_ESTIMATE = "HLL_SKETCH_ESTIMATE";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+      {
+
+        public HllSketchEstimateExpr(List<Expr> args)
+        {
+          super(HLL_SKETCH_ESTIMATE, args);
+        }
+
+        @Override
+        public ExprEval eval(ObjectBinding bindings)
+        {
+          boolean round = false;
+          Expr arg = args.get(0);
+          ExprEval eval = arg.eval(bindings);
+          if (args.size() == 2) {
+            round = args.get(1).eval(bindings).asBoolean();
+          }
+
+          final Object valObj = eval.value();
+          if (valObj == null) {
+            return ExprEval.of(null);
+          }
+          if (eval.type().is(ExprType.COMPLEX) && valObj instanceof HllSketchHolder) {
+            HllSketchHolder h = HllSketchHolder.fromObj(valObj);
+            double estimate = h.getEstimate();
+            return round ? ExprEval.of(Math.round(estimate)) : ExprEval.of(estimate);
+          } else {
+            throw HLLSketchEstimateExprMacro.this.validationFailed("requires a HllSketch as the argument");

Review Comment:
   `validationFailed` is not correct.  This is running in `eval()` which is actually operating on top of the data.  If you are trying to do a validation, that should in the `ExprMacro`'s `apply()` method.  Once we are here, we better be dealing with an object type that we can actually do something with.  I think the best implementation here is to just check if the `valObj` was null and if it is not, pass it into `HllSketchHolder.fromObj()` and just hope for the best.



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchSqlAggregatorTest.java:
##########
@@ -888,4 +902,187 @@ public void testGroupByAggregatorDefaultValuesFinalizeOuterSketches()
         ImmutableList.of(new Object[]{"a", 0L, "0"})
     );
   }
+
+  @Test
+  public void testHllEstimateAsVirtualColumn()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\")",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithRound()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1, TRUE)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\",1)",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnOnNonHllCol()
+  {
+    try {
+      testQuery(
+          "SELECT"
+          + " HLL_SKETCH_ESTIMATE(dim2)"
+          + " FROM druid.foo",
+          ImmutableList.of(
+              newScanQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE1)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .virtualColumns(new ExpressionVirtualColumn(
+                      "v0",
+                      "hll_sketch_estimate(\"dim2\")",
+                      ColumnType.DOUBLE,
+                      MACRO_TABLE
+                  ))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .columns("v0")
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .build()
+          ),
+          ImmutableList.of()
+      );
+    }
+    catch (ExpressionValidationException e) {
+      Assert.assertTrue(
+          e.getMessage().contains("Function[HLL_SKETCH_ESTIMATE] requires a HllSketch as the argument")
+      );
+    }
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithGroupByOrderBy()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1), count(*)"
+        + " FROM druid.foo"
+        + " GROUP BY 1"
+        + " ORDER BY 2 DESC",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(new ExpressionVirtualColumn(
+                            "v0",
+                            "hll_sketch_estimate(\"hllsketch_dim1\")",
+                            ColumnType.DOUBLE,
+                            MACRO_TABLE
+                        ))
+                        .setDimensions(
+                            new DefaultDimensionSpec("v0", "d0", ColumnType.DOUBLE))
+                        .setAggregatorSpecs(
+                            aggregators(
+                                new CountAggregatorFactory("a0")
+                            )
+                        )
+                        .setLimitSpec(
+                            DefaultLimitSpec
+                                .builder()
+                                .orderBy(
+                                    new OrderByColumnSpec(
+                                        "a0",
+                                        OrderByColumnSpec.Direction.DESCENDING,
+                                        StringComparators.NUMERIC
+                                    )
+                                )
+                                .build()
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{1.0D, 5L},
+            new Object[]{0.0D, 1L}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithTopN()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1)"
+        + " FROM druid.foo"
+        + " GROUP BY 1 ORDER BY 1"
+        + " LIMIT 2",

Review Comment:
   Maybe throw a `COUNT(*)` on this and order by that instead?  Just to make it seem more "topN" like.



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchSqlAggregatorTest.java:
##########
@@ -888,4 +902,187 @@ public void testGroupByAggregatorDefaultValuesFinalizeOuterSketches()
         ImmutableList.of(new Object[]{"a", 0L, "0"})
     );
   }
+
+  @Test
+  public void testHllEstimateAsVirtualColumn()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\")",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithRound()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1, TRUE)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\",1)",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnOnNonHllCol()
+  {
+    try {
+      testQuery(
+          "SELECT"
+          + " HLL_SKETCH_ESTIMATE(dim2)"
+          + " FROM druid.foo",
+          ImmutableList.of(
+              newScanQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE1)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .virtualColumns(new ExpressionVirtualColumn(
+                      "v0",
+                      "hll_sketch_estimate(\"dim2\")",
+                      ColumnType.DOUBLE,
+                      MACRO_TABLE
+                  ))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .columns("v0")
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .build()
+          ),
+          ImmutableList.of()
+      );
+    }
+    catch (ExpressionValidationException e) {
+      Assert.assertTrue(
+          e.getMessage().contains("Function[HLL_SKETCH_ESTIMATE] requires a HllSketch as the argument")
+      );
+    }

Review Comment:
   Once you adjust the expression as requested, I expect this test to change some.  If you do the "pass to HllSketchHolder and hope" route, I expect that you will get an IllegalArgumentException saying that the values of dim2 are not legal.  But it might be something else too.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding
+  public static final String HLL_SKETCH_ESTIMATE = "HLL_SKETCH_ESTIMATE";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+      {
+
+        public HllSketchEstimateExpr(List<Expr> args)
+        {
+          super(HLL_SKETCH_ESTIMATE, args);
+        }
+
+        @Override
+        public ExprEval eval(ObjectBinding bindings)
+        {
+          boolean round = false;
+          Expr arg = args.get(0);
+          ExprEval eval = arg.eval(bindings);
+          if (args.size() == 2) {
+            round = args.get(1).eval(bindings).asBoolean();
+          }

Review Comment:
   Also, no reason to check this on every single call.  `args` is gonna be the same for all of them.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding
+  public static final String HLL_SKETCH_ESTIMATE = "HLL_SKETCH_ESTIMATE";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+      {
+
+        public HllSketchEstimateExpr(List<Expr> args)
+        {
+          super(HLL_SKETCH_ESTIMATE, args);
+        }
+
+        @Override
+        public ExprEval eval(ObjectBinding bindings)
+        {
+          boolean round = false;
+          Expr arg = args.get(0);
+          ExprEval eval = arg.eval(bindings);
+          if (args.size() == 2) {
+            round = args.get(1).eval(bindings).asBoolean();
+          }
+
+          final Object valObj = eval.value();
+          if (valObj == null) {
+            return ExprEval.of(null);
+          }
+          if (eval.type().is(ExprType.COMPLEX) && valObj instanceof HllSketchHolder) {

Review Comment:
   Do you have a test that shows that `eval.type().is()` is necessary?  As long as it got an `HllSketchHolder`, I'm nto sure the type of teh `eval` really matters.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding
+  public static final String HLL_SKETCH_ESTIMATE = "HLL_SKETCH_ESTIMATE";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+      {
+
+        public HllSketchEstimateExpr(List<Expr> args)
+        {
+          super(HLL_SKETCH_ESTIMATE, args);
+        }
+
+        @Override
+        public ExprEval eval(ObjectBinding bindings)
+        {
+          boolean round = false;
+          Expr arg = args.get(0);

Review Comment:
   No need to `.get(0)` on every single call here.  Do it once and reuse the reference.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  // make same name as SQL binding
+  public static final String HLL_SKETCH_ESTIMATE = "HLL_SKETCH_ESTIMATE";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr

Review Comment:
   What is the importance of doing this as an inner class like this?  I don't know that it adds anything and I think it just makes things more confusing.  I realize you were perhaps just copying a pattern that was already in place.  I looked and it looks like I copied there pattern here too 
   
   https://github.com/apache/druid/pull/14195/files#diff-5c5c183e6c105fdbb9abfb3e63cb0e75e19b932c1767d1d4c8b7f6131aa3f7d5R52-R57
   
   But, like, the lines highlighted there really show why this pattern doesn't make sense.  I'm calling `args.get(0)` even though I've already gotten the arg out and am additionally passing it to the constructor.  So there's just a ton of referencing the same thing through all sorts of different means, which is confusing.  Let's not copy this pattern around anymore.  Either make this a static class and have it store the arg that it needs.  Or just return it as an anonymous class.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaPostAggMacros.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.theta.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.aggregation.datasketches.theta.SketchHolder;
+
+import java.util.List;
+
+public class ThetaPostAggMacros

Review Comment:
   I had various commentary in the HLL version.  Please make sure to apply it here as well.



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchSqlAggregatorTest.java:
##########
@@ -888,4 +902,187 @@ public void testGroupByAggregatorDefaultValuesFinalizeOuterSketches()
         ImmutableList.of(new Object[]{"a", 0L, "0"})
     );
   }
+
+  @Test
+  public void testHllEstimateAsVirtualColumn()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\")",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithRound()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1, TRUE)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\",1)",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}

Review Comment:
   I don't believe that any of the results actually changed once you turned on rounding...  It would be great to have this test run against some data set that actually makes a difference rounding and not rounding.
   
   For that matter, once you do that, you can use the same expression twice in the same query, once that does the default, once that rounds and once that doesn't round and then validate the results (and show that they are different)



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaSketchEstimateOperatorConversion.java:
##########
@@ -40,7 +40,7 @@
 
 public class ThetaSketchEstimateOperatorConversion implements SqlOperatorConversion
 {
-  private static final String FUNCTION_NAME = "THETA_SKETCH_ESTIMATE";
+  private static final String FUNCTION_NAME = "theta_sketch_estimate";

Review Comment:
   once again we have this casing change...  Is that important?



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/theta/sql/ThetaSketchSqlAggregatorTest.java:
##########
@@ -1018,8 +1034,160 @@ public void testGroupByAggregatorDefaultValuesFinalizeOuterSketches()
   @Test
   public void testThetaSketchIntersectOnScalarExpression()
   {
-    assertQueryIsUnplannable("SELECT THETA_SKETCH_INTERSECT(NULL, NULL) FROM foo",
+    assertQueryIsUnplannable(
+        "SELECT THETA_SKETCH_INTERSECT(NULL, NULL) FROM foo",
         "Possible error: THETA_SKETCH_INTERSECT can only be used on aggregates. " +
-            "It cannot be used directly on a column or on a scalar expression.");
+        "It cannot be used directly on a column or on a scalar expression."
+    );
+  }
+
+  @Test
+  public void testThetaSketchEstimateAsVirtualColumn()
+  {
+    testQuery(
+        "SELECT"
+        + " THETA_SKETCH_ESTIMATE(thetasketch_dim1)"
+        + " FROM foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "theta_sketch_estimate(\"thetasketch_dim1\")",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            NullHandling.replaceWithDefault() ? new Object[]{null} : new Object[]{0.0D},

Review Comment:
   What causes the difference here?  It would appear that this and HLL are not consistent with each other and I'm curious why.



##########
extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllSketchSqlAggregatorTest.java:
##########
@@ -888,4 +902,187 @@ public void testGroupByAggregatorDefaultValuesFinalizeOuterSketches()
         ImmutableList.of(new Object[]{"a", 0L, "0"})
     );
   }
+
+  @Test
+  public void testHllEstimateAsVirtualColumn()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\")",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithRound()
+  {
+    testQuery(
+        "SELECT"
+        + " HLL_SKETCH_ESTIMATE(hllsketch_dim1, TRUE)"
+        + " FROM druid.foo",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(CalciteTests.DATASOURCE1)
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .virtualColumns(new ExpressionVirtualColumn(
+                    "v0",
+                    "hll_sketch_estimate(\"hllsketch_dim1\",1)",
+                    ColumnType.DOUBLE,
+                    MACRO_TABLE
+                ))
+                .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                .columns("v0")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{0.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D},
+            new Object[]{1.0D}
+        )
+    );
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnOnNonHllCol()
+  {
+    try {
+      testQuery(
+          "SELECT"
+          + " HLL_SKETCH_ESTIMATE(dim2)"
+          + " FROM druid.foo",
+          ImmutableList.of(
+              newScanQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE1)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .virtualColumns(new ExpressionVirtualColumn(
+                      "v0",
+                      "hll_sketch_estimate(\"dim2\")",
+                      ColumnType.DOUBLE,
+                      MACRO_TABLE
+                  ))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .columns("v0")
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .build()
+          ),
+          ImmutableList.of()
+      );
+    }
+    catch (ExpressionValidationException e) {
+      Assert.assertTrue(
+          e.getMessage().contains("Function[HLL_SKETCH_ESTIMATE] requires a HllSketch as the argument")
+      );
+    }
+  }
+
+  @Test
+  public void testHllEstimateAsVirtualColumnWithGroupByOrderBy()
+  {
+    skipVectorize();
+    cannotVectorize();

Review Comment:
   Do you know what makes this not vectorizable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #14312: Hll Sketch and Theta sketch estimate can now be used as an expression

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14312:
URL: https://github.com/apache/druid/pull/14312#discussion_r1220819712


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/sql/HllPostAggExprMacros.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.datasketches.hll.sql;
+
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class HllPostAggExprMacros
+{
+  public static final String HLL_SKETCH_ESTIMATE = "hll_sketch_estimate";
+
+  public static class HLLSketchEstimateExprMacro implements ExprMacroTable.ExprMacro
+  {
+
+    @Override
+    public Expr apply(List<Expr> args)
+    {
+      validationHelperCheckAnyOfArgumentCount(args, 1, 2);
+      return new HllSketchEstimateExpr(args);
+    }
+
+    @Override
+    public String name()
+    {
+      return HLL_SKETCH_ESTIMATE;
+    }
+  }
+
+  public static class HllSketchEstimateExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
+  {
+    private Expr estimateExpr;
+    private Expr isRound;
+
+    public HllSketchEstimateExpr(List<Expr> args)
+    {
+      super(HLL_SKETCH_ESTIMATE, args);
+      this.estimateExpr = args.get(0);
+      if (args.size() == 2) {
+        isRound = args.get(1);
+      }
+    }
+
+    @Nullable
+    @Override
+    public ExpressionType getOutputType(InputBindingInspector inspector)
+    {
+      return ExpressionType.DOUBLE;
+    }
+
+    @Override
+    public ExprEval eval(ObjectBinding bindings)
+    {
+      boolean round = false;
+      ExprEval eval = estimateExpr.eval(bindings);
+      if (isRound != null) {

Review Comment:
   yeah, there is `Expr.isLiteral` and `Expr.getLIteralValue` that can be used to check if a value is a constant



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org