You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/08 00:09:13 UTC

[GitHub] [incubator-pinot] bkuang88 opened a new pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

bkuang88 opened a new pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832


   #5377  Description
   
   Currently, the DISTINCTCOUNTTHETASKETCH aggregation function does not support SET_DIFF operations. This pull-request addresses this gap.
   
   However, this commit does introduce a backwards-incompatible change. We are suggesting to change the syntax a bit to the following:
   
   DISTINCTCOUNTTHETASKETCH(col, 'nominalEntries=1234', 'predicate1', 'predicate2', 'SET_DIFF($1, $2)')
   
   We are introducing 3 "merging functions" into the Pinot aggregation function:
   1. SET_UNION
   2. SET_INTERSECT
   3. SET_DIFF
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ x ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   
   ## Release Notes
   This patch introduces a new syntax to the recently introduced DISTINCTCOUNTTHETASKETCH aggregation function. The syntax will also introduce the new SET_DIFF functionality between 2 theta sketches. The syntax will be as follows:
   
   DISTINCTCOUNTTHETASKETCH(col, 'nominalEntries=1024', 'colA=1', 'colB=2', 'SET_DIFF($1, $2)')
   
   ## Documentation
   If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ManojRThakur commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
ManojRThakur commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467531933



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -526,28 +543,51 @@ private Predicate getPredicate(String predicateString) {
    * passed to this method to be used when evaluating the expression.
    *
    * @param postAggregationExpression Post-aggregation expression to evaluate (modeled as a filter)
+   * @param expressions list of aggregation function parameters
    * @param sketchMap Precomputed sketches for predicates that are part of the expression.
    * @return Overall evaluated sketch for the expression.
    */
-  private Sketch evalPostAggregationExpression(FilterContext postAggregationExpression,
-      Map<Predicate, Sketch> sketchMap) {
-    switch (postAggregationExpression.getType()) {
-      case AND:
-        Intersection intersection = _setOperationBuilder.buildIntersection();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          intersection.update(evalPostAggregationExpression(child, sketchMap));
-        }
-        return intersection.getResult();
-      case OR:
-        Union union = _setOperationBuilder.buildUnion();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          union.update(evalPostAggregationExpression(child, sketchMap));
+  private Sketch evalPostAggregationExpression(
+      final ExpressionContext postAggregationExpression,
+      final List<Predicate> expressions,
+      final Map<Predicate, Sketch> sketchMap) {
+    if (postAggregationExpression.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Literal not supported in post-aggregation function");
+    }
+
+    if (postAggregationExpression.getType() == ExpressionContext.Type.IDENTIFIER) {
+      final Predicate exp =
+          expressions.get(extractSubstitutionPosition(postAggregationExpression.getLiteral()) - 1);
+      return sketchMap.get(exp);
+    }
+
+    // shouldn't throw exception because of the validation in the constructor
+    final MergeFunction func =
+        MergeFunction.valueOf(postAggregationExpression.getFunction().getFunctionName().toUpperCase());
+
+    // handle functions recursively
+    switch(func) {
+      case SET_UNION:
+        final Union union = _setOperationBuilder.buildUnion();
+        for (final ExpressionContext exp : postAggregationExpression.getFunction().getArguments()) {
+          union.update(evalPostAggregationExpression(exp, expressions, sketchMap));
         }
         return union.getResult();
-      case PREDICATE:
-        return sketchMap.get(postAggregationExpression.getPredicate());
+      case SET_INTERSECT:
+        final Intersection intersection = _setOperationBuilder.buildIntersection();
+        for (final ExpressionContext exp : postAggregationExpression.getFunction().getArguments()) {
+          intersection.update(evalPostAggregationExpression(exp, expressions, sketchMap));
+        }
+        return intersection.getResult();
+      case SET_DIFF:
+        final List<ExpressionContext> args = postAggregationExpression.getFunction().getArguments();
+        final AnotB diff = _setOperationBuilder.buildANotB();
+        final Sketch a = evalPostAggregationExpression(args.get(0), expressions, sketchMap);
+        final Sketch b = evalPostAggregationExpression(args.get(1), expressions, sketchMap);
+        diff.update(a, b);

Review comment:
       Lets make sure we call out in the documentation that the order matters in case of SET_DIFF




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r468141336



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       It seems to me that we can always support this in future (if needed, and/or asked by users) without breaking compatibility with syntax in this PR. So I am fine with not supporting auto-derive for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#issuecomment-671064316


   @Jackie-Jiang Since we have discussed changing the syntax of this function in the past (to provide projection columns, etc), and given that this PR is changing the syntax as well (for a reason slightly different from our previous discussions), perhaps we should finalize on a syntax that won't need further changing?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] bkuang88 commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
bkuang88 commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467343288



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       Hmm I've thought about it, but if we auto-derive like before, we'd only have a subset of features. We won't be able to support SET_DIFF at all if go with that approach right?
   
   Usually, I would think that if there are 2 equivalent ways of writing something, the feature set would be the same. But it seems like only a subset of features would be available if we go with that route.
   
   I'm personally okay with both approach but wanted to point that out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv merged pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv merged pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467499977



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {

Review comment:
       I feel putting these functions as constant is more readable and easier to use:
   `public static final String UNION = "UNION";`
   You can directly `switch` on the `expression.getFunction().getFunctionName().toUpperCase()` and not need to worry about `MergeFunction.valueOf()` throws exception.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;
+
+    /** SET_INTERSECT($1, $2, $3) => "SET_INTERSECT($1,$2,$3)", useful for unit tests */
+    public String apply(final String... args) {

Review comment:
       Let's not have method just for test in production class. In the test you should test different functions instead of the standardized one (e.g. `Intersect($1, $2, $3)`, `INTERSECT($1,$2,$3)` etc.)

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *

Review comment:
       Also validate that there are 2 arguments for `DIFF`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;
+
+    /** SET_INTERSECT($1, $2, $3) => "SET_INTERSECT($1,$2,$3)", useful for unit tests */
+    public String apply(final String... args) {
+      final String delimited = String.join(",", args);
+      return String.format("%s(%s)", name(), delimited);
+    }
+
+    public static boolean isValid(final String name) {
+      return SET_UNION.name().equalsIgnoreCase(name)
+          || SET_INTERSECT.name().equalsIgnoreCase(name)
+          || SET_DIFF.name().equalsIgnoreCase(name);
+    }
+  }
+
+  private static final Pattern ARGUMENT_SUBSTITUTION = Pattern.compile("\\$(\\d+)");
+
   private final ExpressionContext _thetaSketchColumn;
   private final ThetaSketchParams _thetaSketchParams;
   private final SetOperationBuilder _setOperationBuilder;
   private final List<ExpressionContext> _inputExpressions;
-  private final FilterContext _postAggregationExpression;
+  private final ExpressionContext _postAggregationExpression;
+  private final List<Predicate> _predicateInfoList;

Review comment:
       ```suggestion
     private final List<Predicate> _predicates;
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -526,28 +543,51 @@ private Predicate getPredicate(String predicateString) {
    * passed to this method to be used when evaluating the expression.
    *
    * @param postAggregationExpression Post-aggregation expression to evaluate (modeled as a filter)
+   * @param expressions list of aggregation function parameters
    * @param sketchMap Precomputed sketches for predicates that are part of the expression.
    * @return Overall evaluated sketch for the expression.
    */
-  private Sketch evalPostAggregationExpression(FilterContext postAggregationExpression,
-      Map<Predicate, Sketch> sketchMap) {
-    switch (postAggregationExpression.getType()) {
-      case AND:
-        Intersection intersection = _setOperationBuilder.buildIntersection();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          intersection.update(evalPostAggregationExpression(child, sketchMap));
-        }
-        return intersection.getResult();
-      case OR:
-        Union union = _setOperationBuilder.buildUnion();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          union.update(evalPostAggregationExpression(child, sketchMap));
+  private Sketch evalPostAggregationExpression(
+      final ExpressionContext postAggregationExpression,

Review comment:
       (Code convention) Remove `final` and reformat. Same for other methods

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;
+
+    /** SET_INTERSECT($1, $2, $3) => "SET_INTERSECT($1,$2,$3)", useful for unit tests */
+    public String apply(final String... args) {
+      final String delimited = String.join(",", args);
+      return String.format("%s(%s)", name(), delimited);
+    }
+
+    public static boolean isValid(final String name) {

Review comment:
       (Code convention) We don't use `final` within method argument or local variables. Same for other places

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;

Review comment:
       `UNION`, `INTERSECT`, `DIFF` for concise and simplicity?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *
+   * For example, if the post aggregation function is:
+   * INTERSECT($1, $2, $3)
+   *
+   * But there are only 2 arguments passed into the aggregation function, throw an error
+   * @param context The parsed function context that's a tree structure
+   * @param numPredicates Max number of predicates available to be substituted
+   */
+  private static void validatePostAggregationExpression(final ExpressionContext context, final int numPredicates) {
+    if (context.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Invalid post-aggregation function expression syntax.");
+    }
+
+    if (context.getType() == ExpressionContext.Type.IDENTIFIER) {
+      int id = extractSubstitutionPosition(context.getIdentifier());
+      if (id <= 0)
+        throw new IllegalArgumentException("Argument substitution starts at $1");
+      if (id > numPredicates)
+        throw new IllegalArgumentException("Argument substitution exceeded number of predicates");
+      // if none of the invalid conditions are met above, exit out early
+      return;
+    }
+
+    if (!MergeFunction.isValid(context.getFunction().getFunctionName())) {
+      final String allowed =
+          Arrays.stream(MergeFunction.values())
+              .map(MergeFunction::name)
+              .collect(Collectors.joining(","));
+      throw new IllegalArgumentException(
+          String.format("Invalid Theta Sketch aggregation function. Allowed: [%s]", allowed));
+    }
+
+    switch (MergeFunction.valueOf(context.getFunction().getFunctionName().toUpperCase())) {
+      case SET_DIFF:
+        // set diff can only have 2 arguments
+        if (context.getFunction().getArguments().size() != 2) {
+          throw new IllegalArgumentException("SET_DIFF function can only have 2 arguments.");
+        }
+        break;

Review comment:
       Also validate the arguments

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified
+    for (int i = 2; i < numArguments - 1; i++) {
+      ExpressionContext predicateExpression = arguments.get(i);
+      Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
+          "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
+      Predicate predicate = getPredicate(predicateExpression.getLiteral());
+      _inputExpressions.add(predicate.getLhs());
+      _predicateInfoList.add(predicate);
+      _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
     }
+
+    // first expression is the nominal entries parameter

Review comment:
       (Code convention) Capitalize the first character of the comments. Same for other places

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *
+   * For example, if the post aggregation function is:
+   * INTERSECT($1, $2, $3)
+   *
+   * But there are only 2 arguments passed into the aggregation function, throw an error
+   * @param context The parsed function context that's a tree structure
+   * @param numPredicates Max number of predicates available to be substituted
+   */
+  private static void validatePostAggregationExpression(final ExpressionContext context, final int numPredicates) {
+    if (context.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Invalid post-aggregation function expression syntax.");
+    }
+
+    if (context.getType() == ExpressionContext.Type.IDENTIFIER) {
+      int id = extractSubstitutionPosition(context.getIdentifier());
+      if (id <= 0)
+        throw new IllegalArgumentException("Argument substitution starts at $1");
+      if (id > numPredicates)
+        throw new IllegalArgumentException("Argument substitution exceeded number of predicates");
+      // if none of the invalid conditions are met above, exit out early
+      return;
+    }
+
+    if (!MergeFunction.isValid(context.getFunction().getFunctionName())) {
+      final String allowed =
+          Arrays.stream(MergeFunction.values())
+              .map(MergeFunction::name)
+              .collect(Collectors.joining(","));
+      throw new IllegalArgumentException(
+          String.format("Invalid Theta Sketch aggregation function. Allowed: [%s]", allowed));
+    }
+
+    switch (MergeFunction.valueOf(context.getFunction().getFunctionName().toUpperCase())) {
+      case SET_DIFF:
+        // set diff can only have 2 arguments
+        if (context.getFunction().getArguments().size() != 2) {
+          throw new IllegalArgumentException("SET_DIFF function can only have 2 arguments.");
+        }
+        break;
+      case SET_UNION:
+      case SET_INTERSECT:

Review comment:
       Check it has more than one argument?
   Prevent inefficient function such as `UNION()` or `UNION($1)`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -78,9 +102,9 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
       throws SqlParseException {
     int numArguments = arguments.size();
 
-    // NOTE: This function expects at least 3 arguments: theta-sketch column, parameters, post-aggregation expression.
-    Preconditions.checkArgument(numArguments >= 3,
-        "DistinctCountThetaSketch expects at least three arguments (theta-sketch column, parameters, post-aggregation expression), got: ",
+    // NOTE: This function expects at least 4 arguments: theta-sketch column, parameters, post-aggregation expression.

Review comment:
       +1

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -526,28 +543,51 @@ private Predicate getPredicate(String predicateString) {
    * passed to this method to be used when evaluating the expression.
    *
    * @param postAggregationExpression Post-aggregation expression to evaluate (modeled as a filter)
+   * @param expressions list of aggregation function parameters
    * @param sketchMap Precomputed sketches for predicates that are part of the expression.
    * @return Overall evaluated sketch for the expression.
    */
-  private Sketch evalPostAggregationExpression(FilterContext postAggregationExpression,
-      Map<Predicate, Sketch> sketchMap) {
-    switch (postAggregationExpression.getType()) {
-      case AND:
-        Intersection intersection = _setOperationBuilder.buildIntersection();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          intersection.update(evalPostAggregationExpression(child, sketchMap));
-        }
-        return intersection.getResult();
-      case OR:
-        Union union = _setOperationBuilder.buildUnion();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          union.update(evalPostAggregationExpression(child, sketchMap));
+  private Sketch evalPostAggregationExpression(
+      final ExpressionContext postAggregationExpression,
+      final List<Predicate> expressions,

Review comment:
       No need to pass in this argument. Directly use member variable `_predicates`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467595198



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;

Review comment:
       I think `UNION` is SQL reserved keyword, so Calcite won't parse. If there's a way around that, I'd also prefer concise names.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ManojRThakur commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
ManojRThakur commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467533175



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *
+   * For example, if the post aggregation function is:
+   * INTERSECT($1, $2, $3)
+   *
+   * But there are only 2 arguments passed into the aggregation function, throw an error
+   * @param context The parsed function context that's a tree structure
+   * @param numPredicates Max number of predicates available to be substituted
+   */
+  private static void validatePostAggregationExpression(final ExpressionContext context, final int numPredicates) {
+    if (context.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Invalid post-aggregation function expression syntax.");
+    }
+
+    if (context.getType() == ExpressionContext.Type.IDENTIFIER) {
+      int id = extractSubstitutionPosition(context.getIdentifier());
+      if (id <= 0)
+        throw new IllegalArgumentException("Argument substitution starts at $1");
+      if (id > numPredicates)
+        throw new IllegalArgumentException("Argument substitution exceeded number of predicates");
+      // if none of the invalid conditions are met above, exit out early
+      return;
+    }
+
+    if (!MergeFunction.isValid(context.getFunction().getFunctionName())) {
+      final String allowed =
+          Arrays.stream(MergeFunction.values())
+              .map(MergeFunction::name)
+              .collect(Collectors.joining(","));
+      throw new IllegalArgumentException(
+          String.format("Invalid Theta Sketch aggregation function. Allowed: [%s]", allowed));
+    }
+
+    switch (MergeFunction.valueOf(context.getFunction().getFunctionName().toUpperCase())) {
+      case SET_DIFF:
+        // set diff can only have 2 arguments
+        if (context.getFunction().getArguments().size() != 2) {
+          throw new IllegalArgumentException("SET_DIFF function can only have 2 arguments.");
+        }
+        break;
+      case SET_UNION:
+      case SET_INTERSECT:
+        for (final ExpressionContext arg : context.getFunction().getArguments()) {
+          validatePostAggregationExpression(arg, numPredicates);
+        }
+        break;
+      default:
+        throw new IllegalStateException("Invalid merge function");
+    }
+  }
+
+  private static int extractSubstitutionPosition(final String str) {
+    final Matcher matcher = ARGUMENT_SUBSTITUTION.matcher(str);
+    if (matcher.find() && matcher.groupCount() == 1) {
+      return Integer.parseInt(matcher.group(1));

Review comment:
       Validate that the index is > 0?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#issuecomment-671609047


   (Optional) Allow theta-sketch with only 2 arguments: theta-sketch column, parameters. It simply union the sketches without any extra predicate. Currently in order to achieve that, we have to add a dummy predicate which always evaluate to `true` and pay the extra cost of predicate evaluation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ManojRThakur commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
ManojRThakur commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467533175



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *
+   * For example, if the post aggregation function is:
+   * INTERSECT($1, $2, $3)
+   *
+   * But there are only 2 arguments passed into the aggregation function, throw an error
+   * @param context The parsed function context that's a tree structure
+   * @param numPredicates Max number of predicates available to be substituted
+   */
+  private static void validatePostAggregationExpression(final ExpressionContext context, final int numPredicates) {
+    if (context.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Invalid post-aggregation function expression syntax.");
+    }
+
+    if (context.getType() == ExpressionContext.Type.IDENTIFIER) {
+      int id = extractSubstitutionPosition(context.getIdentifier());
+      if (id <= 0)
+        throw new IllegalArgumentException("Argument substitution starts at $1");
+      if (id > numPredicates)
+        throw new IllegalArgumentException("Argument substitution exceeded number of predicates");
+      // if none of the invalid conditions are met above, exit out early
+      return;
+    }
+
+    if (!MergeFunction.isValid(context.getFunction().getFunctionName())) {
+      final String allowed =
+          Arrays.stream(MergeFunction.values())
+              .map(MergeFunction::name)
+              .collect(Collectors.joining(","));
+      throw new IllegalArgumentException(
+          String.format("Invalid Theta Sketch aggregation function. Allowed: [%s]", allowed));
+    }
+
+    switch (MergeFunction.valueOf(context.getFunction().getFunctionName().toUpperCase())) {
+      case SET_DIFF:
+        // set diff can only have 2 arguments
+        if (context.getFunction().getArguments().size() != 2) {
+          throw new IllegalArgumentException("SET_DIFF function can only have 2 arguments.");
+        }
+        break;
+      case SET_UNION:
+      case SET_INTERSECT:
+        for (final ExpressionContext arg : context.getFunction().getArguments()) {
+          validatePostAggregationExpression(arg, numPredicates);
+        }
+        break;
+      default:
+        throw new IllegalStateException("Invalid merge function");
+    }
+  }
+
+  private static int extractSubstitutionPosition(final String str) {
+    final Matcher matcher = ARGUMENT_SUBSTITUTION.matcher(str);
+    if (matcher.find() && matcher.groupCount() == 1) {
+      return Integer.parseInt(matcher.group(1));

Review comment:
       Validate that the index is > 0?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467334875



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -97,6 +121,10 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     // Initialize the theta-sketch set operation builder
     _setOperationBuilder = getSetOperationBuilder();
 
+    // index of the original input predicates
+    // index[0] = $1

Review comment:
       Remove unused code?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#issuecomment-671498383


   > can we chain the operations?
   > 
   > `diff(union($1, $2), union($3, $4)) `
   
   Yes, the PR should already support this. @bkuang88 could you add a test for this, if one doesn't already exist?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467594765



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       > Not related to this PR, but do we want to support theta-sketch without any predicates (union without post aggregation)?
   
   Sure, this can be achieved by making post-aggregation-expression optional as well (or can be empty).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ManojRThakur commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
ManojRThakur commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467532312



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       @bkuang88 I don't understand why we can't support SET_DIFF with auto-derive.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r468201232



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;

Review comment:
       NVM, seems we cannot use the preserved keyword




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] bkuang88 commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
bkuang88 commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r468125503



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;
+
+    /** SET_INTERSECT($1, $2, $3) => "SET_INTERSECT($1,$2,$3)", useful for unit tests */
+    public String apply(final String... args) {

Review comment:
       Sure, I will get rid of the method for testing purposes. But I'd like to keep the functions for type-safety and inline with some of the other sql function and sql type enums.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] bkuang88 commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
bkuang88 commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r468183845



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;

Review comment:
       I just tried. It won't let me use `UNION`/`INTERSECT` - unless I'm missing something completely.
   
   Based on the current diff, line 136-137 in `DistinctCountThetaSketchAggregationFunction.java` throws the exception
   ```java
       _postAggregationExpression = QueryContextConverterUtils
           .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
   ```
   Exception:
   ```
   Encountered "INTERSECT" at line 1, column 1.
   Was expecting one of:
       "+" ...
       "-" ...
       <UNSIGNED_INTEGER_LITERAL> ...
       <DECIMAL_NUMERIC_LITERAL> ...
       <APPROX_NUMERIC_LITERAL> ...
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467501437



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       @mayankshriv I don't quite follow this comment. In what situation do we need the auto-deriving?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter commented on pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#issuecomment-671586172


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=h1) Report
   > :exclamation: No coverage uploaded for pull request base (`master@ae2bd2f`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `52.56%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/5832/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #5832   +/-   ##
   =========================================
     Coverage          ?   43.70%           
   =========================================
     Files             ?     1172           
     Lines             ?    60975           
     Branches          ?     9291           
   =========================================
     Hits              ?    26651           
     Misses            ?    31920           
     Partials          ?     2404           
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | #integration | `43.70% <52.56%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...n/DistinctCountThetaSketchAggregationFunction.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50VGhldGFTa2V0Y2hBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `44.62% <51.31%> (ø)` | |
   | [...ot/core/segment/index/metadata/ColumnMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L2luZGV4L21ldGFkYXRhL0NvbHVtbk1ldGFkYXRhLmphdmE=) | `70.25% <100.00%> (ø)` | |
   | [...rg/apache/pinot/core/geospatial/GeometryUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9nZW9zcGF0aWFsL0dlb21ldHJ5VXRpbHMuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../org/apache/pinot/client/SimpleBrokerSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1NpbXBsZUJyb2tlclNlbGVjdG9yLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...t/core/data/partition/ModuloPartitionFunction.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3BhcnRpdGlvbi9Nb2R1bG9QYXJ0aXRpb25GdW5jdGlvbi5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...ot/core/segment/index/readers/DocIdDictionary.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L2luZGV4L3JlYWRlcnMvRG9jSWREaWN0aW9uYXJ5LmphdmE=) | `56.25% <0.00%> (ø)` | |
   | [...ransformer/timeunit/CustomTimeUnitTransformer.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vdHJhbnNmb3JtZXIvdGltZXVuaXQvQ3VzdG9tVGltZVVuaXRUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [.../helix/core/realtime/SegmentCompletionManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL3JlYWx0aW1lL1NlZ21lbnRDb21wbGV0aW9uTWFuYWdlci5qYXZh) | `41.66% <0.00%> (ø)` | |
   | [...a/org/apache/pinot/minion/metrics/MinionMeter.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vbWV0cmljcy9NaW5pb25NZXRlci5qYXZh) | `100.00% <0.00%> (ø)` | |
   | [...ain/java/org/apache/pinot/common/data/Segment.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZGF0YS9TZWdtZW50LmphdmE=) | `46.15% <0.00%> (ø)` | |
   | ... and [1164 more](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=footer). Last update [ae2bd2f...c78c568](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#issuecomment-671441879


   can we chain the operations?
   
   > diff(union($1, $2), union($3, $4)) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] bkuang88 commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
bkuang88 commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467344354



##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/DistinctCountThetaSketchTest.java
##########
@@ -117,60 +121,77 @@ public void testGroupBySql() {
     testThetaSketches(true, true);
   }
 
+  @Test(expectedExceptions = BadQueryRequestException.class, dataProvider = "badQueries")
+  public void testInvalidNoPredicates(final String query) {
+    getBrokerResponseForSqlQuery(query);
+  }
+
+  @DataProvider(name = "badQueries")
+  public Object[][] badQueries() {
+    return new Object[][] {
+        // need at least 4 arguments in agg func
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', '$0') from testTable"},
+        // substitution arguments should start at $1
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', 'colA = 1', '$0') from testTable"},
+        // substituting variable has numeric value higher than the number of predicates provided
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', 'colA = 1', '$5') from testTable"},
+        // SET_DIFF requires exactly 2 arguments
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', 'colA = 1', 'SET_DIFF($1)') from testTable"},
+        // invalid merging function
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', 'colA = 1', 'asdf') from testTable"}
+    };
+  }
+
   private void testThetaSketches(boolean groupBy, boolean sql) {
     String tsQuery, distinctQuery;
     String thetaSketchParams = "nominalEntries=1001";
 
     List<String> predicateStrings = Collections.singletonList("colA = 1");
+    String substitution = "$1";
     String whereClause = Strings.join(predicateStrings, " or ");
-    tsQuery = buildQuery(whereClause, thetaSketchParams, predicateStrings, whereClause, groupBy, false);
+    tsQuery = buildQuery(whereClause, thetaSketchParams, predicateStrings, substitution, groupBy, false);

Review comment:
       I think this test wasn't testing any aggregations. It was just selecting a sketch without any aggregations, so I didn't touch it. Would you like me to get rid of this test then?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       Wonder if that introduces more confusion than flexibility. Because we currently don't want to support complex predicates right? Wouldn't the users make more mistakes or be confused as to when they can use complex predicates and when they cannot?
   
   And if we do start to introduce complex queries in the future, will the syntax be "UNION($1, $2)" or "$1 and $2" or "colA='a' or colB='b'" or "UNION(colA='a', colB='b')"?
   
   Given the vast possibilities out there, I'm wondering if it's better to just stick to a single syntax so that it doesn't cause confusion in the future in case we do want to modify the predicate complexities.
   
   What do you guys think?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;

Review comment:
       I think there are some SQL keywords in there. Do we want to mix real SQL keywords with theta sketch merging functions?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467347951



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       I meant like use the new function call based syntax, with expanded values. For example:
   `SET_DIFF('col1=A', 'col2=B')`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#issuecomment-671586172


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=h1) Report
   > :exclamation: No coverage uploaded for pull request base (`master@ae2bd2f`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/5832/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #5832   +/-   ##
   =========================================
     Coverage          ?   67.20%           
   =========================================
     Files             ?     1172           
     Lines             ?    60975           
     Branches          ?     9291           
   =========================================
     Hits              ?    40980           
     Misses            ?    16976           
     Partials          ?     3019           
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | #integration | `43.70% <52.56%> (?)` | |
   | #unittests | `58.43% <66.66%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...n/DistinctCountThetaSketchAggregationFunction.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50VGhldGFTa2V0Y2hBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `59.81% <65.78%> (ø)` | |
   | [...ot/core/segment/index/metadata/ColumnMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L2luZGV4L21ldGFkYXRhL0NvbHVtbk1ldGFkYXRhLmphdmE=) | `81.46% <100.00%> (ø)` | |
   | [...ain/java/org/apache/pinot/spi/utils/JsonUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvSnNvblV0aWxzLmphdmE=) | `26.31% <0.00%> (ø)` | |
   | [.../core/data/function/TimeSpecFunctionEvaluator.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL2Z1bmN0aW9uL1RpbWVTcGVjRnVuY3Rpb25FdmFsdWF0b3IuamF2YQ==) | `76.47% <0.00%> (ø)` | |
   | [.../pinot/plugin/inputformat/orc/ORCRecordReader.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtcGx1Z2lucy9waW5vdC1pbnB1dC1mb3JtYXQvcGlub3Qtb3JjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9wbHVnaW4vaW5wdXRmb3JtYXQvb3JjL09SQ1JlY29yZFJlYWRlci5qYXZh) | `53.74% <0.00%> (ø)` | |
   | [...not/pql/parsers/pql2/ast/StringLiteralAstNode.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9wcWwvcGFyc2Vycy9wcWwyL2FzdC9TdHJpbmdMaXRlcmFsQXN0Tm9kZS5qYXZh) | `83.33% <0.00%> (ø)` | |
   | [...e/pinot/server/starter/grpc/PinotQueryHandler.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9ncnBjL1Bpbm90UXVlcnlIYW5kbGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [.../core/query/request/context/ExpressionContext.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9yZXF1ZXN0L2NvbnRleHQvRXhwcmVzc2lvbkNvbnRleHQuamF2YQ==) | `84.37% <0.00%> (ø)` | |
   | [...pl/fwd/SingleValueUnsortedForwardIndexCreator.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvU2luZ2xlVmFsdWVVbnNvcnRlZEZvcndhcmRJbmRleENyZWF0b3IuamF2YQ==) | `72.72% <0.00%> (ø)` | |
   | [.../core/operator/filter/TextMatchFilterOperator.java](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9maWx0ZXIvVGV4dE1hdGNoRmlsdGVyT3BlcmF0b3IuamF2YQ==) | `87.50% <0.00%> (ø)` | |
   | ... and [1164 more](https://codecov.io/gh/apache/incubator-pinot/pull/5832/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=footer). Last update [ae2bd2f...c78c568](https://codecov.io/gh/apache/incubator-pinot/pull/5832?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467595088



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       Yes, that's what I am stating as well @ManojRThakur. We can auto derive as long as the post-aggr-expression uses  actual predicate strings (as opposed to $ notation).
   
   @Jackie-Jiang Perhaps users will prefer $ notation (in which case you can't auto-derive). But always better to have the option available (especially given that we do support right now).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] bkuang88 commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
bkuang88 commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r468138173



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -526,28 +543,51 @@ private Predicate getPredicate(String predicateString) {
    * passed to this method to be used when evaluating the expression.
    *
    * @param postAggregationExpression Post-aggregation expression to evaluate (modeled as a filter)
+   * @param expressions list of aggregation function parameters
    * @param sketchMap Precomputed sketches for predicates that are part of the expression.
    * @return Overall evaluated sketch for the expression.
    */
-  private Sketch evalPostAggregationExpression(FilterContext postAggregationExpression,
-      Map<Predicate, Sketch> sketchMap) {
-    switch (postAggregationExpression.getType()) {
-      case AND:
-        Intersection intersection = _setOperationBuilder.buildIntersection();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          intersection.update(evalPostAggregationExpression(child, sketchMap));
-        }
-        return intersection.getResult();
-      case OR:
-        Union union = _setOperationBuilder.buildUnion();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          union.update(evalPostAggregationExpression(child, sketchMap));
+  private Sketch evalPostAggregationExpression(
+      final ExpressionContext postAggregationExpression,
+      final List<Predicate> expressions,
+      final Map<Predicate, Sketch> sketchMap) {
+    if (postAggregationExpression.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Literal not supported in post-aggregation function");
+    }
+
+    if (postAggregationExpression.getType() == ExpressionContext.Type.IDENTIFIER) {
+      final Predicate exp =
+          expressions.get(extractSubstitutionPosition(postAggregationExpression.getLiteral()) - 1);
+      return sketchMap.get(exp);
+    }
+
+    // shouldn't throw exception because of the validation in the constructor
+    final MergeFunction func =
+        MergeFunction.valueOf(postAggregationExpression.getFunction().getFunctionName().toUpperCase());
+
+    // handle functions recursively
+    switch(func) {
+      case SET_UNION:
+        final Union union = _setOperationBuilder.buildUnion();
+        for (final ExpressionContext exp : postAggregationExpression.getFunction().getArguments()) {
+          union.update(evalPostAggregationExpression(exp, expressions, sketchMap));
         }
         return union.getResult();
-      case PREDICATE:
-        return sketchMap.get(postAggregationExpression.getPredicate());
+      case SET_INTERSECT:
+        final Intersection intersection = _setOperationBuilder.buildIntersection();
+        for (final ExpressionContext exp : postAggregationExpression.getFunction().getArguments()) {
+          intersection.update(evalPostAggregationExpression(exp, expressions, sketchMap));
+        }
+        return intersection.getResult();
+      case SET_DIFF:
+        final List<ExpressionContext> args = postAggregationExpression.getFunction().getArguments();
+        final AnotB diff = _setOperationBuilder.buildANotB();
+        final Sketch a = evalPostAggregationExpression(args.get(0), expressions, sketchMap);
+        final Sketch b = evalPostAggregationExpression(args.get(1), expressions, sketchMap);
+        diff.update(a, b);

Review comment:
       Good call. Done - updated in the release notes in the PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467334796



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -78,9 +102,9 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
       throws SqlParseException {
     int numArguments = arguments.size();
 
-    // NOTE: This function expects at least 3 arguments: theta-sketch column, parameters, post-aggregation expression.
-    Preconditions.checkArgument(numArguments >= 3,
-        "DistinctCountThetaSketch expects at least three arguments (theta-sketch column, parameters, post-aggregation expression), got: ",
+    // NOTE: This function expects at least 4 arguments: theta-sketch column, parameters, post-aggregation expression.

Review comment:
       The comment says 4 arguments, but it only lists three of them?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -97,6 +121,10 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     // Initialize the theta-sketch set operation builder
     _setOperationBuilder = getSetOperationBuilder();
 
+    // index of the original input predicates
+    // index[0] = $1

Review comment:
       Remove unused code?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -526,28 +543,51 @@ private Predicate getPredicate(String predicateString) {
    * passed to this method to be used when evaluating the expression.
    *
    * @param postAggregationExpression Post-aggregation expression to evaluate (modeled as a filter)
+   * @param expressions list of aggregation function parameters
    * @param sketchMap Precomputed sketches for predicates that are part of the expression.
    * @return Overall evaluated sketch for the expression.
    */
-  private Sketch evalPostAggregationExpression(FilterContext postAggregationExpression,
-      Map<Predicate, Sketch> sketchMap) {
-    switch (postAggregationExpression.getType()) {
-      case AND:
-        Intersection intersection = _setOperationBuilder.buildIntersection();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          intersection.update(evalPostAggregationExpression(child, sketchMap));
-        }
-        return intersection.getResult();
-      case OR:
-        Union union = _setOperationBuilder.buildUnion();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          union.update(evalPostAggregationExpression(child, sketchMap));
+  private Sketch evalPostAggregationExpression(
+      final ExpressionContext postAggregationExpression,
+      final List<Predicate> expressions,
+      final Map<Predicate, Sketch> sketchMap) {
+    if (postAggregationExpression.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Literal not supported in post-aggregation function");
+    }
+
+    if (postAggregationExpression.getType() == ExpressionContext.Type.IDENTIFIER) {
+      final Predicate exp =
+          expressions.get(extractSubstitutionPosition(postAggregationExpression.getLiteral()) - 1);
+      return sketchMap.get(exp);
+    }
+
+    // shouldn't throw exception because of the validation in the constructor
+    final MergeFunction func =
+        MergeFunction.valueOf(postAggregationExpression.getFunction().getFunctionName().toUpperCase());
+
+    // handle functions recursively
+    switch(func) {
+      case SET_UNION:
+        final Union union = _setOperationBuilder.buildUnion();
+        for (final ExpressionContext exp : postAggregationExpression.getFunction().getArguments()) {
+          union.update(evalPostAggregationExpression(exp, expressions, sketchMap));
         }
         return union.getResult();
-      case PREDICATE:
-        return sketchMap.get(postAggregationExpression.getPredicate());
+      case SET_INTERSECT:
+        final Intersection intersection = _setOperationBuilder.buildIntersection();
+        for (final ExpressionContext exp : postAggregationExpression.getFunction().getArguments()) {
+          intersection.update(evalPostAggregationExpression(exp, expressions, sketchMap));
+        }
+        return intersection.getResult();
+      case SET_DIFF:
+        final List<ExpressionContext> args = postAggregationExpression.getFunction().getArguments();
+        final AnotB diff = _setOperationBuilder.buildANotB();
+        final Sketch a = evalPostAggregationExpression(args.get(0), expressions, sketchMap);
+        final Sketch b = evalPostAggregationExpression(args.get(1), expressions, sketchMap);
+        diff.update(a, b);
+        return diff.getResult();
       default:
-        throw new IllegalStateException();
+        throw new IllegalStateException("Invalid post-aggregation function.");

Review comment:
       Include the string representation of function?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -526,28 +543,51 @@ private Predicate getPredicate(String predicateString) {
    * passed to this method to be used when evaluating the expression.
    *
    * @param postAggregationExpression Post-aggregation expression to evaluate (modeled as a filter)
+   * @param expressions list of aggregation function parameters
    * @param sketchMap Precomputed sketches for predicates that are part of the expression.
    * @return Overall evaluated sketch for the expression.
    */
-  private Sketch evalPostAggregationExpression(FilterContext postAggregationExpression,
-      Map<Predicate, Sketch> sketchMap) {
-    switch (postAggregationExpression.getType()) {
-      case AND:
-        Intersection intersection = _setOperationBuilder.buildIntersection();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          intersection.update(evalPostAggregationExpression(child, sketchMap));
-        }
-        return intersection.getResult();
-      case OR:
-        Union union = _setOperationBuilder.buildUnion();
-        for (FilterContext child : postAggregationExpression.getChildren()) {
-          union.update(evalPostAggregationExpression(child, sketchMap));
+  private Sketch evalPostAggregationExpression(
+      final ExpressionContext postAggregationExpression,
+      final List<Predicate> expressions,
+      final Map<Predicate, Sketch> sketchMap) {
+    if (postAggregationExpression.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Literal not supported in post-aggregation function");
+    }
+
+    if (postAggregationExpression.getType() == ExpressionContext.Type.IDENTIFIER) {
+      final Predicate exp =
+          expressions.get(extractSubstitutionPosition(postAggregationExpression.getLiteral()) - 1);

Review comment:
       Perhaps pre-substitition in the constructor would be better? For example, if the same $k arg is repeated multiple times, we might avoid the use of matcher using a temporary alias map?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -108,35 +136,24 @@ public DistinctCountThetaSketchAggregationFunction(List<ExpressionContext> argum
     Preconditions.checkArgument(paramsExpression.getType() == ExpressionContext.Type.LITERAL,
         "Last argument of DistinctCountThetaSketch must be literal (post-aggregation expression)");
     _postAggregationExpression = QueryContextConverterUtils
-        .getFilter(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
+        .getExpression(CalciteSqlParser.compileToExpression(postAggregationExpression.getLiteral()));
 
     // Initialize the predicate map
     _predicateInfoMap = new HashMap<>();
-    if (numArguments > 3) {
-      // Predicates are explicitly specified
-      for (int i = 2; i < numArguments - 1; i++) {
-        ExpressionContext predicateExpression = arguments.get(i);
-        Preconditions.checkArgument(predicateExpression.getType() == ExpressionContext.Type.LITERAL,
-            "Third to second last argument of DistinctCountThetaSketch must be literal (predicate expression)");
-        Predicate predicate = getPredicate(predicateExpression.getLiteral());
-        _inputExpressions.add(predicate.getLhs());
-        _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-      }
-    } else {
-      // Auto-derive predicates from the post-aggregation expression
-      Stack<FilterContext> stack = new Stack<>();
-      stack.push(_postAggregationExpression);
-      while (!stack.isEmpty()) {
-        FilterContext filter = stack.pop();
-        if (filter.getType() == FilterContext.Type.PREDICATE) {
-          Predicate predicate = filter.getPredicate();
-          _inputExpressions.add(predicate.getLhs());
-          _predicateInfoMap.put(predicate, new PredicateInfo(predicate));
-        } else {
-          stack.addAll(filter.getChildren());
-        }
-      }
+
+    // Predicates are explicitly specified

Review comment:
       We should still keep the auto-deriving of predicates. Granted, we won't be able to use $ notation in that case though. What do you think?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *
+   * For example, if the post aggregation function is:
+   * INTERSECT($1, $2, $3)
+   *
+   * But there are only 2 arguments passed into the aggregation function, throw an error
+   * @param context The parsed function context that's a tree structure
+   * @param numPredicates Max number of predicates available to be substituted
+   */
+  private static void validatePostAggregationExpression(final ExpressionContext context, final int numPredicates) {
+    if (context.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Invalid post-aggregation function expression syntax.");
+    }
+
+    if (context.getType() == ExpressionContext.Type.IDENTIFIER) {
+      int id = extractSubstitutionPosition(context.getIdentifier());
+      if (id <= 0)
+        throw new IllegalArgumentException("Argument substitution starts at $1");
+      if (id > numPredicates)
+        throw new IllegalArgumentException("Argument substitution exceeded number of predicates");
+      // if none of the invalid conditions are met above, exit out early
+      return;
+    }
+
+    if (!MergeFunction.isValid(context.getFunction().getFunctionName())) {
+      final String allowed =
+          Arrays.stream(MergeFunction.values())
+              .map(MergeFunction::name)
+              .collect(Collectors.joining(","));
+      throw new IllegalArgumentException(
+          String.format("Invalid Theta Sketch aggregation function. Allowed: [%s]", allowed));
+    }
+
+    switch (MergeFunction.valueOf(context.getFunction().getFunctionName().toUpperCase())) {
+      case SET_DIFF:
+        // set diff can only have 2 arguments
+        if (context.getFunction().getArguments().size() != 2) {

Review comment:
       +1

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/DistinctCountThetaSketchTest.java
##########
@@ -117,60 +121,77 @@ public void testGroupBySql() {
     testThetaSketches(true, true);
   }
 
+  @Test(expectedExceptions = BadQueryRequestException.class, dataProvider = "badQueries")
+  public void testInvalidNoPredicates(final String query) {
+    getBrokerResponseForSqlQuery(query);
+  }
+
+  @DataProvider(name = "badQueries")
+  public Object[][] badQueries() {
+    return new Object[][] {
+        // need at least 4 arguments in agg func
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', '$0') from testTable"},
+        // substitution arguments should start at $1
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', 'colA = 1', '$0') from testTable"},
+        // substituting variable has numeric value higher than the number of predicates provided
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', 'colA = 1', '$5') from testTable"},
+        // SET_DIFF requires exactly 2 arguments
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', 'colA = 1', 'SET_DIFF($1)') from testTable"},
+        // invalid merging function
+        {"select distinctCountThetaSketch(colTS, 'nominalEntries=123', 'colA = 1', 'asdf') from testTable"}
+    };
+  }
+
   private void testThetaSketches(boolean groupBy, boolean sql) {
     String tsQuery, distinctQuery;
     String thetaSketchParams = "nominalEntries=1001";
 
     List<String> predicateStrings = Collections.singletonList("colA = 1");
+    String substitution = "$1";
     String whereClause = Strings.join(predicateStrings, " or ");
-    tsQuery = buildQuery(whereClause, thetaSketchParams, predicateStrings, whereClause, groupBy, false);
+    tsQuery = buildQuery(whereClause, thetaSketchParams, predicateStrings, substitution, groupBy, false);

Review comment:
       Isn't the fourth argument postAggregationExpression? If so, it should look more like a set operation, as opposed to "$1"?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *
+   * For example, if the post aggregation function is:
+   * INTERSECT($1, $2, $3)
+   *
+   * But there are only 2 arguments passed into the aggregation function, throw an error
+   * @param context The parsed function context that's a tree structure
+   * @param numPredicates Max number of predicates available to be substituted
+   */
+  private static void validatePostAggregationExpression(final ExpressionContext context, final int numPredicates) {
+    if (context.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Invalid post-aggregation function expression syntax.");
+    }
+
+    if (context.getType() == ExpressionContext.Type.IDENTIFIER) {
+      int id = extractSubstitutionPosition(context.getIdentifier());
+      if (id <= 0)
+        throw new IllegalArgumentException("Argument substitution starts at $1");
+      if (id > numPredicates)
+        throw new IllegalArgumentException("Argument substitution exceeded number of predicates");
+      // if none of the invalid conditions are met above, exit out early
+      return;
+    }
+
+    if (!MergeFunction.isValid(context.getFunction().getFunctionName())) {
+      final String allowed =
+          Arrays.stream(MergeFunction.values())

Review comment:
       We tend to avoid stream apis in query execution as they tend to have performance overhead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#issuecomment-671100130


   > @Jackie-Jiang Since we have discussed changing the syntax of this function in the past (to provide projection columns, etc), and given that this PR is changing the syntax as well (for a reason slightly different from our previous discussions), perhaps we should finalize on a syntax that won't need further changing?
   
   From the easy to use point of view, the current syntax is good. One feature that is not supported with the current syntax is to support theta-sketch without any predicates (union without post aggregation)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] bkuang88 commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
bkuang88 commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r468136912



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *

Review comment:
       Done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *
+   * For example, if the post aggregation function is:
+   * INTERSECT($1, $2, $3)
+   *
+   * But there are only 2 arguments passed into the aggregation function, throw an error
+   * @param context The parsed function context that's a tree structure
+   * @param numPredicates Max number of predicates available to be substituted
+   */
+  private static void validatePostAggregationExpression(final ExpressionContext context, final int numPredicates) {
+    if (context.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Invalid post-aggregation function expression syntax.");
+    }
+
+    if (context.getType() == ExpressionContext.Type.IDENTIFIER) {
+      int id = extractSubstitutionPosition(context.getIdentifier());
+      if (id <= 0)
+        throw new IllegalArgumentException("Argument substitution starts at $1");
+      if (id > numPredicates)
+        throw new IllegalArgumentException("Argument substitution exceeded number of predicates");
+      // if none of the invalid conditions are met above, exit out early
+      return;
+    }
+
+    if (!MergeFunction.isValid(context.getFunction().getFunctionName())) {
+      final String allowed =
+          Arrays.stream(MergeFunction.values())
+              .map(MergeFunction::name)
+              .collect(Collectors.joining(","));
+      throw new IllegalArgumentException(
+          String.format("Invalid Theta Sketch aggregation function. Allowed: [%s]", allowed));
+    }
+
+    switch (MergeFunction.valueOf(context.getFunction().getFunctionName().toUpperCase())) {
+      case SET_DIFF:
+        // set diff can only have 2 arguments
+        if (context.getFunction().getArguments().size() != 2) {
+          throw new IllegalArgumentException("SET_DIFF function can only have 2 arguments.");
+        }
+        break;

Review comment:
       Done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -576,6 +616,69 @@ private SetOperationBuilder getSetOperationBuilder() {
         : SetOperation.builder().setNominalEntries(_thetaSketchParams.getNominalEntries());
   }
 
+  /**
+   * Validates that the function context's substitution parameters ($1, $2, etc) does not exceed the number
+   * of predicates passed into the post-aggregation function.
+   *
+   * For example, if the post aggregation function is:
+   * INTERSECT($1, $2, $3)
+   *
+   * But there are only 2 arguments passed into the aggregation function, throw an error
+   * @param context The parsed function context that's a tree structure
+   * @param numPredicates Max number of predicates available to be substituted
+   */
+  private static void validatePostAggregationExpression(final ExpressionContext context, final int numPredicates) {
+    if (context.getType() == ExpressionContext.Type.LITERAL) {
+      throw new IllegalArgumentException("Invalid post-aggregation function expression syntax.");
+    }
+
+    if (context.getType() == ExpressionContext.Type.IDENTIFIER) {
+      int id = extractSubstitutionPosition(context.getIdentifier());
+      if (id <= 0)
+        throw new IllegalArgumentException("Argument substitution starts at $1");
+      if (id > numPredicates)
+        throw new IllegalArgumentException("Argument substitution exceeded number of predicates");
+      // if none of the invalid conditions are met above, exit out early
+      return;
+    }
+
+    if (!MergeFunction.isValid(context.getFunction().getFunctionName())) {
+      final String allowed =
+          Arrays.stream(MergeFunction.values())
+              .map(MergeFunction::name)
+              .collect(Collectors.joining(","));
+      throw new IllegalArgumentException(
+          String.format("Invalid Theta Sketch aggregation function. Allowed: [%s]", allowed));
+    }
+
+    switch (MergeFunction.valueOf(context.getFunction().getFunctionName().toUpperCase())) {
+      case SET_DIFF:
+        // set diff can only have 2 arguments
+        if (context.getFunction().getArguments().size() != 2) {
+          throw new IllegalArgumentException("SET_DIFF function can only have 2 arguments.");
+        }
+        break;
+      case SET_UNION:
+      case SET_INTERSECT:

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg edited a comment on pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
kishoreg edited a comment on pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#issuecomment-671441879


   can we chain the operations?
   
   `diff(union($1, $2), union($3, $4)) `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] bkuang88 commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
bkuang88 commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r468136800



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {

Review comment:
       I got rid of the unit testing method as you suggested. But I personally like to keep the enums for type-safety. They don't really have any overhead since they are all static right? Happy to change if you feel strongly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5832: Added set-diff operators and changed distinctCountThetaSketch syntax

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5832:
URL: https://github.com/apache/incubator-pinot/pull/5832#discussion_r467595198



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountThetaSketchAggregationFunction.java
##########
@@ -57,11 +61,31 @@
  * <p>TODO: For performance concern, use {@code List<Sketch>} as the intermediate result.
  */
 public class DistinctCountThetaSketchAggregationFunction implements AggregationFunction<Map<String, Sketch>, Long> {
+
+  public enum MergeFunction {
+    SET_UNION, SET_INTERSECT, SET_DIFF;

Review comment:
       I think `UNION` is SQL reserved keyword, so Calcite will flag it as syntax error. If there's a way around that, I'd also prefer concise names.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org