You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "somandal (via GitHub)" <gi...@apache.org> on 2023/06/05 18:38:51 UTC

[GitHub] [pinot] somandal opened a new pull request, #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

somandal opened a new pull request, #10846:
URL: https://github.com/apache/pinot/pull/10846

   This work is a collaboration between @vvivekiyer and myself
   *Important Note:* This PR must be rebased on top of @vvivekiyer 's runtime changes (PR: https://github.com/apache/pinot/pull/10845) and integrated before it can be merged
   
   Motivation:
   
   - Today only a subset of aggregation functions (sum, min, max, count, avg, count(distinct), kurtosis, and skewness) work in the multistage engine.
   - To support each new aggregation function changes are required on both the planner and runtime side. Pinot contributors should not need to understand the nitty gritty details of the planner and execution engine to add new aggregation functions.
   - The multistage `AggregateOperator` essentially reimplements the aggregation logic rather than reusing the aggregation logic present in the v1 engine functions. This results in code duplication and a high maintenance overhead (e.g. bug fixes must be made in two places which may even have different logic or different mechanisms to support things like null handling).
   - The v1 execution engine supports a large number of aggregation functions and it is not practical to manually add support for each new aggregation function to multistage. This also results in a situation in which each new aggregation function needs to be added to both the v1 engine and the multistage engine, or if support is only added for the v1 engine, the multistage engine must constantly play catch up.
   
   To get around the above limitations we are refactoring the aggregation function framework to work for both the v1 and multistage engine. This PR tackles the changes required on the planner side to make the planner more generic. This is phase 1 of the planned set of changed to support the aggregation functions and tackles only the existing set of aggregation functions already supported in the multistage engine.
   
   - OSS Issue: https://github.com/apache/pinot/issues/10745
   - Design document: https://docs.google.com/document/d/1Us6aBvTpNLMEy0ODo34OgTk73h_LVFFAH6q17689h1M/edit?usp=sharing
   - @vvivekiyer 's runtime changes to support this: https://github.com/apache/pinot/pull/10845
   
   Types of functions this PR does *not* support (and for which support will be added later on):
   
   - Add support for taking function arguments as literals (e.g. PERCENTILE)
   - Add support for aggregation functions which take more than one column as arguments (e.g . COVARIANCE)
   - Arg min/max type aggregation functions using the Parent-Child relationship


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr merged pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr merged PR #10846:
URL: https://github.com/apache/pinot/pull/10846


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on PR #10846:
URL: https://github.com/apache/pinot/pull/10846#issuecomment-1577421662

   Just a note: the test failures are currently expected as the changes need to be integrated with the runtime code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "kishoreg (via GitHub)" <gi...@apache.org>.
kishoreg commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1218955611


##########
pinot-query-planner/src/main/java/org/apache/calcite/sql/fun/PinotOperatorTable.java:
##########
@@ -103,6 +98,61 @@ public final void initNoDuplicate() {
         throw Util.throwAsRuntime(Util.causeOrSelf(e));
       }
     }
+
+    // Walk through all the Pinot aggregation types and register those that are supported in multistage and which
+    // aren't standard Calcite functions such as SUM / MIN / MAX / COUNT etc.
+    for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) {
+      if (aggregationFunctionType.isNativeCalciteAggregationFunctionType()
+          || !aggregationFunctionType.isSupportedInMultiStage()) {

Review Comment:
   its probably better to just have a hard coded list in this class



##########
pinot-query-planner/src/main/java/org/apache/calcite/sql/fun/PinotOperatorTable.java:
##########
@@ -103,6 +98,61 @@ public final void initNoDuplicate() {
         throw Util.throwAsRuntime(Util.causeOrSelf(e));
       }
     }
+
+    // Walk through all the Pinot aggregation types and register those that are supported in multistage and which
+    // aren't standard Calcite functions such as SUM / MIN / MAX / COUNT etc.
+    for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) {
+      if (aggregationFunctionType.isNativeCalciteAggregationFunctionType()
+          || !aggregationFunctionType.isSupportedInMultiStage()) {

Review Comment:
   is it possible to have better method name to derive this? adding this to the Type interface will result in backward incompatibility later when we make v2 the default



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248908794


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/AggregateOperatorTest.java:
##########
@@ -79,7 +77,8 @@ public void shouldHandleUpstreamErrorBlocks() {
     DataSchema inSchema = new DataSchema(new String[]{"group", "arg"}, new ColumnDataType[]{INT, INT});
     DataSchema outSchema = new DataSchema(new String[]{"sum"}, new ColumnDataType[]{DOUBLE});
     AggregateOperator operator =
-        new AggregateOperator(OperatorTestUtil.getDefaultContext(), _input, outSchema, calls, group, inSchema);
+        new AggregateOperator(OperatorTestUtil.getDefaultContext(), _input, outSchema, inSchema, calls, group,
+            false, true, false, false);

Review Comment:
   TODO: many of these functions are setup to run in both leaf and intermediate, we need to mix and match test modes as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1220101094


##########
pinot-query-planner/src/main/java/org/apache/calcite/sql/fun/PinotOperatorTable.java:
##########
@@ -103,6 +98,61 @@ public final void initNoDuplicate() {
         throw Util.throwAsRuntime(Util.causeOrSelf(e));
       }
     }
+
+    // Walk through all the Pinot aggregation types and register those that are supported in multistage and which
+    // aren't standard Calcite functions such as SUM / MIN / MAX / COUNT etc.
+    for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) {
+      if (aggregationFunctionType.isNativeCalciteAggregationFunctionType()
+          || !aggregationFunctionType.isSupportedInMultiStage()) {

Review Comment:
   @kishoreg i have removed that flag and moved out any specific APIs to do with multistage to the `PinotOperatorTable` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1242671755


##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/function/InternalReduceFunctions.java:
##########
@@ -35,17 +35,61 @@ private InternalReduceFunctions() {
   }
 
   @ScalarFunction
-  public static double skewnessReduce(PinotFourthMoment fourthMoment) {
+  public static Double skewnessReduce(PinotFourthMoment fourthMoment) {
+    if (fourthMoment == null) {
+      return null;
+    }
     return fourthMoment.skew();
   }
 
   @ScalarFunction
-  public static double kurtosisReduce(PinotFourthMoment fourthMoment) {
+  public static Double kurtosisReduce(PinotFourthMoment fourthMoment) {
+    if (fourthMoment == null) {
+      return null;
+    }
     return fourthMoment.kurtosis();
   }
 
   @ScalarFunction
-  public static int countDistinctReduce(Set<?> values) {
+  public static Integer countDistinctReduce(Set<?> values) {
     return values.size();
   }
+
+  @ScalarFunction
+  public static Double maxReduce(Double intermediateResult) {
+    return intermediateResult;

Review Comment:
   You can just have a `SingleValueReduce` to avoid the code duplication?



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java:
##########
@@ -189,8 +188,7 @@ public static AggregationFunction getAggregationFunction(FunctionContext functio
           case MAX:
             return new MaxAggregationFunction(firstArgument, nullHandlingEnabled);
           case SUM:
-          // TODO(Sonam): Uncomment SUM0 when merging planner changes
-          // case SUM0:
+          case SUM0:

Review Comment:
   Is there any other functions need to append 0 after the function name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248183763


##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/rules/PinotAggregateReduceFunctionsRule.java:
##########
@@ -0,0 +1,427 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.calcite.rel.rules;
+
+import com.google.common.collect.ImmutableSet;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Set;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.PinotSqlAggFunction;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.calcite.util.CompositeList;
+import org.apache.calcite.util.Util;
+import org.apache.pinot.segment.spi.AggregationFunctionType;
+
+
+/**
+ * Note: This class copies the logic for reducing SUM and AVG from {@link AggregateReduceFunctionsRule} with some
+ * changes to use our Pinot defined operand type checkers and return types. This is necessary otherwise the v1
+ * aggregations won't work in the v2 engine due to type issues. Once we fix the return types for the v1 aggregation
+ * functions the logic for AVG and SUM can be removed. We also had to resort to using an AVG_REDUCE scalar function
+ * due to null handling issues with DIVIDE (returning null on count = 0 via a CASE statement was also not possible
+ * as the types of the columns were all non-null and Calcite marks nullable and non-nullable columns as incompatible).
+ *
+ * We added additional logic to handle typecasting MIN / MAX functions for EVERY / SOME aggregation functions in Calcite
+ * which internally uses MIN / MAX with boolean return types. This was necessary because the v1 aggregations for
+ * MIN / MAX always return DOUBLE and this caused type issues for certain queries that utilize Calcite's EVERY / SOME
+ * aggregation functions.
+ *
+ * Planner rule that reduces aggregate functions in
+ * {@link org.apache.calcite.rel.core.Aggregate}s to simpler forms.
+ *
+ * <p>Rewrites:
+ * <ul>
+ *
+ * <li>AVG(x) &rarr; SUM(x) / COUNT(x)
+ *
+ * </ul>
+ *
+ * <p>Since many of these rewrites introduce multiple occurrences of simpler
+ * forms like {@code COUNT(x)}, the rule gathers common sub-expressions as it
+ * goes.
+ *
+ * @see CoreRules#AGGREGATE_REDUCE_FUNCTIONS
+ */
+public class PinotAggregateReduceFunctionsRule

Review Comment:
   discussed this already and decided not to rename it back as it was added as a replacement for an existing calcite rule and not a replacement for the other one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248908703


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/PhysicalPlanVisitor.java:
##########
@@ -100,17 +100,14 @@ public MultiStageOperator visitAggregate(AggregateNode node, PhysicalPlanContext
     DataSchema inputSchema = node.getInputs().get(0).getDataSchema();
     DataSchema resultSchema = node.getDataSchema();
 
-    // TODO(Sonam): Rename to AggregateOperator when the planner changes are merged.
-//    boolean extractFinalResult = AggregateNode.isFinalStage(node);
-//    boolean isIntermediateStage = AggregateNode.isIntermediateStage(node);
-//    boolean isLeafStage = AggregateNode.isLeafStage(node);
-//    boolean treatIntermediateAsLeaf = node.isTreatIntermediateStageAsLeaf();
-//
-//    return new NewAggregateOperator(context.getOpChainExecutionContext(), nextOperator, resultSchema, inputSchema,
-//        node.getAggCalls(), node.getGroupSet(), isLeafStage, isIntermediateStage, extractFinalResult,
-//        treatIntermediateAsLeaf);
-    return new AggregateOperator(context.getOpChainExecutionContext(), nextOperator, node.getDataSchema(),
-        node.getAggCalls(), node.getGroupSet(), node.getInputs().get(0).getDataSchema());
+    boolean extractFinalResult = AggregateNode.isFinalStage(node);
+    boolean isIntermediateStage = AggregateNode.isIntermediateStage(node);
+    boolean isLeafStage = AggregateNode.isLeafStage(node);
+    boolean treatIntermediateAsLeaf = node.isTreatIntermediateStageAsLeaf();

Review Comment:
   directly set mode is much simplier see previous comments



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1252834103


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/AggregateOperatorTest.java:
##########
@@ -79,7 +77,8 @@ public void shouldHandleUpstreamErrorBlocks() {
     DataSchema inSchema = new DataSchema(new String[]{"group", "arg"}, new ColumnDataType[]{INT, INT});
     DataSchema outSchema = new DataSchema(new String[]{"sum"}, new ColumnDataType[]{DOUBLE});
     AggregateOperator operator =
-        new AggregateOperator(OperatorTestUtil.getDefaultContext(), _input, outSchema, calls, group, inSchema);
+        new AggregateOperator(OperatorTestUtil.getDefaultContext(), _input, outSchema, inSchema, calls, group,
+            false, true, false, false);

Review Comment:
    Is there a way to execute a pre-defined plan, which we can use that to auto-gen the mix-match the test cases?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on PR #10846:
URL: https://github.com/apache/pinot/pull/10846#issuecomment-1614018305

   Have addressed the suggestions from @walterddr and @Jackie-Jiang 
   
   This PR is ready to be reviewed @walterddr @Jackie-Jiang @kishoreg @xiangfu0 but must only be merged after rebasing and incorporating changes from https://github.com/apache/pinot/pull/10845


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248091869


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseSingleInputAggregationFunction.java:
##########
@@ -40,7 +40,7 @@ public BaseSingleInputAggregationFunction(ExpressionContext expression) {
 
   @Override
   public String getResultColumnName() {
-    return getType().getName().toLowerCase() + "(" + _expression + ")";
+    return getType().getName().replace("_", "").toLowerCase() + "(" + _expression + ")";

Review Comment:
   none of the v1 aggregation function have an '_' in their name so this should be fine. This was added to handle backward incompatibility for BOOL_AND and BOOL_OR which are named with underscore in v2 but don't have an underscore in v1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248908609


##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/AggregateNode.java:
##########
@@ -35,30 +35,72 @@ public class AggregateNode extends AbstractPlanNode {
   private List<RexExpression> _aggCalls;
   @ProtoProperties
   private List<RexExpression> _groupSet;
+  @ProtoProperties
+  private AggregationStage _aggregationStage;
+  @ProtoProperties
+  private boolean _treatIntermediateStageAsLeaf;
+
+  /**
+   * Enum to denote the aggregation stage being performed. Hints are used to populate these values
+   * LEAF - leaf aggregation level which performs the aggregations on the raw values directly
+   * INTERMEDIATE - intermediate aggregation level which merges results from lower aggregation levels
+   * FINAL - final aggregation level which extracts the final result
+   */
+  public enum AggregationStage {
+    LEAF,
+    INTERMEDIATE,
+    FINAL
+  }

Review Comment:
   let's merge this ENUM with AggregateOperator.Mode
   IMO, there's nothing we can't determine during plan time for runtime operator to perform.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1219868444


##########
pinot-query-planner/src/main/java/org/apache/calcite/sql/fun/PinotOperatorTable.java:
##########
@@ -103,6 +98,61 @@ public final void initNoDuplicate() {
         throw Util.throwAsRuntime(Util.causeOrSelf(e));
       }
     }
+
+    // Walk through all the Pinot aggregation types and register those that are supported in multistage and which
+    // aren't standard Calcite functions such as SUM / MIN / MAX / COUNT etc.
+    for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) {
+      if (aggregationFunctionType.isNativeCalciteAggregationFunctionType()
+          || !aggregationFunctionType.isSupportedInMultiStage()) {

Review Comment:
   I think I can derive this from the other fields in the enum so let me try that route before maintaining either kind of list. there are over 65 aggregation functions, so any list will eventually be impractical to maintain, even if temporary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "kishoreg (via GitHub)" <gi...@apache.org>.
kishoreg commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1219852675


##########
pinot-query-planner/src/main/java/org/apache/calcite/sql/fun/PinotOperatorTable.java:
##########
@@ -103,6 +98,61 @@ public final void initNoDuplicate() {
         throw Util.throwAsRuntime(Util.causeOrSelf(e));
       }
     }
+
+    // Walk through all the Pinot aggregation types and register those that are supported in multistage and which
+    // aren't standard Calcite functions such as SUM / MIN / MAX / COUNT etc.
+    for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) {
+      if (aggregationFunctionType.isNativeCalciteAggregationFunctionType()
+          || !aggregationFunctionType.isSupportedInMultiStage()) {

Review Comment:
   then have a negation list to list the ones that are not supported in v2 or something calcite already supports.. my concern is that the aggregationfunction type itself should not care about its supported in multistage etc.. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1219868444


##########
pinot-query-planner/src/main/java/org/apache/calcite/sql/fun/PinotOperatorTable.java:
##########
@@ -103,6 +98,61 @@ public final void initNoDuplicate() {
         throw Util.throwAsRuntime(Util.causeOrSelf(e));
       }
     }
+
+    // Walk through all the Pinot aggregation types and register those that are supported in multistage and which
+    // aren't standard Calcite functions such as SUM / MIN / MAX / COUNT etc.
+    for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) {
+      if (aggregationFunctionType.isNativeCalciteAggregationFunctionType()
+          || !aggregationFunctionType.isSupportedInMultiStage()) {

Review Comment:
   I think I can derive this from the other fields in the enum so let me try that route before maintaining either kind of list. there are over 65 aggregation functions, so any hard-coded list will eventually be impractical to maintain, even if temporary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1219834738


##########
pinot-query-planner/src/main/java/org/apache/calcite/sql/fun/PinotOperatorTable.java:
##########
@@ -103,6 +98,61 @@ public final void initNoDuplicate() {
         throw Util.throwAsRuntime(Util.causeOrSelf(e));
       }
     }
+
+    // Walk through all the Pinot aggregation types and register those that are supported in multistage and which
+    // aren't standard Calcite functions such as SUM / MIN / MAX / COUNT etc.
+    for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) {
+      if (aggregationFunctionType.isNativeCalciteAggregationFunctionType()
+          || !aggregationFunctionType.isSupportedInMultiStage()) {

Review Comment:
   Are you talking about `isSupportedInMultiStage()`? I can perhaps rename it to something like `registerWithOperatorTable()` or something like that for now.
   
   This method is meant to be a temporary one which we will remove once support for all aggregation functions is added to multistage. I don't want to keep a hard-coded list here because eventually most aggregation functions will be supported and the full list is available as part of the `AggregationFunctionType` enum class anyways.
   
   Also note that there are many MV aggregation functions which we cannot support yet without generically adding MV support to multi-stage as part of this OSS issue: https://github.com/apache/pinot/issues/10658 
   So we may need to gatekeep which aggregation functions are supported for a while, but over time it should be possible to support most.
   
   I'm not sure what you mean by backward incompatibility when we make v2 the default? Can you elaborate a bit on that part?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248908947


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/AggregateOperatorTest.java:
##########
@@ -188,60 +191,27 @@ public void shouldAggregateSingleInputBlockWithLiteralInput() {
     Assert.assertTrue(block2.isEndOfStreamBlock(), "Second block is EOS (done processing)");
   }
 
-  @Test
-  public void shouldCallMergerWhenAggregatingMultipleRows() {
-    // Given:
-    List<RexExpression> calls = ImmutableList.of(getSum(new RexExpression.InputRef(1)));
-    List<RexExpression> group = ImmutableList.of(new RexExpression.InputRef(0));
-
-    DataSchema inSchema = new DataSchema(new String[]{"group", "arg"}, new ColumnDataType[]{INT, INT});
-    Mockito.when(_input.nextBlock())
-        .thenReturn(OperatorTestUtil.block(inSchema, new Object[]{1, 1}, new Object[]{1, 1}))
-        .thenReturn(OperatorTestUtil.block(inSchema, new Object[]{1, 1}))
-        .thenReturn(TransferableBlockUtils.getEndOfStreamTransferableBlock());
-
-    AggregationUtils.Merger merger = Mockito.mock(AggregationUtils.Merger.class);
-    Mockito.when(merger.merge(Mockito.any(), Mockito.any())).thenReturn(12d);
-    Mockito.when(merger.init(Mockito.any(), Mockito.any())).thenReturn(1d);
-    DataSchema outSchema = new DataSchema(new String[]{"sum"}, new ColumnDataType[]{DOUBLE});
-    AggregateOperator operator =
-        new AggregateOperator(OperatorTestUtil.getDefaultContext(), _input, outSchema, calls, group, inSchema,
-            ImmutableMap.of("SUM", cdt -> merger));

Review Comment:
   This test is no longer allowed as we do not have a merger API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248167425


##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/function/InternalReduceFunctions.java:
##########
@@ -35,17 +33,10 @@ private InternalReduceFunctions() {
   }
 
   @ScalarFunction
-  public static double skewnessReduce(PinotFourthMoment fourthMoment) {
-    return fourthMoment.skew();
-  }
-
-  @ScalarFunction
-  public static double kurtosisReduce(PinotFourthMoment fourthMoment) {
-    return fourthMoment.kurtosis();
-  }
-
-  @ScalarFunction
-  public static int countDistinctReduce(Set<?> values) {
-    return values.size();
+  public static Double avgReduce(Double intermediateResultSum, Long intermediateResultCount) {

Review Comment:
   `@ScalarFunction(nullableParameters = true)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10846:
URL: https://github.com/apache/pinot/pull/10846#issuecomment-1577372202

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#10846](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (ba818df) into [master](https://app.codecov.io/gh/apache/pinot/commit/da6944882da5c30b1f790b21d1e247a8466da67b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (da69448) will **decrease** coverage by `54.82%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #10846       +/-   ##
   =============================================
   - Coverage     68.42%   13.61%   -54.82%     
   + Complexity     6573      439     -6134     
   =============================================
     Files          2170     2116       -54     
     Lines        116688   114263     -2425     
     Branches      17661    17364      -297     
   =============================================
   - Hits          79846    15559    -64287     
   - Misses        31196    97431    +66235     
   + Partials       5646     1273     -4373     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `13.61% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...gregation/function/AggregationFunctionFactory.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9BZ2dyZWdhdGlvbkZ1bmN0aW9uRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-81.36%)` | :arrow_down: |
   | [...ation/function/BaseBooleanAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9CYXNlQm9vbGVhbkFnZ3JlZ2F0aW9uRnVuY3Rpb24uamF2YQ==) | `0.00% <ø> (-60.68%)` | :arrow_down: |
   | [...n/function/BaseSingleInputAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9CYXNlU2luZ2xlSW5wdXRBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `0.00% <0.00%> (-75.00%)` | :arrow_down: |
   | [...gation/function/BooleanAndAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Cb29sZWFuQW5kQWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | `0.00% <0.00%> (-60.00%)` | :arrow_down: |
   | [...egation/function/BooleanOrAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Cb29sZWFuT3JBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `0.00% <0.00%> (-60.00%)` | :arrow_down: |
   | [...aggregation/function/CountAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Db3VudEFnZ3JlZ2F0aW9uRnVuY3Rpb24uamF2YQ==) | `0.00% <0.00%> (-76.55%)` | :arrow_down: |
   | [...ion/function/DistinctCountAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50QWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | `0.00% <0.00%> (-80.00%)` | :arrow_down: |
   | [...tion/function/FourthMomentAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Gb3VydGhNb21lbnRBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `0.00% <0.00%> (-66.08%)` | :arrow_down: |
   | [...y/aggregation/function/MaxAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9NYXhBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `0.00% <0.00%> (-72.42%)` | :arrow_down: |
   | [...y/aggregation/function/MinAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9NaW5BZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `0.00% <0.00%> (-72.42%)` | :arrow_down: |
   | ... and [3 more](https://app.codecov.io/gh/apache/pinot/pull/10846?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | |
   
   ... and [1664 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/10846/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248049043


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseSingleInputAggregationFunction.java:
##########
@@ -40,7 +40,7 @@ public BaseSingleInputAggregationFunction(ExpressionContext expression) {
 
   @Override
   public String getResultColumnName() {
-    return getType().getName().toLowerCase() + "(" + _expression + ")";
+    return getType().getName().replace("_", "").toLowerCase() + "(" + _expression + ")";

Review Comment:
   will this be backward incompatible when upgrading 
   (e.g. server returns a different result ColunmName while broker is still using older version of the name and thus can't find the right column?)



##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/rules/PinotAggregateReduceFunctionsRule.java:
##########
@@ -0,0 +1,427 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.calcite.rel.rules;
+
+import com.google.common.collect.ImmutableSet;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Set;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.PinotSqlAggFunction;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.calcite.util.CompositeList;
+import org.apache.calcite.util.Util;
+import org.apache.pinot.segment.spi.AggregationFunctionType;
+
+
+/**
+ * Note: This class copies the logic for reducing SUM and AVG from {@link AggregateReduceFunctionsRule} with some
+ * changes to use our Pinot defined operand type checkers and return types. This is necessary otherwise the v1
+ * aggregations won't work in the v2 engine due to type issues. Once we fix the return types for the v1 aggregation
+ * functions the logic for AVG and SUM can be removed. We also had to resort to using an AVG_REDUCE scalar function
+ * due to null handling issues with DIVIDE (returning null on count = 0 via a CASE statement was also not possible
+ * as the types of the columns were all non-null and Calcite marks nullable and non-nullable columns as incompatible).
+ *
+ * We added additional logic to handle typecasting MIN / MAX functions for EVERY / SOME aggregation functions in Calcite
+ * which internally uses MIN / MAX with boolean return types. This was necessary because the v1 aggregations for
+ * MIN / MAX always return DOUBLE and this caused type issues for certain queries that utilize Calcite's EVERY / SOME
+ * aggregation functions.
+ *
+ * Planner rule that reduces aggregate functions in
+ * {@link org.apache.calcite.rel.core.Aggregate}s to simpler forms.
+ *
+ * <p>Rewrites:
+ * <ul>
+ *
+ * <li>AVG(x) &rarr; SUM(x) / COUNT(x)
+ *
+ * </ul>
+ *
+ * <p>Since many of these rewrites introduce multiple occurrences of simpler
+ * forms like {@code COUNT(x)}, the rule gathers common sub-expressions as it
+ * goes.
+ *
+ * @see CoreRules#AGGREGATE_REDUCE_FUNCTIONS
+ */
+public class PinotAggregateReduceFunctionsRule

Review Comment:
   why the rename? is most code similar to PinotReduceAggregateFunctionRule?



##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/hint/PinotHintStrategyTable.java:
##########
@@ -27,8 +27,11 @@
  * Default hint strategy set for Pinot query.
  */
 public class PinotHintStrategyTable {
-  public static final String INTERNAL_AGG_INTERMEDIATE_STAGE = "aggIntermediateStage";
   public static final String INTERNAL_AGG_FINAL_STAGE = "aggFinalStage";
+  public static final String INTERNAL_AGG_INTERMEDIATE_STAGE = "aggIntermediateStage";
+  public static final String INTERNAL_AGG_LEAF_STAGE = "aggLeafStage";

Review Comment:
   1. can we add javadocs for these? i think some definition is different from previous usage of these internal hints
   2. please also include the internal hints to agg node function name mapping in the java doc and reference to those ( i believe it was called `MODE` in #10845)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248289636


##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/function/InternalReduceFunctions.java:
##########
@@ -35,17 +33,10 @@ private InternalReduceFunctions() {
   }
 
   @ScalarFunction
-  public static double skewnessReduce(PinotFourthMoment fourthMoment) {
-    return fourthMoment.skew();
-  }
-
-  @ScalarFunction
-  public static double kurtosisReduce(PinotFourthMoment fourthMoment) {
-    return fourthMoment.kurtosis();
-  }
-
-  @ScalarFunction
-  public static int countDistinctReduce(Set<?> values) {
-    return values.size();
+  public static Double avgReduce(Double intermediateResultSum, Long intermediateResultCount) {

Review Comment:
   done



##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/hint/PinotHintStrategyTable.java:
##########
@@ -27,8 +27,11 @@
  * Default hint strategy set for Pinot query.
  */
 public class PinotHintStrategyTable {
-  public static final String INTERNAL_AGG_INTERMEDIATE_STAGE = "aggIntermediateStage";
   public static final String INTERNAL_AGG_FINAL_STAGE = "aggFinalStage";
+  public static final String INTERNAL_AGG_INTERMEDIATE_STAGE = "aggIntermediateStage";
+  public static final String INTERNAL_AGG_LEAF_STAGE = "aggLeafStage";

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10846: [multistage] Initial planner changes to support the v1 aggregation functions in multistage

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10846:
URL: https://github.com/apache/pinot/pull/10846#discussion_r1248289893


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseSingleInputAggregationFunction.java:
##########
@@ -40,7 +40,7 @@ public BaseSingleInputAggregationFunction(ExpressionContext expression) {
 
   @Override
   public String getResultColumnName() {
-    return getType().getName().toLowerCase() + "(" + _expression + ")";
+    return getType().getName().replace("_", "").toLowerCase() + "(" + _expression + ")";

Review Comment:
   discussed offline - now we register both names without and without the underscore. removed this change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org