You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "xiangfu0 (via GitHub)" <gi...@apache.org> on 2023/06/26 19:11:40 UTC

[GitHub] [pinot] xiangfu0 commented on a diff in pull request #10845: [Multistage] Runtime changes for leveraging V1 Aggregation Functions

xiangfu0 commented on code in PR #10845:
URL: https://github.com/apache/pinot/pull/10845#discussion_r1242589834


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java:
##########
@@ -295,6 +295,22 @@ public Double merge(Double intermediateMaxResult1, Double intermediateMaxResult2
     return intermediateMaxResult2;
   }
 
+  @Override
+  public void mergeAndUpdateResultHolder(Double intermediateResult,

Review Comment:
   Shall we make those default implementation for the base class and put unsupported and TODO for the implementations won't honor these two methods?



##########
pinot-common/src/main/java/org/apache/pinot/common/request/context/IdentifierContext.java:
##########
@@ -0,0 +1,75 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.common.request.context;
+
+import java.util.Objects;
+import org.apache.pinot.common.utils.DataSchema;
+
+/**
+ * The {@code IdentifierContext} class represents a Identifer in the query.
+ * <p> This class includes information that about an identifier. The v1 engine uses column names for identifiers. The
+ * multistage query engine uses ordinals to distinctly track each identifier. So this context is set up to support both
+ * v1 and multistage engine identifiers.
+ */
+public class IdentifierContext {
+  private DataSchema.ColumnDataType _dataType;
+  String _name;
+  // Identifier Index is needed because multistage engine tracks identifiers with the index(ordinal) position.
+  int _identifierIndex;
+
+  public IdentifierContext(String name, DataSchema.ColumnDataType dataType, int identifierIndex) {
+    _name = name;
+    _dataType = dataType;
+    _identifierIndex = identifierIndex;
+  }
+
+  public String getName() {
+    return _name;
+  }
+
+  public DataSchema.ColumnDataType getDataType() {
+    return _dataType;
+  }
+
+  public int getIndex() {
+    return _identifierIndex;
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+    if (!(o instanceof IdentifierContext)) {
+      return false;
+    }
+    IdentifierContext that = (IdentifierContext) o;
+    return Objects.equals(_name, that._name) && Objects.equals(_identifierIndex, that._identifierIndex);
+  }
+
+  @Override
+  public int hashCode() {
+    return Objects.hash(_name, _identifierIndex);
+  }
+
+  @Override
+  public String toString() {
+    return _name;

Review Comment:
   Where is this used? Shall we assemble both _name and _identifierIndex or even the _dataType here?



##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/hint/PinotHintStrategyTable.java:
##########
@@ -29,6 +29,9 @@
 public class PinotHintStrategyTable {
   public static final String INTERNAL_AGG_INTERMEDIATE_STAGE = "aggIntermediateStage";
   public static final String INTERNAL_AGG_FINAL_STAGE = "aggFinalStage";
+  // Hint to denote that aggregation is to be run as a single stage rather than split into an intermediate and a final
+  // stage
+  public static final String INTERNAL_IS_SINGLE_STAGE_AGG = "isSingleStageAgg";

Review Comment:
   Is this hint required to be set explicitly from external users or we are using it internally always?
   If later, I think we can add a parameter to the Agg PlanNode instead of adding a hint?



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/AggregateNode.java:
##########
@@ -44,9 +44,11 @@ public AggregateNode(int planFragmentId, DataSchema dataSchema, List<AggregateCa
       List<RexExpression> groupSet,
       List<RelHint> relHints) {
     super(planFragmentId, dataSchema);
-    _aggCalls = aggCalls.stream().map(RexExpression::toRexExpression).collect(Collectors.toList());
+
+
     _groupSet = groupSet;
     _relHints = relHints;
+    _aggCalls = aggCalls.stream().map(RexExpression::toRexExpression).collect(Collectors.toList());

Review Comment:
   why changed the ordering?



##########
pinot-core/src/main/java/org/apache/pinot/core/common/IntermediateStageBlockValSet.java:
##########
@@ -0,0 +1,250 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common;
+
+import java.math.BigDecimal;
+import java.util.List;
+import javax.annotation.Nullable;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.PinotDataType;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.roaringbitmap.RoaringBitmap;
+
+/**
+ * In the multistage engine, the leaf stage servers process the data in columnar fashion. By the time the
+ * intermediate stage receives the projected column, they are converted to a row based format. This class provides
+ * the capability to convert the row based represenation into blocks so that they can be used to process
+ * aggregations.
+ * TODO: Support MV
+ */
+public class IntermediateStageBlockValSet implements BlockValSet {
+  private final FieldSpec.DataType _dataType;
+  private final PinotDataType _pinotDataType;
+  private final List<Object> _values;
+  private final RoaringBitmap _nullBitMap;
+  private boolean _nullBitMapSet;

Review Comment:
   why have this boolean instead of set `_nullBitMap` to null and construct in `getNullBitmap` method? 



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggFunctionQueryContext.java:
##########
@@ -0,0 +1,62 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import java.util.List;
+import org.apache.pinot.common.request.context.OrderByExpressionContext;
+import org.apache.pinot.core.query.request.context.QueryContext;
+
+
+/**
+ * The <code>AggFunctionQueryContext</code> class contains extracted details from QueryContext that can be used for
+ * Aggregation Functions.
+ */
+public class AggFunctionQueryContext {
+  private boolean _isNullHandlingEnabled;

Review Comment:
   Make those member variables final?



##########
pinot-core/src/main/java/org/apache/pinot/core/common/IntermediateStageBlockValSet.java:
##########
@@ -0,0 +1,250 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common;
+
+import java.math.BigDecimal;
+import java.util.List;
+import javax.annotation.Nullable;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.PinotDataType;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.roaringbitmap.RoaringBitmap;
+
+/**
+ * In the multistage engine, the leaf stage servers process the data in columnar fashion. By the time the
+ * intermediate stage receives the projected column, they are converted to a row based format. This class provides
+ * the capability to convert the row based represenation into blocks so that they can be used to process
+ * aggregations.
+ * TODO: Support MV
+ */
+public class IntermediateStageBlockValSet implements BlockValSet {
+  private final FieldSpec.DataType _dataType;
+  private final PinotDataType _pinotDataType;
+  private final List<Object> _values;
+  private final RoaringBitmap _nullBitMap;
+  private boolean _nullBitMapSet;
+
+  public IntermediateStageBlockValSet(DataSchema.ColumnDataType columnDataType, List<Object> values) {

Review Comment:
   +1 to what @walterddr mentioned.
   I think we can have two constructors:
   1. `IntermediateStageBlockValSet(DataSchema.ColumnDataType columnDataType, List<Object> values)`, which is the current setup, that is a list of boxed java object, so we have the solution to solve all the scenarios both simple and complex data types. 
   2. Another constructor could be: `IntermediateStageBlockValSet(DataSchema.ColumnDataType columnDataType, Object values)`, in this case, values is instance of `int[]` when ColumnDataType is int.
   
   Agreed with you that this could be further optimization, which we can add TODO here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org