You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/07/23 04:29:18 UTC
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9086: Proper null handling in Aggregation functions, and comparison operators / matchers for SV data types

Jackie-Jiang commented on code in PR #9086:
URL: https://github.com/apache/pinot/pull/9086#discussion_r928074322


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java:
##########
@@ -188,6 +206,33 @@ public int matchValues(int limit, int[] docIds) {
     }
   }
 
+  private class DictIdMatcherAndNullHandler implements ValueMatcher {
+
+    private final int[] _buffer = new int[OPTIMAL_ITERATOR_BATCH_SIZE];
+    private final ImmutableRoaringBitmap _nullBitmap;
+
+    public DictIdMatcherAndNullHandler(ImmutableRoaringBitmap nullBitmap) {
+      _nullBitmap = nullBitmap;
+    }
+
+    @Override
+    public boolean doesValueMatch(int docId) {
+      if (_nullBitmap.contains(docId)) {
+        return false;
+      }
+      return _predicateEvaluator.applySV(_reader.getDictId(docId, _readerContext));
+    }
+
+    @Override
+    public int matchValues(int limit, int[] docIds) {
+      _reader.readDictIds(docIds, limit, _buffer, _readerContext);

Review Comment:
   This is not efficient. We should first use the `_nullBitmap` to filter out the null docs, then follow the same way as the regular matcher. Or, in the first version we can just use the default implementation



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java:
##########
@@ -48,12 +49,15 @@ public final class SVScanDocIdIterator implements ScanBasedDocIdIterator {
   private int _nextDocId = 0;
   private long _numEntriesScanned = 0L;
 
-  public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, ForwardIndexReader reader, int numDocs) {
+  public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, ForwardIndexReader reader, int numDocs,
+      NullValueVectorReader nullValueReader, boolean nullHandlingEnabled) {
     _predicateEvaluator = predicateEvaluator;
     _reader = reader;
     _readerContext = reader.createContext();
     _numDocs = numDocs;
-    _valueMatcher = getValueMatcher();
+    ImmutableRoaringBitmap nullBitmap = nullHandlingEnabled && nullValueReader != null

Review Comment:
   Check if `nullBitmap` is empty here, and set it to `null` when it is empty



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -95,30 +95,46 @@ public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> sele
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean nullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,

Review Comment:
   Why do we need to add this constructor? Seems you changed the distinct query to use this constructor. One issue with this is that it will cause backward incompatible during version upgrade if broker relies on data schema to process the distinct query



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -48,14 +48,17 @@ public class RangeIndexBasedFilterOperator extends BaseFilterOperator {
   private final PredicateEvaluator _rangePredicateEvaluator;
   private final DataSource _dataSource;
   private final int _numDocs;
+  private final boolean _nullHandlingEnabled;
 
   @SuppressWarnings("unchecked")
-  public RangeIndexBasedFilterOperator(PredicateEvaluator rangePredicateEvaluator, DataSource dataSource, int numDocs) {
+  public RangeIndexBasedFilterOperator(PredicateEvaluator rangePredicateEvaluator, DataSource dataSource, int numDocs,

Review Comment:
   I don't think we can use range index when null handling is enabled because range index won't count the null values. You may add a TODO to support null values in the range index



##########
pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java:
##########
@@ -95,6 +90,9 @@ public static FunctionContext getFunction(Function thriftFunction) {
       for (Expression operand : operands) {
         arguments.add(getExpression(operand));
       }
+      if (arguments.size() == 0 && functionName.equalsIgnoreCase(AggregationFunctionType.COUNT.getName())) {

Review Comment:
   Arguments should never be empty



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/CountAggregationFunction.java:
##########
@@ -30,34 +31,48 @@
 import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
 import org.apache.pinot.segment.spi.AggregationFunctionType;
 import org.apache.pinot.segment.spi.index.startree.AggregationFunctionColumnPair;
+import org.roaringbitmap.RoaringBitmap;
 
 
-public class CountAggregationFunction implements AggregationFunction<Long, Long> {
-  private static final String COLUMN_NAME = "count_star";
-  private static final String RESULT_COLUMN_NAME = "count(*)";
+public class CountAggregationFunction extends BaseSingleInputAggregationFunction<Long, Long> {
   private static final double DEFAULT_INITIAL_VALUE = 0.0;
   // Special expression used by star-tree to pass in BlockValSet
   private static final ExpressionContext STAR_TREE_COUNT_STAR_EXPRESSION =
       ExpressionContext.forIdentifier(AggregationFunctionColumnPair.STAR);
 
+  private final String _columnName;
+  private final String _resultColumnName;
+  private final boolean _nullHandlingEnabled;
+
+  public CountAggregationFunction(ExpressionContext expression, boolean nullHandlingEnabled) {
+    super(expression);
+    String expStr = expression.toString();

Review Comment:
   When null handling is enabled, we can set the argument to `*` to keep the current behavior



##########
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java:
##########
@@ -179,20 +181,29 @@ public Operator<IntermediateResultsBlock> buildNonFilteredAggOperator() {
     BaseFilterOperator filterOperator = filterPlanNode.run();
 
     if (canOptimizeFilteredCount(filterOperator, aggregationFunctions)) {
-      return new FastFilteredCountOperator(aggregationFunctions, filterOperator, _indexSegment.getSegmentMetadata());
+      return new FastFilteredCountOperator(aggregationFunctions, filterOperator, _indexSegment.getSegmentMetadata(),
+          _queryContext.isNullHandlingEnabled());
     }
 
-    if (filterOperator.isResultMatchingAll()) {
+    boolean skipNonScanBasedAggregation = false;
+    if (filterOperator.isResultMatchingAll() && !_queryContext.isNullHandlingEnabled()) {
       if (isFitForNonScanBasedPlan(aggregationFunctions, _indexSegment)) {
         DataSource[] dataSources = new DataSource[aggregationFunctions.length];
         for (int i = 0; i < aggregationFunctions.length; i++) {
           List<?> inputExpressions = aggregationFunctions[i].getInputExpressions();
           if (!inputExpressions.isEmpty()) {
             String column = ((ExpressionContext) inputExpressions.get(0)).getIdentifier();
             dataSources[i] = _indexSegment.getDataSource(column);
+            NullValueVectorReader nullValueReader = dataSources[i].getNullValueVector();

Review Comment:
   This part is not required because it is within the condition of no null handling



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java:
##########
@@ -28,13 +28,19 @@
 import org.apache.pinot.core.query.aggregation.groupby.DoubleGroupByResultHolder;
 import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
 import org.apache.pinot.segment.spi.AggregationFunctionType;
+import org.roaringbitmap.RoaringBitmap;
 
 
 public class MaxAggregationFunction extends BaseSingleInputAggregationFunction<Double, Double> {
   private static final double DEFAULT_INITIAL_VALUE = Double.NEGATIVE_INFINITY;
+  private final boolean _nullHandlingEnabled;
 
-  public MaxAggregationFunction(ExpressionContext expression) {
+  // stores id of the groupKey where the corresponding value is null.
+  private Integer _groupKeyForNullValue = null;

Review Comment:
   Do we need to handle the null group in the aggregation function? It will make aggregation function hard to implement. I think we should be able to identify and skip the null groups in the group-by executor



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/PredicateEvaluator.java:
##########
@@ -92,6 +93,65 @@ default int applySV(int limit, int[] docIds, int[] values) {
     return matches;
   }
 
+  /**
+   * Apply the predicate to a batch of single-value entries.
+   * Compact matching entries into the prefix of the docIds array.
+   *
+   * @param limit How much of the input to consume.
+   * @param docIds The docIds associated with the values - may be modified by invocation.
+   * @param values Batch of dictionary ids or raw values.
+   * @param nullBitmap The document ids null bitmap.
+   * @return the index of the first non-matching entry.
+   */
+  default int applySV(int limit, int[] docIds, int[] values, ImmutableRoaringBitmap nullBitmap) {

Review Comment:
   These changes are not needed. We can filter the docs in the DocIdIterator instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org