You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/19 23:48:20 UTC

[GitHub] [pinot] nizarhejazi opened a new pull request, #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

nizarhejazi opened a new pull request, #8927:
URL: https://github.com/apache/pinot/pull/8927

   Proper NULL handling in:
   
   - SELECT with and without ORDER BY
   - DISTINCT
   - GROUP BY
   - DISTINCT
   
   For Single-valued columns of all data types with: 
   - Raw encoding
   - Dictionary encoding
   
   Unit tests:
   
   Updated `BigDecimalQueriesTest` to work with null values.
   Added `BooleanNullEnabledNoDictQueriesTest`.
   Added AllNullQueriesTest where all values are null.
   Added NullEnabledQueriesTest where some values are nulls.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917460044


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java:
##########
@@ -40,12 +44,40 @@
 public class TransformBlockValSet implements BlockValSet {
   private final ProjectionBlock _projectionBlock;
   private final TransformFunction _transformFunction;
+  private final RoaringBitmap _nullBitmap;
 
   private int[] _numMVEntries;
 
-  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction) {
+  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction,
+      ExpressionContext expression) {
     _projectionBlock = projectionBlock;
     _transformFunction = transformFunction;
+    RoaringBitmap nullBitmap = null;

Review Comment:
   I went this way because it requires much less plumbing. Let me know if you feel strongly otherwise.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r916439619


##########
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java:
##########
@@ -84,9 +84,23 @@ public TableResizer(DataSchema dataSchema, QueryContext queryContext) {
       _orderByValueExtractors[i] = getOrderByValueExtractor(orderByExpression.getExpression());
       comparators[i] = orderByExpression.isAsc() ? Comparator.naturalOrder() : Comparator.reverseOrder();
     }
+    boolean isNullHandlingEnabled = queryContext.isNullHandlingEnabled();
     _intermediateRecordComparator = (o1, o2) -> {
       for (int i = 0; i < _numOrderByExpressions; i++) {
-        int result = comparators[i].compare(o1._values[i], o2._values[i]);
+        Object v1 = o1._values[i];
+        Object v2 = o2._values[i];
+        if (isNullHandlingEnabled) {

Review Comment:
   I did not do that at first because of the duplicate code .. makes sense for perf reasons.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r922448609


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +55,30 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    _nullBitmapSet = false;

Review Comment:
   What I meant is that assigning `false` to a member boolean in constructor is redundant, but this is a nitpicking so can be ignored



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917457959


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +53,26 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector();

Review Comment:
   Updated to do computation lazily, and removed null check.
   
   Please note that passing whether query has null handling enabled is tedious (need to pass QueryContext through many classes). Let me know if you still prefer to pass it through.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8927:
URL: https://github.com/apache/pinot/pull/8927#issuecomment-1159855830

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8927?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8927](https://codecov.io/gh/apache/pinot/pull/8927?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (574f6f0) into [master](https://codecov.io/gh/apache/pinot/commit/09fcdd59cc07e2faa154858075473cdec3518b60?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (09fcdd5) will **increase** coverage by `37.63%`.
   > The diff coverage is `84.58%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8927       +/-   ##
   =============================================
   + Coverage     28.81%   66.45%   +37.63%     
   - Complexity       47     4691     +4644     
   =============================================
     Files          1801     1360      -441     
     Lines         94168    69062    -25106     
     Branches      14078    10906     -3172     
   =============================================
   + Hits          27138    45895    +18757     
   + Misses        64479    19846    -44633     
   - Partials       2551     3321      +770     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `66.45% <84.58%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8927?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../java/org/apache/pinot/common/utils/DataTable.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVRhYmxlLmphdmE=) | `95.12% <ø> (ø)` | |
   | [...ator/transform/function/BaseTransformFunction.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vZnVuY3Rpb24vQmFzZVRyYW5zZm9ybUZ1bmN0aW9uLmphdmE=) | `41.01% <0.00%> (+30.62%)` | :arrow_up: |
   | [...r/transform/function/LiteralTransformFunction.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vZnVuY3Rpb24vTGl0ZXJhbFRyYW5zZm9ybUZ1bmN0aW9uLmphdmE=) | `68.26% <0.00%> (+31.37%)` | :arrow_up: |
   | [.../transform/function/MapValueTransformFunction.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vZnVuY3Rpb24vTWFwVmFsdWVUcmFuc2Zvcm1GdW5jdGlvbi5qYXZh) | `0.00% <0.00%> (-85.30%)` | :arrow_down: |
   | [...e/pinot/core/query/reduce/RowBasedBlockValSet.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9yZWR1Y2UvUm93QmFzZWRCbG9ja1ZhbFNldC5qYXZh) | `15.94% <0.00%> (+15.94%)` | :arrow_up: |
   | [...main/java/org/apache/pinot/spi/data/FieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9GaWVsZFNwZWMuamF2YQ==) | `82.52% <ø> (+82.52%)` | :arrow_up: |
   | [.../raw/RawBytesSingleColumnDistinctOnlyExecutor.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9kaXN0aW5jdC9yYXcvUmF3Qnl0ZXNTaW5nbGVDb2x1bW5EaXN0aW5jdE9ubHlFeGVjdXRvci5qYXZh) | `69.23% <33.33%> (+69.23%)` | :arrow_up: |
   | [...w/RawBytesSingleColumnDistinctOrderByExecutor.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9kaXN0aW5jdC9yYXcvUmF3Qnl0ZXNTaW5nbGVDb2x1bW5EaXN0aW5jdE9yZGVyQnlFeGVjdXRvci5qYXZh) | `79.16% <33.33%> (+79.16%)` | :arrow_up: |
   | [.../RawStringSingleColumnDistinctOrderByExecutor.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9kaXN0aW5jdC9yYXcvUmF3U3RyaW5nU2luZ2xlQ29sdW1uRGlzdGluY3RPcmRlckJ5RXhlY3V0b3IuamF2YQ==) | `70.96% <33.33%> (+70.96%)` | :arrow_up: |
   | [...t/core/query/distinct/DistinctExecutorFactory.java](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9kaXN0aW5jdC9EaXN0aW5jdEV4ZWN1dG9yRmFjdG9yeS5qYXZh) | `87.35% <40.00%> (+33.78%)` | :arrow_up: |
   | ... and [1621 more](https://codecov.io/gh/apache/pinot/pull/8927/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8927?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8927?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [09fcdd5...574f6f0](https://codecov.io/gh/apache/pinot/pull/8927?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921634433


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -191,6 +193,16 @@ public void setDefaultNullValue(@Nullable Object defaultNullValue) {
     }
   }
 
+  private static final Map<DataType, Object> FIELD_TYPE_DEFAULT_NULL_VALUE_MAP = new HashMap<>();
+  public static Object getDefaultDimensionNullValue(DataType dataType) {

Review Comment:
   (minor) Rename to `getDefaultNullValue(DataType dataType)`



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,64 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean nullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean nullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,

Review Comment:
   Do we need this extra constructor? For `DistinctOperator`, it can reuse the constructor without the data schema



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -272,6 +272,7 @@ public static class QueryOptionKey {
         public static final String NUM_REPLICA_GROUPS_TO_QUERY = "numReplicaGroupsToQuery";
         public static final String EXPLAIN_PLAN_VERBOSE = "explainPlanVerbose";
         public static final String USE_MULTISTAGE_ENGINE = "useMultistageEngine";
+        public static final String NULL_HANDLING_ENABLED = "nullHandlingEnabled";

Review Comment:
   ```suggestion
           public static final String ENABLE_NULL_HANDLING = "enableNullHandling";
   ```



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -191,6 +193,16 @@ public void setDefaultNullValue(@Nullable Object defaultNullValue) {
     }
   }
 
+  private static final Map<DataType, Object> FIELD_TYPE_DEFAULT_NULL_VALUE_MAP = new HashMap<>();
+  public static Object getDefaultDimensionNullValue(DataType dataType) {
+    if (!FIELD_TYPE_DEFAULT_NULL_VALUE_MAP.containsKey(dataType)) {

Review Comment:
   This is not thread safe. Suggest moving the logic into a separate util class, and setup the values for each data type in a `static` block to avoid the race condition



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/BaseRawDoubleSingleColumnDistinctExecutor.java:
##########
@@ -39,13 +39,17 @@ abstract class BaseRawDoubleSingleColumnDistinctExecutor implements DistinctExec
   final ExpressionContext _expression;
   final DataType _dataType;
   final int _limit;
+  final boolean _nullHandlingEnabled;
 
   final DoubleSet _valueSet;
+  int _numNulls;

Review Comment:
   Make it `private`
   
   It is a little bit confusing to call it `_numNulls`. Suggest making it `boolean _hasNull` and keep a separate variable to track the limit



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawDoubleSingleColumnDistinctOnlyExecutor.java:
##########
@@ -40,10 +42,15 @@ public boolean process(TransformBlock transformBlock) {
     int numDocs = transformBlock.getNumDocs();
     if (blockValueSet.isSingleValue()) {
       double[] values = blockValueSet.getDoubleValuesSV();
+      RoaringBitmap nullBitmap = blockValueSet.getNullBitmap();

Review Comment:
   With null handling checked, seems we don't have much code to share with/without null handling. We can consider having a separate set of classes to handle the case with null handling enabled?



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawBigDecimalSingleColumnDistinctOrderByExecutor.java:
##########
@@ -36,22 +37,28 @@ public class RawBigDecimalSingleColumnDistinctOrderByExecutor extends BaseRawBig
   private final PriorityQueue<BigDecimal> _priorityQueue;
 
   public RawBigDecimalSingleColumnDistinctOrderByExecutor(ExpressionContext expression, DataType dataType,
-      OrderByExpressionContext orderByExpression, int limit) {
-    super(expression, dataType, limit);
+      OrderByExpressionContext orderByExpression, int limit, boolean nullHandlingEnabled) {
+    super(expression, dataType, limit, nullHandlingEnabled);
 
     assert orderByExpression.getExpression().equals(expression);
     int comparisonFactor = orderByExpression.isAsc() ? -1 : 1;
-    _priorityQueue = new ObjectHeapPriorityQueue<>(Math.min(limit, MAX_INITIAL_CAPACITY),
-        (b1, b2) -> b1.compareTo(b2) * comparisonFactor);
+    if (nullHandlingEnabled) {
+      _priorityQueue = new ObjectHeapPriorityQueue<>(Math.min(limit, MAX_INITIAL_CAPACITY),
+          (b1, b2) -> b1 == null ? (b2 == null ? 0 : 1) : (b2 == null ? -1 : b1.compareTo(b2)) * comparisonFactor);
+    } else {
+      _priorityQueue = new ObjectHeapPriorityQueue<>(Math.min(limit, MAX_INITIAL_CAPACITY),
+          (b1, b2) -> b1.compareTo(b2) * comparisonFactor);
+    }
   }
 
   @Override
   public boolean process(TransformBlock transformBlock) {
     BlockValSet blockValueSet = transformBlock.getBlockValueSet(_expression);
     BigDecimal[] values = blockValueSet.getBigDecimalValuesSV();
+    RoaringBitmap nullBitmap = blockValueSet.getNullBitmap();

Review Comment:
   We need to check if null handling is enabled to avoid the overhead, same for other classes



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java:
##########
@@ -81,10 +83,11 @@ public DefaultGroupByExecutor(QueryContext queryContext, ExpressionContext[] gro
     // Initialize group key generator
     int numGroupsLimit = queryContext.getNumGroupsLimit();
     int maxInitialResultHolderCapacity = queryContext.getMaxInitialResultHolderCapacity();
-    if (hasNoDictionaryGroupByExpression) {
+    if (hasNoDictionaryGroupByExpression || _nullHandlingEnabled) {
       if (groupByExpressions.length == 1) {
         _groupKeyGenerator =
-            new NoDictionarySingleColumnGroupKeyGenerator(transformOperator, groupByExpressions[0], numGroupsLimit);
+            new NoDictionarySingleColumnGroupKeyGenerator(transformOperator, groupByExpressions[0], numGroupsLimit,
+                _nullHandlingEnabled);
       } else {
         _groupKeyGenerator =

Review Comment:
   Add a TODO to support MV and dictionary based. No dictionary performance is much worse than dictionary based



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java:
##########
@@ -49,27 +50,25 @@ public RowDataBlock(ByteBuffer byteBuffer)
     computeBlockObjectConstants();
   }
 
+  @Nullable
   @Override
   public RoaringBitmap getNullRowIds(int colId) {
     // _fixedSizeData stores two ints per col's null bitmap: offset, and length.
     int position = _numRows * _rowSizeInBytes + colId * Integer.BYTES * 2;
-    if (position >= _fixedSizeData.limit()) {
+    if (_fixedSizeData == null || position >= _fixedSizeData.limit()) {

Review Comment:
   I don't see the test mentioned above (might be removed?). `_fixedSizeData` can be `null` for metadata only data table, but we should never read null bitmap from it



##########
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorUtils.java:
##########
@@ -92,27 +94,26 @@ public static List<ExpressionContext> extractExpressions(QueryContext queryConte
     }
 
     List<ExpressionContext> selectExpressions = queryContext.getSelectExpressions();
-    if (selectExpressions.size() == 1 && selectExpressions.get(0).equals(IDENTIFIER_STAR)) {
-      // For 'SELECT *', sort all columns (ignore columns that start with '$') so that the order is deterministic
-      Set<String> allColumns = indexSegment.getColumnNames();
-      List<String> selectColumns = new ArrayList<>(allColumns.size());
-      for (String column : allColumns) {
-        if (column.charAt(0) != '$') {
-          selectColumns.add(column);
+    for (ExpressionContext selectExpression : selectExpressions) {

Review Comment:
   This is unrelated. Let's make a separate PR for this.
   We won't hit this when the schema is present because broker should already rewrite `*` to actual columns



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,64 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean nullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean nullHandlingEnabled) {

Review Comment:
   This constructor is for aggregation/distinct, and the intermediate result should never be null. We don't need to handle null in this case



##########
pinot-core/src/main/java/org/apache/pinot/core/util/QueryOptionsUtils.java:
##########
@@ -87,4 +88,9 @@ public static Integer getMinServerGroupTrimSize(Map<String, String> queryOptions
     String minServerGroupTrimSizeString = queryOptions.get(Request.QueryOptionKey.MIN_SERVER_GROUP_TRIM_SIZE);
     return minServerGroupTrimSizeString != null ? Integer.parseInt(minServerGroupTrimSizeString) : null;
   }
+
+  public static boolean isNullHandlingEnabled(Map<String, String> queryOptions) {
+    return Boolean.parseBoolean(queryOptions.get(Request.QueryOptionKey.NULL_HANDLING_ENABLED))

Review Comment:
   When null handling is enabled, let's put a `Precondition.checkState(DataTableFactory.getDataTableVersion() >= DataTableFactory.VERSION_4)` instead of silently disable it



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -390,28 +446,55 @@ private DataTable getAggregationResultDataTable()
       columnNames[i] = aggregationFunction.getColumnName();
       columnDataTypes[i] = aggregationFunction.getIntermediateResultColumnType();
     }
+    RoaringBitmap[] nullBitmaps = null;

Review Comment:
   Aggregation intermediate result should never be null, so we shouldn't need to handling null within this method



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -311,16 +334,48 @@ private DataTable getResultDataTable()
       throws IOException {
     DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(_dataSchema);
     ColumnDataType[] storedColumnDataTypes = _dataSchema.getStoredColumnDataTypes();
+    int numColumns = _dataSchema.size();
     Iterator<Record> iterator = _table.iterator();
-    while (iterator.hasNext()) {
-      Record record = iterator.next();
-      dataTableBuilder.startRow();
-      int columnIndex = 0;
-      for (Object value : record.getValues()) {
-        setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
-        columnIndex++;
+    RoaringBitmap[] nullBitmaps = null;

Review Comment:
   (nit) Can be declared within the if block



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java:
##########
@@ -40,12 +44,49 @@
 public class TransformBlockValSet implements BlockValSet {
   private final ProjectionBlock _projectionBlock;
   private final TransformFunction _transformFunction;
+  private final ExpressionContext _expression;
+
+  private boolean _nullBitmapSet;
+  private RoaringBitmap _nullBitmap;
 
   private int[] _numMVEntries;
 
-  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction) {
+  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction,
+      ExpressionContext expression) {
     _projectionBlock = projectionBlock;
     _transformFunction = transformFunction;
+    _expression = expression;
+    _nullBitmapSet = false;
+  }
+
+  @Nullable
+  @Override
+  public RoaringBitmap getNullBitmap() {

Review Comment:
   Please add a todo here to revisit this part in the future because some transform function can take null input and return non-null result (e.g. `isNull()`), and we should move this logic into the transform function



##########
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java:
##########
@@ -63,14 +64,22 @@ public Operator<IntermediateResultsBlock> run() {
         Dictionary dictionary = dataSource.getDictionary();
         if (dictionary != null) {
           DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
-          return new DictionaryBasedDistinctOperator(dataSourceMetadata.getDataType(), distinctAggregationFunction,
-              dictionary, dataSourceMetadata.getNumDocs());
+          // If nullHandlingEnabled is set to true, and the column contains null values, call DistinctOperator instead
+          // of DictionaryBasedDistinctOperator since nullValueVectorReader is a form of a filter.
+          // TODO: reserve special value in dictionary (e.g. -1) for null in the future so
+          //  DictionaryBasedDistinctOperator can be reused since it is more efficient than DistinctOperator for
+          //  dictionary-encoded columns.
+          NullValueVectorReader nullValueReader = dataSource.getNullValueVector();

Review Comment:
   Check if null handling enabled before other checks. The check can be added to the if statement on line 60



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +55,30 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    _nullBitmapSet = false;
+  }
+
+  @Nullable
+  @Override
+  public RoaringBitmap getNullBitmap() {
+    if (!_nullBitmapSet) {
+      NullValueVectorReader nullValueReader = _dataSource.getNullValueVector();
+      if (nullValueReader != null && nullValueReader.getNullBitmap().getCardinality() > 0) {

Review Comment:
   Cache `nullValueReader.getNullBitmap()` in a local variable and use it to determine whether the value is `null`. `nullValueReader.isNull()` is quite expensive because it will construct a new bitmap



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawDoubleSingleColumnDistinctOnlyExecutor.java:
##########
@@ -40,10 +42,15 @@ public boolean process(TransformBlock transformBlock) {
     int numDocs = transformBlock.getNumDocs();
     if (blockValueSet.isSingleValue()) {
       double[] values = blockValueSet.getDoubleValuesSV();
+      RoaringBitmap nullBitmap = blockValueSet.getNullBitmap();
       for (int i = 0; i < numDocs; i++) {
-        _valueSet.add(values[i]);
-        if (_valueSet.size() >= _limit) {
-          return true;
+        if (nullBitmap != null && nullBitmap.contains(i)) {
+          _numNulls = 1;
+        } else {
+          _valueSet.add(values[i]);
+          if (_valueSet.size() >= _limit - _numNulls) {
+            return true;
+          }
         }
       }
     } else {

Review Comment:
   For null MV value, we want to skip the entry



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +55,30 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    _nullBitmapSet = false;

Review Comment:
   (nit) redundant



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/TransformFunction.java:
##########
@@ -66,6 +67,11 @@ public interface TransformFunction {
    */
   Dictionary getDictionary();
 
+  /**
+   * Returns null value vector for the column if exists, or {@code null} if not.
+   */
+  NullValueVectorReader getNullValueVectorReader();

Review Comment:
   Let's remove this API for now since it is not used and not sure if this is the correct way to handle null in transform function



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -72,47 +78,122 @@ public int getGlobalGroupKeyUpperBound() {
   @Override
   public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys) {
     BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    RoaringBitmap nullBitmap = blockValSet.getNullBitmap();

Review Comment:
   This should be under the `nullHandlingEnabled` branch. I feel it is cleaner if we make 2 helper method, one for existing logic, one for null handling



##########
pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java:
##########
@@ -184,18 +184,8 @@ public Plan makeInstancePlan(List<IndexSegment> indexSegments, QueryContext quer
     return new GlobalPlanImplV0(new InstanceResponsePlanNode(combinePlanNode, indexSegments, fetchContexts));
   }
 
-  private void applyQueryOptions(QueryContext queryContext) {
+  private void applyExtraQueryOptions(QueryContext queryContext) {

Review Comment:
   Suggest not changing this, see the comments in `QueryContext`



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -72,47 +78,122 @@ public int getGlobalGroupKeyUpperBound() {
   @Override
   public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys) {
     BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    RoaringBitmap nullBitmap = blockValSet.getNullBitmap();
     int numDocs = transformBlock.getNumDocs();
 
     switch (_storedType) {
       case INT:
         int[] intValues = blockValSet.getIntValuesSV();
+        if (_nullHandlingEnabled) {
+          if (nullBitmap != null && nullBitmap.getCardinality() < numDocs) {
+            for (int i = 0; i < numDocs; i++) {
+              groupKeys[i] = nullBitmap.contains(i) ? getKeyForNullValue() : getKeyForValue(intValues[i]);
+            }
+          } else if (numDocs > 0) {
+            _groupIdForNullValue = _numGroups++;
+            Arrays.fill(groupKeys, _groupIdForNullValue);

Review Comment:
   There might already be an id associated with `null`
   ```suggestion
               Arrays.fill(groupKeys, getKeyForNullValue());
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java:
##########
@@ -135,6 +138,12 @@ private QueryContext(@Nullable String tableName, @Nullable QueryContext subquery
     _limit = limit;
     _offset = offset;
     _queryOptions = queryOptions;
+    if (_queryOptions != null) {

Review Comment:
   Suggest not changing it in this PR for the following reasons:
   1. These options only apply to server but not broker and we don't want to pay the overhead on the broker
   2. We may add some query options on the fly which won't be applied here
   3. Parsing all the options in the same place is easier to track
   
   Since `nullHandlingEnabled` is needed on both broker and server side, we may call the setter in 2 places



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawStringSingleColumnDistinctOrderByExecutor.java:
##########
@@ -50,8 +56,9 @@ public boolean process(TransformBlock transformBlock) {
     int numDocs = transformBlock.getNumDocs();
     if (blockValueSet.isSingleValue()) {
       String[] values = blockValueSet.getStringValuesSV();
+      RoaringBitmap nullBitmap = blockValueSet.getNullBitmap();

Review Comment:
   Check `nullHandlingEnabled`



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -51,16 +53,20 @@ public class NoDictionarySingleColumnGroupKeyGenerator implements GroupKeyGenera
   private final DataType _storedType;
   private final Map _groupKeyMap;
   private final int _globalGroupIdUpperBound;
+  private final boolean _nullHandlingEnabled;

Review Comment:
   Most of the logic are not sharable, so we can consider making a base implementation, then 2 separate class, one for null enabled, one for disabled



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawStringSingleColumnDistinctOnlyExecutor.java:
##########
@@ -40,7 +42,11 @@ public boolean process(TransformBlock transformBlock) {
     int numDocs = transformBlock.getNumDocs();
     if (blockValueSet.isSingleValue()) {
       String[] values = blockValueSet.getStringValuesSV();
+      RoaringBitmap nullBitmap = blockValueSet.getNullBitmap();

Review Comment:
   Check `nullHandlingEnabled`



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -72,47 +78,122 @@ public int getGlobalGroupKeyUpperBound() {
   @Override
   public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys) {
     BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    RoaringBitmap nullBitmap = blockValSet.getNullBitmap();
     int numDocs = transformBlock.getNumDocs();
 
     switch (_storedType) {
       case INT:
         int[] intValues = blockValSet.getIntValuesSV();
+        if (_nullHandlingEnabled) {
+          if (nullBitmap != null && nullBitmap.getCardinality() < numDocs) {
+            for (int i = 0; i < numDocs; i++) {
+              groupKeys[i] = nullBitmap.contains(i) ? getKeyForNullValue() : getKeyForValue(intValues[i]);
+            }
+          } else if (numDocs > 0) {

Review Comment:
   We should also specialize the case when null bitmap is empty



##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java:
##########
@@ -51,13 +51,17 @@
 import org.apache.pinot.core.transport.ServerRoutingInstance;
 import org.apache.pinot.core.util.GroupByUtils;
 import org.apache.pinot.core.util.trace.TraceRunnable;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 /**
  * Helper class to reduce data tables and set group by results into the BrokerResponseNative
  */
 @SuppressWarnings({"rawtypes", "unchecked"})
 public class GroupByDataTableReducer implements DataTableReducer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(GroupByDataTableReducer.class);

Review Comment:
   (minor) Seems not used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
walterddr commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r957435868


##########
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorUtils.java:
##########
@@ -227,16 +229,42 @@ public static void mergeWithOrdering(PriorityQueue<Object[]> mergedRows, Collect
    *
    * @param rows {@link Collection} of selection rows.
    * @param dataSchema data schema.
+   * @param nullHandlingEnabled whether null handling is enabled.
    * @return data table.
    * @throws Exception
    */
-  public static DataTable getDataTableFromRows(Collection<Object[]> rows, DataSchema dataSchema)
+  public static DataTable getDataTableFromRows(Collection<Object[]> rows, DataSchema dataSchema,
+      boolean nullHandlingEnabled)
       throws Exception {
     ColumnDataType[] storedColumnDataTypes = dataSchema.getStoredColumnDataTypes();
     int numColumns = storedColumnDataTypes.length;
 
-    DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(
-        dataSchema);
+    DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(dataSchema);
+    RoaringBitmap[] nullBitmaps = null;
+    if (nullHandlingEnabled) {
+      nullBitmaps = new RoaringBitmap[numColumns];
+      Object[] colDefaultNullValues = new Object[numColumns];
+      for (int colId = 0; colId < numColumns; colId++) {
+        if (storedColumnDataTypes[colId] != ColumnDataType.OBJECT && !storedColumnDataTypes[colId].isArray()) {
+          colDefaultNullValues[colId] =
+              NullValueUtils.getDefaultNullValue(storedColumnDataTypes[colId].toDataType());
+        }
+        nullBitmaps[colId] = new RoaringBitmap();
+      }
+
+      int rowId = 0;
+      for (Object[] row : rows) {
+        for (int i = 0; i < numColumns; i++) {
+          Object columnValue = row[i];
+          if (columnValue == null) {
+            row[i] = colDefaultNullValues[i];
+            nullBitmaps[i].add(rowId);
+          }
+        }
+        rowId++;

Review Comment:
   can we merge this part with the main loop below? any specific reason we had to loop through the rows one more time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921764722


##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/BaseRawDoubleSingleColumnDistinctExecutor.java:
##########
@@ -39,13 +39,17 @@ abstract class BaseRawDoubleSingleColumnDistinctExecutor implements DistinctExec
   final ExpressionContext _expression;
   final DataType _dataType;
   final int _limit;
+  final boolean _nullHandlingEnabled;
 
   final DoubleSet _valueSet;
+  int _numNulls;

Review Comment:
   marked as `protected` to avoid calling a setter method. Renamed to `_hasNll`.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917458879


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/TransformOperator.java:
##########
@@ -95,6 +96,14 @@ public TransformResultMetadata getResultMetadata(ExpressionContext expression) {
     return _transformFunctionMap.get(expression).getResultMetadata();
   }
 
+  public NullValueVectorReader getNullValueVectorReader(ExpressionContext expression) {

Review Comment:
   This can be safely deleted. Left from early iteration of code review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r906925603


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByOrderByCombineOperator.java:
##########
@@ -235,11 +235,15 @@ protected IntermediateResultsBlock mergeResults()
     }
 
     IndexedTable indexedTable = _indexedTable;
-    indexedTable.finish(false);
+    if (indexedTable != null) {

Review Comment:
   @Jackie-Jiang This is from a test where all input values are nulls. In this case _table is null. Check `AllNullsQueriesTest`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r906996087


##########
pinot-core/src/main/java/org/apache/pinot/core/common/RowBasedBlockValueFetcher.java:
##########
@@ -43,6 +44,15 @@ public Object[] getRow(int docId) {
     return row;
   }
 
+  public RoaringBitmap getColumnNullBitmap(int colId) {

Review Comment:
   left from the early iterations. good catch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917457959


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +53,26 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector();

Review Comment:
   Updated to do computation lazily. 
   
   Please note that passing whether query has null handling enabled is tedious (need to pass QueryContext through many classes). Let me know if you still prefer to pass it through.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921761625


##########
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java:
##########
@@ -63,14 +64,22 @@ public Operator<IntermediateResultsBlock> run() {
         Dictionary dictionary = dataSource.getDictionary();
         if (dictionary != null) {
           DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
-          return new DictionaryBasedDistinctOperator(dataSourceMetadata.getDataType(), distinctAggregationFunction,
-              dictionary, dataSourceMetadata.getNumDocs());
+          // If nullHandlingEnabled is set to true, and the column contains null values, call DistinctOperator instead
+          // of DictionaryBasedDistinctOperator since nullValueVectorReader is a form of a filter.
+          // TODO: reserve special value in dictionary (e.g. -1) for null in the future so
+          //  DictionaryBasedDistinctOperator can be reused since it is more efficient than DistinctOperator for
+          //  dictionary-encoded columns.
+          NullValueVectorReader nullValueReader = dataSource.getNullValueVector();

Review Comment:
   Added the check and will keep the check for nullValueVector cardinality here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921831490


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -51,16 +53,20 @@ public class NoDictionarySingleColumnGroupKeyGenerator implements GroupKeyGenera
   private final DataType _storedType;
   private final Map _groupKeyMap;
   private final int _globalGroupIdUpperBound;
+  private final boolean _nullHandlingEnabled;

Review Comment:
   Added TODO? Will address in next PR since this one became big.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r906925515


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,69 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean isNullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      DataSchema dataSchema, boolean isNullHandlingEnabled) {
+    _aggregationFunctions = aggregationFunctions;
+    _aggregationResult = aggregationResult;
+    _dataSchema = dataSchema;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult}.
    */
   public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
-      @Nullable AggregationGroupByResult aggregationGroupByResults, DataSchema dataSchema) {
+      @Nullable AggregationGroupByResult aggregationGroupByResults, DataSchema dataSchema,
+      boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationGroupByResult = aggregationGroupByResults;
     _dataSchema = dataSchema;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult} and
    * with a collection of intermediate records.
    */
   public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
-      Collection<IntermediateRecord> intermediateRecords, DataSchema dataSchema) {
+      Collection<IntermediateRecord> intermediateRecords, DataSchema dataSchema, boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _dataSchema = dataSchema;
     _intermediateRecords = intermediateRecords;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   public IntermediateResultsBlock(Table table) {
     _table = table;
-    _dataSchema = table.getDataSchema();
+    if (_table != null) {

Review Comment:
   @Jackie-Jiang This is from a test where all input values are nulls. In this case _table is null. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r907006878


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -311,16 +343,50 @@ private DataTable getResultDataTable()
       throws IOException {
     DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(_dataSchema);
     ColumnDataType[] storedColumnDataTypes = _dataSchema.getStoredColumnDataTypes();
+    int numColumns = _dataSchema.size();
     Iterator<Record> iterator = _table.iterator();
-    while (iterator.hasNext()) {
-      Record record = iterator.next();
-      dataTableBuilder.startRow();
-      int columnIndex = 0;
-      for (Object value : record.getValues()) {
-        setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
-        columnIndex++;
+    RoaringBitmap[] nullBitmaps = null;
+    if (_isNullHandlingEnabled) {
+      nullBitmaps = new RoaringBitmap[numColumns];
+      Object[] colDefaultNullValues = new Object[numColumns];
+      for (int colId = 0; colId < numColumns; colId++) {
+        if (storedColumnDataTypes[colId] != ColumnDataType.OBJECT) {
+          colDefaultNullValues[colId] = FieldSpec.getDefaultNullValue(FieldSpec.FieldType.METRIC,

Review Comment:
   Changed to:
   ```
     // Store a dummy value that is both a valid numeric, and a valid hex encoded value.
     String specialVal = "30";
     colDefaultNullValues[colId] = storedColumnDataTypes[colId].toDataType().convert(specialVal);
   ```
   
   Setting it in `setDataTableColumn()` may hurt performance (need to check for null and type together). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r918243041


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java:
##########
@@ -40,12 +44,40 @@
 public class TransformBlockValSet implements BlockValSet {
   private final ProjectionBlock _projectionBlock;
   private final TransformFunction _transformFunction;
+  private final RoaringBitmap _nullBitmap;
 
   private int[] _numMVEntries;
 
-  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction) {
+  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction,
+      ExpressionContext expression) {
     _projectionBlock = projectionBlock;
     _transformFunction = transformFunction;
+    RoaringBitmap nullBitmap = null;

Review Comment:
   I moved the logic to `getNullBitmap()` method. Please note that the `getNullBitmap()` method is called only when nullHandlingEnabled is set to true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r922568228


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -52,6 +58,30 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataSource = dataSource;
   }
 
+  @Nullable
+  @Override
+  public RoaringBitmap getNullBitmap() {
+    if (!_nullBitmapSet) {
+      NullValueVectorReader nullValueReader = _dataSource.getNullValueVector();
+      ImmutableRoaringBitmap nullBitmap = nullValueReader != null ? nullValueReader.getNullBitmap() : null;
+      if (nullBitmap != null && nullBitmap.getCardinality() > 0) {

Review Comment:
   For better performance, same for other places where cardinality is calculated just to identify if the bitmap is empty
   ```suggestion
         if (nullBitmap != null && !nullBitmap.isEmpty()) {
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -390,28 +443,55 @@ private DataTable getAggregationResultDataTable()
       columnNames[i] = aggregationFunction.getColumnName();
       columnDataTypes[i] = aggregationFunction.getIntermediateResultColumnType();
     }
+    RoaringBitmap[] nullBitmaps = null;

Review Comment:
   Let's revert the changes in this method for now. We may re-introduce them in the PR to support null aggregation result



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -378,22 +486,30 @@ private int getKeyForValue(ByteArray value) {
   private static class IntGroupKeyIterator implements Iterator<GroupKey> {
     final Iterator<Int2IntMap.Entry> _iterator;
     final GroupKey _groupKey;
+    Integer _groupKeyForNullValue;
 
-    IntGroupKeyIterator(Int2IntOpenHashMap intMap) {
+    IntGroupKeyIterator(Int2IntOpenHashMap intMap, Integer groupKeyForNullValue) {
       _iterator = intMap.int2IntEntrySet().fastIterator();
       _groupKey = new GroupKey();
+      _groupKeyForNullValue = groupKeyForNullValue;
     }
 
     @Override
     public boolean hasNext() {
-      return _iterator.hasNext();
+      return _iterator.hasNext() || _groupKeyForNullValue != null;
     }
 
     @Override
     public GroupKey next() {
-      Int2IntMap.Entry entry = _iterator.next();
-      _groupKey._groupId = entry.getIntValue();
-      _groupKey._keys = new Object[]{entry.getIntKey()};
+      if (_iterator.hasNext()) {
+        Int2IntMap.Entry entry = _iterator.next();
+        _groupKey._groupId = entry.getIntValue();
+        _groupKey._keys = new Object[]{entry.getIntKey()};
+      } else if (_groupKeyForNullValue != null) {
+        _groupKey._groupId = _groupKeyForNullValue;
+        _groupKey._keys = new Object[]{null};
+        _groupKeyForNullValue = null;
+      }

Review Comment:
   Consider returning `null` group first to reduce the overhead, also add a TODO to revisit this because this one adds overhead to regular case without null handling
   ```suggestion
         if (_groupKeyForNullValue != null) {
           _groupKey._groupId = _groupKeyForNullValue;
           _groupKey._keys = new Object[]{null};
           _groupKeyForNullValue = null;
           return _groupKey;
         }
         Int2IntMap.Entry entry = _iterator.next();
         _groupKey._groupId = entry.getIntValue();
         _groupKey._keys = new Object[]{entry.getIntKey()};
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/util/QueryOptionsUtils.java:
##########
@@ -87,4 +88,12 @@ public static Integer getMinServerGroupTrimSize(Map<String, String> queryOptions
     String minServerGroupTrimSizeString = queryOptions.get(Request.QueryOptionKey.MIN_SERVER_GROUP_TRIM_SIZE);
     return minServerGroupTrimSizeString != null ? Integer.parseInt(minServerGroupTrimSizeString) : null;
   }
+
+  public static boolean isNullHandlingEnabled(Map<String, String> queryOptions) {
+    boolean nullHandlingEnabled = Boolean.parseBoolean(queryOptions.get(Request.QueryOptionKey.ENABLE_NULL_HANDLING));
+    if (nullHandlingEnabled) {
+      Preconditions.checkState(DataTableFactory.getDataTableVersion() >= DataTableFactory.VERSION_4);

Review Comment:
   Add some exception message such as: `Null handling cannot be enabled for data table version smaller than 4`



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpecUtils.java:
##########
@@ -0,0 +1,45 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data;
+
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+public class FieldSpecUtils {

Review Comment:
   Let's call it `NullValueUtils` and move it under `org.apache.pinot.spi.utils`



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -72,6 +80,13 @@ public int getGlobalGroupKeyUpperBound() {
   @Override
   public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys) {
     BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    if (_nullHandlingEnabled) {
+      RoaringBitmap nullBitmap = blockValSet.getNullBitmap();
+      if (nullBitmap != null) {

Review Comment:
   ```suggestion
         if (nullBitmap != null && !nullBitmap.isEmpty()) {
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorService.java:
##########
@@ -147,11 +177,26 @@ public PriorityQueue<Object[]> getRows() {
    * Reduces a collection of {@link DataTable}s to selection rows for selection queries with <code>ORDER BY</code>.
    * (Broker side)
    */
-  public void reduceWithOrdering(Collection<DataTable> dataTables) {
+  public void reduceWithOrdering(Collection<DataTable> dataTables, boolean nullHandlingEnabled) {
+    RoaringBitmap[] nullBitmaps = null;
     for (DataTable dataTable : dataTables) {
+      if (nullHandlingEnabled) {
+        if (nullBitmaps == null) {
+          nullBitmaps = new RoaringBitmap[dataTable.getDataSchema().size()];
+        }
+        for (int colId = 0; colId < nullBitmaps.length; colId++) {
+          nullBitmaps[colId] = dataTable.getNullRowIds(colId);
+        }
+      }
       int numRows = dataTable.getNumberOfRows();
       for (int rowId = 0; rowId < numRows; rowId++) {
         Object[] row = SelectionOperatorUtils.extractRowFromDataTable(dataTable, rowId);
+        int len = nullBitmaps == null ? 0 : nullBitmaps.length;
+        for (int colId = 0; colId < len; colId++) {
+          if (nullBitmaps[colId] != null && nullBitmaps[colId].contains(rowId)) {
+            row[colId] = null;
+          }
+        }

Review Comment:
   ```suggestion
           if (nullHandlingEnabled) {
             for (int colId = 0; colId < nullBitmaps.length; colId++) {
               if (nullBitmaps[colId] != null && nullBitmaps[colId].contains(rowId)) {
                 row[colId] = null;
               }
             }
           }
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -122,6 +137,88 @@ public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys)
     }
   }
 
+  public void generateKeysForBlockNullHandlingEnabled(TransformBlock transformBlock, int[] groupKeys,
+      RoaringBitmap nullBitmap) {
+    assert nullBitmap != null;
+    BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    int numDocs = transformBlock.getNumDocs();
+
+    switch (_storedType) {
+      case INT:
+        int[] intValues = blockValSet.getIntValuesSV();
+        if (nullBitmap.getCardinality() < numDocs) {
+          for (int i = 0; i < numDocs; i++) {
+            groupKeys[i] = nullBitmap.contains(i) ? getKeyForNullValue() : getKeyForValue(intValues[i]);
+          }
+        } else if (numDocs > 0) {
+          Arrays.fill(groupKeys, getKeyForNullValue());

Review Comment:
   Same for other places
   ```suggestion
             Arrays.fill(groupKeys, getKeyForNullValue(), 0, numDocs);
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java:
##########
@@ -81,10 +83,13 @@ public DefaultGroupByExecutor(QueryContext queryContext, ExpressionContext[] gro
     // Initialize group key generator
     int numGroupsLimit = queryContext.getNumGroupsLimit();
     int maxInitialResultHolderCapacity = queryContext.getMaxInitialResultHolderCapacity();
-    if (hasNoDictionaryGroupByExpression) {
+    if (hasNoDictionaryGroupByExpression || _nullHandlingEnabled) {
       if (groupByExpressions.length == 1) {
+        // TODO(nhejazi): support MV and dictionary based. No dictionary performance is much worse than dictionary

Review Comment:
   (minor) Update the TODO a little bit to be more clear: `support MV and dictionary based when null handling is enabled.`



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,64 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean nullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean nullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,

Review Comment:
   Do we still need this new constructor? Not sure if we want to set the distinct table schema as the results block schema



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917460044


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java:
##########
@@ -40,12 +44,40 @@
 public class TransformBlockValSet implements BlockValSet {
   private final ProjectionBlock _projectionBlock;
   private final TransformFunction _transformFunction;
+  private final RoaringBitmap _nullBitmap;
 
   private int[] _numMVEntries;
 
-  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction) {
+  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction,
+      ExpressionContext expression) {
     _projectionBlock = projectionBlock;
     _transformFunction = transformFunction;
+    RoaringBitmap nullBitmap = null;

Review Comment:
   I went this way because it requires much less plumbing. Let me know if you feel strongly otherwise.
   Also, passing nullHandlingEnabled requires passing QueryContext across many layers. Happy to do so if you prefer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917457959


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +53,26 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector();

Review Comment:
   Updated to the computation lazily. 
   
   Please note that passing whether query has null handling enabled is tedious (need to pass QueryContext through many classes). Let me know if you still prefer to pass it through.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r903173704


##########
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java:
##########
@@ -84,9 +84,20 @@ public TableResizer(DataSchema dataSchema, QueryContext queryContext) {
       _orderByValueExtractors[i] = getOrderByValueExtractor(orderByExpression.getExpression());
       comparators[i] = orderByExpression.isAsc() ? Comparator.naturalOrder() : Comparator.reverseOrder();
     }
+    // TODO: return a diff. comparator that does not handle nulls when nullHandlingEnabled is false.
     _intermediateRecordComparator = (o1, o2) -> {
       for (int i = 0; i < _numOrderByExpressions; i++) {
-        int result = comparators[i].compare(o1._values[i], o2._values[i]);
+        Object v1 = o1._values[i];
+        Object v2 = o2._values[i];
+        if (v1 == null) {
+          if (v2 != null) {
+            // The default null ordering is NULLS LAST, regardless of the ordering direction.
+            return 1;
+          }
+        } else if (v2 == null) {
+          return -1;
+        }
+        int result = comparators[i].compare(v1, v2);

Review Comment:
   It can cause NPE when both v1 and v2 are `null`. Also I don't think it is wired correctly, or some test should already fail



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByOrderByCombineOperator.java:
##########
@@ -235,11 +235,15 @@ protected IntermediateResultsBlock mergeResults()
     }
 
     IndexedTable indexedTable = _indexedTable;
-    indexedTable.finish(false);
+    if (indexedTable != null) {

Review Comment:
   I saw several extra `null` checks introduced. Want to understand why because most likely this is because something is not wired correctly



##########
pinot-core/src/main/java/org/apache/pinot/core/common/RowBasedBlockValueFetcher.java:
##########
@@ -43,6 +44,15 @@ public Object[] getRow(int docId) {
     return row;
   }
 
+  public RoaringBitmap getColumnNullBitmap(int colId) {

Review Comment:
   I don't think the changes in this file is required since we don't directly read `null` into the row. The nullBitmap is not in row format, so we should directly read it from `BlockValSet` on the caller side



##########
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java:
##########
@@ -84,9 +84,20 @@ public TableResizer(DataSchema dataSchema, QueryContext queryContext) {
       _orderByValueExtractors[i] = getOrderByValueExtractor(orderByExpression.getExpression());
       comparators[i] = orderByExpression.isAsc() ? Comparator.naturalOrder() : Comparator.reverseOrder();
     }
+    // TODO: return a diff. comparator that does not handle nulls when nullHandlingEnabled is false.

Review Comment:
   Please address this TODO because this can cause performance regression



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java:
##########
@@ -42,6 +42,8 @@ public interface DataTable {
 
   Map<Integer, String> getExceptions();
 
+  int getVersion();

Review Comment:
   (minor) Remove the override in `BaseDataTable`



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java:
##########
@@ -53,23 +53,20 @@ public RowDataBlock(ByteBuffer byteBuffer)
   public RoaringBitmap getNullRowIds(int colId) {
     // _fixedSizeData stores two ints per col's null bitmap: offset, and length.
     int position = _numRows * _rowSizeInBytes + colId * Integer.BYTES * 2;
-    if (position >= _fixedSizeData.limit()) {
+    if (_fixedSizeData == null || position >= _fixedSizeData.limit()) {
       return null;
     }
 
     _fixedSizeData.position(position);
     int offset = _fixedSizeData.getInt();
     int bytesLength = _fixedSizeData.getInt();
-    RoaringBitmap nullBitmap;
     if (bytesLength > 0) {
       _variableSizeData.position(offset);
       byte[] nullBitmapBytes = new byte[bytesLength];
       _variableSizeData.get(nullBitmapBytes);
-      nullBitmap = ObjectSerDeUtils.ROARING_BITMAP_SER_DE.deserialize(nullBitmapBytes);
-    } else {
-      nullBitmap = new RoaringBitmap();
+      return ObjectSerDeUtils.ROARING_BITMAP_SER_DE.deserialize(nullBitmapBytes);
     }
-    return nullBitmap;
+    return new RoaringBitmap();

Review Comment:
   Not introduced in this PR, but maybe we should return `null` here



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,69 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean isNullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      DataSchema dataSchema, boolean isNullHandlingEnabled) {
+    _aggregationFunctions = aggregationFunctions;
+    _aggregationResult = aggregationResult;
+    _dataSchema = dataSchema;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult}.
    */
   public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
-      @Nullable AggregationGroupByResult aggregationGroupByResults, DataSchema dataSchema) {
+      @Nullable AggregationGroupByResult aggregationGroupByResults, DataSchema dataSchema,
+      boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationGroupByResult = aggregationGroupByResults;
     _dataSchema = dataSchema;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult} and
    * with a collection of intermediate records.
    */
   public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
-      Collection<IntermediateRecord> intermediateRecords, DataSchema dataSchema) {
+      Collection<IntermediateRecord> intermediateRecords, DataSchema dataSchema, boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _dataSchema = dataSchema;
     _intermediateRecords = intermediateRecords;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   public IntermediateResultsBlock(Table table) {
     _table = table;
-    _dataSchema = table.getDataSchema();
+    if (_table != null) {

Review Comment:
   Why do we need to add this check?



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -311,16 +343,50 @@ private DataTable getResultDataTable()
       throws IOException {
     DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(_dataSchema);
     ColumnDataType[] storedColumnDataTypes = _dataSchema.getStoredColumnDataTypes();
+    int numColumns = _dataSchema.size();
     Iterator<Record> iterator = _table.iterator();
-    while (iterator.hasNext()) {
-      Record record = iterator.next();
-      dataTableBuilder.startRow();
-      int columnIndex = 0;
-      for (Object value : record.getValues()) {
-        setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
-        columnIndex++;
+    RoaringBitmap[] nullBitmaps = null;
+    if (_isNullHandlingEnabled) {
+      nullBitmaps = new RoaringBitmap[numColumns];
+      Object[] colDefaultNullValues = new Object[numColumns];
+      for (int colId = 0; colId < numColumns; colId++) {
+        if (storedColumnDataTypes[colId] != ColumnDataType.OBJECT) {
+          colDefaultNullValues[colId] = FieldSpec.getDefaultNullValue(FieldSpec.FieldType.METRIC,

Review Comment:
   Several data types are not supported as METRIC. We should allow setting `null` values in `setDataTableColumn()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921826114


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -72,47 +78,122 @@ public int getGlobalGroupKeyUpperBound() {
   @Override
   public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys) {
     BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    RoaringBitmap nullBitmap = blockValSet.getNullBitmap();

Review Comment:
   indeed. Split into 2 helper methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r922583526


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -122,6 +137,88 @@ public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys)
     }
   }
 
+  public void generateKeysForBlockNullHandlingEnabled(TransformBlock transformBlock, int[] groupKeys,
+      RoaringBitmap nullBitmap) {
+    assert nullBitmap != null;
+    BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    int numDocs = transformBlock.getNumDocs();
+
+    switch (_storedType) {
+      case INT:
+        int[] intValues = blockValSet.getIntValuesSV();
+        if (nullBitmap.getCardinality() < numDocs) {
+          for (int i = 0; i < numDocs; i++) {
+            groupKeys[i] = nullBitmap.contains(i) ? getKeyForNullValue() : getKeyForValue(intValues[i]);
+          }
+        } else if (numDocs > 0) {
+          Arrays.fill(groupKeys, getKeyForNullValue());

Review Comment:
   you mean: `Arrays.fill(groupKeys, 0, numDocs, getKeyForNullValue());`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r922457807


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -72,47 +78,122 @@ public int getGlobalGroupKeyUpperBound() {
   @Override
   public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys) {
     BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    RoaringBitmap nullBitmap = blockValSet.getNullBitmap();
     int numDocs = transformBlock.getNumDocs();
 
     switch (_storedType) {
       case INT:
         int[] intValues = blockValSet.getIntValuesSV();
+        if (_nullHandlingEnabled) {
+          if (nullBitmap != null && nullBitmap.getCardinality() < numDocs) {
+            for (int i = 0; i < numDocs; i++) {
+              groupKeys[i] = nullBitmap.contains(i) ? getKeyForNullValue() : getKeyForValue(intValues[i]);
+            }
+          } else if (numDocs > 0) {
+            _groupIdForNullValue = _numGroups++;
+            Arrays.fill(groupKeys, _groupIdForNullValue);

Review Comment:
   This method is called per block instead of only once. In the previous block, the null key might already be assigned



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r922458480


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,64 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean nullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean nullHandlingEnabled) {

Review Comment:
   removed for now. Will add back in separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r922452500


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,64 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean nullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean nullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,

Review Comment:
   Yes, let's move the code unrelated to this PR into a separate one so that it only contains related changes for easier review and tracking. Without the context in the other PR, I cannot tell if this change is desired



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r911061582


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java:
##########
@@ -40,12 +44,40 @@
 public class TransformBlockValSet implements BlockValSet {
   private final ProjectionBlock _projectionBlock;
   private final TransformFunction _transformFunction;
+  private final RoaringBitmap _nullBitmap;
 
   private int[] _numMVEntries;
 
-  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction) {
+  public TransformBlockValSet(ProjectionBlock projectionBlock, TransformFunction transformFunction,
+      ExpressionContext expression) {
     _projectionBlock = projectionBlock;
     _transformFunction = transformFunction;
+    RoaringBitmap nullBitmap = null;

Review Comment:
   I feel this logic should be performed in the operator instead of in the value set. We need to compute the null bitmap only when the null handling is enabled



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -311,16 +337,51 @@ private DataTable getResultDataTable()
       throws IOException {
     DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(_dataSchema);
     ColumnDataType[] storedColumnDataTypes = _dataSchema.getStoredColumnDataTypes();
+    int numColumns = _dataSchema.size();
     Iterator<Record> iterator = _table.iterator();
-    while (iterator.hasNext()) {
-      Record record = iterator.next();
-      dataTableBuilder.startRow();
-      int columnIndex = 0;
-      for (Object value : record.getValues()) {
-        setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
-        columnIndex++;
+    RoaringBitmap[] nullBitmaps = null;
+    if (_isNullHandlingEnabled) {
+      nullBitmaps = new RoaringBitmap[numColumns];
+      Object[] colDefaultNullValues = new Object[numColumns];
+      for (int colId = 0; colId < numColumns; colId++) {
+        if (storedColumnDataTypes[colId] != ColumnDataType.OBJECT) {
+          // Store a dummy value that is both a valid numeric, and a valid hex encoded value.
+          String specialVal = "30";
+          colDefaultNullValues[colId] = storedColumnDataTypes[colId].toDataType().convert(specialVal);
+        }
+        nullBitmaps[colId] = new RoaringBitmap();
+      }
+      int rowId = 0;
+      while (iterator.hasNext()) {
+        Object[] values = iterator.next().getValues();
+        dataTableBuilder.startRow();
+        for (int columnIndex = 0; columnIndex < values.length; columnIndex++) {
+          Object value = values[columnIndex];
+          if (value == null) {
+            value = colDefaultNullValues[columnIndex];
+            nullBitmaps[columnIndex].add(rowId);
+          }
+          setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
+        }
+        dataTableBuilder.finishRow();
+        rowId++;
+      }
+    } else {
+      while (iterator.hasNext()) {
+        Record record = iterator.next();
+        dataTableBuilder.startRow();
+        int columnIndex = 0;
+        for (Object value : record.getValues()) {
+          setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
+          columnIndex++;
+        }
+        dataTableBuilder.finishRow();
+      }
+    }
+    if (_isNullHandlingEnabled && DataTableFactory.getDataTableVersion() >= DataTableFactory.VERSION_4) {

Review Comment:
   This check should be performed much earlier, probably when the server just gets the query with null handling enabled



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -390,28 +452,57 @@ private DataTable getAggregationResultDataTable()
       columnNames[i] = aggregationFunction.getColumnName();
       columnDataTypes[i] = aggregationFunction.getIntermediateResultColumnType();
     }
+    RoaringBitmap[] nullBitmaps = null;
+    Object[] colDefaultNullValues = null;
+    if (_isNullHandlingEnabled) {
+      colDefaultNullValues = new Object[numAggregationFunctions];
+      nullBitmaps = new RoaringBitmap[numAggregationFunctions];
+      for (int i = 0; i < numAggregationFunctions; i++) {
+        if (columnDataTypes[i] != ColumnDataType.OBJECT) {
+          // Store a dummy value that is both a valid numeric, and a valid hex encoded value.
+          String specialVal = "30";
+          colDefaultNullValues[i] = columnDataTypes[i].toDataType().convert(specialVal);
+        }
+        nullBitmaps[i] = new RoaringBitmap();
+      }
+    }
 
     // Build the data table.
     DataTableBuilder dataTableBuilder =
         DataTableFactory.getDataTableBuilder(new DataSchema(columnNames, columnDataTypes));
     dataTableBuilder.startRow();
     for (int i = 0; i < numAggregationFunctions; i++) {
+      Object value = _aggregationResult.get(i);
+      // OBJECT (e.g. DistinctTable) calls toBytes() (e.g. DistinctTable.toBytes()) which takes care of replacing nulls
+      // with default values, and building presence vector and serializing both.
+      if (_isNullHandlingEnabled && columnDataTypes[i] != ColumnDataType.OBJECT) {

Review Comment:
   Should we skip the null bitmap update?
   Suggest moving the null handling check outside of the for loop



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +53,26 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector();

Review Comment:
   Data source should never be `null`. The computation should be lazy, and the operator should decide whether to read it based on if null handling is enabled



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -54,6 +55,7 @@
 @SuppressWarnings("rawtypes")
 public class IntermediateResultsBlock implements Block {
   private DataSchema _dataSchema;
+  private boolean _isNullHandlingEnabled;

Review Comment:
   (minor) Rename to `_nullHandlingEnabled`, same for other places



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/query/SelectionOrderByOperator.java:
##########
@@ -141,13 +143,21 @@ private Comparator<Object[]> getComparator() {
       multipliers[i] = _orderByExpressions.get(valueIndex).isAsc() ? -1 : 1;
     }
 
-    return (o1, o2) -> {
+    return (Object[] o1, Object[] o2) -> {
       for (int i = 0; i < numValuesToCompare; i++) {
         int index = valueIndices[i];
 
         // TODO: Evaluate the performance of casting to Comparable and avoid the switch
         Object v1 = o1[index];
         Object v2 = o2[index];
+        if (_isNullHandlingEnabled) {

Review Comment:
   Move this check outside



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -311,16 +337,51 @@ private DataTable getResultDataTable()
       throws IOException {
     DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(_dataSchema);
     ColumnDataType[] storedColumnDataTypes = _dataSchema.getStoredColumnDataTypes();
+    int numColumns = _dataSchema.size();
     Iterator<Record> iterator = _table.iterator();
-    while (iterator.hasNext()) {
-      Record record = iterator.next();
-      dataTableBuilder.startRow();
-      int columnIndex = 0;
-      for (Object value : record.getValues()) {
-        setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
-        columnIndex++;
+    RoaringBitmap[] nullBitmaps = null;
+    if (_isNullHandlingEnabled) {
+      nullBitmaps = new RoaringBitmap[numColumns];
+      Object[] colDefaultNullValues = new Object[numColumns];
+      for (int colId = 0; colId < numColumns; colId++) {
+        if (storedColumnDataTypes[colId] != ColumnDataType.OBJECT) {
+          // Store a dummy value that is both a valid numeric, and a valid hex encoded value.
+          String specialVal = "30";
+          colDefaultNullValues[colId] = storedColumnDataTypes[colId].toDataType().convert(specialVal);
+        }
+        nullBitmaps[colId] = new RoaringBitmap();
+      }
+      int rowId = 0;
+      while (iterator.hasNext()) {
+        Object[] values = iterator.next().getValues();
+        dataTableBuilder.startRow();
+        for (int columnIndex = 0; columnIndex < values.length; columnIndex++) {
+          Object value = values[columnIndex];
+          if (value == null) {
+            value = colDefaultNullValues[columnIndex];
+            nullBitmaps[columnIndex].add(rowId);
+          }
+          setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
+        }
+        dataTableBuilder.finishRow();
+        rowId++;
+      }
+    } else {
+      while (iterator.hasNext()) {
+        Record record = iterator.next();
+        dataTableBuilder.startRow();
+        int columnIndex = 0;
+        for (Object value : record.getValues()) {
+          setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
+          columnIndex++;
+        }
+        dataTableBuilder.finishRow();
+      }
+    }
+    if (_isNullHandlingEnabled && DataTableFactory.getDataTableVersion() >= DataTableFactory.VERSION_4) {
+      for (int colId = 0; colId < numColumns; colId++) {

Review Comment:
   This part can be moved into the previous if check



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -390,28 +452,57 @@ private DataTable getAggregationResultDataTable()
       columnNames[i] = aggregationFunction.getColumnName();
       columnDataTypes[i] = aggregationFunction.getIntermediateResultColumnType();
     }
+    RoaringBitmap[] nullBitmaps = null;
+    Object[] colDefaultNullValues = null;
+    if (_isNullHandlingEnabled) {

Review Comment:
   Same comments as the previous method



##########
pinot-common/src/thrift/query.thrift:
##########
@@ -37,6 +37,7 @@ struct PinotQuery {
   11: optional map<string, string> queryOptions;
   12: optional bool explain;
   13: optional map<Expression, Expression> expressionOverrideHints;
+  14: optional bool nullHandlingEnabled;

Review Comment:
   Let's not add this as a separate thrift field. It can be added as a query option. See `QueryOptionsUtils` for more query option examples



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/streaming/StreamingSelectionOnlyCombineOperator.java:
##########
@@ -50,9 +50,7 @@ public class StreamingSelectionOnlyCombineOperator extends BaseCombineOperator {
   private static final String EXPLAIN_NAME = "SELECT_STREAMING_COMBINE";
 
   // Special IntermediateResultsBlock to indicate that this is the last results block for an operator
-  private static final IntermediateResultsBlock LAST_RESULTS_BLOCK =

Review Comment:
   We shouldn't need to change this



##########
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java:
##########
@@ -84,9 +84,23 @@ public TableResizer(DataSchema dataSchema, QueryContext queryContext) {
       _orderByValueExtractors[i] = getOrderByValueExtractor(orderByExpression.getExpression());
       comparators[i] = orderByExpression.isAsc() ? Comparator.naturalOrder() : Comparator.reverseOrder();
     }
+    boolean isNullHandlingEnabled = queryContext.isNullHandlingEnabled();
     _intermediateRecordComparator = (o1, o2) -> {
       for (int i = 0; i < _numOrderByExpressions; i++) {
-        int result = comparators[i].compare(o1._values[i], o2._values[i]);
+        Object v1 = o1._values[i];
+        Object v2 = o2._values[i];
+        if (isNullHandlingEnabled) {

Review Comment:
   We can move this check outside to avoid overhead:
   ```
   if (queryContext.isNullHandlingEnabled()) {
     _intermediateRecordComparator = ...
   } else {
     _intermediateRecordComparator = ...
   }



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java:
##########
@@ -49,27 +50,25 @@ public RowDataBlock(ByteBuffer byteBuffer)
     computeBlockObjectConstants();
   }
 
+  @Nullable
   @Override
   public RoaringBitmap getNullRowIds(int colId) {
     // _fixedSizeData stores two ints per col's null bitmap: offset, and length.
     int position = _numRows * _rowSizeInBytes + colId * Integer.BYTES * 2;
-    if (position >= _fixedSizeData.limit()) {
+    if (_fixedSizeData == null || position >= _fixedSizeData.limit()) {

Review Comment:
   (minor) The first `null` check seems redundant, as we don't have this check in other methods



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -145,6 +167,10 @@ public void setDataSchema(DataSchema dataSchema) {
     _dataSchema = dataSchema;
   }
 
+  public boolean isNullHandlingEnabled() {

Review Comment:
   Is this for testing only? Ideally this info shouldn't be retrieved from the intermediate result, but from the query context



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/TransformOperator.java:
##########
@@ -95,6 +96,14 @@ public TransformResultMetadata getResultMetadata(ExpressionContext expression) {
     return _transformFunctionMap.get(expression).getResultMetadata();
   }
 
+  public NullValueVectorReader getNullValueVectorReader(ExpressionContext expression) {

Review Comment:
   This method should be in the projection layer instead of the transform layer. In the transform layer, we should be able to resolve the null values for the expression, instead of the column



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/MinMaxValueBasedSelectionOrderByCombineOperator.java:
##########
@@ -60,9 +60,7 @@ public class MinMaxValueBasedSelectionOrderByCombineOperator extends BaseCombine
 
   // For min/max value based combine, when a thread detects that no more segments need to be processed, it inserts this
   // special IntermediateResultsBlock into the BlockingQueue to awake the main thread
-  private static final IntermediateResultsBlock LAST_RESULTS_BLOCK =
-      new IntermediateResultsBlock(new DataSchema(new String[0], new DataSchema.ColumnDataType[0]),
-          Collections.emptyList());
+  private final IntermediateResultsBlock _lastResultsBlock;

Review Comment:
   This shouldn't need to be changed



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -311,16 +337,51 @@ private DataTable getResultDataTable()
       throws IOException {
     DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(_dataSchema);
     ColumnDataType[] storedColumnDataTypes = _dataSchema.getStoredColumnDataTypes();
+    int numColumns = _dataSchema.size();
     Iterator<Record> iterator = _table.iterator();
-    while (iterator.hasNext()) {
-      Record record = iterator.next();
-      dataTableBuilder.startRow();
-      int columnIndex = 0;
-      for (Object value : record.getValues()) {
-        setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
-        columnIndex++;
+    RoaringBitmap[] nullBitmaps = null;
+    if (_isNullHandlingEnabled) {
+      nullBitmaps = new RoaringBitmap[numColumns];
+      Object[] colDefaultNullValues = new Object[numColumns];
+      for (int colId = 0; colId < numColumns; colId++) {
+        if (storedColumnDataTypes[colId] != ColumnDataType.OBJECT) {
+          // Store a dummy value that is both a valid numeric, and a valid hex encoded value.
+          String specialVal = "30";
+          colDefaultNullValues[colId] = storedColumnDataTypes[colId].toDataType().convert(specialVal);

Review Comment:
   We can save the default value per type as constants instead of calculate on the fly. Using the default value for dimension should be good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on PR #8927:
URL: https://github.com/apache/pinot/pull/8927#issuecomment-1211111773

   Related to #8697 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r918243552


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +53,26 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector();

Review Comment:
   Moved the logic to `getNullBitmap()` method. Please note that this method is called only when `nullHandlingEnabled` is set to true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921754410


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,64 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean nullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean nullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,

Review Comment:
   This is needed as we will be supporting null handling in aggregation functions including Distinct in an upcoming PR. I can remove it from this PR if preferred.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921753864


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java:
##########
@@ -49,27 +50,25 @@ public RowDataBlock(ByteBuffer byteBuffer)
     computeBlockObjectConstants();
   }
 
+  @Nullable
   @Override
   public RoaringBitmap getNullRowIds(int colId) {
     // _fixedSizeData stores two ints per col's null bitmap: offset, and length.
     int position = _numRows * _rowSizeInBytes + colId * Integer.BYTES * 2;
-    if (position >= _fixedSizeData.limit()) {
+    if (_fixedSizeData == null || position >= _fixedSizeData.limit()) {

Review Comment:
   @Jackie-Jiang getNullRowIds can be called using the static method `DistinctTable fromByteBuffer(ByteBuffer byteBuffer)`. In this method, we don't have an easy way to figure out whether nullHandlingEnabled is true so we depends on calling `getNullRowIds` to figure out if we should handle nulls.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921754868


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -390,28 +446,55 @@ private DataTable getAggregationResultDataTable()
       columnNames[i] = aggregationFunction.getColumnName();
       columnDataTypes[i] = aggregationFunction.getIntermediateResultColumnType();
     }
+    RoaringBitmap[] nullBitmaps = null;

Review Comment:
   There is a separate PR for handling nulls in aggregation functions and we will need to handle nulls in aggregation intermediate results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #8927:
URL: https://github.com/apache/pinot/pull/8927


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921833051


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -72,47 +78,122 @@ public int getGlobalGroupKeyUpperBound() {
   @Override
   public void generateKeysForBlock(TransformBlock transformBlock, int[] groupKeys) {
     BlockValSet blockValSet = transformBlock.getBlockValueSet(_groupByExpression);
+    RoaringBitmap nullBitmap = blockValSet.getNullBitmap();
     int numDocs = transformBlock.getNumDocs();
 
     switch (_storedType) {
       case INT:
         int[] intValues = blockValSet.getIntValuesSV();
+        if (_nullHandlingEnabled) {
+          if (nullBitmap != null && nullBitmap.getCardinality() < numDocs) {
+            for (int i = 0; i < numDocs; i++) {
+              groupKeys[i] = nullBitmap.contains(i) ? getKeyForNullValue() : getKeyForValue(intValues[i]);
+            }
+          } else if (numDocs > 0) {
+            _groupIdForNullValue = _numGroups++;
+            Arrays.fill(groupKeys, _groupIdForNullValue);

Review Comment:
   Not sure I follow. I will make the change but would like to understand under what circumstances there might be already an id associated w/ `null`? This code gets executed when input is all null values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921930370


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +55,30 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    _nullBitmapSet = false;

Review Comment:
   _nullBitmap is null by default and after initialization it can be either null or non-null. _nullBitmapSet is there to differentiate between _nullBitmap is null and initialized vs. not-initialized. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917457959


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +53,26 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector();

Review Comment:
   Updated to do computation lazily, and removed null check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r916439619


##########
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java:
##########
@@ -84,9 +84,23 @@ public TableResizer(DataSchema dataSchema, QueryContext queryContext) {
       _orderByValueExtractors[i] = getOrderByValueExtractor(orderByExpression.getExpression());
       comparators[i] = orderByExpression.isAsc() ? Comparator.naturalOrder() : Comparator.reverseOrder();
     }
+    boolean isNullHandlingEnabled = queryContext.isNullHandlingEnabled();
     _intermediateRecordComparator = (o1, o2) -> {
       for (int i = 0; i < _numOrderByExpressions; i++) {
-        int result = comparators[i].compare(o1._values[i], o2._values[i]);
+        Object v1 = o1._values[i];
+        Object v2 = o2._values[i];
+        if (isNullHandlingEnabled) {

Review Comment:
   I did not do that at first because it requires writing duplicate code .. makes sense for perf reasons.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r916446120


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java:
##########
@@ -49,27 +50,25 @@ public RowDataBlock(ByteBuffer byteBuffer)
     computeBlockObjectConstants();
   }
 
+  @Nullable
   @Override
   public RoaringBitmap getNullRowIds(int colId) {
     // _fixedSizeData stores two ints per col's null bitmap: offset, and length.
     int position = _numRows * _rowSizeInBytes + colId * Integer.BYTES * 2;
-    if (position >= _fixedSizeData.limit()) {
+    if (_fixedSizeData == null || position >= _fixedSizeData.limit()) {

Review Comment:
   `DataTableUtilsTest.testBuildEmptyDataTable()` ends up calling `getNullRowIds` w/ a null `_fixedSizeData`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917456694


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -390,28 +452,57 @@ private DataTable getAggregationResultDataTable()
       columnNames[i] = aggregationFunction.getColumnName();
       columnDataTypes[i] = aggregationFunction.getIntermediateResultColumnType();
     }
+    RoaringBitmap[] nullBitmaps = null;
+    Object[] colDefaultNullValues = null;
+    if (_isNullHandlingEnabled) {
+      colDefaultNullValues = new Object[numAggregationFunctions];
+      nullBitmaps = new RoaringBitmap[numAggregationFunctions];
+      for (int i = 0; i < numAggregationFunctions; i++) {
+        if (columnDataTypes[i] != ColumnDataType.OBJECT) {
+          // Store a dummy value that is both a valid numeric, and a valid hex encoded value.
+          String specialVal = "30";
+          colDefaultNullValues[i] = columnDataTypes[i].toDataType().convert(specialVal);
+        }
+        nullBitmaps[i] = new RoaringBitmap();
+      }
+    }
 
     // Build the data table.
     DataTableBuilder dataTableBuilder =
         DataTableFactory.getDataTableBuilder(new DataSchema(columnNames, columnDataTypes));
     dataTableBuilder.startRow();
     for (int i = 0; i < numAggregationFunctions; i++) {
+      Object value = _aggregationResult.get(i);
+      // OBJECT (e.g. DistinctTable) calls toBytes() (e.g. DistinctTable.toBytes()) which takes care of replacing nulls
+      // with default values, and building presence vector and serializing both.
+      if (_isNullHandlingEnabled && columnDataTypes[i] != ColumnDataType.OBJECT) {

Review Comment:
   We need to override null value before calling switch statement (casting to Number and calling longValue/doubleValue throws an exception).
   Btw, this method operators on aggregation results which has a very low dimension compared to the raw data dimension.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r917454162


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -311,16 +337,51 @@ private DataTable getResultDataTable()
       throws IOException {
     DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(_dataSchema);
     ColumnDataType[] storedColumnDataTypes = _dataSchema.getStoredColumnDataTypes();
+    int numColumns = _dataSchema.size();
     Iterator<Record> iterator = _table.iterator();
-    while (iterator.hasNext()) {
-      Record record = iterator.next();
-      dataTableBuilder.startRow();
-      int columnIndex = 0;
-      for (Object value : record.getValues()) {
-        setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
-        columnIndex++;
+    RoaringBitmap[] nullBitmaps = null;
+    if (_isNullHandlingEnabled) {
+      nullBitmaps = new RoaringBitmap[numColumns];
+      Object[] colDefaultNullValues = new Object[numColumns];
+      for (int colId = 0; colId < numColumns; colId++) {
+        if (storedColumnDataTypes[colId] != ColumnDataType.OBJECT) {
+          // Store a dummy value that is both a valid numeric, and a valid hex encoded value.
+          String specialVal = "30";
+          colDefaultNullValues[colId] = storedColumnDataTypes[colId].toDataType().convert(specialVal);

Review Comment:
   I done this way because BIG_DECIMAL is not supported as a dimension/time data type. Will handle it separately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921755353


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,64 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean nullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _nullHandlingEnabled = nullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean nullHandlingEnabled) {

Review Comment:
   I have a separate changes for handling nulls in aggregations functions and we will need to handle nullable intermediate results. I can remove this constructor if you prefer from this PR for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921831490


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -51,16 +53,20 @@ public class NoDictionarySingleColumnGroupKeyGenerator implements GroupKeyGenera
   private final DataType _storedType;
   private final Map _groupKeyMap;
   private final int _globalGroupIdUpperBound;
+  private final boolean _nullHandlingEnabled;

Review Comment:
   Added TODO. Will address in next PR since this one became big.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r922455838


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -50,6 +55,30 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS
     _dataBlockCache = dataBlockCache;
     _column = column;
     _dataSource = dataSource;
+    _nullBitmapSet = false;

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921819679


##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawDoubleSingleColumnDistinctOnlyExecutor.java:
##########
@@ -40,10 +42,15 @@ public boolean process(TransformBlock transformBlock) {
     int numDocs = transformBlock.getNumDocs();
     if (blockValueSet.isSingleValue()) {
       double[] values = blockValueSet.getDoubleValuesSV();
+      RoaringBitmap nullBitmap = blockValueSet.getNullBitmap();
       for (int i = 0; i < numDocs; i++) {
-        _valueSet.add(values[i]);
-        if (_valueSet.size() >= _limit) {
-          return true;
+        if (nullBitmap != null && nullBitmap.contains(i)) {
+          _numNulls = 1;
+        } else {
+          _valueSet.add(values[i]);
+          if (_valueSet.size() >= _limit - _numNulls) {
+            return true;
+          }
         }
       }
     } else {

Review Comment:
   I don't handle any MV columns yet. Added TODO to address later in separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r921821144


##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawDoubleSingleColumnDistinctOnlyExecutor.java:
##########
@@ -40,10 +42,15 @@ public boolean process(TransformBlock transformBlock) {
     int numDocs = transformBlock.getNumDocs();
     if (blockValueSet.isSingleValue()) {
       double[] values = blockValueSet.getDoubleValuesSV();
+      RoaringBitmap nullBitmap = blockValueSet.getNullBitmap();

Review Comment:
   Some distinct only, and distinct w/ order by classes share more code than others. We can keep it this way for now and add TODO to address later?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r906996320


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,69 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean isNullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      DataSchema dataSchema, boolean isNullHandlingEnabled) {
+    _aggregationFunctions = aggregationFunctions;
+    _aggregationResult = aggregationResult;
+    _dataSchema = dataSchema;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult}.
    */
   public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
-      @Nullable AggregationGroupByResult aggregationGroupByResults, DataSchema dataSchema) {
+      @Nullable AggregationGroupByResult aggregationGroupByResults, DataSchema dataSchema,
+      boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationGroupByResult = aggregationGroupByResults;
     _dataSchema = dataSchema;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult} and
    * with a collection of intermediate records.
    */
   public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
-      Collection<IntermediateRecord> intermediateRecords, DataSchema dataSchema) {
+      Collection<IntermediateRecord> intermediateRecords, DataSchema dataSchema, boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _dataSchema = dataSchema;
     _intermediateRecords = intermediateRecords;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   public IntermediateResultsBlock(Table table) {
     _table = table;
-    _dataSchema = table.getDataSchema();
+    if (_table != null) {

Review Comment:
   Not needed anymore.



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByOrderByCombineOperator.java:
##########
@@ -235,11 +235,15 @@ protected IntermediateResultsBlock mergeResults()
     }
 
     IndexedTable indexedTable = _indexedTable;
-    indexedTable.finish(false);
+    if (indexedTable != null) {

Review Comment:
   Not needed anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org