You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/22 00:50:43 UTC

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8927: Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY

Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r903173704


##########
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java:
##########
@@ -84,9 +84,20 @@ public TableResizer(DataSchema dataSchema, QueryContext queryContext) {
       _orderByValueExtractors[i] = getOrderByValueExtractor(orderByExpression.getExpression());
       comparators[i] = orderByExpression.isAsc() ? Comparator.naturalOrder() : Comparator.reverseOrder();
     }
+    // TODO: return a diff. comparator that does not handle nulls when nullHandlingEnabled is false.
     _intermediateRecordComparator = (o1, o2) -> {
       for (int i = 0; i < _numOrderByExpressions; i++) {
-        int result = comparators[i].compare(o1._values[i], o2._values[i]);
+        Object v1 = o1._values[i];
+        Object v2 = o2._values[i];
+        if (v1 == null) {
+          if (v2 != null) {
+            // The default null ordering is NULLS LAST, regardless of the ordering direction.
+            return 1;
+          }
+        } else if (v2 == null) {
+          return -1;
+        }
+        int result = comparators[i].compare(v1, v2);

Review Comment:
   It can cause NPE when both v1 and v2 are `null`. Also I don't think it is wired correctly, or some test should already fail



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByOrderByCombineOperator.java:
##########
@@ -235,11 +235,15 @@ protected IntermediateResultsBlock mergeResults()
     }
 
     IndexedTable indexedTable = _indexedTable;
-    indexedTable.finish(false);
+    if (indexedTable != null) {

Review Comment:
   I saw several extra `null` checks introduced. Want to understand why because most likely this is because something is not wired correctly



##########
pinot-core/src/main/java/org/apache/pinot/core/common/RowBasedBlockValueFetcher.java:
##########
@@ -43,6 +44,15 @@ public Object[] getRow(int docId) {
     return row;
   }
 
+  public RoaringBitmap getColumnNullBitmap(int colId) {

Review Comment:
   I don't think the changes in this file is required since we don't directly read `null` into the row. The nullBitmap is not in row format, so we should directly read it from `BlockValSet` on the caller side



##########
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java:
##########
@@ -84,9 +84,20 @@ public TableResizer(DataSchema dataSchema, QueryContext queryContext) {
       _orderByValueExtractors[i] = getOrderByValueExtractor(orderByExpression.getExpression());
       comparators[i] = orderByExpression.isAsc() ? Comparator.naturalOrder() : Comparator.reverseOrder();
     }
+    // TODO: return a diff. comparator that does not handle nulls when nullHandlingEnabled is false.

Review Comment:
   Please address this TODO because this can cause performance regression



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java:
##########
@@ -42,6 +42,8 @@ public interface DataTable {
 
   Map<Integer, String> getExceptions();
 
+  int getVersion();

Review Comment:
   (minor) Remove the override in `BaseDataTable`



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java:
##########
@@ -53,23 +53,20 @@ public RowDataBlock(ByteBuffer byteBuffer)
   public RoaringBitmap getNullRowIds(int colId) {
     // _fixedSizeData stores two ints per col's null bitmap: offset, and length.
     int position = _numRows * _rowSizeInBytes + colId * Integer.BYTES * 2;
-    if (position >= _fixedSizeData.limit()) {
+    if (_fixedSizeData == null || position >= _fixedSizeData.limit()) {
       return null;
     }
 
     _fixedSizeData.position(position);
     int offset = _fixedSizeData.getInt();
     int bytesLength = _fixedSizeData.getInt();
-    RoaringBitmap nullBitmap;
     if (bytesLength > 0) {
       _variableSizeData.position(offset);
       byte[] nullBitmapBytes = new byte[bytesLength];
       _variableSizeData.get(nullBitmapBytes);
-      nullBitmap = ObjectSerDeUtils.ROARING_BITMAP_SER_DE.deserialize(nullBitmapBytes);
-    } else {
-      nullBitmap = new RoaringBitmap();
+      return ObjectSerDeUtils.ROARING_BITMAP_SER_DE.deserialize(nullBitmapBytes);
     }
-    return nullBitmap;
+    return new RoaringBitmap();

Review Comment:
   Not introduced in this PR, but maybe we should return `null` here



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,69 @@ public IntermediateResultsBlock() {
   /**
    * Constructor for selection result.
    */
-  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult) {
+  public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]> selectionResult,
+      boolean isNullHandlingEnabled) {
     _dataSchema = dataSchema;
     _selectionResult = selectionResult;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation result.
    * <p>For aggregation only, the result is a list of values.
    * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
    */
-  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult) {
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationResult = aggregationResult;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
+  }
+
+  /**
+   * Constructor for aggregation result.
+   * <p>For aggregation only, the result is a list of values.
+   * <p>For aggregation group-by, the result is a list of maps from group keys to aggregation values.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions, List<Object> aggregationResult,
+      DataSchema dataSchema, boolean isNullHandlingEnabled) {
+    _aggregationFunctions = aggregationFunctions;
+    _aggregationResult = aggregationResult;
+    _dataSchema = dataSchema;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult}.
    */
   public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
-      @Nullable AggregationGroupByResult aggregationGroupByResults, DataSchema dataSchema) {
+      @Nullable AggregationGroupByResult aggregationGroupByResults, DataSchema dataSchema,
+      boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _aggregationGroupByResult = aggregationGroupByResults;
     _dataSchema = dataSchema;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   /**
    * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult} and
    * with a collection of intermediate records.
    */
   public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
-      Collection<IntermediateRecord> intermediateRecords, DataSchema dataSchema) {
+      Collection<IntermediateRecord> intermediateRecords, DataSchema dataSchema, boolean isNullHandlingEnabled) {
     _aggregationFunctions = aggregationFunctions;
     _dataSchema = dataSchema;
     _intermediateRecords = intermediateRecords;
+    _isNullHandlingEnabled = isNullHandlingEnabled;
   }
 
   public IntermediateResultsBlock(Table table) {
     _table = table;
-    _dataSchema = table.getDataSchema();
+    if (_table != null) {

Review Comment:
   Why do we need to add this check?



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -311,16 +343,50 @@ private DataTable getResultDataTable()
       throws IOException {
     DataTableBuilder dataTableBuilder = DataTableFactory.getDataTableBuilder(_dataSchema);
     ColumnDataType[] storedColumnDataTypes = _dataSchema.getStoredColumnDataTypes();
+    int numColumns = _dataSchema.size();
     Iterator<Record> iterator = _table.iterator();
-    while (iterator.hasNext()) {
-      Record record = iterator.next();
-      dataTableBuilder.startRow();
-      int columnIndex = 0;
-      for (Object value : record.getValues()) {
-        setDataTableColumn(storedColumnDataTypes[columnIndex], dataTableBuilder, columnIndex, value);
-        columnIndex++;
+    RoaringBitmap[] nullBitmaps = null;
+    if (_isNullHandlingEnabled) {
+      nullBitmaps = new RoaringBitmap[numColumns];
+      Object[] colDefaultNullValues = new Object[numColumns];
+      for (int colId = 0; colId < numColumns; colId++) {
+        if (storedColumnDataTypes[colId] != ColumnDataType.OBJECT) {
+          colDefaultNullValues[colId] = FieldSpec.getDefaultNullValue(FieldSpec.FieldType.METRIC,

Review Comment:
   Several data types are not supported as METRIC. We should allow setting `null` values in `setDataTableColumn()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org