You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/09 16:31:32 UTC

[GitHub] [pinot] nizarhejazi opened a new pull request, #8872: DataTable V4 implementation with per-column null bitmaps

nizarhejazi opened a new pull request, #8872:
URL: https://github.com/apache/pinot/pull/8872

   - DataTable V4 implementation with per-column null bitmaps.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899552287


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/DataBlockBuilder.java:
##########
@@ -289,7 +304,7 @@ public static ColumnarDataBlock buildFromColumns(List<Object[]> columns, DataSch
   }
 
   private static RowDataBlock buildRowBlock(DataBlockBuilder builder) {
-    return new RowDataBlock(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,
+    return new DataTableImplV4(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,

Review Comment:
   One issue here is that `getDataBlockVersionType()` is implemented differently between `RowDataBlock` (returns 1 now) and `DataTableImplV4` (returns 4). As a result the serialized version returned by `toBytes()` is different between the two. `DataTableFactory.getDataTable()` looks for version 4 explicitly to build `DataTableImplV4`.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899552287


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/DataBlockBuilder.java:
##########
@@ -289,7 +304,7 @@ public static ColumnarDataBlock buildFromColumns(List<Object[]> columns, DataSch
   }
 
   private static RowDataBlock buildRowBlock(DataBlockBuilder builder) {
-    return new RowDataBlock(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,
+    return new DataTableImplV4(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,

Review Comment:
   One issue here is that `getDataBlockVersionType()` is implemented differently between `RowDataBlock` (returns 1 now) and `DataTableImplV4` (returns 4). As a result the serialized version returned by `toBytes()` is different between the two. `DataTableFactory.getDataTable()` looks for version 4 explicitly to build `DataTableImplV4`. Please check this line in testDataBlockBuilderV4() test:
   `    BaseDataBlock dataBlock = (BaseDataBlock) DataTableFactory.getDataTable(rowDataBlock.toBytes());`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899553126


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV4.java:
##########
@@ -48,7 +50,26 @@ public DataTableImplV4(int numRows, DataSchema dataSchema, Map<String, Map<Integ
   }
 
   @Override
-  protected int getDataBlockVersionType() {
+  public RoaringBitmap getNullRowIds(int colId) {
+    // _fixedSizeData stores two ints per col's null bitmap: offset, and length.

Review Comment:
   This method depends on `_rowSizeInBytes` which is only defined in `RowDataBlock`. Will move to `RowDataBlock`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r898871943


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/DataBlockFactory.java:
##########
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datablock;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.core.common.datatable.DataTableBuilder;
+import org.apache.pinot.core.common.datatable.DataTableImplV4;
+
+
+public class DataBlockFactory {

Review Comment:
   this is used in `testDataBlockBuilderV4()` and it returns `BaseDataBlock` instead of `DataTable` so I can work with and test `DataBlock` directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899520561


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV4.java:
##########
@@ -48,7 +50,26 @@ public DataTableImplV4(int numRows, DataSchema dataSchema, Map<String, Map<Integ
   }
 
   @Override
-  protected int getDataBlockVersionType() {
+  public RoaringBitmap getNullRowIds(int colId) {
+    // _fixedSizeData stores two ints per col's null bitmap: offset, and length.

Review Comment:
   Move the implementation into the `BaseDataBlock`



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/DataBlockBuilder.java:
##########
@@ -289,7 +304,7 @@ public static ColumnarDataBlock buildFromColumns(List<Object[]> columns, DataSch
   }
 
   private static RowDataBlock buildRowBlock(DataBlockBuilder builder) {
-    return new RowDataBlock(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,
+    return new DataTableImplV4(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,

Review Comment:
   (minor) Should this be
   ```suggestion
       return new RowDataBlock(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,
   ```



##########
pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java:
##########
@@ -164,7 +170,7 @@ public void testAllDataTypes()
   }
 
   @Test
-  public void testV3V4Compatibility()
+  public void testV2V3Compatibility()

Review Comment:
   Can we revert the reordering of V2V3 and V3V4 compatibility test? It is really hard to track the changes here



##########
pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java:
##########
@@ -659,44 +849,57 @@ private void fillDataTableWithRandomData(DataTableBuilder dataTableBuilder,
       }
       dataTableBuilder.finishRow();
     }
+    if (nullBitmaps != null) {
+      for (int colId = 0; colId < numColumns; colId++) {
+        System.out.println("col ID: " + colId);

Review Comment:
   Remove this



##########
pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java:
##########
@@ -535,56 +574,207 @@ public void testDataTableMetadataBytesLayout()
     }
   }
 
+  private RowDataBlock getDataBlockFromRandomData(DataSchema schema, int numColumns)

Review Comment:
   I don't think this is needed. We should use `fillDataTableWithRandomData()`



##########
pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java:
##########
@@ -426,6 +429,40 @@ public void testV2V3Compatibility()
     Assert.assertEquals(newDataTable.getMetadata(), EXPECTED_METADATA);
   }
 
+  @Test
+  public void testDataBlockBuilderV4()

Review Comment:
   We should test `DataTableBuilder` here. `DataBlockBuilder` is used only for multi-stage engine, which is out of the scope of this test



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV4.java:
##########
@@ -48,7 +50,26 @@ public DataTableImplV4(int numRows, DataSchema dataSchema, Map<String, Map<Integ
   }
 
   @Override
-  protected int getDataBlockVersionType() {
+  public RoaringBitmap getNullRowIds(int colId) {
+    // _fixedSizeData stores two ints per col's null bitmap: offset, and length.
+    int position = _numRows * _rowSizeInBytes + colId * Integer.BYTES * 2;
+    _fixedSizeData.position(position);

Review Comment:
   We should check the length of the buffer. If `nullRowIds` are not passed, fixed size data length will be smaller than the position, and we should return `null` in that scenario



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899555219


##########
pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java:
##########
@@ -535,56 +574,207 @@ public void testDataTableMetadataBytesLayout()
     }
   }
 
+  private RowDataBlock getDataBlockFromRandomData(DataSchema schema, int numColumns)

Review Comment:
   I added this to be able to test: `DataBlockBuilder.buildFromRows`. Will remove.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899630994


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java:
##########
@@ -48,6 +50,29 @@ public RowDataBlock(ByteBuffer byteBuffer)
     computeBlockObjectConstants();
   }
 
+  @Override
+  public RoaringBitmap getNullRowIds(int colId) {

Review Comment:
   @Jackie-Jiang I mentioned in my other comment that `BaseDataBlock` has two implementations: `RowDataBlock` and `ColumnarDataBlock`. Only `RowDataBlock` has `_rowSizeInBytes` field which we use to calculate position.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8872:
URL: https://github.com/apache/pinot/pull/8872#issuecomment-1156324780

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8872?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8872](https://codecov.io/gh/apache/pinot/pull/8872?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b43388c) into [master](https://codecov.io/gh/apache/pinot/commit/c802786ea95cff67b83ff4d24f796b965e565854?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c802786) will **decrease** coverage by `54.91%`.
   > The diff coverage is `0.49%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8872       +/-   ##
   =============================================
   - Coverage     69.78%   14.87%   -54.92%     
   + Complexity     4679      170     -4509     
   =============================================
     Files          1808     1763       -45     
     Lines         94235    92367     -1868     
     Branches      14052    13854      -198     
   =============================================
   - Hits          65765    13739    -52026     
   - Misses        23908    77637    +53729     
   + Partials       4562      991     -3571     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.87% <0.49%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8872?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../java/org/apache/pinot/common/utils/DataTable.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVRhYmxlLmphdmE=) | `0.00% <ø> (-95.13%)` | :arrow_down: |
   | [.../org/apache/pinot/core/common/NullBitmapUtils.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vTnVsbEJpdG1hcFV0aWxzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...org/apache/pinot/core/common/ObjectSerDeUtils.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vT2JqZWN0U2VyRGVVdGlscy5qYXZh) | `0.00% <0.00%> (-90.39%)` | :arrow_down: |
   | [...che/pinot/core/common/datablock/BaseDataBlock.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YWJsb2NrL0Jhc2VEYXRhQmxvY2suamF2YQ==) | `0.00% <0.00%> (-76.10%)` | :arrow_down: |
   | [...pinot/core/common/datablock/ColumnarDataBlock.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YWJsb2NrL0NvbHVtbmFyRGF0YUJsb2NrLmphdmE=) | `0.00% <ø> (-58.63%)` | :arrow_down: |
   | [.../pinot/core/common/datablock/DataBlockBuilder.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YWJsb2NrL0RhdGFCbG9ja0J1aWxkZXIuamF2YQ==) | `0.00% <0.00%> (-74.25%)` | :arrow_down: |
   | [.../pinot/core/common/datablock/DataBlockFactory.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YWJsb2NrL0RhdGFCbG9ja0ZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/core/common/datablock/MetadataBlock.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YWJsb2NrL01ldGFkYXRhQmxvY2suamF2YQ==) | `0.00% <ø> (-55.56%)` | :arrow_down: |
   | [...ache/pinot/core/common/datablock/RowDataBlock.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YWJsb2NrL1Jvd0RhdGFCbG9jay5qYXZh) | `0.00% <ø> (-66.67%)` | :arrow_down: |
   | [...che/pinot/core/common/datatable/BaseDataTable.java](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0Jhc2VEYXRhVGFibGUuamF2YQ==) | `0.00% <0.00%> (-81.96%)` | :arrow_down: |
   | ... and [1390 more](https://codecov.io/gh/apache/pinot/pull/8872/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8872?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8872?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c802786...b43388c](https://codecov.io/gh/apache/pinot/pull/8872?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] nizarhejazi commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
nizarhejazi commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899556640


##########
pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java:
##########
@@ -426,6 +429,40 @@ public void testV2V3Compatibility()
     Assert.assertEquals(newDataTable.getMetadata(), EXPECTED_METADATA);
   }
 
+  @Test
+  public void testDataBlockBuilderV4()

Review Comment:
   DataTableBuilder is already tested by `testV3V4Compatibility`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on PR #8872:
URL: https://github.com/apache/pinot/pull/8872#issuecomment-1211111444

   Related to #8697 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r894889021


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV4.java:
##########
@@ -0,0 +1,384 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayOutputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+/**
+ * Datatable V4 implementation.
+ * The layout of serialized V4 datatable looks like:
+ * +-----------------------------------------------+
+ * | 17 integers of header:                        |
+ * | VERSION                                       |
+ * | NUM_ROWS                                      |
+ * | NUM_COLUMNS                                   |
+ * | EXCEPTIONS SECTION START OFFSET               |
+ * | EXCEPTIONS SECTION LENGTH                     |
+ * | DICTIONARY_MAP SECTION START OFFSET           |
+ * | DICTIONARY_MAP SECTION LENGTH                 |
+ * | DATA_SCHEMA SECTION START OFFSET              |
+ * | DATA_SCHEMA SECTION LENGTH                    |
+ * | FIXED_SIZE_DATA SECTION START OFFSET          |
+ * | FIXED_SIZE_DATA SECTION LENGTH                |
+ * | VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * | VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * | FIXED_SIZE_NULL_VECTOR SECTION START OFFSET   |
+ * | FIXED_SIZE_NULL_VECTOR SECTION LENGTH         |
+ * | VARIABLE_SIZE_NULL_VECTOR SECTION START OFFSET|
+ * | VARIABLE_SIZE_NULL_VECTOR SECTION LENGTH      |
+ * +-----------------------------------------------+
+ * | EXCEPTIONS SECTION                            |
+ * +-----------------------------------------------+
+ * | DICTIONARY_MAP SECTION                        |
+ * +-----------------------------------------------+
+ * | DATA_SCHEMA SECTION                           |
+ * +-----------------------------------------------+
+ * | FIXED_SIZE_DATA SECTION                       |
+ * +-----------------------------------------------+
+ * | VARIABLE_SIZE_DATA SECTION                    |
+ * +-----------------------------------------------+
+ * | FIXED_SIZE_NULL_VECTOR SECTION                |
+ * +-----------------------------------------------+
+ * | VARIABLE_SIZE_VECTOR SECTION SECTION          |
+ * +-----------------------------------------------+
+ * | METADATA LENGTH                               |
+ * | METADATA SECTION                              |
+ * +-----------------------------------------------+
+ */
+public class DataTableImplV4 extends DataTableImplV3 {
+  private static final int HEADER_SIZE = Integer.BYTES * 17;
+  // _errCodeToExceptionMap stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _errCodeToExceptionMap;
+  protected byte[] _fixedSizeNullVectorBytes;
+  protected ByteBuffer _fixedSizeNullVectorData;
+  protected byte[] _variableSizeNullVectorBytes;
+  protected ByteBuffer _variableSizeNullVectorData;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV4(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes, byte[] fixedSizeNullVectorBytes,
+      byte[] variableSizeNullVectorBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _errCodeToExceptionMap = new HashMap<>();
+    _fixedSizeNullVectorBytes = fixedSizeNullVectorBytes;
+    _fixedSizeNullVectorData = ByteBuffer.wrap(fixedSizeNullVectorBytes);
+    _variableSizeNullVectorBytes = variableSizeNullVectorBytes;
+    _variableSizeNullVectorData = ByteBuffer.wrap(variableSizeNullVectorBytes);
+  }
+
+  @Override
+  public MutableRoaringBitmap getNullRowIds(int colId) {
+    // _fixedSizeNullVectorData stores two ints per col: offset, and length.
+    if (_fixedSizeNullVectorData != null) {
+      _fixedSizeNullVectorData.position(colId * Integer.BYTES * 2);
+      int offset = _fixedSizeNullVectorData.getInt();
+      int bitmapLength = _fixedSizeNullVectorData.getInt();
+      MutableRoaringBitmap mutableRoaringBitmap = new MutableRoaringBitmap();
+      if (bitmapLength > 0) {
+        _variableSizeNullVectorData.position(offset);
+        for (int i = 0; i < bitmapLength; i++) {
+          mutableRoaringBitmap.add(_variableSizeNullVectorData.getInt());
+        }
+      }
+      return mutableRoaringBitmap;
+    }
+    return null;
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV4() {
+    _errCodeToExceptionMap = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV4(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int fixedSizeNullVectorStart = byteBuffer.getInt();
+    int fixedSizeNullVectorLength = byteBuffer.getInt();
+    int variableSizeNullVectorStart = byteBuffer.getInt();
+    int variableSizeNullVectorLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byteBuffer.position(exceptionsStart);
+      _errCodeToExceptionMap = deserializeExceptions(byteBuffer);
+    } else {
+      _errCodeToExceptionMap = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byteBuffer.position(dictionaryMapStart);
+      _dictionaryMap = deserializeDictionaryMap(byteBuffer);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byteBuffer.position(dataSchemaStart);
+      _dataSchema = DataSchema.fromBytes(byteBuffer);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read fixed size null vector data.
+    if (fixedSizeNullVectorLength != 0) {
+      _fixedSizeNullVectorBytes = new byte[fixedSizeNullVectorLength];
+      byteBuffer.position(fixedSizeNullVectorStart);
+      byteBuffer.get(_fixedSizeNullVectorBytes);
+      _fixedSizeNullVectorData = ByteBuffer.wrap(_fixedSizeNullVectorBytes);
+    } else {
+      _fixedSizeNullVectorBytes = null;
+      _fixedSizeNullVectorData = null;
+    }
+
+    // Read variable size null vector data.
+    if (variableSizeNullVectorLength != 0) {
+      _variableSizeNullVectorBytes = new byte[variableSizeNullVectorLength];
+      byteBuffer.position(variableSizeNullVectorStart);
+      byteBuffer.get(_variableSizeNullVectorBytes);
+      _variableSizeNullVectorData = ByteBuffer.wrap(_variableSizeNullVectorBytes);
+    } else {
+      _variableSizeNullVectorBytes = null;
+      _variableSizeNullVectorData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    if (metadataLength != 0) {
+      _metadata = deserializeMetadata(byteBuffer);
+    }
+  }
+
+  @Override
+  public int getVersion() {
+    return DataTableBuilder.VERSION_4;
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _errCodeToExceptionMap.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public void addException(int errCode, String errMsg) {
+    _errCodeToExceptionMap.put(errCode, errMsg);
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _errCodeToExceptionMap;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    writeLeadingSections(dataOutputStream);
+
+    // Add table serialization time metadata if thread timer is enabled.
+    if (ThreadTimer.isThreadCpuTimeMeasurementEnabled()) {
+      long responseSerializationCpuTimeNs = threadTimer.getThreadTimeNs();
+      getMetadata().put(MetadataKey.RESPONSE_SER_CPU_TIME_NS.getName(), String.valueOf(responseSerializationCpuTimeNs));
+    }
+
+    // Write metadata: length followed by actual metadata bytes.
+    // NOTE: We ignore metadata serialization time in "responseSerializationCpuTimeNs" as it's negligible while
+    // considering it will bring a lot code complexity.
+    byte[] metadataBytes = serializeMetadata();
+    dataOutputStream.writeInt(metadataBytes.length);
+    dataOutputStream.write(metadataBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  @Override
+  public DataTableImplV4 toMetadataOnlyDataTable() {
+    DataTableImplV4 metadataOnlyDataTable = new DataTableImplV4();
+    metadataOnlyDataTable._metadata.putAll(_metadata);
+    metadataOnlyDataTable._errCodeToExceptionMap.putAll(_errCodeToExceptionMap);
+    return metadataOnlyDataTable;
+  }
+
+  @Override
+  public DataTableImplV4 toDataOnlyDataTable() {
+    return new DataTableImplV4(_numRows, _dataSchema, _dictionaryMap, _fixedSizeDataBytes, _variableSizeDataBytes,
+        _fixedSizeNullVectorBytes, _variableSizeNullVectorBytes);
+  }
+
+  private void writeLeadingSections(DataOutputStream dataOutputStream)
+      throws IOException {
+    dataOutputStream.writeInt(DataTableBuilder.VERSION_4);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions(_errCodeToExceptionMap);
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap();
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size null vector section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeNullVectorBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeNullVectorBytes.length);
+      dataOffset += _fixedSizeNullVectorBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size null vector section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeNullVectorBytes != null) {
+      dataOutputStream.writeInt(_variableSizeNullVectorBytes.length);
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    // Write fixed size null vector bytes.
+    if (_fixedSizeNullVectorBytes != null) {
+      dataOutputStream.write(_fixedSizeNullVectorBytes);
+    }
+    // Write variable size null vector bytes.
+    if (_variableSizeNullVectorBytes != null) {

Review Comment:
   I think this should be inside the if branch for _fixedSizeNullVectorBytes ? It's all or nothing IIUC. Either the null bitmap information (including both bitmap length and then docIds) is sent or none of them is sent. In fact, we should assert if one of them is null and other is not



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV4.java:
##########
@@ -0,0 +1,384 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayOutputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+/**
+ * Datatable V4 implementation.
+ * The layout of serialized V4 datatable looks like:
+ * +-----------------------------------------------+
+ * | 17 integers of header:                        |
+ * | VERSION                                       |
+ * | NUM_ROWS                                      |
+ * | NUM_COLUMNS                                   |
+ * | EXCEPTIONS SECTION START OFFSET               |
+ * | EXCEPTIONS SECTION LENGTH                     |
+ * | DICTIONARY_MAP SECTION START OFFSET           |
+ * | DICTIONARY_MAP SECTION LENGTH                 |
+ * | DATA_SCHEMA SECTION START OFFSET              |
+ * | DATA_SCHEMA SECTION LENGTH                    |
+ * | FIXED_SIZE_DATA SECTION START OFFSET          |
+ * | FIXED_SIZE_DATA SECTION LENGTH                |
+ * | VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * | VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * | FIXED_SIZE_NULL_VECTOR SECTION START OFFSET   |
+ * | FIXED_SIZE_NULL_VECTOR SECTION LENGTH         |
+ * | VARIABLE_SIZE_NULL_VECTOR SECTION START OFFSET|
+ * | VARIABLE_SIZE_NULL_VECTOR SECTION LENGTH      |
+ * +-----------------------------------------------+
+ * | EXCEPTIONS SECTION                            |
+ * +-----------------------------------------------+
+ * | DICTIONARY_MAP SECTION                        |
+ * +-----------------------------------------------+
+ * | DATA_SCHEMA SECTION                           |
+ * +-----------------------------------------------+
+ * | FIXED_SIZE_DATA SECTION                       |
+ * +-----------------------------------------------+
+ * | VARIABLE_SIZE_DATA SECTION                    |
+ * +-----------------------------------------------+
+ * | FIXED_SIZE_NULL_VECTOR SECTION                |
+ * +-----------------------------------------------+
+ * | VARIABLE_SIZE_VECTOR SECTION SECTION          |
+ * +-----------------------------------------------+
+ * | METADATA LENGTH                               |
+ * | METADATA SECTION                              |
+ * +-----------------------------------------------+
+ */
+public class DataTableImplV4 extends DataTableImplV3 {
+  private static final int HEADER_SIZE = Integer.BYTES * 17;
+  // _errCodeToExceptionMap stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _errCodeToExceptionMap;
+  protected byte[] _fixedSizeNullVectorBytes;
+  protected ByteBuffer _fixedSizeNullVectorData;
+  protected byte[] _variableSizeNullVectorBytes;
+  protected ByteBuffer _variableSizeNullVectorData;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV4(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes, byte[] fixedSizeNullVectorBytes,
+      byte[] variableSizeNullVectorBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _errCodeToExceptionMap = new HashMap<>();
+    _fixedSizeNullVectorBytes = fixedSizeNullVectorBytes;
+    _fixedSizeNullVectorData = ByteBuffer.wrap(fixedSizeNullVectorBytes);
+    _variableSizeNullVectorBytes = variableSizeNullVectorBytes;
+    _variableSizeNullVectorData = ByteBuffer.wrap(variableSizeNullVectorBytes);
+  }
+
+  @Override
+  public MutableRoaringBitmap getNullRowIds(int colId) {
+    // _fixedSizeNullVectorData stores two ints per col: offset, and length.
+    if (_fixedSizeNullVectorData != null) {
+      _fixedSizeNullVectorData.position(colId * Integer.BYTES * 2);
+      int offset = _fixedSizeNullVectorData.getInt();
+      int bitmapLength = _fixedSizeNullVectorData.getInt();
+      MutableRoaringBitmap mutableRoaringBitmap = new MutableRoaringBitmap();
+      if (bitmapLength > 0) {
+        _variableSizeNullVectorData.position(offset);
+        for (int i = 0; i < bitmapLength; i++) {
+          mutableRoaringBitmap.add(_variableSizeNullVectorData.getInt());
+        }
+      }
+      return mutableRoaringBitmap;
+    }
+    return null;
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV4() {
+    _errCodeToExceptionMap = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV4(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int fixedSizeNullVectorStart = byteBuffer.getInt();
+    int fixedSizeNullVectorLength = byteBuffer.getInt();
+    int variableSizeNullVectorStart = byteBuffer.getInt();
+    int variableSizeNullVectorLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byteBuffer.position(exceptionsStart);
+      _errCodeToExceptionMap = deserializeExceptions(byteBuffer);
+    } else {
+      _errCodeToExceptionMap = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byteBuffer.position(dictionaryMapStart);
+      _dictionaryMap = deserializeDictionaryMap(byteBuffer);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byteBuffer.position(dataSchemaStart);
+      _dataSchema = DataSchema.fromBytes(byteBuffer);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read fixed size null vector data.
+    if (fixedSizeNullVectorLength != 0) {
+      _fixedSizeNullVectorBytes = new byte[fixedSizeNullVectorLength];
+      byteBuffer.position(fixedSizeNullVectorStart);
+      byteBuffer.get(_fixedSizeNullVectorBytes);
+      _fixedSizeNullVectorData = ByteBuffer.wrap(_fixedSizeNullVectorBytes);
+    } else {
+      _fixedSizeNullVectorBytes = null;
+      _fixedSizeNullVectorData = null;
+    }
+
+    // Read variable size null vector data.
+    if (variableSizeNullVectorLength != 0) {
+      _variableSizeNullVectorBytes = new byte[variableSizeNullVectorLength];
+      byteBuffer.position(variableSizeNullVectorStart);
+      byteBuffer.get(_variableSizeNullVectorBytes);
+      _variableSizeNullVectorData = ByteBuffer.wrap(_variableSizeNullVectorBytes);
+    } else {
+      _variableSizeNullVectorBytes = null;
+      _variableSizeNullVectorData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    if (metadataLength != 0) {
+      _metadata = deserializeMetadata(byteBuffer);
+    }
+  }
+
+  @Override
+  public int getVersion() {
+    return DataTableBuilder.VERSION_4;
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _errCodeToExceptionMap.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public void addException(int errCode, String errMsg) {
+    _errCodeToExceptionMap.put(errCode, errMsg);
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _errCodeToExceptionMap;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    writeLeadingSections(dataOutputStream);
+
+    // Add table serialization time metadata if thread timer is enabled.
+    if (ThreadTimer.isThreadCpuTimeMeasurementEnabled()) {
+      long responseSerializationCpuTimeNs = threadTimer.getThreadTimeNs();
+      getMetadata().put(MetadataKey.RESPONSE_SER_CPU_TIME_NS.getName(), String.valueOf(responseSerializationCpuTimeNs));
+    }
+
+    // Write metadata: length followed by actual metadata bytes.
+    // NOTE: We ignore metadata serialization time in "responseSerializationCpuTimeNs" as it's negligible while
+    // considering it will bring a lot code complexity.
+    byte[] metadataBytes = serializeMetadata();
+    dataOutputStream.writeInt(metadataBytes.length);
+    dataOutputStream.write(metadataBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  @Override
+  public DataTableImplV4 toMetadataOnlyDataTable() {
+    DataTableImplV4 metadataOnlyDataTable = new DataTableImplV4();
+    metadataOnlyDataTable._metadata.putAll(_metadata);
+    metadataOnlyDataTable._errCodeToExceptionMap.putAll(_errCodeToExceptionMap);
+    return metadataOnlyDataTable;
+  }
+
+  @Override
+  public DataTableImplV4 toDataOnlyDataTable() {
+    return new DataTableImplV4(_numRows, _dataSchema, _dictionaryMap, _fixedSizeDataBytes, _variableSizeDataBytes,
+        _fixedSizeNullVectorBytes, _variableSizeNullVectorBytes);
+  }
+
+  private void writeLeadingSections(DataOutputStream dataOutputStream)
+      throws IOException {
+    dataOutputStream.writeInt(DataTableBuilder.VERSION_4);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions(_errCodeToExceptionMap);
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap();
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size null vector section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeNullVectorBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeNullVectorBytes.length);
+      dataOffset += _fixedSizeNullVectorBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size null vector section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeNullVectorBytes != null) {
+      dataOutputStream.writeInt(_variableSizeNullVectorBytes.length);
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    // Write fixed size null vector bytes.
+    if (_fixedSizeNullVectorBytes != null) {
+      dataOutputStream.write(_fixedSizeNullVectorBytes);
+    }
+    // Write variable size null vector bytes.
+    if (_variableSizeNullVectorBytes != null) {

Review Comment:
   I think this should be inside the if branch for `_fixedSizeNullVectorBytes` ? It's all or nothing IIUC. Either the null bitmap information (including both bitmap length and then docIds) is sent or none of them is sent. In fact, we should assert if one of them is null and other is not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r898573213


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV4.java:
##########
@@ -19,36 +19,306 @@
 
 package org.apache.pinot.core.common.datatable;
 
+import java.io.DataOutputStream;
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.HashMap;
 import java.util.Map;
 import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.common.datablock.DataBlockUtils;
 import org.apache.pinot.core.common.datablock.RowDataBlock;
 import org.apache.pinot.spi.annotations.InterfaceStability;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
 /**
- * Datatable V4 Implementation is a wrapper around the Row-based data block.
+ * Datatable V4 implementation is a wrapper around the Row-based data block.
+ * The layout of serialized V4 datatable looks like:
+ * +-----------------------------------------------+
+ * | 17 integers of header:                        |
+ * | VERSION                                       |
+ * | NUM_ROWS                                      |
+ * | NUM_COLUMNS                                   |
+ * | EXCEPTIONS SECTION START OFFSET               |
+ * | EXCEPTIONS SECTION LENGTH                     |
+ * | DICTIONARY_MAP SECTION START OFFSET           |
+ * | DICTIONARY_MAP SECTION LENGTH                 |
+ * | DATA_SCHEMA SECTION START OFFSET              |
+ * | DATA_SCHEMA SECTION LENGTH                    |
+ * | FIXED_SIZE_DATA SECTION START OFFSET          |
+ * | FIXED_SIZE_DATA SECTION LENGTH                |
+ * | VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * | VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * | FIXED_SIZE_NULL_VECTOR SECTION START OFFSET   |
+ * | FIXED_SIZE_NULL_VECTOR SECTION LENGTH         |
+ * | VARIABLE_SIZE_NULL_VECTOR SECTION START OFFSET|
+ * | VARIABLE_SIZE_NULL_VECTOR SECTION LENGTH      |
+ * +-----------------------------------------------+
+ * | EXCEPTIONS SECTION                            |
+ * +-----------------------------------------------+
+ * | DICTIONARY_MAP SECTION                        |
+ * +-----------------------------------------------+
+ * | DATA_SCHEMA SECTION                           |
+ * +-----------------------------------------------+
+ * | FIXED_SIZE_DATA SECTION                       |
+ * +-----------------------------------------------+
+ * | VARIABLE_SIZE_DATA SECTION                    |
+ * +-----------------------------------------------+
+ * | FIXED_SIZE_NULL_VECTOR SECTION                |
+ * +-----------------------------------------------+
+ * | VARIABLE_SIZE_VECTOR SECTION SECTION          |
+ * +-----------------------------------------------+
+ * | METADATA LENGTH                               |
+ * | METADATA SECTION                              |
+ * +-----------------------------------------------+
  */
 @InterfaceStability.Evolving
 public class DataTableImplV4 extends RowDataBlock {
+  private static final int HEADER_SIZE = Integer.BYTES * 17;
+  protected byte[] _fixedSizeNullVectorBytes;
+  protected ByteBuffer _fixedSizeNullVectorData;
+  protected byte[] _variableSizeNullVectorBytes;
+  protected ByteBuffer _variableSizeNullVectorData;
+
+  public DataTableImplV4(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,

Review Comment:
   Move the changes to `BaseDataBlock`



##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -119,7 +119,8 @@ public enum ObjectType {
     LongLongPair(28),
     FloatLongPair(29),
     DoubleLongPair(30),
-    StringLongPair(31);
+    StringLongPair(31),
+    Missing(100);

Review Comment:
   Let's name it `Null` for clarity



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java:
##########
@@ -76,6 +77,8 @@ byte[] toBytes()
 
   String[] getStringArray(int rowId, int colId);
 
+  MutableRoaringBitmap getNullRowIds(int colId);

Review Comment:
   Suggest returning `RoaringBitmap` which has better performance



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV4.java:
##########
@@ -19,36 +19,306 @@
 
 package org.apache.pinot.core.common.datatable;
 
+import java.io.DataOutputStream;
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.HashMap;
 import java.util.Map;
 import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.common.datablock.DataBlockUtils;
 import org.apache.pinot.core.common.datablock.RowDataBlock;
 import org.apache.pinot.spi.annotations.InterfaceStability;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
 /**
- * Datatable V4 Implementation is a wrapper around the Row-based data block.
+ * Datatable V4 implementation is a wrapper around the Row-based data block.
+ * The layout of serialized V4 datatable looks like:
+ * +-----------------------------------------------+
+ * | 17 integers of header:                        |
+ * | VERSION                                       |
+ * | NUM_ROWS                                      |
+ * | NUM_COLUMNS                                   |
+ * | EXCEPTIONS SECTION START OFFSET               |
+ * | EXCEPTIONS SECTION LENGTH                     |
+ * | DICTIONARY_MAP SECTION START OFFSET           |
+ * | DICTIONARY_MAP SECTION LENGTH                 |
+ * | DATA_SCHEMA SECTION START OFFSET              |
+ * | DATA_SCHEMA SECTION LENGTH                    |
+ * | FIXED_SIZE_DATA SECTION START OFFSET          |
+ * | FIXED_SIZE_DATA SECTION LENGTH                |
+ * | VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * | VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * | FIXED_SIZE_NULL_VECTOR SECTION START OFFSET   |

Review Comment:
   We can put the bitmaps into existing `fixedSizeData` and `variableSizeData`. If no bitmap is provided, we don't write anything. If some bitmaps are provided, we store 2 ints per column in the fixed size data (variable size data buffer offset & length). On the reader side, we can derive whether null bitmap is available by checking the size of the fixed size data. The overhead would be minimal if null handling is not enabled, where the data table will be exactly the same as before.



##########
pinot-core/src/main/java/org/apache/pinot/core/common/NullBitmapUtils.java:
##########
@@ -0,0 +1,45 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common;
+
+import java.io.DataOutputStream;
+import java.io.IOException;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+
+
+public final class NullBitmapUtils {

Review Comment:
   If we switch to use `RoaringBitmap`, we already have a ser-de: `ObjectSerDeUtils.ROARING_BITMAP_SER_DE`



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/DataBlockFactory.java:
##########
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datablock;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.core.common.datatable.DataTableBuilder;
+import org.apache.pinot.core.common.datatable.DataTableImplV4;
+
+
+public class DataBlockFactory {

Review Comment:
   Do we need this factory? We should use `DataTableFactory`



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/DataBlockBuilder.java:
##########
@@ -76,7 +83,13 @@ private DataBlockBuilder(DataSchema dataSchema, BaseDataBlock.Type blockType) {
     }
   }
 
-  public static RowDataBlock buildFromRows(List<Object[]> rows, DataSchema dataSchema)
+  public void setNullRowIds(ImmutableRoaringBitmap nullBitmap)
+      throws IOException {
+    NullBitmapUtils.setNullRowIds(nullBitmap, _fixedSizeNullVectorOutputStream, _variableSizeNullVectorOutputStream);
+  }
+
+  public static RowDataBlock buildFromRows(List<Object[]> rows, ImmutableRoaringBitmap[] colNullBitmaps,

Review Comment:
   ```suggestion
     public static RowDataBlock buildFromRows(List<Object[]> rows, @Nullable ImmutableRoaringBitmap[] colNullBitmaps,
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -131,6 +132,10 @@ public int getValue() {
     }
 
     public static ObjectType getObjectType(Object value) {

Review Comment:
   ```suggestion
       public static ObjectType getObjectType(@Nullable Object value) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #8872:
URL: https://github.com/apache/pinot/pull/8872


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899389643


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/DataBlockFactory.java:
##########
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datablock;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.core.common.datatable.DataTableBuilder;
+import org.apache.pinot.core.common.datatable.DataTableImplV4;
+
+
+public class DataBlockFactory {

Review Comment:
   If it is not used in production code, I'd suggest not adding it for now. You may use `DataTableFactory` and cast the `DataTable` to a `BaseDataBlock`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8872: DataTable V4 implementation with per-column null bitmaps

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8872:
URL: https://github.com/apache/pinot/pull/8872#discussion_r899573706


##########
pinot-core/src/main/java/org/apache/pinot/core/common/NullBitmapUtils.java:
##########
@@ -0,0 +1,51 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import org.roaringbitmap.RoaringBitmap;
+
+
+public final class NullBitmapUtils {
+  private NullBitmapUtils() {
+  }
+
+  public static void setNullRowIds(RoaringBitmap nullBitmap, ByteArrayOutputStream fixedSizeByteArrayOutputStream,
+      ByteArrayOutputStream variableSizeDataByteArrayOutputStream)
+      throws IOException {
+    if (nullBitmap != null) {

Review Comment:
   Since we didn't store the `colId` of the null bitmap, we have to write all bitmaps. When the bitmap is null or empty, we can write the offset and 0



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/BaseDataBlock.java:
##########
@@ -203,7 +204,7 @@ public BaseDataBlock(ByteBuffer byteBuffer)
    * Return the int serialized form of the data block version and type.
    * @return
    */
-  protected abstract int getDataBlockVersionType();
+  public abstract int getDataBlockVersionType();

Review Comment:
   We might not want to expose this method. It is for internal usage only



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java:
##########
@@ -48,6 +50,29 @@ public RowDataBlock(ByteBuffer byteBuffer)
     computeBlockObjectConstants();
   }
 
+  @Override
+  public RoaringBitmap getNullRowIds(int colId) {

Review Comment:
   This should be part of the `BaseDataBlock`



##########
pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java:
##########
@@ -175,11 +177,7 @@ public void testV3V4Compatibility()
 
     DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
 
-    // TODO: verify data table compatibility across multi-stage and normal query engine.

Review Comment:
   Should we keep the TODOs? I don't think it is resolved yet



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/DataBlockBuilder.java:
##########
@@ -289,7 +304,7 @@ public static ColumnarDataBlock buildFromColumns(List<Object[]> columns, DataSch
   }
 
   private static RowDataBlock buildRowBlock(DataBlockBuilder builder) {
-    return new RowDataBlock(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,
+    return new DataTableImplV4(builder._numRows, builder._dataSchema, builder._reverseDictionaryMap,

Review Comment:
   I think the issue is from using `DataBlockBuilder` to build data table in the test. If we switch to use `DataTableBuilder` to build data table, it should work. Changing this can break the data block code. cc @walterddr 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org