You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/11/09 22:10:52 UTC

[GitHub] [pinot] walterddr commented on a diff in pull request #9755: [multistage] leaf-node return data in non-serialized format

walterddr commented on code in PR #9755:
URL: https://github.com/apache/pinot/pull/9755#discussion_r1018462041


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlockUtils.java:
##########
@@ -59,37 +60,67 @@ public static boolean isNoOpBlock(TransferableBlock transferableBlock) {
   }
 
   /**
-   *  Split a block into multiple block so that each block size is within maxBlockSize.
-   *  Currently, we only support split for row type dataBlock.
-   *  For columnar data block, we return the original data block.
-   *  Metadata data block split is not supported.
+   * Split block into multiple blocks. Default without any clean up.
    *
-   *  When row size is greater than maxBlockSize, we pack each row as a separate block.
+   * @see TransferableBlockUtils#splitBlock(TransferableBlock, BaseDataBlock.Type, int, boolean)
    */
   public static List<TransferableBlock> splitBlock(TransferableBlock block, BaseDataBlock.Type type, int maxBlockSize) {
+    return splitBlock(block, type, maxBlockSize, false);
+  }
+
+  /**
+   *
+   *  Split a block into multiple block so that each block size is within maxBlockSize. Currently,
+   *  <ul>
+   *    <li>For row data block, we split for row type dataBlock.</li>
+   *    <li>For columnar data block, we return the original data block.</li>
+   *    <li>For metadata block, split is not supported.</li>
+   *  </ul>
+   *
+   * @param block the data block
+   * @param type type of block
+   * @param maxBlockSize Each chunk of data is less than maxBlockSize
+   * @param isCleanupRequired whether clean up is required, set to true if the block is constructed from leaf stage.
+   * @return a list of data block chunks
+   */
+  public static List<TransferableBlock> splitBlock(TransferableBlock block, BaseDataBlock.Type type, int maxBlockSize,
+      boolean isCleanupRequired) {
     List<TransferableBlock> blockChunks = new ArrayList<>();
     if (type != DataBlock.Type.ROW) {
       return Collections.singletonList(block);
     } else {
-      int rowSizeInBytes = ((RowDataBlock) block.getDataBlock()).getRowSizeInBytes();
-      int numRowsPerChunk = maxBlockSize / rowSizeInBytes;
+      int estimatedRowSizeInBytes = block.getDataSchema().getColumnNames().length * 8;

Review Comment:
   yeah we dont have to be precise. (for example if there's variable length fields, then it will never be precise)
   unless we check exactly the bytes written into the buffers,  which seems overly complicated
   
   but will refactor our the constant



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org