You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/09/29 02:34:22 UTC

[GitHub] [pinot] 61yao opened a new pull request, #9485: [multistage] [enhancement] Split row data block when row size is too large

61yao opened a new pull request, #9485:
URL: https://github.com/apache/pinot/pull/9485

   Split data block when the size is too large (exceeds grpc limit).
   
   Set the limit to 4M for now. 
   The size is estimated from row size in bytes. 
   
   Columnar block split is not supported for now. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on a diff in pull request #9485: [multistage] [enhancement] Split row data block when row size is too large

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on code in PR #9485:
URL: https://github.com/apache/pinot/pull/9485#discussion_r988357249


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlockUtils.java:
##########
@@ -43,4 +50,41 @@ public static TransferableBlock getErrorTransferableBlock(Map<Integer, String> e
   public static boolean isEndOfStream(TransferableBlock transferableBlock) {
     return transferableBlock.isEndOfStreamBlock();
   }
+
+  /**
+   *  Split a block into multiple block so that each block size is within maxBlockSize.
+   *  Currently, we only support split for row type dataBlock.
+   *  For columnar data block, we return the original data block.
+   *  Metadata data block split is not supported.
+   *
+   *  When row size is greater than maxBlockSize, we pack each row as a separate block.
+   */
+  public static List<TransferableBlock> splitBlock(TransferableBlock block, BaseDataBlock.Type type, int maxBlockSize) {
+    List<TransferableBlock> blockChunks = new ArrayList<>();
+    if (type != BaseDataBlock.Type.ROW) {
+      return Collections.singletonList(block);
+    } else {
+      int rowSizeInBytes = ((RowDataBlock) block.getDataBlock()).getRowSizeInBytes();
+      int numRowsPerChunk = maxBlockSize / rowSizeInBytes;
+      Preconditions.checkState(numRowsPerChunk > 0, "row size too large for query engine to handle, abort!");

Review Comment:
   Can we include the offending rowSize in the message / exception ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on pull request #9485: [multistage] [enhancement] Split row data block when row size is too large

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on PR #9485:
URL: https://github.com/apache/pinot/pull/9485#issuecomment-1268984164

   May be add a TODO to split columnar block as well in future


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr merged pull request #9485: [multistage] [enhancement] Split row data block when row size is too large

Posted by GitBox <gi...@apache.org>.
walterddr merged PR #9485:
URL: https://github.com/apache/pinot/pull/9485


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on pull request #9485: [multistage] [enhancement] Split row data block when row size is too large

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on PR #9485:
URL: https://github.com/apache/pinot/pull/9485#issuecomment-1268986365

   > To confirm - metadata block / error block / eos block are not split right ?
   
   Confirmed offline. Only Row data block at this point


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org