You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/06/23 12:07:31 UTC

[GitHub] [pulsar] BewareMyPower opened a new pull request, #16196: [fix][Java Client] Fix large message sometimes cannot be split into hunks after PIP-132

BewareMyPower opened a new pull request, #16196:
URL: https://github.com/apache/pulsar/pull/16196

   Fixes https://github.com/apache/pulsar/issues/16195
   
   ### Motivation
   
   [PIP-132](https://github.com/apache/pulsar/pull/14007) considers the
   message metadata size when computing the payload chunk size and the
   number of chunks. However, it could make some messages whose size is
   less than `maxMessageSize` cannot be sent. There are two reasons:
   1. The `MessageMetadata` will be updated after computing the payload
      chunk size, i.e. the actual metadata size would be greater.
   2. `OpSendMsg#getMessageHeaderAndPayloadSize` doesn't exclude all bytes
      other than the metadata and payload, e.g. the 4 bytes checksum field.
   
   For example, if the max message size is 100, send a string whose size is
   60 with chunking enabled.
   1. The initial metadata size is 25 so the chunk size is 75, the message
      won't be spit into chunks.
   2. After `serializeAndSendMessage`, the metadata size becomes 32, so the
      serialized header's total size is 4 + 8 + 6 + 4 + 32 = 54, and the
     total size is 54 + 60 = 114, see `headerContentSize` in
     `serializeCommandSendWithSize`.
   3. In `getMessageHeaderAndPayloadSize`, the returned value is computed
      by 114 - 8 - 4 = 102 > 100. The 6 bytes magic and checksum and 4
      bytes metadata length field are not included.
   
   ### Modifications
   
   - Update the message metadata before computing the chunk size.
   - Compute the correct size in `getMessageHeaderAndPayloadSize`.
   
   ### Verifying this change
   
   Add `testChunkSize` to verify all sizes in range [1, maxMessageSize] can
   be sent successfully when chunking is enabled.
   
   ### Documentation
   
   Check the box below or label this PR directly.
   
   Need to update docs? 
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
     
   - [x] `doc-not-needed` 
   (Please explain why)
     
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-complete`
   (Docs have been already added)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on a diff in pull request #16196: [fix][Java Client] Fix large message sometimes cannot be split into chunks after PIP-132

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on code in PR #16196:
URL: https://github.com/apache/pulsar/pull/16196#discussion_r906124600


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -488,6 +488,10 @@ public void sendAsync(Message<?> message, SendCallback callback) {
             return;
         }
 
+        // Update the message metadata before computing the payload chunk size to avoid a large message cannot be split
+        // into chunks.
+        final long sequenceId = updateMessageMetadata(msgMetadata, uncompressedSize);

Review Comment:
   Yes, we can skip `isMessageSizeExceeded` if client chunking is enabled. But I think the existing changes in this PR make the computation of the payload chunk size **more accurate**.
   
   Regarding to the metadata update after the computation,
   
   >  msgMetadata.setUuid(...) and msgMetadata.setChunkId().....
   
   These metadata fields are updated only for chunks:
   
   ```protobuf
       optional string uuid = 26;
       optional int32 num_chunks_from_msg = 27;
       optional int32 total_chunk_msg_size = 28;
       optional int32 chunk_id = 29;
   ```
   
   In this case, `isMessageSizeExceeded` is already skipped.
   
   And yes, `ProducerImpl#encryptMessage` could update the metadata as well. I think it's better to skip the message size check when encryption is enabled, because message encryption could also increase the size of the payload buffer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Jason918 commented on a diff in pull request #16196: [fix][Java Client] Fix large message sometimes cannot be split into chunks after PIP-132

Posted by GitBox <gi...@apache.org>.
Jason918 commented on code in PR #16196:
URL: https://github.com/apache/pulsar/pull/16196#discussion_r906066253


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -488,6 +488,10 @@ public void sendAsync(Message<?> message, SendCallback callback) {
             return;
         }
 
+        // Update the message metadata before computing the payload chunk size to avoid a large message cannot be split
+        // into chunks.
+        final long sequenceId = updateMessageMetadata(msgMetadata, uncompressedSize);

Review Comment:
   There are still some metadata updates after this method, like `msgMetadata.setUuid(...)` and `msgMetadata.setChunkId().....` and even in `ProducerImpl#encryptMessage`.
   
   Can we just skip `ProducerImpl#isMessageSizeExceeded` if client chunking is enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower merged pull request #16196: [fix][Java Client] Fix large message sometimes cannot be split into chunks after PIP-132

Posted by GitBox <gi...@apache.org>.
BewareMyPower merged PR #16196:
URL: https://github.com/apache/pulsar/pull/16196


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on a diff in pull request #16196: [fix][Java Client] Fix large message sometimes cannot be split into chunks after PIP-132

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on code in PR #16196:
URL: https://github.com/apache/pulsar/pull/16196#discussion_r906126527


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -488,6 +488,10 @@ public void sendAsync(Message<?> message, SendCallback callback) {
             return;
         }
 
+        // Update the message metadata before computing the payload chunk size to avoid a large message cannot be split
+        // into chunks.
+        final long sequenceId = updateMessageMetadata(msgMetadata, uncompressedSize);

Review Comment:
   IMO, it's also better to make it clear which metadata fields can be set before checking if the size exceeds the limit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Jason918 commented on a diff in pull request #16196: [fix][Java Client] Fix large message sometimes cannot be split into chunks after PIP-132

Posted by GitBox <gi...@apache.org>.
Jason918 commented on code in PR #16196:
URL: https://github.com/apache/pulsar/pull/16196#discussion_r906648231


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -2214,8 +2233,8 @@ private void recoverProcessOpSendMsgFrom(ClientCnx cnx, MessageImpl from, long e
     /**
      *  Check if final message size for non-batch and non-chunked messages is larger than max message size.
      */
-    public boolean isMessageSizeExceeded(OpSendMsg op) {
-        if (op.msg != null && op.totalChunks <= 1) {
+    private boolean isMessageSizeExceeded(OpSendMsg op) {
+        if (op.msg != null && op.totalChunks <= 1 && !conf.isEncryptionEnabled()) {

Review Comment:
   Disable size check if isChunkingEnabled?
   ```suggestion
           if (op.msg != null && !conf.isChunkingEnabled() && !conf.isEncryptionEnabled()) {
   ```



##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -488,6 +488,10 @@ public void sendAsync(Message<?> message, SendCallback callback) {
             return;
         }
 
+        // Update the message metadata before computing the payload chunk size to avoid a large message cannot be split
+        // into chunks.
+        final long sequenceId = updateMessageMetadata(msgMetadata, uncompressedSize);

Review Comment:
   > But I think the existing changes in this PR make the computation of the payload chunk size **more accurate**.
   
   +1



##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -488,6 +488,10 @@ public void sendAsync(Message<?> message, SendCallback callback) {
             return;
         }
 
+        // Update the message metadata before computing the payload chunk size to avoid a large message cannot be split
+        // into chunks.
+        final long sequenceId = updateMessageMetadata(msgMetadata, uncompressedSize);

Review Comment:
   > `ProducerImpl#encryptMessage` could update the metadata as well. I think it's better to skip the message size check when encryption is enabled, because message encryption could also increase the size of the payload buffer.
   
   This is a little out the scope of this PR. Maybe need more discussion. I am not sure if the space complexity of the increased size is O(1) compare to the limitation. If not, I think it's better to check the size at client side.



##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -488,6 +488,10 @@ public void sendAsync(Message<?> message, SendCallback callback) {
             return;
         }
 
+        // Update the message metadata before computing the payload chunk size to avoid a large message cannot be split
+        // into chunks.
+        final long sequenceId = updateMessageMetadata(msgMetadata, uncompressedSize);

Review Comment:
   > IMO, it's also better to make it clear which metadata fields can be set before checking if the size exceeds the limit.
   
   I think the bottom line is the fields which can be very large must be included here, like `property`. Other fields won't affect much as they take constant small spaces.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on a diff in pull request #16196: [fix][Java Client] Fix large message sometimes cannot be split into chunks after PIP-132

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on code in PR #16196:
URL: https://github.com/apache/pulsar/pull/16196#discussion_r906652829


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -2214,8 +2233,8 @@ private void recoverProcessOpSendMsgFrom(ClientCnx cnx, MessageImpl from, long e
     /**
      *  Check if final message size for non-batch and non-chunked messages is larger than max message size.
      */
-    public boolean isMessageSizeExceeded(OpSendMsg op) {
-        if (op.msg != null && op.totalChunks <= 1) {
+    private boolean isMessageSizeExceeded(OpSendMsg op) {
+        if (op.msg != null && op.totalChunks <= 1 && !conf.isEncryptionEnabled()) {

Review Comment:
   Yes. And I will skip the check for `isEncryptionEnabled` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org