You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/23 11:16:29 UTC

[GitHub] [pulsar] AnonHxy opened a new pull request, #17822: [improve][java-client]Add init capactiy for messages in BatchMessageContainerImpl

AnonHxy opened a new pull request, #17822:
URL: https://github.com/apache/pulsar/pull/17822

   ### Motivation
   
   When adding messages to `BatchMessageContainerImpl`, it takes a lot of cpu time for the list growing of `BatchMessageContainerImpl#messages`, see below:
   <img width="1031" alt="image" src="https://user-images.githubusercontent.com/10233437/191948739-f2c5b427-eb8a-4391-a462-184da23e993d.png">
   
   
   ### Modifications
   
   * Init `BatchMessageContainerImpl#messages` with capacity.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
   
   - [x] `doc-not-needed` 
   (Please explain why)
   
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-complete`
   (Docs have been already added)
   
   ### Matching PR in forked repository
   
   PR in forked repository: <!-- ENTER URL HERE 
   
   After opening this PR, the build in apache/pulsar will fail and instructions will
   be provided for opening a PR in the PR author's forked repository.
   
   apache/pulsar pull requests should be first tested in your own fork since the 
   apache/pulsar CI based on GitHub Actions has constrained resources and quota.
   GitHub Actions provides separate quota for pull requests that are executed in 
   a forked repository.
   
   The tests will be run in the forked repository until all PR review comments have
   been handled, the tests pass and the PR is approved by a reviewer.
   
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy commented on pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
AnonHxy commented on PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#issuecomment-1258938984

   /pulsarbot ready-to-test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy closed pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
AnonHxy closed pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl
URL: https://github.com/apache/pulsar/pull/17822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy commented on pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
AnonHxy commented on PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#issuecomment-1259007491

   > Profiling results can be flawed on MacOS.
   
   Thanks for your reminder @lhotari , There is anothoer profiling result on Linux,  the simiar conclusions can be drawn.
   <img width="1911" alt="image" src="https://user-images.githubusercontent.com/10233437/192442133-45542ac5-1b7b-46d1-8022-02d8818e7d43.png">
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy merged pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
AnonHxy merged PR #17822:
URL: https://github.com/apache/pulsar/pull/17822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] lhotari commented on pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
lhotari commented on PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#issuecomment-1259052036

   > > Profiling results can be flawed on MacOS.
   > 
   > Thanks for your reminder @lhotari , There is anothoer profiling result on Linux, the simiar conclusions can be drawn. <img alt="image" width="1911" src="https://user-images.githubusercontent.com/10233437/192442133-45542ac5-1b7b-46d1-8022-02d8818e7d43.png">
   
   Can you share the whole flamegraph? A lot of information is lost if you just provide a snippet of the whole graph.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] lhotari commented on pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
lhotari commented on PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#issuecomment-1258988782

   the gettimeofday stack looks like MacOS, https://opensource.apple.com/source/xnu/xnu-3789.41.3/libsyscall/wrappers/__commpage_gettimeofday.c.auto.html
   Profiling results can be flawed on MacOS.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy commented on pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
AnonHxy commented on PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#issuecomment-1259064550

   > Can you share the whole flamegraph?
   
   OK, here is the flamegraph file.  @lhotari 
   [buffer.html.zip](https://github.com/apache/pulsar/files/9652465/buffer.html.zip)
   
   test case is below
   ```
   public static void main(String[] args) throws Exception {
           String topic = "hxy-test-mem";
           byte[] smallMsg = new byte[64];
           PulsarClient client = PulsarClient.builder()
                   .serviceUrl("pulsar://127.0.0.1:6650")
                   .memoryLimit(2000, SizeUnit.MEGA_BYTES)  // memory limit 2000M
                   .build();
           Producer<byte[]> producer = client.newProducer()
                   .topic(topic)
                   .enableBatching(true)
                   .create();
           for (int i = 0; i < Integer.MAX_VALUE; ++i) {
               producer.sendAsync(smallMsg);
           }
       }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy commented on pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
AnonHxy commented on PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#issuecomment-1262528678

   /pulsarbot run-failure-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Jason918 commented on a diff in pull request #17822: [improve][java-client]Add init capactiy for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
Jason918 commented on code in PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#discussion_r979262901


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageContainerImpl.java:
##########
@@ -168,12 +168,13 @@ private ByteBuf getCompressedBatchMetadataAndPayload() {
         // Update the current max batch size using the uncompressed size, which is what we need in any case to
         // accumulate the batch content
         maxBatchSize = Math.max(maxBatchSize, uncompressedSize);
+        maxMessagesNum = Math.max(maxMessagesNum, numMessagesInBatch);
         return compressedPayload;
     }
 
     @Override
     public void clear() {
-        messages = new ArrayList<>();
+        messages = new ArrayList<>(maxMessagesNum);

Review Comment:
   Can we use `message.clear()` instead of creating new ArrayList here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy commented on a diff in pull request #17822: [improve][java-client]Add init capactiy for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
AnonHxy commented on code in PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#discussion_r979352513


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageContainerImpl.java:
##########
@@ -168,12 +168,13 @@ private ByteBuf getCompressedBatchMetadataAndPayload() {
         // Update the current max batch size using the uncompressed size, which is what we need in any case to
         // accumulate the batch content
         maxBatchSize = Math.max(maxBatchSize, uncompressedSize);
+        maxMessagesNum = Math.max(maxMessagesNum, numMessagesInBatch);
         return compressedPayload;
     }
 
     @Override
     public void clear() {
-        messages = new ArrayList<>();
+        messages = new ArrayList<>(maxMessagesNum);

Review Comment:
   If using `message.clear()`, there will be some cpu overhead I think.  We must check if `messages.size == maxMessageNum`, and the clear() method will loop all the elements in the arrayList:
   ```
    public void clear() {
           modCount++;
           final Object[] es = elementData;
           for (int to = size, i = size = 0; i < to; i++)
               es[i] = null;
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] lhotari commented on pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
lhotari commented on PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#issuecomment-1258986130

   @AnonHxy just curious which VM/platform you are running on. 
   
   I'm wondering about the large portion of "gettimeofday" as part of the stack
   ![image](https://user-images.githubusercontent.com/66864/192435659-02413bc1-cdfc-4d03-983a-015f991be6ea.png)
   
   Regarding "gettimeofday", there's a Brendan Gregg blog post "[The Speed Of Time](https://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html)". In many cases, the fastest clocksource on Linux is tsc. GCP's GKE uses tsc by default.
   
   Since Arrays.copyOf / System.arraycopy (or it's native implementations) don't seem to call gettimeofday at all, it might be GC related activity which is happening. 
   
   The default initial size for ArrayList is 10. In this PR it's changed to 64. That could make sense. However, I would assume that the average batch size is very rarely over 20, so this change won't make a difference in most cases. Since average batch size is rarely over 20, a better default could be a lower value, such as 32.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy commented on pull request #17822: [improve][java-client]Add init capacity for messages in BatchMessageContainerImpl

Posted by GitBox <gi...@apache.org>.
AnonHxy commented on PR #17822:
URL: https://github.com/apache/pulsar/pull/17822#issuecomment-1259011607

   > Since average batch size is rarely over 20, a better default could be a lower value, such as 32.
   
   It makes sense to me. Updated default capacity from 64 to 32


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org