You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/06/25 17:58:24 UTC

[GitHub] [pulsar] klevy-toast opened a new issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

klevy-toast opened a new issue #11100:
URL: https://github.com/apache/pulsar/issues/11100


   **Describe the bug**
   When upgrading our pulsar client to 2.7.x, we noticed significantly higher CPU used for applications running multiple produces with the newer version. We have been unable to isolate the exact cause of the regression.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Run the example dropwizard application at https://github.com/klevy-toast/dropwizard-pulsar-test/
   2. Run `./while.sh` to trigger continuous message production
   3. Observe average CPU over time
   4. Change pulsar version in `pom.xml`, repeat 1-3
   
   **Expected behavior**
   CPU usage should be fairly similar between versions. But something in 2.7.x is clearly using more resources.
   
   **Screenshots**
   I tested this by running the application on an AWS t3a.large EC2 instance -- there was a ~15% increase in CPU.
   ![image](https://user-images.githubusercontent.com/42187013/123466383-464e3d80-d5bd-11eb-911f-4f33194fd5bf.png)
   
   **Desktop (please complete the following information):**
    - OS: linux
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ross710 commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
ross710 commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007575674


   @Shoothzj yes, so this task isn't run when batching is disabled. Tests were run with batching enabled and the default `batchingMaxPublishDelayMicros` of 1000 (equivalent to 1ms).
   
   We can certainly increase the batch time, which will result in the task running less often. However, the fix we are proposing would make the the task run less often even at 1ms and should match the behavior prior to the 2.7 release. The theory is that before the task will take the 1ms break before running again, but now it's effectively an infinite loop without breaks.
   
   @eolivelli we will try that out and get back to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] complone commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
complone commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-876204802


   > Hey @complone. I was using client version 2.5.2 before upgrading to 2.7.1. But I also did some tests with 2.6.x, and observed that the CPU usage was _not_ elevated in that version. It only regressed in 2.7.x
   
   @klevy-toast I want to try to reproduce this problem, but it is currently tested on a local machine and cannot be restored to the cpu and thread configuration in the aws instance you provided. If the configuration of my current local machine is lower than the configuration you provided and it does not work properly, there is a problem. Do you think it is appropriate for me to verify this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Shoothzj commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
Shoothzj commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007483765


   @klevy-toast Did you start producer with batch-enabled? and the batch time config kind of small, like `1ms`, `10ms`. My team have found this param is an important factor affecting the CPU. We also have a excel of `cpu`,`producer number`, `batch time`. If you are interested in, I can share it tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Shoothzj commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
Shoothzj commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007487433


   @klevy-toast That's the result tested with @fu-turer on huawei cloud.
   case1: 1000 producer, batch 1ms, cpu 68%
   case2: 1000 producer, batch 20ms, cpu 23%
   case3: 1000 producer, batch 50ms, cpu 11%
   case4: 1000 producer, non-batch, cpu near to zero
   case5: 500 producer, batch 1ms, cpu 50%
   That's the test result without any produce.
   
   cc @eolivelli 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] complone edited a comment on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
complone edited a comment on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-869089596


   @klevy-toast  It seems to be a very strange question. I want to know what version you upgraded to before 2.7.
   I will follow up on this issue.
   
   
   I will start the plusar server locally according to the following documents, and conduct integration tests with this sample
   https://github.com/apache/pulsar/blob/master/site2/docs/io-quickstart.md


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007477623


   Can you test your suggestion on current master branch?
   If you get good results please send a PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1033416133


   I think that https://github.com/apache/pulsar/pull/14185 might prove to be an even more optimal solution, especially when running with many producers that are not in user. I haven't completed any bench marks yet, though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Shoothzj edited a comment on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
Shoothzj edited a comment on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007483765


   @klevy-toast Did you start producer with batch-enabled? and the batch time config kind of small, like `1ms`, `10ms`. My team have found this param is an important factor affecting the CPU. We also have a excel of `cpu`,`producer number`, `batch time`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] klevy-toast commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
klevy-toast commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-869176045


   Hey @complone. I was using client version 2.5.2 before upgrading to 2.7.1. But I also did some tests with 2.6.x, and observed that the CPU usage was _not_ elevated in that version. It only regressed in 2.7.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] complone edited a comment on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
complone edited a comment on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-869089596


   @klevy-toast  It seems to be a very strange question. I want to know what version you upgraded to before 2.7.
   I will follow up on this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ross710 commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
ross710 commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007575674


   @Shoothzj yes, so this task isn't run when batching is disabled. Tests were run with batching enabled and the default `batchingMaxPublishDelayMicros` of 1000 (equivalent to 1ms).
   
   We can certainly increase the batch time, which will result in the task running less often. However, the fix we are proposing would make the the task run less often even at 1ms and should match the behavior prior to the 2.7 release. The theory is that before the task will take the 1ms break before running again, but now it's effectively an infinite loop without breaks.
   
   @eolivelli we will try that out and get back to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007477623


   Can you test your suggestion on current master branch?
   If you get good results please send a PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Shoothzj edited a comment on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
Shoothzj edited a comment on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007483765


   @klevy-toast Did you start producer with batch-enabled? and the batch time config kind of small, like `1ms`, `10ms`. My team have found this param is an important factor affecting the CPU. We also have a excel of `cpu`,`producer number`, `batch time`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Shoothzj commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
Shoothzj commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1007483765






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ross710 commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
ross710 commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1006218658


   We were able to reproduce this issue with one of our applications.
   
   With ~40 producers spun up but idle (not producing any messages) we tested with pulsar client 2.6.3 and 2.7.2 and analyzed the CPU usage using datadog's profiler.
   
   2.63:
   <img width="1505" alt="2 6 3" src="https://user-images.githubusercontent.com/3477670/148313853-53695457-e8e0-4f1e-970c-484b9147e901.png">
   
   2.7.2:
   <img width="752" alt="2 7 2" src="https://user-images.githubusercontent.com/3477670/148313889-dc62bcc8-07e3-4a17-ac87-0e4db404b7ac.png">
    
   ---
   2.7.2 used significantly more CPU, mostly in the ProducerImpl's batching task.
   
   We did notice this change between 2.6.x and 2.7.x: https://github.com/apache/pulsar/pull/7733/files#diff-d6fcf8aa2d0035cc386dca0942a452343d6854763c7fd397efa4e660c0069767R1233
   
   Looking at the code there seems to be a slight behavior regression in how this task is scheduled. Previously, the behavior mimicked [scheduleWithFixedDelay](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleWithFixedDelay(java.lang.Runnable,%20long,%20long,%20java.util.concurrent.TimeUnit)) however [scheduleAtFixedRate](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleAtFixedRate(java.lang.Runnable,%20long,%20long,%20java.util.concurrent.TimeUnit)) was chosen. So we believe that the batching task for producers runs much more frequently now, causing higher CPU usage.
   
   We think replacing the usage of `scheduleAtFixedRate` with `scheduleWithFixedDelay` will likely fix this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] complone edited a comment on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
complone edited a comment on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-876204802


   > Hey @complone. I was using client version 2.5.2 before upgrading to 2.7.1. But I also did some tests with 2.6.x, and observed that the CPU usage was _not_ elevated in that version. It only regressed in 2.7.x
   
   @klevy-toast I want to try to reproduce this problem, but it is currently tested on a local machine and cannot be restored to the cpu and thread configuration in the aws instance you provided. If the configuration of my current local machine is lower than the configuration you provided and it does not work properly, there is a problem. Do you think it is appropriate for me to verify this ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] klevy-toast commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
klevy-toast commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-1030282564


   @eolivelli finally got around to verifying the behavior of the suggested fix here and submitted a PR #14125 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] klevy-toast commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
klevy-toast commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-883477743


   @complone If you can measure your CPU usage on your local machine effectively, I believe you will still see the issue. I used AWS because of the convenient monitoring tools.
   
   Regarding #2, I was not monitoring those pulsar metrics. I was simply monitoring the CPU usage of the whole server while it was running my program.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] complone commented on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
complone commented on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-869089596


   @klevy-toast  It seems to be a very strange question. I want to know what version you upgraded to before 2.7.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall closed issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
michaeljmarshall closed issue #11100:
URL: https://github.com/apache/pulsar/issues/11100


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] complone edited a comment on issue #11100: Java client: high CPU regression for producer between 2.6.x and 2.7.x

Posted by GitBox <gi...@apache.org>.
complone edited a comment on issue #11100:
URL: https://github.com/apache/pulsar/issues/11100#issuecomment-876204802


   > Hey @complone. I was using client version 2.5.2 before upgrading to 2.7.1. But I also did some tests with 2.6.x, and observed that the CPU usage was _not_ elevated in that version. It only regressed in 2.7.x
   
   @klevy-toast 
   
   1.I want to try to reproduce this problem, but it is currently tested on a local machine and cannot be restored to the cpu and thread configuration in the aws instance you provided. If the configuration of my current local machine is lower than the configuration you provided and it does not work properly, there is a problem. Do you think it is appropriate for me to verify this ?
   
   2.In addition, are the monitoring indicators provided by you configured according to the following documents?
   https://pulsar.apache.org/docs/en/reference-metrics/#topic-metrics
   
   ![WechatIMG285](https://user-images.githubusercontent.com/20021404/124882706-3e51bd00-e003-11eb-8140-5102d4a5996d.jpeg)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org