You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/02/04 19:22:26 UTC

[GitHub] [pulsar] klevy-toast opened a new pull request #14125: Use `scheduleWithFixedDelay` instead of `scheduleAtFixedRate` for java producer batch timer

klevy-toast opened a new pull request #14125:
URL: https://github.com/apache/pulsar/pull/14125


   <!--
   ### Contribution Checklist
     
     - Name the pull request in the form "[Issue XYZ][component] Title of the pull request", where *XYZ* should be replaced by the actual issue number.
       Skip *Issue XYZ* if there is no associated github issue for this pull request.
       Skip *component* if you are unsure about which is the best component. E.g. `[docs] Fix typo in produce method`.
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Each pull request should address only one issue, not mix up code from multiple issues.
     
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   
   Fixes #11100 
   
   
   ### Motivation
   
   We believe that the use of `scheduleAtFixedRate` in the java producer's batch timer can result in unnecessarily high thread usage, which can become especially problematic for applications that start many producers. 
   
   ### Modifications
   
   Replaced the use of `scheduleAtFixedRate` with `scheduleWithFixedDelay`, which is the same behavior as previously in 2.6.x. The producer's parameter `batchingMaxPublishDelay` implies the use of the "delay" method instead of "rate" method as well.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   This change is already covered by existing tests, such as existing pulsar client producer tests.
   
   Testing of the performance regression can be demonstrated by using [this](https://github.com/klevy-toast/dropwizard-pulsar-test) artifact and comparing a recent release of pulsar client with a manually built SNAPSHOT version with this change:
   
   #### Version 2.7.1 CPU & thread behavior
   
   - While sending messages
   <img width="1632" alt="image" src="https://user-images.githubusercontent.com/42187013/152588959-8ee4beb9-70f3-4ad8-9132-240d4498dda5.png">
   - While running idle producers
   <img width="1613" alt="image" src="https://user-images.githubusercontent.com/42187013/152589079-b45fce49-757a-4bfd-8ddd-c438774ecf41.png">
   - 30 second profile while sending messages
   <img width="1295" alt="image" src="https://user-images.githubusercontent.com/42187013/152589222-54732bf3-44d7-40b8-8c6b-03b54ba01090.png">
   
   #### Version 2.10.0-SNAPSHOT CPU & thread behavior
   - While sending messages
   <img width="1615" alt="image" src="https://user-images.githubusercontent.com/42187013/152589391-ae243e7a-5f1f-40b7-a77c-7e3d12a84c8e.png">
   - While running idle producers
   <img width="1603" alt="image" src="https://user-images.githubusercontent.com/42187013/152589436-784d9c56-043e-41fa-95e8-6a721e0adc78.png">
   - 30 second profile while sending messages
   <img width="1289" alt="image" src="https://user-images.githubusercontent.com/42187013/152589619-f274545d-b9f9-48e8-8b02-e226c6dec59e.png">
   
   These samples show fewer threads running with this change compared to 2.7.1, less time spend in `batchMessageAndSend`, and overall lower CPU usage -- note that this testing was done on a desktop machine, and we have observed, along with [other users](https://github.com/apache/pulsar/issues/11100#issuecomment-1007487433) that this CPU regression can be much worse with more producers, smaller batch intervals, and on deployed cloud applications.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (no)
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
   Check the box below or label this PR directly (if you have committer privilege).
   
   Need to update docs? 
   
   - [x] `no-need-doc` 
     
   The general behavior of the batch timer feature should not be changing
     
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] klevy-toast commented on pull request #14125: Use `scheduleWithFixedDelay` instead of `scheduleAtFixedRate` for java producer batch timer

Posted by GitBox <gi...@apache.org>.
klevy-toast commented on pull request #14125:
URL: https://github.com/apache/pulsar/pull/14125#issuecomment-1031668363


   > > I think we could even go further and schedule the task only when the producer is active.
   > 
   > @merlimat - this is a great point. Essentially, we only need a scheduled to a "flush" if there are messages in the `batchMessageContainer`. We could even delay the flush each time that we send messages, since a client that is pushing many messages through the producer will likely hit the batch limit before the flush. Without pushing out the flush, we'll also increase the probability of under-filled batches.
   > 
   > @klevy-toast - are you interested in working on this fix? If not, I think we should merge this and then someone can work on a follow up fix?
   
   I probably will not be able to work on that enhancement, but I agree that it is a great idea! This issue originally surfaced for us when we had an application running many inactive producers, but we ended up working around it by just disabling batch messaging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall merged pull request #14125: Use `scheduleWithFixedDelay` instead of `scheduleAtFixedRate` for java producer batch timer

Posted by GitBox <gi...@apache.org>.
michaeljmarshall merged pull request #14125:
URL: https://github.com/apache/pulsar/pull/14125


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on pull request #14125: Use `scheduleWithFixedDelay` instead of `scheduleAtFixedRate` for java producer batch timer

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on pull request #14125:
URL: https://github.com/apache/pulsar/pull/14125#issuecomment-1030426931


   After thinking about this a bit more, I think it makes sense to merge this as is, cherry-pick it to older release branches, and then any further optimizations will just target master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] klevy-toast commented on pull request #14125: Use `scheduleWithFixedDelay` instead of `scheduleAtFixedRate` for java producer batch timer

Posted by GitBox <gi...@apache.org>.
klevy-toast commented on pull request #14125:
URL: https://github.com/apache/pulsar/pull/14125#issuecomment-1030337399


   /pulsarbot rerun-failure-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] klevy-toast removed a comment on pull request #14125: Use `scheduleWithFixedDelay` instead of `scheduleAtFixedRate` for java producer batch timer

Posted by GitBox <gi...@apache.org>.
klevy-toast removed a comment on pull request #14125:
URL: https://github.com/apache/pulsar/pull/14125#issuecomment-1030337399


   /pulsarbot rerun-failure-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on pull request #14125: Use `scheduleWithFixedDelay` instead of `scheduleAtFixedRate` for java producer batch timer

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on pull request #14125:
URL: https://github.com/apache/pulsar/pull/14125#issuecomment-1030420161


   > I think we could even go further and schedule the task only when the producer is active.
   
   @merlimat - this is a great point. Essentially, we only need a scheduled to a "flush" if there are messages in the `batchMessageContainer`. We could even delay the flush each time that we send messages, since a client that is pushing many messages through the producer will likely hit the batch limit before the flush. Without pushing out the flush, we'll also increase the probability of under-filled batches.
   
   @klevy-toast - are you interested in working on this fix? If not, I think we should merge this and then someone can work on a follow up fix?
   
   I'd argue we could cherry pick this commit to older release branches since it is very well understood.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org