You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "grssam (via GitHub)" <gi...@apache.org> on 2023/10/19 12:08:29 UTC

[PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

grssam opened a new pull request, #21399:
URL: https://github.com/apache/pulsar/pull/21399

   
   ### Motivation
   
   Support custom publish rate limiters to support produce use cases which do not fit well within either the poller or the precise rate limier.
   
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [x] `doc-required` <!-- Your PR changes impact docs and you will update later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1368401194


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design
+
+* Add a new `initialize(PublishRate, ServiceConfigurations)` method in the interface `PublishRateLimiter.java` replacing direct constructor call.
+* Add 2 new optional properties in broker config: 
+  * `topicPublishRateLimiterProvider: <classpath>`
+  * `resourceGroupPublishRateLimiterProvider: <classpath>`
+* Based on the config values, and historic property, `AbstractTopic.java` and `ResourceGroup.java` initialize the relevant class.
+  * Fallback to `PublishRateLimiterImpl.java` in case rate limiter is enabled but class doesn't load.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+* In `AbstractTopic.java`, instead of if-else based constructor call to either `PublishRateLimiterImpl` or `
+  PrecisePublishLimiter`, we read the `topicPublishRateLimiterProvider` config and load that class instead.
+  * Similarly, in `ResourceGroup.java`, we load the class defined in `resourceGroupPublishRateLimiterProvider` config.
+* Replace constructor call for `ResourceGroupPublishLimiter`, `PrecisePublishLimiter` and `PublishRateLimiterImpl` to `.initialize` method call.
+* Add fallback & default logic to load `PublishRateLimiterImpl` and `ResourceGroupPublishLimiter` in case of ClassNotFoundException or exception in `initialize` call in `AbstractTopic.java` and `
+  ResourceGroup.java`
+* For cases where rate limiting is disabled, we continue to use `PublishRateLimiterDisable` like we do currently.
+* For backward compatibility (for 2 minor versions), in `AbstractTopic.java` we add following logic:
+  * In case users have the flag `preciseTopicPublishRateLimiterEnable` as true *and* the `
+    topicPublishRateLimiterProvider` is empty, we still load `PublishRateLimiterImpl`.
+  * In case `topicPublishRateLimiterProvider` is a valid class, we ignore the value of `
+    preciseTopicPublishRateLimiterEnable`
+  * We mark the config `preciseTopicPublishRateLimiterEnable` deprecated in the config file and ask users to use the provider config with the classpath.
+
+### Configuration
+
+* `topicPublishRateLimiterProvider: <classpath>`
+* `resourceGroupPublishRateLimiterProvider: <classpath>`
+
+# Backward & Forward Compatibility
+
+## Upgrade
+
+In detailed design, backward compatibility logic is explained. Upgrades would work without need of config change until
+we finally remove the logic to understand `preciseTopicPublishRateLimiterEnable`.
+
+As a best practice:
+
+* Users requiring precise rate limiter remove the config `preciseTopicPublishRateLimiterEnable` and
+  add `topicPublishRateLimiterProvider: org.apache.pulsar.broker.service.PrecisePublishLimiter`
+* Users requiring custom rate limiter ensure that their class is present in pulsar classpath and they set
+  the `topicPublishRateLimiterProvider` or `
+  resourceGroupPublishRateLimiterProvider` accordingly
+
+## Revert
+
+In case user was not using the precise rate limiter, simply deleting the two new configs should revert back to previous version.
+
+In case user was using precise rate limiter, i.e., they had `preciseTopicPublishRateLimiterEnable: true` and:
+* They removed that and switched to `topicPublishRateLimiterProvider: org.apache.pulsar.broker.service.PrecisePublishLimiter`
+  * To revert, they simply need to remove the config `topicPublishRateLimiterProvider` and put in `preciseTopicPublishRateLimiterEnable: true`
+* They did not change any of their configs, i.e. they did not specifically add and value in `topicPublishRateLimiterProvider`, then no change needed.
+
+# Links

Review Comment:
   Take OpenTelemetry java SDK for example - they clearly mark in the package which interface/class is public. I believe in Pulsar we don't have that, but the basic idea remains: You need to document in the PIP what changes you are making to code you intend the public to use, whether introducing a new interface or changing an existing one.  
   
   PublishRateLimiter is public API since Pulsar users will create instances of that interfac and you can't break in the future? Hence it's public API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "grssam (via GitHub)" <gi...@apache.org>.
grssam commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1368533675


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`

Review Comment:
   As I said, its mentioned in the motivation section. Line 38. Quoting here:
   > Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
   >  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
   >  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge

Review Comment:
   Noted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "grssam (via GitHub)" <gi...@apache.org>.
grssam commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1368538204


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design
+
+* Add a new `initialize(PublishRate, ServiceConfigurations)` method in the interface `PublishRateLimiter.java` replacing direct constructor call.
+* Add 2 new optional properties in broker config: 
+  * `topicPublishRateLimiterProvider: <classpath>`
+  * `resourceGroupPublishRateLimiterProvider: <classpath>`
+* Based on the config values, and historic property, `AbstractTopic.java` and `ResourceGroup.java` initialize the relevant class.
+  * Fallback to `PublishRateLimiterImpl.java` in case rate limiter is enabled but class doesn't load.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+* In `AbstractTopic.java`, instead of if-else based constructor call to either `PublishRateLimiterImpl` or `
+  PrecisePublishLimiter`, we read the `topicPublishRateLimiterProvider` config and load that class instead.
+  * Similarly, in `ResourceGroup.java`, we load the class defined in `resourceGroupPublishRateLimiterProvider` config.
+* Replace constructor call for `ResourceGroupPublishLimiter`, `PrecisePublishLimiter` and `PublishRateLimiterImpl` to `.initialize` method call.
+* Add fallback & default logic to load `PublishRateLimiterImpl` and `ResourceGroupPublishLimiter` in case of ClassNotFoundException or exception in `initialize` call in `AbstractTopic.java` and `
+  ResourceGroup.java`
+* For cases where rate limiting is disabled, we continue to use `PublishRateLimiterDisable` like we do currently.
+* For backward compatibility (for 2 minor versions), in `AbstractTopic.java` we add following logic:
+  * In case users have the flag `preciseTopicPublishRateLimiterEnable` as true *and* the `
+    topicPublishRateLimiterProvider` is empty, we still load `PublishRateLimiterImpl`.
+  * In case `topicPublishRateLimiterProvider` is a valid class, we ignore the value of `
+    preciseTopicPublishRateLimiterEnable`
+  * We mark the config `preciseTopicPublishRateLimiterEnable` deprecated in the config file and ask users to use the provider config with the classpath.
+
+### Configuration
+
+* `topicPublishRateLimiterProvider: <classpath>`
+* `resourceGroupPublishRateLimiterProvider: <classpath>`
+
+# Backward & Forward Compatibility
+
+## Upgrade
+
+In detailed design, backward compatibility logic is explained. Upgrades would work without need of config change until
+we finally remove the logic to understand `preciseTopicPublishRateLimiterEnable`.
+
+As a best practice:
+
+* Users requiring precise rate limiter remove the config `preciseTopicPublishRateLimiterEnable` and
+  add `topicPublishRateLimiterProvider: org.apache.pulsar.broker.service.PrecisePublishLimiter`
+* Users requiring custom rate limiter ensure that their class is present in pulsar classpath and they set
+  the `topicPublishRateLimiterProvider` or `
+  resourceGroupPublishRateLimiterProvider` accordingly
+
+## Revert
+
+In case user was not using the precise rate limiter, simply deleting the two new configs should revert back to previous version.
+
+In case user was using precise rate limiter, i.e., they had `preciseTopicPublishRateLimiterEnable: true` and:
+* They removed that and switched to `topicPublishRateLimiterProvider: org.apache.pulsar.broker.service.PrecisePublishLimiter`
+  * To revert, they simply need to remove the config `topicPublishRateLimiterProvider` and put in `preciseTopicPublishRateLimiterEnable: true`
+* They did not change any of their configs, i.e. they did not specifically add and value in `topicPublishRateLimiterProvider`, then no change needed.
+
+# Links

Review Comment:
   Noted. thanks for the explanation. Will add in here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#issuecomment-1792149368

   I replied on the dev mailing list: https://lists.apache.org/thread/rs95ww4mfx1x08nk2zsr5d31p4wkpy81


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "grssam (via GitHub)" <gi...@apache.org>.
grssam closed pull request #21399: [improve][pip] PIP-310: Support custom publish rate limiters
URL: https://github.com/apache/pulsar/pull/21399


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1368370523


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge

Review Comment:
   I think the size is not a consideration and I don't think an explanation would exceed a normal length. It's the cause that matters: One can not understand a suggestion without proper background most of us don't have, right? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1368383833


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`

Review Comment:
   You wrote:
   
   > The precise one checks for breach before handing out permits each time.
   
   What's the problem with that? calling decrease on an AtomicLong is a performance issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1367925109


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge

Review Comment:
   * Seems reasonable to include an explanation what are Resource Groups, where are they configured and what does it mean to apply a rate limit for them (is it local to broker or a distributed one). You won't find this info anywhere, and I bet you won't find many core Pulsar developers familiar with this
   * You haven't mentioned if rate limited are server or client based. I would start with that scope in the beginning.
   
   



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.

Review Comment:
   `RateLimiter.java`



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--

Review Comment:
   Feel free to delete those comments



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design
+
+* Add a new `initialize(PublishRate, ServiceConfigurations)` method in the interface `PublishRateLimiter.java` replacing direct constructor call.
+* Add 2 new optional properties in broker config: 
+  * `topicPublishRateLimiterProvider: <classpath>`

Review Comment:
   Why not unite this into a single provider like: RateLimiterProvider which will be the factory for different type of rate limiters: publish, consume, etc?
   



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`

Review Comment:
   What's the caveat of precise?



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.

Review Comment:
   Can you expand about the poller? Each producer holds a set of permits? I don't understand how it works.



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design

Review Comment:
   I would start by describing what a Rate Limiter supposed to implement and in the Public API section I must provide the interface



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design
+
+* Add a new `initialize(PublishRate, ServiceConfigurations)` method in the interface `PublishRateLimiter.java` replacing direct constructor call.
+* Add 2 new optional properties in broker config: 
+  * `topicPublishRateLimiterProvider: <classpath>`
+  * `resourceGroupPublishRateLimiterProvider: <classpath>`
+* Based on the config values, and historic property, `AbstractTopic.java` and `ResourceGroup.java` initialize the relevant class.
+  * Fallback to `PublishRateLimiterImpl.java` in case rate limiter is enabled but class doesn't load.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+* In `AbstractTopic.java`, instead of if-else based constructor call to either `PublishRateLimiterImpl` or `
+  PrecisePublishLimiter`, we read the `topicPublishRateLimiterProvider` config and load that class instead.
+  * Similarly, in `ResourceGroup.java`, we load the class defined in `resourceGroupPublishRateLimiterProvider` config.
+* Replace constructor call for `ResourceGroupPublishLimiter`, `PrecisePublishLimiter` and `PublishRateLimiterImpl` to `.initialize` method call.
+* Add fallback & default logic to load `PublishRateLimiterImpl` and `ResourceGroupPublishLimiter` in case of ClassNotFoundException or exception in `initialize` call in `AbstractTopic.java` and `
+  ResourceGroup.java`
+* For cases where rate limiting is disabled, we continue to use `PublishRateLimiterDisable` like we do currently.
+* For backward compatibility (for 2 minor versions), in `AbstractTopic.java` we add following logic:
+  * In case users have the flag `preciseTopicPublishRateLimiterEnable` as true *and* the `
+    topicPublishRateLimiterProvider` is empty, we still load `PublishRateLimiterImpl`.
+  * In case `topicPublishRateLimiterProvider` is a valid class, we ignore the value of `
+    preciseTopicPublishRateLimiterEnable`
+  * We mark the config `preciseTopicPublishRateLimiterEnable` deprecated in the config file and ask users to use the provider config with the classpath.
+
+### Configuration
+
+* `topicPublishRateLimiterProvider: <classpath>`
+* `resourceGroupPublishRateLimiterProvider: <classpath>`
+
+# Backward & Forward Compatibility
+
+## Upgrade
+
+In detailed design, backward compatibility logic is explained. Upgrades would work without need of config change until
+we finally remove the logic to understand `preciseTopicPublishRateLimiterEnable`.
+
+As a best practice:
+
+* Users requiring precise rate limiter remove the config `preciseTopicPublishRateLimiterEnable` and
+  add `topicPublishRateLimiterProvider: org.apache.pulsar.broker.service.PrecisePublishLimiter`
+* Users requiring custom rate limiter ensure that their class is present in pulsar classpath and they set
+  the `topicPublishRateLimiterProvider` or `
+  resourceGroupPublishRateLimiterProvider` accordingly
+
+## Revert
+
+In case user was not using the precise rate limiter, simply deleting the two new configs should revert back to previous version.
+
+In case user was using precise rate limiter, i.e., they had `preciseTopicPublishRateLimiterEnable: true` and:
+* They removed that and switched to `topicPublishRateLimiterProvider: org.apache.pulsar.broker.service.PrecisePublishLimiter`
+  * To revert, they simply need to remove the config `topicPublishRateLimiterProvider` and put in `preciseTopicPublishRateLimiterEnable: true`
+* They did not change any of their configs, i.e. they did not specifically add and value in `topicPublishRateLimiterProvider`, then no change needed.
+
+# Links

Review Comment:
   You're missing the Public API section



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "grssam (via GitHub)" <gi...@apache.org>.
grssam commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1368537566


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design

Review Comment:
   I think I get the missing link here. I will explain the already existing interface `PublishRateLimiter.java` in the background knowledge section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "grssam (via GitHub)" <gi...@apache.org>.
grssam commented on PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#issuecomment-1857889100

   thank you everyone who participated in the review and the discussion thread. We have a way ahead which is different from this proposal so closing this PR as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#issuecomment-1779534495

   Ok @grssam  - ping me when you updated it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1368395322


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design

Review Comment:
   Of course, but can you be more specific - which interface already exists for rate limiting? Just understand that if I'm asking those questions it means I couldn't answer them based on reading the doc. If a prequisite is to read certain parts of the code prior reading this PIP, it should be mentioned, but I think we can avoid that easily.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1368390115


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.

Review Comment:
   Pulsar has a serious knowledge gap in docs, so people reading your PIP have no other place they can can catch up. Sending all your readers to read the code is unacceptable.  I'm not saying add 50 lines, just adding 1 paragraph composed of 3-4 sentences is something easily achievable, and not bloating the design doc at all. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][pip] PIP-310: Support custom publish rate limiters [pulsar]

Posted by "grssam (via GitHub)" <gi...@apache.org>.
grssam commented on code in PR #21399:
URL: https://github.com/apache/pulsar/pull/21399#discussion_r1367927955


##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--

Review Comment:
   will do. thanks.



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design
+
+* Add a new `initialize(PublishRate, ServiceConfigurations)` method in the interface `PublishRateLimiter.java` replacing direct constructor call.
+* Add 2 new optional properties in broker config: 
+  * `topicPublishRateLimiterProvider: <classpath>`
+  * `resourceGroupPublishRateLimiterProvider: <classpath>`
+* Based on the config values, and historic property, `AbstractTopic.java` and `ResourceGroup.java` initialize the relevant class.
+  * Fallback to `PublishRateLimiterImpl.java` in case rate limiter is enabled but class doesn't load.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+* In `AbstractTopic.java`, instead of if-else based constructor call to either `PublishRateLimiterImpl` or `
+  PrecisePublishLimiter`, we read the `topicPublishRateLimiterProvider` config and load that class instead.
+  * Similarly, in `ResourceGroup.java`, we load the class defined in `resourceGroupPublishRateLimiterProvider` config.
+* Replace constructor call for `ResourceGroupPublishLimiter`, `PrecisePublishLimiter` and `PublishRateLimiterImpl` to `.initialize` method call.
+* Add fallback & default logic to load `PublishRateLimiterImpl` and `ResourceGroupPublishLimiter` in case of ClassNotFoundException or exception in `initialize` call in `AbstractTopic.java` and `
+  ResourceGroup.java`
+* For cases where rate limiting is disabled, we continue to use `PublishRateLimiterDisable` like we do currently.
+* For backward compatibility (for 2 minor versions), in `AbstractTopic.java` we add following logic:
+  * In case users have the flag `preciseTopicPublishRateLimiterEnable` as true *and* the `
+    topicPublishRateLimiterProvider` is empty, we still load `PublishRateLimiterImpl`.
+  * In case `topicPublishRateLimiterProvider` is a valid class, we ignore the value of `
+    preciseTopicPublishRateLimiterEnable`
+  * We mark the config `preciseTopicPublishRateLimiterEnable` deprecated in the config file and ask users to use the provider config with the classpath.
+
+### Configuration
+
+* `topicPublishRateLimiterProvider: <classpath>`
+* `resourceGroupPublishRateLimiterProvider: <classpath>`
+
+# Backward & Forward Compatibility
+
+## Upgrade
+
+In detailed design, backward compatibility logic is explained. Upgrades would work without need of config change until
+we finally remove the logic to understand `preciseTopicPublishRateLimiterEnable`.
+
+As a best practice:
+
+* Users requiring precise rate limiter remove the config `preciseTopicPublishRateLimiterEnable` and
+  add `topicPublishRateLimiterProvider: org.apache.pulsar.broker.service.PrecisePublishLimiter`
+* Users requiring custom rate limiter ensure that their class is present in pulsar classpath and they set
+  the `topicPublishRateLimiterProvider` or `
+  resourceGroupPublishRateLimiterProvider` accordingly
+
+## Revert
+
+In case user was not using the precise rate limiter, simply deleting the two new configs should revert back to previous version.
+
+In case user was using precise rate limiter, i.e., they had `preciseTopicPublishRateLimiterEnable: true` and:
+* They removed that and switched to `topicPublishRateLimiterProvider: org.apache.pulsar.broker.service.PrecisePublishLimiter`
+  * To revert, they simply need to remove the config `topicPublishRateLimiterProvider` and put in `preciseTopicPublishRateLimiterEnable: true`
+* They did not change any of their configs, i.e. they did not specifically add and value in `topicPublishRateLimiterProvider`, then no change needed.
+
+# Links

Review Comment:
   I must say I was a bit confused by the purpose of "Public API" section. Who is the "Public" here? I thought that only changes going in the pulsar client handshake related interfaces are Public. I tried to refer to older PIPs but there was no solid conclusion.
   
   In this PIP, will the interface `PublishRateLimiter.java` be the Public API?



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design
+
+* Add a new `initialize(PublishRate, ServiceConfigurations)` method in the interface `PublishRateLimiter.java` replacing direct constructor call.
+* Add 2 new optional properties in broker config: 
+  * `topicPublishRateLimiterProvider: <classpath>`

Review Comment:
   I thought about that initially.. But the usage of rate limiting is very rare outside of publish context. The misc rate limiters are all metadata or admin related rate limiters where it doesn't make any sense to use a custom rate limiter logic - think unload namespace rate limiter, or stats rate limiter.
   
   In the consume path - the current precise one works without any hassle. In real world applications, consumers can take a hit as there is no scope of application degradation or data loss due to consume throughput rate limiting since the data is already persisted in bookkeepers.
   
   I also wanted to keep the scope of this PIP as very minimal. so that the turn around time is much lower. Hope this makes sense?
   



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`

Review Comment:
   The caveat should be covered in the Motivation section. Is caveat mentioned there good enough?



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.

Review Comment:
   Adding to the reply to the resource group section - including the details of how poller and precise works - would lead to a very bloated "Background Knowledge" section. I can certainly add both the details, but is there a better approach to this?
   Unfortunately, the official documentation about these things is also lacking.



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.

Review Comment:
   will do. thanks.



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge

Review Comment:
   I agree with you that resource groups aren't a well known feature.. But adding details about resource group (and other additional contextual details) - would that make the Background knowledge of this PIP too big?
   
   Rate limiters in Pulsar are all server based AFAIK, please correct me if I am mistaken.



##########
pip/pip-310.md:
##########
@@ -0,0 +1,121 @@
+# PIP-310 Support custom publish rate limiters
+
+# Background knowledge
+
+Pulsar uses rate limiters to control produce and consume rates. At core, this utility is provided by the RateLimiter.java class.
+There are many places where a rate limiter is used today. Mainly:
+* Produce path:
+  * ResourceGroupPublishLimiter - resource group's produce qps and MBps checks
+  * PrecisePublishLimiter - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker config is `true`
+  * PublishRateLimiterImpl - the topic quota rate limiter in case when `preciseTopicPublishRateLimiterEnable` broker
+  config is `false`
+  * BrokerService - manage total publish rate of the broker
+* Consume path:
+  * DispatchRateLimiter - dispatch rate limiter
+  * BrokerService - manage total dispatch rate of the broker 
+* Misc:
+  * BrokerService - concurrent unloader rate limiter
+  * (Function|Sink|Source)StatsManager
+
+RateLimiter.java has two ways of implementing the rate limit check - precise and poller based. The poller based accumulates allowed permit and check for breach at a regular, configurable, frequency within each second. The precise one checks for breach before handing out permits each time.
+
+The DispatchRateLimiter hard codes RateLimiter to use precise rate limiter, but in the produce path , users can configure which one to choose for their brokers.
+
+# Motivation
+
+<!--
+Describe the problem this proposal is trying to solve.
+
+* Explain what is the problem you're trying to solve - current situation.
+* This section is the "Why" of your proposal.
+-->
+
+We have used both the poller based and precise publish rate limiters and there are challenges with both the approaches:
+
+* Poller based doesn't really do any throttling, all pending requests due to limit breach in the first second pass through in the first interval of the subsequent second.
+  * Effectively, there is no rate limiting.
+  * This also leads to noisy neighbour issues where topics hosted on the same broker/bookie face latency spikes.
+* Precise rate limiter fixes the above two issues, but introduces another challenge - the rate limiting is too strict.
+  * This leads potential for data loss in case there are sudden spikes in produce and client side produce queue breaches.
+  * The produce latencies increase exponentially in case produce breaches the set throughput even for small windows.
+
+Thus, there is a need for a rate limiter which is in between the above two implementations. As such, the handling of produce spikes
+and bursts may way from use-case to use-case, so it's not ideal to implement and add another rate limiter logic inside of Pulsar codebase
+as it will never be able to meet all possible requirements.
+
+# Goals
+
+I propose a configurable rate limiter implementation support, just like how AuthorizationProvider and
+AuthenticationProvider allow for.
+
+## In Scope
+
+For this PIP, the scope is the produce path only. It covers both the topic and the resource group level rate limiter.
+
+## Out of Scope
+
+Dispatch rate limiter is out of scope of this PIP. Generally speaking, the dispatch rate can smoothen out over time even if there are high produce spikes.
+
+# High Level Design

Review Comment:
   I thought, since its using an existing interface, i'd rather show the minimal delta, than the whole class which exists in the codebase already?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org