You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Lari Hotari <lh...@apache.org> on 2024/03/18 10:32:03 UTC

[DISCUSS] Improving Pulsar broker cache design

Hi all,

I'd like to start a discussion about improving Pulsar broker cache design.

In the Pulsar broker, there are two main scenarios for message
dispatching to consumers: tailing (hot) reads and catch-up (cold)
reads. In both scenarios consumers can be fast or slow which also
impacts the scenarios.
The Pulsar broker contains a cache for handling tailing reads. This
cache was extended to handle catch-up reads with PR #12258 (+other
follow-up PRs) in Pulsar 2.11.
This cache is referred to as the "broker cache" or more specifically,
the "managed ledger cache".

The issue "Slow backlog draining with high fanout topics"
https://github.com/apache/pulsar/issues/12257 describes a scenario why
the caching was extended to handle catch-up reads (PR #12258, in
Pulsar 2.11):

"When there are topics with a high publish rate and high fan-out, with
a large number of subscriptions or a large number of replicators, and
a backlog is built, it becomes really difficult to drain the backlog
for such topics while they are serving a high publish rate. This
problem can be reproduced with multiple topics (100), each with 10K
writes, 6 backlog replicators, and multiple subscriptions. It becomes
impossible to drain backlogs even if most of the cursors are draining
a similar backlog because a large number of cold-reads from the bookie
makes the overall backlog draining slower. Even the bookkeeper cache
is not able to keep up by caching entries and eventually has to do
cold-reads." (edited)

The problem described above could also occur in other cases. Under
heavy load, a high fan-out catch-up read could increase the load of
the system and it could cause a cascading failure which results in
partial outages and backlogs that could take hours to recover from. In
many cases, this could be avoided with a more optimal broker cache
implementation.

Optimizations in the broker cache are crucial for improving Pulsar
performance. Unlike Kafka, the Pulsar broker doesn't have access to
local files and cannot leverage the Linux page cache for efficient
caching.
The cache implementation in the Pulsar broker should be intelligent to
avoid unnecessary load on the system. However, there have been very
few improvements and focus on improving the Pulsar broker cache.

For most production configurations, understanding the tuning of the
broker cache is necessary. However, we don't have much documentation
in the Apache Pulsar project. I noticed this in 2022 when I was
preparing my talk "Performance tuning for Apache Pulsar Kubernetes
deployments" for ApacheCon 2022 (slides [1], talk [2]). The situation
hasn't improved since then in the project documentation. There are
better performance-related tutorials than my talk available, for
example, "Apache Pulsar Troubleshooting Performance Issues" by Hang
Chen [3] and "Apache Pulsar Troubleshooting backlog Issues" by Penghui
Li [4]. It would be great if this type of content were summarized in
the Apache Pulsar project documentation. It can be hard to find the
relevant information when it is needed.

The main intention of this email is not about improving documentation.
It is to highlight that the current implementation could be
significantly improved.
The broker cache should be designed in such a way that it minimizes
cache misses and maximizes cache hits within the available broker
cache memory.
I believe this can be achieved by developing an algorithm that
incorporates an optimization model for making optimal caching
decisions.

In addition to a better caching model and algorithm, there are
opportunities to leverage rate limiting to improve caching for both
tailing reads and catch-up reads.
For instance, instead of rate limiting individual consumers, consumers
could be dynamically grouped in a way where the speed at which the
group moves forward could be rate limited so that cache hits are
maximized and consumer side speed difference wouldn't cause the
consumers to fall out of the cached range. When consumers are catching
up to the tail, it doesn't always make sense to impose a rate limit
for an individual consumer if the entries to read are already cached,
as rate limiting would only increase the cache size and increase the
chance that the consumer falls out of the cached range.

Would anyone else be interested in contributing to the design and
improvement of the Pulsar broker cache?
It would also be useful to hear about real user experiences with
current problems in high fan-out scenarios in Pulsar.

- Lari

1 - https://www.apachecon.com/acna2022/slides/03_Hotari_Lari_Performance_tuning_Pulsar.pdf
2 - https://www.youtube.com/watch?v=WkdfILAx-4c
3 - https://www.youtube.com/watch?v=8_4bVctj2_E
4 - https://www.youtube.com/watch?v=17jQIOVeu4s

Re: [DISCUSS] Improving Pulsar broker cache design

Posted by Lari Hotari <lh...@apache.org>.
Bumping this thread. Looking forward to some feedback. 
Thanks!

-Lari

On 2024/03/18 10:32:03 Lari Hotari wrote:
> Hi all,
> 
> I'd like to start a discussion about improving Pulsar broker cache design.
> 
> In the Pulsar broker, there are two main scenarios for message
> dispatching to consumers: tailing (hot) reads and catch-up (cold)
> reads. In both scenarios consumers can be fast or slow which also
> impacts the scenarios.
> The Pulsar broker contains a cache for handling tailing reads. This
> cache was extended to handle catch-up reads with PR #12258 (+other
> follow-up PRs) in Pulsar 2.11.
> This cache is referred to as the "broker cache" or more specifically,
> the "managed ledger cache".
> 
> The issue "Slow backlog draining with high fanout topics"
> https://github.com/apache/pulsar/issues/12257 describes a scenario why
> the caching was extended to handle catch-up reads (PR #12258, in
> Pulsar 2.11):
> 
> "When there are topics with a high publish rate and high fan-out, with
> a large number of subscriptions or a large number of replicators, and
> a backlog is built, it becomes really difficult to drain the backlog
> for such topics while they are serving a high publish rate. This
> problem can be reproduced with multiple topics (100), each with 10K
> writes, 6 backlog replicators, and multiple subscriptions. It becomes
> impossible to drain backlogs even if most of the cursors are draining
> a similar backlog because a large number of cold-reads from the bookie
> makes the overall backlog draining slower. Even the bookkeeper cache
> is not able to keep up by caching entries and eventually has to do
> cold-reads." (edited)
> 
> The problem described above could also occur in other cases. Under
> heavy load, a high fan-out catch-up read could increase the load of
> the system and it could cause a cascading failure which results in
> partial outages and backlogs that could take hours to recover from. In
> many cases, this could be avoided with a more optimal broker cache
> implementation.
> 
> Optimizations in the broker cache are crucial for improving Pulsar
> performance. Unlike Kafka, the Pulsar broker doesn't have access to
> local files and cannot leverage the Linux page cache for efficient
> caching.
> The cache implementation in the Pulsar broker should be intelligent to
> avoid unnecessary load on the system. However, there have been very
> few improvements and focus on improving the Pulsar broker cache.
> 
> For most production configurations, understanding the tuning of the
> broker cache is necessary. However, we don't have much documentation
> in the Apache Pulsar project. I noticed this in 2022 when I was
> preparing my talk "Performance tuning for Apache Pulsar Kubernetes
> deployments" for ApacheCon 2022 (slides [1], talk [2]). The situation
> hasn't improved since then in the project documentation. There are
> better performance-related tutorials than my talk available, for
> example, "Apache Pulsar Troubleshooting Performance Issues" by Hang
> Chen [3] and "Apache Pulsar Troubleshooting backlog Issues" by Penghui
> Li [4]. It would be great if this type of content were summarized in
> the Apache Pulsar project documentation. It can be hard to find the
> relevant information when it is needed.
> 
> The main intention of this email is not about improving documentation.
> It is to highlight that the current implementation could be
> significantly improved.
> The broker cache should be designed in such a way that it minimizes
> cache misses and maximizes cache hits within the available broker
> cache memory.
> I believe this can be achieved by developing an algorithm that
> incorporates an optimization model for making optimal caching
> decisions.
> 
> In addition to a better caching model and algorithm, there are
> opportunities to leverage rate limiting to improve caching for both
> tailing reads and catch-up reads.
> For instance, instead of rate limiting individual consumers, consumers
> could be dynamically grouped in a way where the speed at which the
> group moves forward could be rate limited so that cache hits are
> maximized and consumer side speed difference wouldn't cause the
> consumers to fall out of the cached range. When consumers are catching
> up to the tail, it doesn't always make sense to impose a rate limit
> for an individual consumer if the entries to read are already cached,
> as rate limiting would only increase the cache size and increase the
> chance that the consumer falls out of the cached range.
> 
> Would anyone else be interested in contributing to the design and
> improvement of the Pulsar broker cache?
> It would also be useful to hear about real user experiences with
> current problems in high fan-out scenarios in Pulsar.
> 
> - Lari
> 
> 1 - https://www.apachecon.com/acna2022/slides/03_Hotari_Lari_Performance_tuning_Pulsar.pdf
> 2 - https://www.youtube.com/watch?v=WkdfILAx-4c
> 3 - https://www.youtube.com/watch?v=8_4bVctj2_E
> 4 - https://www.youtube.com/watch?v=17jQIOVeu4s
>