You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Fares Oueslati <ou...@gmail.com> on 2023/04/27 14:18:32 UTC

Imbalance in Commit Messages Across __consumer_offsets Topic Partitions

Hello Kafka users,

I’m facing an issue with a Kafka cluster, specifically with the
__consumer_offsets topic.
There seems to be an imbalance in the number of commit messages across its
partitions. Most of the commit messages are concentrated in a single
partition, which is causing high CPU usage on the broker handling that
partition.
I have already verified that the topic partitions’ leaders are
well-balanced across the six brokers.
However, a specific consumer group (The largest one, with many members
consuming from multiple topics, based on Spring Kafka) generates a large
number of commit messages, and they all end up in the same partition #37.
My understanding is that, by default, all commit messages sent by a
particular consumer group for a specific topic partition are directed to a
single partition of the __consumer_offsets topic, determined by hashing the
consumer group id and the topic partition. In our case, this default
partitioning strategy seems to be causing the imbalance, even though I
don’t understand why exactly.
Could you please help me understand why there’s such an imbalance in the
number of messages across the __consumer_offsets partitions and why the
large number of commit messages from the large consumer group are not
spread well across the partitions of the __consumer_offsets topic? Are
there any recommendations or best practices to address this issue?

Any guidance would be greatly appreciated.

Best Regards,
Fares

Re: Imbalance in Commit Messages Across __consumer_offsets Topic Partitions

Posted by Alexandre Dupriez <al...@gmail.com>.
Hi Fares,

What is the rate of offset commits for the group?
How often do you need to commit offsets for consumers in this group?

Thanks,
Alexandre

Le mar. 9 mai 2023 à 18:34, Fares Oueslati <ou...@gmail.com> a écrit :
>
> Hello Richard,
>
> Thank you for your answer.
>
> Upon examining the `__consumer_offsets` topic, it seems that all commit
> messages for a given consumer `group.id` go to the same partition.
> So, there's nothing much to do if we have a dominant consumer group reading
> from all topics.
>
> The only solution would be to split it in multiple consumer groups reading
> from different subsets of topics.
>
> Best Regards,
> Fares
>
> On Mon, May 1, 2023 at 11:07 AM Richard Bosch <ri...@axual.com>
> wrote:
>
> > Hi Fares,
> >
> > You're right in your description of the contents of the __consumer_offsets
> > topic, and how they are stored.
> > The most common reason for an uneven load on the consumer offsets are.
> >
> > 1. Configuration of offset commits in the client
> > 2. Load on topic being consumed
> >
> > If a topic has 10 partitions, and the producer produces records with a key
> > and a partitioner based on the key hash, then it can happen that one or
> > more partitions get more records than the others. Just because several keys
> > are more often used than others.
> > Now the consumer needs to read more records from those partitions.
> > If the consumer commits offsets at a time interval or after an N amount of
> > records consumed, it follows that this results in more offset commits for
> > the topic partitions containing more records.
> >
> > You might want to check the load on the topic partitions being consumed to
> > confirm this is the case.
> > Unfortunately, I do not have an easy answer on how to remedy that problem.
> >
> > You can check what the offset commit settings are for your application, and
> > if you can update the logic to reflect the higher load.
> > If the trigger is based on time or nrOfRecords consumed you can alter these
> > values to make sure they aren't triggered that often.
> >
> > You can also start monitoring and updating the cluster with a preferred
> > leader election setting for your partitions using CruiseControl to minimize
> > the load on your brokers.
> >
> > I do recommend to use keys with the hash based partitioner only if the
> > order of messages for those keys MUST be guaranteed, else you can use a
> > different Partitioner.
> > Then there will be a more uniform distribution of records on the topic, and
> > a better distribution of offset commits.
> >
> > Do note that there is an issue with the UniformStickyPartitioner that is
> > being worked on right now, see
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
> >
> > I hope this helps
> >
> > Kind regards,
> >
> >
> > Richard Bosch
> >
> > Developer Advocate
> >
> > Axual BV
> >
> > E : richard.bosch@axual.com
> > M : +31 6 11 850 846
> > W : www.axual.com
> >
> >
> > On Thu, Apr 27, 2023 at 4:19 PM Fares Oueslati <ou...@gmail.com>
> > wrote:
> >
> > > Hello Kafka users,
> > >
> > > I’m facing an issue with a Kafka cluster, specifically with the
> > > __consumer_offsets topic.
> > > There seems to be an imbalance in the number of commit messages across
> > its
> > > partitions. Most of the commit messages are concentrated in a single
> > > partition, which is causing high CPU usage on the broker handling that
> > > partition.
> > > I have already verified that the topic partitions’ leaders are
> > > well-balanced across the six brokers.
> > > However, a specific consumer group (The largest one, with many members
> > > consuming from multiple topics, based on Spring Kafka) generates a large
> > > number of commit messages, and they all end up in the same partition #37.
> > > My understanding is that, by default, all commit messages sent by a
> > > particular consumer group for a specific topic partition are directed to
> > a
> > > single partition of the __consumer_offsets topic, determined by hashing
> > the
> > > consumer group id and the topic partition. In our case, this default
> > > partitioning strategy seems to be causing the imbalance, even though I
> > > don’t understand why exactly.
> > > Could you please help me understand why there’s such an imbalance in the
> > > number of messages across the __consumer_offsets partitions and why the
> > > large number of commit messages from the large consumer group are not
> > > spread well across the partitions of the __consumer_offsets topic? Are
> > > there any recommendations or best practices to address this issue?
> > >
> > > Any guidance would be greatly appreciated.
> > >
> > > Best Regards,
> > > Fares
> > >
> >

Re: Imbalance in Commit Messages Across __consumer_offsets Topic Partitions

Posted by Fares Oueslati <ou...@gmail.com>.
Hello Richard,

Thank you for your answer.

Upon examining the `__consumer_offsets` topic, it seems that all commit
messages for a given consumer `group.id` go to the same partition.
So, there's nothing much to do if we have a dominant consumer group reading
from all topics.

The only solution would be to split it in multiple consumer groups reading
from different subsets of topics.

Best Regards,
Fares

On Mon, May 1, 2023 at 11:07 AM Richard Bosch <ri...@axual.com>
wrote:

> Hi Fares,
>
> You're right in your description of the contents of the __consumer_offsets
> topic, and how they are stored.
> The most common reason for an uneven load on the consumer offsets are.
>
> 1. Configuration of offset commits in the client
> 2. Load on topic being consumed
>
> If a topic has 10 partitions, and the producer produces records with a key
> and a partitioner based on the key hash, then it can happen that one or
> more partitions get more records than the others. Just because several keys
> are more often used than others.
> Now the consumer needs to read more records from those partitions.
> If the consumer commits offsets at a time interval or after an N amount of
> records consumed, it follows that this results in more offset commits for
> the topic partitions containing more records.
>
> You might want to check the load on the topic partitions being consumed to
> confirm this is the case.
> Unfortunately, I do not have an easy answer on how to remedy that problem.
>
> You can check what the offset commit settings are for your application, and
> if you can update the logic to reflect the higher load.
> If the trigger is based on time or nrOfRecords consumed you can alter these
> values to make sure they aren't triggered that often.
>
> You can also start monitoring and updating the cluster with a preferred
> leader election setting for your partitions using CruiseControl to minimize
> the load on your brokers.
>
> I do recommend to use keys with the hash based partitioner only if the
> order of messages for those keys MUST be guaranteed, else you can use a
> different Partitioner.
> Then there will be a more uniform distribution of records on the topic, and
> a better distribution of offset commits.
>
> Do note that there is an issue with the UniformStickyPartitioner that is
> being worked on right now, see
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
>
> I hope this helps
>
> Kind regards,
>
>
> Richard Bosch
>
> Developer Advocate
>
> Axual BV
>
> E : richard.bosch@axual.com
> M : +31 6 11 850 846
> W : www.axual.com
>
>
> On Thu, Apr 27, 2023 at 4:19 PM Fares Oueslati <ou...@gmail.com>
> wrote:
>
> > Hello Kafka users,
> >
> > I’m facing an issue with a Kafka cluster, specifically with the
> > __consumer_offsets topic.
> > There seems to be an imbalance in the number of commit messages across
> its
> > partitions. Most of the commit messages are concentrated in a single
> > partition, which is causing high CPU usage on the broker handling that
> > partition.
> > I have already verified that the topic partitions’ leaders are
> > well-balanced across the six brokers.
> > However, a specific consumer group (The largest one, with many members
> > consuming from multiple topics, based on Spring Kafka) generates a large
> > number of commit messages, and they all end up in the same partition #37.
> > My understanding is that, by default, all commit messages sent by a
> > particular consumer group for a specific topic partition are directed to
> a
> > single partition of the __consumer_offsets topic, determined by hashing
> the
> > consumer group id and the topic partition. In our case, this default
> > partitioning strategy seems to be causing the imbalance, even though I
> > don’t understand why exactly.
> > Could you please help me understand why there’s such an imbalance in the
> > number of messages across the __consumer_offsets partitions and why the
> > large number of commit messages from the large consumer group are not
> > spread well across the partitions of the __consumer_offsets topic? Are
> > there any recommendations or best practices to address this issue?
> >
> > Any guidance would be greatly appreciated.
> >
> > Best Regards,
> > Fares
> >
>

Re: Imbalance in Commit Messages Across __consumer_offsets Topic Partitions

Posted by Richard Bosch <ri...@axual.com>.
Hi Fares,

You're right in your description of the contents of the __consumer_offsets
topic, and how they are stored.
The most common reason for an uneven load on the consumer offsets are.

1. Configuration of offset commits in the client
2. Load on topic being consumed

If a topic has 10 partitions, and the producer produces records with a key
and a partitioner based on the key hash, then it can happen that one or
more partitions get more records than the others. Just because several keys
are more often used than others.
Now the consumer needs to read more records from those partitions.
If the consumer commits offsets at a time interval or after an N amount of
records consumed, it follows that this results in more offset commits for
the topic partitions containing more records.

You might want to check the load on the topic partitions being consumed to
confirm this is the case.
Unfortunately, I do not have an easy answer on how to remedy that problem.

You can check what the offset commit settings are for your application, and
if you can update the logic to reflect the higher load.
If the trigger is based on time or nrOfRecords consumed you can alter these
values to make sure they aren't triggered that often.

You can also start monitoring and updating the cluster with a preferred
leader election setting for your partitions using CruiseControl to minimize
the load on your brokers.

I do recommend to use keys with the hash based partitioner only if the
order of messages for those keys MUST be guaranteed, else you can use a
different Partitioner.
Then there will be a more uniform distribution of records on the topic, and
a better distribution of offset commits.

Do note that there is an issue with the UniformStickyPartitioner that is
being worked on right now, see
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner

I hope this helps

Kind regards,


Richard Bosch

Developer Advocate

Axual BV

E : richard.bosch@axual.com
M : +31 6 11 850 846
W : www.axual.com


On Thu, Apr 27, 2023 at 4:19 PM Fares Oueslati <ou...@gmail.com>
wrote:

> Hello Kafka users,
>
> I’m facing an issue with a Kafka cluster, specifically with the
> __consumer_offsets topic.
> There seems to be an imbalance in the number of commit messages across its
> partitions. Most of the commit messages are concentrated in a single
> partition, which is causing high CPU usage on the broker handling that
> partition.
> I have already verified that the topic partitions’ leaders are
> well-balanced across the six brokers.
> However, a specific consumer group (The largest one, with many members
> consuming from multiple topics, based on Spring Kafka) generates a large
> number of commit messages, and they all end up in the same partition #37.
> My understanding is that, by default, all commit messages sent by a
> particular consumer group for a specific topic partition are directed to a
> single partition of the __consumer_offsets topic, determined by hashing the
> consumer group id and the topic partition. In our case, this default
> partitioning strategy seems to be causing the imbalance, even though I
> don’t understand why exactly.
> Could you please help me understand why there’s such an imbalance in the
> number of messages across the __consumer_offsets partitions and why the
> large number of commit messages from the large consumer group are not
> spread well across the partitions of the __consumer_offsets topic? Are
> there any recommendations or best practices to address this issue?
>
> Any guidance would be greatly appreciated.
>
> Best Regards,
> Fares
>