You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Christo Lolov <ch...@gmail.com> on 2023/01/18 08:51:07 UTC

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Greetings,

I am bumping the below DISCUSSion thread for KIP-895. The KIP presents a
situation where consumer groups are in an undefined state until a rolling
restart of a cluster is performed. While I have demonstrated the behaviour
using a cluster using Zookeeper I believe the same problem can be shown in
a KRaft cluster. Please let me know your opinions on the problem and the
presented solution.

Best,
Christo

On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> <ch...@yahoo.com.invalid> wrote:
>
>
> Hello!
> I would like to start this discussion thread on KIP-895: Dynamically
> refresh partition count of __consumer_offsets.
> The KIP proposes to alter brokers so that they refresh the partition count
> of __consumer_offsets used to determine group coordinators without
> requiring a rolling restart of the cluster.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
>
> Let me know your thoughts on the matter!
> Best, Christo
>

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Posted by Alexandre Dupriez <al...@gmail.com>.
Hi Divij,

Thanks for the follow-up. A few comments/questions.

100. The stated motivation to increase the number of partitions above
50 is scale. Are we sure that 50 partitions is not enough to cover all
valid use cases? An upper bound in the range 1 to 10 MB/s of ingress
per partition gives 50 to 500 MB/s. Assuming 100 bytes per offset and
metadata records, this gives between 500,000 and 5,000,000 offsets
committed per second. Assuming 10,000 consumers active on the cluster,
this would allow a rate of 50 to 500 offsets committed per second per
consumer. Are there really use cases where there is a genuine need for
more? Arguably, this does not include group metadata records which are
generated at a low frequency.

101. The partitioning scheme applied for consumer offsets is also used
in other parts such as the already mentioned transaction metadata or
remote log metadata for the topic-based remote log metadata manager
[1]. Have we considered a holistic approach for all these internal
topics?

Overall, I am not sure if changing the number of partitions for the
consumer offsets topic should even be allowed unless there is evidence
of it being required to accommodate throughput. Reassignment can be
required after cluster expansion, but that is correctly supported
IIRC.

Thanks,
Alexandre

[1] https://github.com/Hangleton/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataTopicPartitioner.java#L37

Le jeu. 6 avr. 2023 à 16:01, hzh0425 <hz...@163.com> a écrit :
>
> I think it's a good idea as we may want to store segments in different buckets
>
>
>
> | |
> hzhkafka@163.com
> |
> |
> 邮箱:hzhkafka@163.com
> |
>
>
>
>
> ---- 回复的原邮件 ----
> | 发件人 | Divij Vaidya<di...@gmail.com> |
> | 日期 | 2023年04月04日 23:56 |
> | 收件人 | dev@kafka.apache.org<de...@kafka.apache.org> |
> | 抄送至 | |
> | 主题 | Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets |
> FYI, a user faced this problem and reached out to us in the mailing list
> [1]. Implementation of this KIP could have reduced the downtime for these
> customers.
>
> Christo, would you like to create a JIRA and associate with the KIP so that
> we can continue to collect cases in the JIRA where users have faced this
> problem?
>
> [1] https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd
>
> --
> Divij Vaidya
>
>
>
> On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <ch...@gmail.com>
> wrote:
>
> > Greetings,
> >
> > I am bumping the below DISCUSSion thread for KIP-895. The KIP presents a
> > situation where consumer groups are in an undefined state until a rolling
> > restart of a cluster is performed. While I have demonstrated the behaviour
> > using a cluster using Zookeeper I believe the same problem can be shown in
> > a KRaft cluster. Please let me know your opinions on the problem and the
> > presented solution.
> >
> > Best,
> > Christo
> >
> > On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > > <ch...@yahoo.com.invalid> wrote:
> > >
> > >
> > > Hello!
> > > I would like to start this discussion thread on KIP-895: Dynamically
> > > refresh partition count of __consumer_offsets.
> > > The KIP proposes to alter brokers so that they refresh the partition
> > count
> > > of __consumer_offsets used to determine group coordinators without
> > > requiring a rolling restart of the cluster.
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> > >
> > > Let me know your thoughts on the matter!
> > > Best, Christo
> > >
> >

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Posted by hzh0425 <hz...@163.com>.
I think it's a good idea as we may want to store segments in different buckets



| |
hzhkafka@163.com
|
|
邮箱:hzhkafka@163.com
|




---- 回复的原邮件 ----
| 发件人 | Divij Vaidya<di...@gmail.com> |
| 日期 | 2023年04月04日 23:56 |
| 收件人 | dev@kafka.apache.org<de...@kafka.apache.org> |
| 抄送至 | |
| 主题 | Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets |
FYI, a user faced this problem and reached out to us in the mailing list
[1]. Implementation of this KIP could have reduced the downtime for these
customers.

Christo, would you like to create a JIRA and associate with the KIP so that
we can continue to collect cases in the JIRA where users have faced this
problem?

[1] https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd

--
Divij Vaidya



On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <ch...@gmail.com>
wrote:

> Greetings,
>
> I am bumping the below DISCUSSion thread for KIP-895. The KIP presents a
> situation where consumer groups are in an undefined state until a rolling
> restart of a cluster is performed. While I have demonstrated the behaviour
> using a cluster using Zookeeper I believe the same problem can be shown in
> a KRaft cluster. Please let me know your opinions on the problem and the
> presented solution.
>
> Best,
> Christo
>
> On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > <ch...@yahoo.com.invalid> wrote:
> >
> >
> > Hello!
> > I would like to start this discussion thread on KIP-895: Dynamically
> > refresh partition count of __consumer_offsets.
> > The KIP proposes to alter brokers so that they refresh the partition
> count
> > of __consumer_offsets used to determine group coordinators without
> > requiring a rolling restart of the cluster.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> >
> > Let me know your thoughts on the matter!
> > Best, Christo
> >
>

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Posted by David Jacot <dj...@confluent.io.INVALID>.
Hi Divij,

I think that the motivation is clear, however the ideal solution is not, at
least not for me. I would like to ensure that we solve the real problem
instead of making it worse. In our experience, is this issue usually due to
a mistake or a willingness to increase the number of consumers? In my mind,
in the current state, one should never change the number of partitions
because it results in losing group metadata. Preventing it would not be a
bad idea.

I agree that the ideal solution would be to change how we assign groups to
__consumer_offsets partitions. I have this idea of making groups a first
class resource in Kafka in the back of my mind for a while. This idea would
be to store group ids and their current partition in the controller and to
let the controller decide where a group should go when it is created. This
could be done via a plugin as well. If we have this, then adding new
__consumer_offsets partitions is no longer an issue. The controller would
start by filling the empty partitions when new groups are created. This
would have a few other advantages. For instance, it would allow us to put
quotas on the number of groups. It also has a few challenges. For instance,
how should a group be created - implicitly as today or explicitly? There is
also the question about the deletion. At the moment, groups are cleaned up
automatically after the grace period. Would we keep this?

I think that we should also consider the transaction coordinator in this
discussion because it suffers from the very same limitation. Ideally, we
should have a solution for both of them. Have you looked at how it handles
an increase of the number of partitions?

As a side note, we are in the middle or rewriting the group coordinator. I
think that big changes should only be made when we are done with that.

Best,
David

On Wed, Apr 5, 2023 at 10:08 PM Divij Vaidya <di...@gmail.com>
wrote:

> Thank you for your comments and participation in the discussion, David,
> Justine and Alex.
>
> You are right! The KIP is missing a lot of details about the motivation. I
> apologize for the confusion I created with my earlier statement about
> reducing the downtime in this thread. I will request Christo to update it.
>
> Meanwhile, as a summary, the KIP does not attempt to solve the problem of
> losing consumer offsets after partition increase. Instead the objective of
> the KIP is to reduce the time to recovery for reads to start after such an
> event has occurred. Prior to this KIP, impact of the change manifests when
> one of the brokers is restarted and the consumer groups remain in
> errors/undefined state until all brokers have been finished restarting.
> During a rolling restart, this places the time to recovery in proportion
> with the number of brokers in the clusters. After this KIP is implemented,
> we would not wait for the broker restart to pick up the new partitions,
> instead all brokers will notified about the change in number of partitions
> immediately. This would reduce the duration during which consumer groups
> are in erroring/undefined state from length of rolling to time it takes to
> process LISR across the cluster. Hence, a (small) win!
>
> I hope this explanation throws some more light into the context.
>
> Why do users change __consumer_offets?
> 1. They change it accidentally OR
> 2. They increase it to scale with the increase in the number of consumers.
> This is because (correct me if I am wrong) with an increase in the number
> of consumers, we can hit the limits on single partition throughput while
> reading/writing to the __consumer_offsets. This is a genuine use case and
> the downside of losing existing metadata/offsets is acceptable to them.
>
> How do we ideally fix it?
> An ideal solution would allow us to increase the number of partitions for
> __consumer_offsets without losing existing metadata. We either need to make
> partition assignment for a consumer "sticky" such that existing consumers
> are not re-assigned to new partitions OR we need to transfer data as per
> new partitions in __consumer_offsets. Both these approaches are long term
> fixes and require a separate discussion.
>
> What can we do in the short term?
> In the short term either we can block users from changing the number of
> partitions (which might not be possible due to use case #2 above) OR we can
> at least improve (not fix but just improve!) the current situation by
> reducing the time to recovery using this KIP.
>
> Let's circle back on this discussion as soon as KIP is updated with more
> details.
>
> --
> Divij Vaidya
>
>
>
> On Tue, Apr 4, 2023 at 8:00 PM Alexandre Dupriez <
> alexandre.dupriez@gmail.com> wrote:
>
> > Hi Christo,
> >
> > Thanks for the KIP. Apologies for the delayed review.
> >
> > At a high-level, I am not sure if the KIP really solves the problem it
> > intends to.
> >
> > More specifically, the KIP mentions that once a broker is restarted
> > and the group coordinator becomes aware of the new partition count of
> > the consumer offsets topic, the problem is mitigated. However, how do
> > we access the metadata and offsets recorded in a partition once it is
> > no longer the partition a consumer group resolves to?
> >
> > Thanks,
> > Alexandre
> >
> > Le mar. 4 avr. 2023 à 18:34, Justine Olshan
> > <jo...@confluent.io.invalid> a écrit :
> > >
> > > Hi,
> > >
> > > I'm also a bit unsure of the motivation here. Is there a need to change
> > the
> > > number of partitions for this topic?
> > >
> > > Justine
> > >
> > > On Tue, Apr 4, 2023 at 10:07 AM David Jacot <da...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am not very comfortable with the proposal of this KIP. The main
> > issue is
> > > > that changing the number of partitions means that all group metadata
> is
> > > > lost because the hashing changes. I wonder if we should just disallow
> > > > changing the number of partitions entirely. Did we consider something
> > like
> > > > this?
> > > >
> > > > Best,
> > > > David
> > > >
> > > > Le mar. 4 avr. 2023 à 17:57, Divij Vaidya <di...@gmail.com>
> a
> > > > écrit :
> > > >
> > > > > FYI, a user faced this problem and reached out to us in the mailing
> > list
> > > > > [1]. Implementation of this KIP could have reduced the downtime for
> > these
> > > > > customers.
> > > > >
> > > > > Christo, would you like to create a JIRA and associate with the KIP
> > so
> > > > that
> > > > > we can continue to collect cases in the JIRA where users have faced
> > this
> > > > > problem?
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd
> > > > >
> > > > > --
> > > > > Divij Vaidya
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <
> > christololov@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Greetings,
> > > > > >
> > > > > > I am bumping the below DISCUSSion thread for KIP-895. The KIP
> > presents
> > > > a
> > > > > > situation where consumer groups are in an undefined state until a
> > > > rolling
> > > > > > restart of a cluster is performed. While I have demonstrated the
> > > > > behaviour
> > > > > > using a cluster using Zookeeper I believe the same problem can be
> > shown
> > > > > in
> > > > > > a KRaft cluster. Please let me know your opinions on the problem
> > and
> > > > the
> > > > > > presented solution.
> > > > > >
> > > > > > Best,
> > > > > > Christo
> > > > > >
> > > > > > On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > > > > > > <ch...@yahoo.com.invalid> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Hello!
> > > > > > > I would like to start this discussion thread on KIP-895:
> > Dynamically
> > > > > > > refresh partition count of __consumer_offsets.
> > > > > > > The KIP proposes to alter brokers so that they refresh the
> > partition
> > > > > > count
> > > > > > > of __consumer_offsets used to determine group coordinators
> > without
> > > > > > > requiring a rolling restart of the cluster.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> > > > > > >
> > > > > > > Let me know your thoughts on the matter!
> > > > > > > Best, Christo
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Posted by Divij Vaidya <di...@gmail.com>.
Thank you for your comments and participation in the discussion, David,
Justine and Alex.

You are right! The KIP is missing a lot of details about the motivation. I
apologize for the confusion I created with my earlier statement about
reducing the downtime in this thread. I will request Christo to update it.

Meanwhile, as a summary, the KIP does not attempt to solve the problem of
losing consumer offsets after partition increase. Instead the objective of
the KIP is to reduce the time to recovery for reads to start after such an
event has occurred. Prior to this KIP, impact of the change manifests when
one of the brokers is restarted and the consumer groups remain in
errors/undefined state until all brokers have been finished restarting.
During a rolling restart, this places the time to recovery in proportion
with the number of brokers in the clusters. After this KIP is implemented,
we would not wait for the broker restart to pick up the new partitions,
instead all brokers will notified about the change in number of partitions
immediately. This would reduce the duration during which consumer groups
are in erroring/undefined state from length of rolling to time it takes to
process LISR across the cluster. Hence, a (small) win!

I hope this explanation throws some more light into the context.

Why do users change __consumer_offets?
1. They change it accidentally OR
2. They increase it to scale with the increase in the number of consumers.
This is because (correct me if I am wrong) with an increase in the number
of consumers, we can hit the limits on single partition throughput while
reading/writing to the __consumer_offsets. This is a genuine use case and
the downside of losing existing metadata/offsets is acceptable to them.

How do we ideally fix it?
An ideal solution would allow us to increase the number of partitions for
__consumer_offsets without losing existing metadata. We either need to make
partition assignment for a consumer "sticky" such that existing consumers
are not re-assigned to new partitions OR we need to transfer data as per
new partitions in __consumer_offsets. Both these approaches are long term
fixes and require a separate discussion.

What can we do in the short term?
In the short term either we can block users from changing the number of
partitions (which might not be possible due to use case #2 above) OR we can
at least improve (not fix but just improve!) the current situation by
reducing the time to recovery using this KIP.

Let's circle back on this discussion as soon as KIP is updated with more
details.

--
Divij Vaidya



On Tue, Apr 4, 2023 at 8:00 PM Alexandre Dupriez <
alexandre.dupriez@gmail.com> wrote:

> Hi Christo,
>
> Thanks for the KIP. Apologies for the delayed review.
>
> At a high-level, I am not sure if the KIP really solves the problem it
> intends to.
>
> More specifically, the KIP mentions that once a broker is restarted
> and the group coordinator becomes aware of the new partition count of
> the consumer offsets topic, the problem is mitigated. However, how do
> we access the metadata and offsets recorded in a partition once it is
> no longer the partition a consumer group resolves to?
>
> Thanks,
> Alexandre
>
> Le mar. 4 avr. 2023 à 18:34, Justine Olshan
> <jo...@confluent.io.invalid> a écrit :
> >
> > Hi,
> >
> > I'm also a bit unsure of the motivation here. Is there a need to change
> the
> > number of partitions for this topic?
> >
> > Justine
> >
> > On Tue, Apr 4, 2023 at 10:07 AM David Jacot <da...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I am not very comfortable with the proposal of this KIP. The main
> issue is
> > > that changing the number of partitions means that all group metadata is
> > > lost because the hashing changes. I wonder if we should just disallow
> > > changing the number of partitions entirely. Did we consider something
> like
> > > this?
> > >
> > > Best,
> > > David
> > >
> > > Le mar. 4 avr. 2023 à 17:57, Divij Vaidya <di...@gmail.com> a
> > > écrit :
> > >
> > > > FYI, a user faced this problem and reached out to us in the mailing
> list
> > > > [1]. Implementation of this KIP could have reduced the downtime for
> these
> > > > customers.
> > > >
> > > > Christo, would you like to create a JIRA and associate with the KIP
> so
> > > that
> > > > we can continue to collect cases in the JIRA where users have faced
> this
> > > > problem?
> > > >
> > > > [1] https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <
> christololov@gmail.com>
> > > > wrote:
> > > >
> > > > > Greetings,
> > > > >
> > > > > I am bumping the below DISCUSSion thread for KIP-895. The KIP
> presents
> > > a
> > > > > situation where consumer groups are in an undefined state until a
> > > rolling
> > > > > restart of a cluster is performed. While I have demonstrated the
> > > > behaviour
> > > > > using a cluster using Zookeeper I believe the same problem can be
> shown
> > > > in
> > > > > a KRaft cluster. Please let me know your opinions on the problem
> and
> > > the
> > > > > presented solution.
> > > > >
> > > > > Best,
> > > > > Christo
> > > > >
> > > > > On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > > > > > <ch...@yahoo.com.invalid> wrote:
> > > > > >
> > > > > >
> > > > > > Hello!
> > > > > > I would like to start this discussion thread on KIP-895:
> Dynamically
> > > > > > refresh partition count of __consumer_offsets.
> > > > > > The KIP proposes to alter brokers so that they refresh the
> partition
> > > > > count
> > > > > > of __consumer_offsets used to determine group coordinators
> without
> > > > > > requiring a rolling restart of the cluster.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> > > > > >
> > > > > > Let me know your thoughts on the matter!
> > > > > > Best, Christo
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Posted by Alexandre Dupriez <al...@gmail.com>.
Hi Christo,

Thanks for the KIP. Apologies for the delayed review.

At a high-level, I am not sure if the KIP really solves the problem it
intends to.

More specifically, the KIP mentions that once a broker is restarted
and the group coordinator becomes aware of the new partition count of
the consumer offsets topic, the problem is mitigated. However, how do
we access the metadata and offsets recorded in a partition once it is
no longer the partition a consumer group resolves to?

Thanks,
Alexandre

Le mar. 4 avr. 2023 à 18:34, Justine Olshan
<jo...@confluent.io.invalid> a écrit :
>
> Hi,
>
> I'm also a bit unsure of the motivation here. Is there a need to change the
> number of partitions for this topic?
>
> Justine
>
> On Tue, Apr 4, 2023 at 10:07 AM David Jacot <da...@gmail.com> wrote:
>
> > Hi,
> >
> > I am not very comfortable with the proposal of this KIP. The main issue is
> > that changing the number of partitions means that all group metadata is
> > lost because the hashing changes. I wonder if we should just disallow
> > changing the number of partitions entirely. Did we consider something like
> > this?
> >
> > Best,
> > David
> >
> > Le mar. 4 avr. 2023 à 17:57, Divij Vaidya <di...@gmail.com> a
> > écrit :
> >
> > > FYI, a user faced this problem and reached out to us in the mailing list
> > > [1]. Implementation of this KIP could have reduced the downtime for these
> > > customers.
> > >
> > > Christo, would you like to create a JIRA and associate with the KIP so
> > that
> > > we can continue to collect cases in the JIRA where users have faced this
> > > problem?
> > >
> > > [1] https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <ch...@gmail.com>
> > > wrote:
> > >
> > > > Greetings,
> > > >
> > > > I am bumping the below DISCUSSion thread for KIP-895. The KIP presents
> > a
> > > > situation where consumer groups are in an undefined state until a
> > rolling
> > > > restart of a cluster is performed. While I have demonstrated the
> > > behaviour
> > > > using a cluster using Zookeeper I believe the same problem can be shown
> > > in
> > > > a KRaft cluster. Please let me know your opinions on the problem and
> > the
> > > > presented solution.
> > > >
> > > > Best,
> > > > Christo
> > > >
> > > > On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > > > > <ch...@yahoo.com.invalid> wrote:
> > > > >
> > > > >
> > > > > Hello!
> > > > > I would like to start this discussion thread on KIP-895: Dynamically
> > > > > refresh partition count of __consumer_offsets.
> > > > > The KIP proposes to alter brokers so that they refresh the partition
> > > > count
> > > > > of __consumer_offsets used to determine group coordinators without
> > > > > requiring a rolling restart of the cluster.
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> > > > >
> > > > > Let me know your thoughts on the matter!
> > > > > Best, Christo
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hi,

I'm also a bit unsure of the motivation here. Is there a need to change the
number of partitions for this topic?

Justine

On Tue, Apr 4, 2023 at 10:07 AM David Jacot <da...@gmail.com> wrote:

> Hi,
>
> I am not very comfortable with the proposal of this KIP. The main issue is
> that changing the number of partitions means that all group metadata is
> lost because the hashing changes. I wonder if we should just disallow
> changing the number of partitions entirely. Did we consider something like
> this?
>
> Best,
> David
>
> Le mar. 4 avr. 2023 à 17:57, Divij Vaidya <di...@gmail.com> a
> écrit :
>
> > FYI, a user faced this problem and reached out to us in the mailing list
> > [1]. Implementation of this KIP could have reduced the downtime for these
> > customers.
> >
> > Christo, would you like to create a JIRA and associate with the KIP so
> that
> > we can continue to collect cases in the JIRA where users have faced this
> > problem?
> >
> > [1] https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <ch...@gmail.com>
> > wrote:
> >
> > > Greetings,
> > >
> > > I am bumping the below DISCUSSion thread for KIP-895. The KIP presents
> a
> > > situation where consumer groups are in an undefined state until a
> rolling
> > > restart of a cluster is performed. While I have demonstrated the
> > behaviour
> > > using a cluster using Zookeeper I believe the same problem can be shown
> > in
> > > a KRaft cluster. Please let me know your opinions on the problem and
> the
> > > presented solution.
> > >
> > > Best,
> > > Christo
> > >
> > > On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > > > <ch...@yahoo.com.invalid> wrote:
> > > >
> > > >
> > > > Hello!
> > > > I would like to start this discussion thread on KIP-895: Dynamically
> > > > refresh partition count of __consumer_offsets.
> > > > The KIP proposes to alter brokers so that they refresh the partition
> > > count
> > > > of __consumer_offsets used to determine group coordinators without
> > > > requiring a rolling restart of the cluster.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> > > >
> > > > Let me know your thoughts on the matter!
> > > > Best, Christo
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Posted by David Jacot <da...@gmail.com>.
Hi,

I am not very comfortable with the proposal of this KIP. The main issue is
that changing the number of partitions means that all group metadata is
lost because the hashing changes. I wonder if we should just disallow
changing the number of partitions entirely. Did we consider something like
this?

Best,
David

Le mar. 4 avr. 2023 à 17:57, Divij Vaidya <di...@gmail.com> a
écrit :

> FYI, a user faced this problem and reached out to us in the mailing list
> [1]. Implementation of this KIP could have reduced the downtime for these
> customers.
>
> Christo, would you like to create a JIRA and associate with the KIP so that
> we can continue to collect cases in the JIRA where users have faced this
> problem?
>
> [1] https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd
>
> --
> Divij Vaidya
>
>
>
> On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <ch...@gmail.com>
> wrote:
>
> > Greetings,
> >
> > I am bumping the below DISCUSSion thread for KIP-895. The KIP presents a
> > situation where consumer groups are in an undefined state until a rolling
> > restart of a cluster is performed. While I have demonstrated the
> behaviour
> > using a cluster using Zookeeper I believe the same problem can be shown
> in
> > a KRaft cluster. Please let me know your opinions on the problem and the
> > presented solution.
> >
> > Best,
> > Christo
> >
> > On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > > <ch...@yahoo.com.invalid> wrote:
> > >
> > >
> > > Hello!
> > > I would like to start this discussion thread on KIP-895: Dynamically
> > > refresh partition count of __consumer_offsets.
> > > The KIP proposes to alter brokers so that they refresh the partition
> > count
> > > of __consumer_offsets used to determine group coordinators without
> > > requiring a rolling restart of the cluster.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> > >
> > > Let me know your thoughts on the matter!
> > > Best, Christo
> > >
> >
>

Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

Posted by Divij Vaidya <di...@gmail.com>.
FYI, a user faced this problem and reached out to us in the mailing list
[1]. Implementation of this KIP could have reduced the downtime for these
customers.

Christo, would you like to create a JIRA and associate with the KIP so that
we can continue to collect cases in the JIRA where users have faced this
problem?

[1] https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd

--
Divij Vaidya



On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <ch...@gmail.com>
wrote:

> Greetings,
>
> I am bumping the below DISCUSSion thread for KIP-895. The KIP presents a
> situation where consumer groups are in an undefined state until a rolling
> restart of a cluster is performed. While I have demonstrated the behaviour
> using a cluster using Zookeeper I believe the same problem can be shown in
> a KRaft cluster. Please let me know your opinions on the problem and the
> presented solution.
>
> Best,
> Christo
>
> On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > <ch...@yahoo.com.invalid> wrote:
> >
> >
> > Hello!
> > I would like to start this discussion thread on KIP-895: Dynamically
> > refresh partition count of __consumer_offsets.
> > The KIP proposes to alter brokers so that they refresh the partition
> count
> > of __consumer_offsets used to determine group coordinators without
> > requiring a rolling restart of the cluster.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> >
> > Let me know your thoughts on the matter!
> > Best, Christo
> >
>