You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Matt Farmer <ma...@frmr.me> on 2018/11/20 02:24:41 UTC

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Thanks for the KIP.

Will this cap be a global cap across the entire cluster or per broker?

Either way the default value seems a bit high to me, but that could just be
from my own usage patterns. I’d have probably started with 500 or 1k but
could be easily convinced that’s wrong.

Thanks,
Matt

On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com> wrote:

> Hey folks,
>
>
> I would like to start a discussion on KIP-389:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-389%3A+Enforce+group.max.size+to+cap+member+metadata+growth
>
>
> This is a pretty simple change to cap the consumer group size for broker
> stability. Give me your valuable feedback when you got time.
>
>
> Thank you!
>

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Sounds good! I've updated the KIP with another small section under
Motivation.
If there aren't any objections or further recommendations, I plan on
starting a VOTE thread in the following days.

Best,
Stanislav

On Wed, Jan 9, 2019 at 8:54 PM Gwen Shapira <gw...@confluent.io> wrote:

> Thanks for the data driven approach, Stanislav. I love it :)
> And thank you for sharing your formula, Boyang. I totally agree that
> rebalance latency will not grow linearly with the consumer group size.
>
> My recommendation, considering what we know today:
> 1. Add the limit config, and set it to MAX_INT by default (effectively
> unlimited, without a magic number like -1)
> 2. Document our thoughts - the concern about runaway groups,
> Pinterest's 500 limit, Confluent's experience with few thousand
> consumers in a group, the conclusions from Stanislav's memory research
> (Personally, I wouldn't want what is essentially a linked list that we
> iterate to grow beyond 1M).
>
> Mostly likely, 99% of the users won't need it and those who do will
> have the right information to figure things out (or at least, they'll
> know everything that we know).
>
> WDYT?
>
> On Wed, Jan 9, 2019 at 4:25 AM Stanislav Kozlovski
> <st...@confluent.io> wrote:
> >
> > Hey everybody,
> >
> > I ran a quick benchmark and took some heap dumps to gauge how much memory
> > each consumer in a group is using, all done locally.
> > The setup was the following: 10 topics with 10 partitions each (100
> > partitions total) and one consumer group with 10 members, then expanded
> to
> > 20 members.
> > Here are some notes of my findings in a public Google doc:
> >
> https://docs.google.com/document/d/1Z4aY5qg8lU2uNXzdgp_30_oJ9_I9xNelPko6GIQYXYk/edit?usp=sharing
> >
> >
> > On Mon, Jan 7, 2019 at 10:51 PM Boyang Chen <bc...@outlook.com> wrote:
> >
> > > Hey Stanislav,
> > >
> > > I think the time taken to rebalance is not linearly correlated with
> number
> > > of consumers with our application. As for our current and future use
> cases,
> > > the main concern for Pinterest is still on the broker memory not CPU,
> > > because crashing server by one application could have cascading effect
> on
> > > all jobs.
> > > Do you want to drive a more detailed formula on how to compute the
> memory
> > > consumption against number of consumers within the group?
> > >
> > > In the meantime, I'm pretty buying in the motivation of this KIP, so I
> > > think the follow-up work is just refinement to make the new config
> easy to
> > > use. We should be good
> > > to vote IMO.
> > >
> > > Best,
> > > Boyang
> > > ________________________________
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > Sent: Monday, January 7, 2019 4:21 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hey there,
> > >
> > > Per Gwen's comments, I slightly reworked the motivation section. Let me
> > > know if it's any better now
> > >
> > > I completely agree that it would be best if we were to add a
> recommended
> > > number to a typical consumer group size. There is a problem that
> timing the
> > > CPU usage and rebalance times of consumer groups is tricky. We can
> update
> > > the KIP with memory guidelines (e.g 1 consumer in a group uses X
> memory,
> > > therefore 100 use Y).
> > > I fear that the most useful recommendations though would be knowing
> the CPU
> > > impact of large consumer groups and the rebalance times. That is,
> > > unfortunately, tricky to test and measure.
> > >
> > > @Boyang, you had mentioned some numbers used in Pinterest. If
> available to
> > > you, would you be comfortable sharing the number of consumers you are
> using
> > > in a group and maybe the potential time it takes to rebalance it?
> > >
> > > I'd appreciate any anecdotes regarding consumer group sizes from the
> > > community
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Thu, Jan 3, 2019 at 1:59 AM Boyang Chen <bc...@outlook.com>
> wrote:
> > >
> > > > Thanks Gwen for the suggestion! +1 on the guidance of defining
> > > > group.max.size. I guess a sample formula would be:
> > > > 2 * (# of brokers * average metadata cache size * 80%) / (# of
> consumer
> > > > groups * size of a single member metadata)
> > > >
> > > > if we assumed non-skewed partition assignment and pretty fair
> consumer
> > > > group consumption. The "2" is the 95 percentile of normal
> distribution
> > > and
> > > > 80% is just to buffer some memory capacity which are both open to
> > > > discussion. This config should be useful for Kafka platform team to
> make
> > > > sure one extreme large consumer group won't bring down the whole
> cluster.
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Boyang
> > > >
> > > > ________________________________
> > > > From: Gwen Shapira <gw...@confluent.io>
> > > > Sent: Thursday, January 3, 2019 2:59 AM
> > > > To: dev
> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > > metadata growth
> > > >
> > > > Sorry for joining the fun late, but I think the problem we are
> solving
> > > > evolved a bit in the thread, and I'd like to have better
> understanding
> > > > of the problem before voting :)
> > > >
> > > > Both KIP and discussion assert that large groups are a problem, but
> > > > they are kinda inconsistent regarding why they are a problem and
> whose
> > > > problem they are...
> > > > 1. The KIP itself states that the main issue with large groups are
> > > > long rebalance times. Per my understanding, this is mostly a problem
> > > > for the application that consumes data, but not really a problem for
> > > > the brokers themselves, so broker admins probably don't and shouldn't
> > > > care about it. Also, my understanding is that this is a problem for
> > > > consumer groups, but not necessarily a problem for other group types.
> > > > 2. The discussion highlights the issue of "run away" groups that
> > > > essentially create tons of members needlessly and use up lots of
> > > > broker memory. This is something the broker admins will care about a
> > > > lot. And is also a problem for every group that uses coordinators and
> > > > not just consumers. And since the memory in question is the metadata
> > > > cache, it probably has the largest impact on Kafka Streams
> > > > applications, since they have lots of metadata.
> > > >
> > > > The solution proposed makes the most sense in the context of #2, so
> > > > perhaps we should update the motivation section of the KIP to reflect
> > > > that.
> > > >
> > > > The reason I'm probing here is that in my opinion we have to give our
> > > > users some guidelines on what a reasonable limit is (otherwise, how
> > > > will they know?). Calculating the impact of group-size on rebalance
> > > > time in order to make good recommendations will take a significant
> > > > effort. On the other hand, informing users regarding the memory
> > > > footprint of a consumer in a group and using that to make a
> reasonable
> > > > suggestion isn't hard.
> > > >
> > > > Gwen
> > > >
> > > >
> > > > On Sun, Dec 30, 2018 at 12:51 PM Stanislav Kozlovski
> > > > <st...@confluent.io> wrote:
> > > > >
> > > > > Thanks Boyang,
> > > > >
> > > > > If there aren't any more thoughts on the KIP I'll start a vote
> thread
> > > in
> > > > > the new year
> > > > >
> > > > > On Sat, Dec 29, 2018 at 12:58 AM Boyang Chen <bc...@outlook.com>
> > > > wrote:
> > > > >
> > > > > > Yep Stanislav, that's what I'm proposing, and your explanation
> makes
> > > > sense.
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > ________________________________
> > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > Sent: Friday, December 28, 2018 7:59 PM
> > > > > > To: dev@kafka.apache.org
> > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> > > > > > metadata growth
> > > > > >
> > > > > > Hey there everybody, let's work on wrapping this discussion up.
> > > > > >
> > > > > > @Boyang, could you clarify what you mean by
> > > > > > > One more question is whether you feel we should enforce group
> size
> > > > cap
> > > > > > statically or on runtime?
> > > > > > Is that related to the option of enabling this config via the
> dynamic
> > > > > > broker config feature?
> > > > > >
> > > > > > Regarding that - I feel it's useful to have and I also think it
> might
> > > > not
> > > > > > introduce additional complexity. Ås long as we handle the config
> > > being
> > > > > > changed midway through a rebalance (via using the old value) we
> > > should
> > > > be
> > > > > > good to go.
> > > > > >
> > > > > > On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <
> > > > > > stanislav@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Jason,
> > > > > > >
> > > > > > > Yes, that is what I meant by
> > > > > > > > Given those constraints, I think that we can simply mark the
> > > group
> > > > as
> > > > > > > `PreparingRebalance` with a rebalanceTimeout of the server
> setting
> > > `
> > > > > > > group.max.session.timeout.ms`. That's a bit long by default (5
> > > > minutes)
> > > > > > > but I can't seem to come up with a better alternative
> > > > > > > So either the timeout or all members calling joinGroup, yes
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <
> bchen11@outlook.com>
> > > > wrote:
> > > > > > >
> > > > > > >> Hey Jason,
> > > > > > >>
> > > > > > >> I think this is the correct understanding. One more question
> is
> > > > whether
> > > > > > >> you feel
> > > > > > >> we should enforce group size cap statically or on runtime?
> > > > > > >>
> > > > > > >> Boyang
> > > > > > >> ________________________________
> > > > > > >> From: Jason Gustafson <ja...@confluent.io>
> > > > > > >> Sent: Tuesday, December 11, 2018 3:24 AM
> > > > > > >> To: dev
> > > > > > >> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > member
> > > > > > >> metadata growth
> > > > > > >>
> > > > > > >> Hey Stanislav,
> > > > > > >>
> > > > > > >> Just to clarify, I think what you're suggesting is something
> like
> > > > this
> > > > > > in
> > > > > > >> order to gracefully shrink the group:
> > > > > > >>
> > > > > > >> 1. Transition the group to PREPARING_REBALANCE. No members are
> > > > kicked
> > > > > > out.
> > > > > > >> 2. Continue to allow offset commits and heartbeats for all
> current
> > > > > > >> members.
> > > > > > >> 3. Allow the first n members that send JoinGroup to stay in
> the
> > > > group,
> > > > > > but
> > > > > > >> wait for the JoinGroup (or session timeout) from all active
> > > members
> > > > > > before
> > > > > > >> finishing the rebalance.
> > > > > > >>
> > > > > > >> So basically we try to give the current members an
> opportunity to
> > > > finish
> > > > > > >> work, but we prevent some of them from rejoining after the
> > > rebalance
> > > > > > >> completes. It sounds reasonable if I've understood correctly.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Jason
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <
> bchen11@outlook.com>
> > > > wrote:
> > > > > > >>
> > > > > > >> > Yep, LGTM on my side. Thanks Stanislav!
> > > > > > >> > ________________________________
> > > > > > >> > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > >> > Sent: Friday, December 7, 2018 8:51 PM
> > > > > > >> > To: dev@kafka.apache.org
> > > > > > >> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> cap
> > > > member
> > > > > > >> > metadata growth
> > > > > > >> >
> > > > > > >> > Hi,
> > > > > > >> >
> > > > > > >> > We discussed this offline with Boyang and figured that it's
> best
> > > > to
> > > > > > not
> > > > > > >> > wait on the Cooperative Rebalancing proposal. Our thinking
> is
> > > > that we
> > > > > > >> can
> > > > > > >> > just force a rebalance from the broker, allowing consumers
> to
> > > > commit
> > > > > > >> > offsets if their rebalanceListener is configured correctly.
> > > > > > >> > When rebalancing improvements are implemented, we assume
> that
> > > they
> > > > > > would
> > > > > > >> > improve KIP-389's behavior as well as the normal rebalance
> > > > scenarios
> > > > > > >> >
> > > > > > >> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <
> > > bchen11@outlook.com>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Hey Stanislav,
> > > > > > >> > >
> > > > > > >> > > thanks for the question! `Trivial rebalance` means "we
> don't
> > > > start
> > > > > > >> > > reassignment right now, but you need to know it's coming
> soon
> > > > > > >> > > and you should start preparation".
> > > > > > >> > >
> > > > > > >> > > An example KStream use case is that before actually
> starting
> > > to
> > > > > > shrink
> > > > > > >> > the
> > > > > > >> > > consumer group, we need to
> > > > > > >> > > 1. partition the consumer group into two subgroups, where
> one
> > > > will
> > > > > > be
> > > > > > >> > > offline soon and the other will keep serving;
> > > > > > >> > > 2. make sure the states associated with near-future
> offline
> > > > > > consumers
> > > > > > >> are
> > > > > > >> > > successfully replicated on the serving ones.
> > > > > > >> > >
> > > > > > >> > > As I have mentioned shrinking the consumer group is pretty
> > > much
> > > > > > >> > equivalent
> > > > > > >> > > to group scaling down, so we could think of this
> > > > > > >> > > as an add-on use case for cluster scaling. So my
> understanding
> > > > is
> > > > > > that
> > > > > > >> > the
> > > > > > >> > > KIP-389 could be sequenced within our cooperative
> rebalancing<
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > > > > > >> > > >
> > > > > > >> > > proposal.
> > > > > > >> > >
> > > > > > >> > > Let me know if this makes sense.
> > > > > > >> > >
> > > > > > >> > > Best,
> > > > > > >> > > Boyang
> > > > > > >> > > ________________________________
> > > > > > >> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > >> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > > > > > >> > > To: dev@kafka.apache.org
> > > > > > >> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> cap
> > > > member
> > > > > > >> > > metadata growth
> > > > > > >> > >
> > > > > > >> > > Hey Boyang,
> > > > > > >> > >
> > > > > > >> > > I think we still need to take care of group shrinkage
> because
> > > > even
> > > > > > if
> > > > > > >> > users
> > > > > > >> > > change the config value we cannot guarantee that all
> consumer
> > > > groups
> > > > > > >> > would
> > > > > > >> > > have been manually shrunk.
> > > > > > >> > >
> > > > > > >> > > Regarding 2., I agree that forcefully triggering a
> rebalance
> > > > might
> > > > > > be
> > > > > > >> the
> > > > > > >> > > most intuitive way to handle the situation.
> > > > > > >> > > What does a "trivial rebalance" mean? Sorry, I'm not
> familiar
> > > > with
> > > > > > the
> > > > > > >> > > term.
> > > > > > >> > > I was thinking that maybe we could force a rebalance,
> which
> > > > would
> > > > > > >> cause
> > > > > > >> > > consumers to commit their offsets (given their
> > > > rebalanceListener is
> > > > > > >> > > configured correctly) and subsequently reject some of the
> > > > incoming
> > > > > > >> > > `joinGroup` requests. Does that sound like it would work?
> > > > > > >> > >
> > > > > > >> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <
> > > bchen11@outlook.com
> > > > >
> > > > > > >> wrote:
> > > > > > >> > >
> > > > > > >> > > > Hey Stanislav,
> > > > > > >> > > >
> > > > > > >> > > > I read the latest KIP and saw that we already changed
> the
> > > > default
> > > > > > >> value
> > > > > > >> > > to
> > > > > > >> > > > -1. Do
> > > > > > >> > > > we still need to take care of the consumer group
> shrinking
> > > > when
> > > > > > >> doing
> > > > > > >> > the
> > > > > > >> > > > upgrade?
> > > > > > >> > > >
> > > > > > >> > > > However this is an interesting topic that worth
> discussing.
> > > > > > Although
> > > > > > >> > > > rolling
> > > > > > >> > > > upgrade is fine, `consumer.group.max.size` could always
> have
> > > > > > >> conflict
> > > > > > >> > > with
> > > > > > >> > > > the current
> > > > > > >> > > > consumer group size which means we need to adhere to one
> > > > source of
> > > > > > >> > truth.
> > > > > > >> > > >
> > > > > > >> > > > 1.Choose the current group size, which means we never
> > > > interrupt
> > > > > > the
> > > > > > >> > > > consumer group until
> > > > > > >> > > > it transits to PREPARE_REBALANCE. And we keep track of
> how
> > > > many
> > > > > > join
> > > > > > >> > > group
> > > > > > >> > > > requests
> > > > > > >> > > > we have seen so far during PREPARE_REBALANCE. After
> reaching
> > > > the
> > > > > > >> > consumer
> > > > > > >> > > > cap,
> > > > > > >> > > > we start to inform over provisioned consumers that you
> > > should
> > > > send
> > > > > > >> > > > LeaveGroupRequest and
> > > > > > >> > > > fail yourself. Or with what Mayuresh proposed in
> KIP-345, we
> > > > could
> > > > > > >> mark
> > > > > > >> > > > extra members
> > > > > > >> > > > as hot backup and rebalance without them.
> > > > > > >> > > >
> > > > > > >> > > > 2.Choose the `consumer.group.max.size`. I feel
> incremental
> > > > > > >> rebalancing
> > > > > > >> > > > (you proposed) could be of help here.
> > > > > > >> > > > When a new cap is enforced, leader should be notified.
> If
> > > the
> > > > > > >> current
> > > > > > >> > > > group size is already over limit, leader
> > > > > > >> > > > shall trigger a trivial rebalance to shuffle some topic
> > > > partitions
> > > > > > >> and
> > > > > > >> > > let
> > > > > > >> > > > a subset of consumers prepare the ownership
> > > > > > >> > > > transition. Until they are ready, we trigger a real
> > > rebalance
> > > > to
> > > > > > >> remove
> > > > > > >> > > > over-provisioned consumers. It is pretty much
> > > > > > >> > > > equivalent to `how do we scale down the consumer group
> > > without
> > > > > > >> > > > interrupting the current processing`.
> > > > > > >> > > >
> > > > > > >> > > > I personally feel inclined to 2 because we could kill
> two
> > > > birds
> > > > > > with
> > > > > > >> > one
> > > > > > >> > > > stone in a generic way. What do you think?
> > > > > > >> > > >
> > > > > > >> > > > Boyang
> > > > > > >> > > > ________________________________
> > > > > > >> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > >> > > > Sent: Monday, December 3, 2018 8:35 PM
> > > > > > >> > > > To: dev@kafka.apache.org
> > > > > > >> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size
> to
> > > cap
> > > > > > member
> > > > > > >> > > > metadata growth
> > > > > > >> > > >
> > > > > > >> > > > Hi Jason,
> > > > > > >> > > >
> > > > > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > > > > >> > > > I'm not sure. Looking at the config from the
> perspective of
> > > a
> > > > > > >> > > prescriptive
> > > > > > >> > > > config, we may get away with not updating it
> dynamically.
> > > > > > >> > > > But in my opinion, it always makes sense to have a
> config be
> > > > > > >> > dynamically
> > > > > > >> > > > configurable. As long as we limit it to being a
> cluster-wide
> > > > > > >> config, we
> > > > > > >> > > > should be fine.
> > > > > > >> > > >
> > > > > > >> > > > > 1. I think it would be helpful to clarify the details
> on
> > > > how the
> > > > > > >> > > > coordinator will shrink the group. It will need to
> choose
> > > > which
> > > > > > >> members
> > > > > > >> > > to
> > > > > > >> > > > remove. Are we going to give current members an
> opportunity
> > > to
> > > > > > >> commit
> > > > > > >> > > > offsets before kicking them from the group?
> > > > > > >> > > >
> > > > > > >> > > > This turns out to be somewhat tricky. I think that we
> may
> > > not
> > > > be
> > > > > > >> able
> > > > > > >> > to
> > > > > > >> > > > guarantee that consumers don't process a message twice.
> > > > > > >> > > > My initial approach was to do as much as we could to let
> > > > consumers
> > > > > > >> > commit
> > > > > > >> > > > offsets.
> > > > > > >> > > >
> > > > > > >> > > > I was thinking that we mark a group to be shrunk, we
> could
> > > > keep a
> > > > > > >> map
> > > > > > >> > of
> > > > > > >> > > > consumer_id->boolean indicating whether they have
> committed
> > > > > > >> offsets. I
> > > > > > >> > > then
> > > > > > >> > > > thought we could delay the rebalance until every
> consumer
> > > > commits
> > > > > > >> (or
> > > > > > >> > > some
> > > > > > >> > > > time passes).
> > > > > > >> > > > In the meantime, we would block all incoming fetch
> calls (by
> > > > > > either
> > > > > > >> > > > returning empty records or a retriable error) and we
> would
> > > > > > continue
> > > > > > >> to
> > > > > > >> > > > accept offset commits (even twice for a single consumer)
> > > > > > >> > > >
> > > > > > >> > > > I see two problems with this approach:
> > > > > > >> > > > * We have async offset commits, which implies that we
> can
> > > > receive
> > > > > > >> fetch
> > > > > > >> > > > requests before the offset commit req has been handled.
> i.e
> > > > > > consmer
> > > > > > >> > sends
> > > > > > >> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive
> > > A,C,B
> > > > in
> > > > > > the
> > > > > > >> > > > broker. Meaning we could have saved the offsets for B
> but
> > > > > > rebalance
> > > > > > >> > > before
> > > > > > >> > > > the offsetCommit for the offsets processed in C come in.
> > > > > > >> > > > * KIP-392 Allow consumers to fetch from closest replica
> > > > > > >> > > > <
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > > > > > >> > > > >
> > > > > > >> > > > would
> > > > > > >> > > > make it significantly harder to block poll() calls on
> > > > consumers
> > > > > > >> whose
> > > > > > >> > > > groups are being shrunk. Even if we implemented a
> solution,
> > > > the
> > > > > > same
> > > > > > >> > race
> > > > > > >> > > > condition noted above seems to apply and probably others
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > Given those constraints, I think that we can simply
> mark the
> > > > group
> > > > > > >> as
> > > > > > >> > > > `PreparingRebalance` with a rebalanceTimeout of the
> server
> > > > > > setting `
> > > > > > >> > > > group.max.session.timeout.ms`. That's a bit long by
> default
> > > > (5
> > > > > > >> > minutes)
> > > > > > >> > > > but
> > > > > > >> > > > I can't seem to come up with a better alternative
> > > > > > >> > > >
> > > > > > >> > > > I'm interested in hearing your thoughts.
> > > > > > >> > > >
> > > > > > >> > > > Thanks,
> > > > > > >> > > > Stanislav
> > > > > > >> > > >
> > > > > > >> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <
> > > > > > jason@confluent.io
> > > > > > >> >
> > > > > > >> > > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hey Stanislav,
> > > > > > >> > > > >
> > > > > > >> > > > > What do you think about the use case I mentioned in my
> > > > previous
> > > > > > >> reply
> > > > > > >> > > > about
> > > > > > >> > > > > > a more resilient self-service Kafka? I believe the
> > > benefit
> > > > > > >> there is
> > > > > > >> > > > > bigger.
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > I see this config as analogous to the open file limit.
> > > > Probably
> > > > > > >> this
> > > > > > >> > > > limit
> > > > > > >> > > > > was intended to be prescriptive at some point about
> what
> > > was
> > > > > > >> deemed a
> > > > > > >> > > > > reasonable number of open files for an application.
> But
> > > > mostly
> > > > > > >> people
> > > > > > >> > > > treat
> > > > > > >> > > > > it as an annoyance which they have to work around. If
> it
> > > > happens
> > > > > > >> to
> > > > > > >> > be
> > > > > > >> > > > hit,
> > > > > > >> > > > > usually you just increase it because it is not tied
> to an
> > > > actual
> > > > > > >> > > resource
> > > > > > >> > > > > constraint. However, occasionally hitting the limit
> does
> > > > > > indicate
> > > > > > >> an
> > > > > > >> > > > > application bug such as a leak, so I wouldn't say it
> is
> > > > useless.
> > > > > > >> > > > Similarly,
> > > > > > >> > > > > the issue in KAFKA-7610 was a consumer leak and having
> > > this
> > > > > > limit
> > > > > > >> > would
> > > > > > >> > > > > have allowed the problem to be detected before it
> impacted
> > > > the
> > > > > > >> > cluster.
> > > > > > >> > > > To
> > > > > > >> > > > > me, that's the main benefit. It's possible that it
> could
> > > be
> > > > used
> > > > > > >> > > > > prescriptively to prevent poor usage of groups, but
> like
> > > the
> > > > > > open
> > > > > > >> > file
> > > > > > >> > > > > limit, I suspect administrators will just set it large
> > > > enough
> > > > > > that
> > > > > > >> > > users
> > > > > > >> > > > > are unlikely to complain.
> > > > > > >> > > > >
> > > > > > >> > > > > Anyway, just a couple additional questions:
> > > > > > >> > > > >
> > > > > > >> > > > > 1. I think it would be helpful to clarify the details
> on
> > > > how the
> > > > > > >> > > > > coordinator will shrink the group. It will need to
> choose
> > > > which
> > > > > > >> > members
> > > > > > >> > > > to
> > > > > > >> > > > > remove. Are we going to give current members an
> > > opportunity
> > > > to
> > > > > > >> commit
> > > > > > >> > > > > offsets before kicking them from the group?
> > > > > > >> > > > >
> > > > > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks,
> > > > > > >> > > > > Jason
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > > > > >> > > > > stanislav@confluent.io>
> > > > > > >> > > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Hi Jason,
> > > > > > >> > > > > >
> > > > > > >> > > > > > You raise some very valid points.
> > > > > > >> > > > > >
> > > > > > >> > > > > > > The benefit of this KIP is probably limited to
> > > > preventing
> > > > > > >> > "runaway"
> > > > > > >> > > > > > consumer groups due to leaks or some other
> application
> > > bug
> > > > > > >> > > > > > What do you think about the use case I mentioned in
> my
> > > > > > previous
> > > > > > >> > reply
> > > > > > >> > > > > about
> > > > > > >> > > > > > a more resilient self-service Kafka? I believe the
> > > benefit
> > > > > > >> there is
> > > > > > >> > > > > bigger
> > > > > > >> > > > > >
> > > > > > >> > > > > > * Default value
> > > > > > >> > > > > > You're right, we probably do need to be
> conservative.
> > > Big
> > > > > > >> consumer
> > > > > > >> > > > groups
> > > > > > >> > > > > > are considered an anti-pattern and my goal was to
> also
> > > > hint at
> > > > > > >> this
> > > > > > >> > > > > through
> > > > > > >> > > > > > the config's default. Regardless, it is better to
> not
> > > > have the
> > > > > > >> > > > potential
> > > > > > >> > > > > to
> > > > > > >> > > > > > break applications with an upgrade.
> > > > > > >> > > > > > Choosing between the default of something big like
> 5000
> > > > or an
> > > > > > >> > opt-in
> > > > > > >> > > > > > option, I think we should go with the *disabled
> default
> > > > > > option*
> > > > > > >> > > (-1).
> > > > > > >> > > > > > The only benefit we would get from a big default of
> 5000
> > > > is
> > > > > > >> default
> > > > > > >> > > > > > protection against buggy/malicious applications
> that hit
> > > > the
> > > > > > >> > > KAFKA-7610
> > > > > > >> > > > > > issue.
> > > > > > >> > > > > > While this KIP was spawned from that issue, I
> believe
> > > its
> > > > > > value
> > > > > > >> is
> > > > > > >> > > > > enabling
> > > > > > >> > > > > > the possibility of protection and helping move
> towards a
> > > > more
> > > > > > >> > > > > self-service
> > > > > > >> > > > > > Kafka. I also think that a default value of 5000
> might
> > > be
> > > > > > >> > misleading
> > > > > > >> > > to
> > > > > > >> > > > > > users and lead them to think that big consumer
> groups (>
> > > > 250)
> > > > > > >> are a
> > > > > > >> > > > good
> > > > > > >> > > > > > thing.
> > > > > > >> > > > > >
> > > > > > >> > > > > > The good news is that KAFKA-7610 should be fully
> > > resolved
> > > > and
> > > > > > >> the
> > > > > > >> > > > > rebalance
> > > > > > >> > > > > > protocol should, in general, be more solid after the
> > > > planned
> > > > > > >> > > > improvements
> > > > > > >> > > > > > in KIP-345 and KIP-394.
> > > > > > >> > > > > >
> > > > > > >> > > > > > * Handling bigger groups during upgrade
> > > > > > >> > > > > > I now see that we store the state of consumer
> groups in
> > > > the
> > > > > > log
> > > > > > >> and
> > > > > > >> > > > why a
> > > > > > >> > > > > > rebalance isn't expected during a rolling upgrade.
> > > > > > >> > > > > > Since we're going with the default value of the
> max.size
> > > > being
> > > > > > >> > > > disabled,
> > > > > > >> > > > > I
> > > > > > >> > > > > > believe we can afford to be more strict here.
> > > > > > >> > > > > > During state reloading of a new Coordinator with a
> > > defined
> > > > > > >> > > > max.group.size
> > > > > > >> > > > > > config, I believe we should *force* rebalances for
> > > groups
> > > > that
> > > > > > >> > exceed
> > > > > > >> > > > the
> > > > > > >> > > > > > configured size. Then, only some consumers will be
> able
> > > to
> > > > > > join
> > > > > > >> and
> > > > > > >> > > the
> > > > > > >> > > > > max
> > > > > > >> > > > > > size invariant will be satisfied.
> > > > > > >> > > > > >
> > > > > > >> > > > > > I updated the KIP with a migration plan, rejected
> > > > alternatives
> > > > > > >> and
> > > > > > >> > > the
> > > > > > >> > > > > new
> > > > > > >> > > > > > default value.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Thanks,
> > > > > > >> > > > > > Stanislav
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > > > > > >> > jason@confluent.io>
> > > > > > >> > > > > > wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > > > Hey Stanislav,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Clients will then find that coordinator
> > > > > > >> > > > > > > > and send `joinGroup` on it, effectively
> rebuilding
> > > the
> > > > > > >> group,
> > > > > > >> > > since
> > > > > > >> > > > > the
> > > > > > >> > > > > > > > cache of active consumers is not stored outside
> the
> > > > > > >> > Coordinator's
> > > > > > >> > > > > > memory.
> > > > > > >> > > > > > > > (please do say if that is incorrect)
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Groups do not typically rebalance after a
> coordinator
> > > > > > change.
> > > > > > >> You
> > > > > > >> > > > could
> > > > > > >> > > > > > > potentially force a rebalance if the group is too
> big
> > > > and
> > > > > > kick
> > > > > > >> > out
> > > > > > >> > > > the
> > > > > > >> > > > > > > slowest members or something. A more graceful
> solution
> > > > is
> > > > > > >> > probably
> > > > > > >> > > to
> > > > > > >> > > > > > just
> > > > > > >> > > > > > > accept the current size and prevent it from
> getting
> > > > bigger.
> > > > > > We
> > > > > > >> > > could
> > > > > > >> > > > > log
> > > > > > >> > > > > > a
> > > > > > >> > > > > > > warning potentially.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > My thinking is that we should abstract away from
> > > > conserving
> > > > > > >> > > resources
> > > > > > >> > > > > and
> > > > > > >> > > > > > > > focus on giving control to the broker. The issue
> > > that
> > > > > > >> spawned
> > > > > > >> > > this
> > > > > > >> > > > > KIP
> > > > > > >> > > > > > > was
> > > > > > >> > > > > > > > a memory problem but I feel this change is
> useful
> > > in a
> > > > > > more
> > > > > > >> > > general
> > > > > > >> > > > > > way.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > So you probably already know why I'm asking about
> > > this.
> > > > For
> > > > > > >> > > consumer
> > > > > > >> > > > > > groups
> > > > > > >> > > > > > > anyway, resource usage would typically be
> proportional
> > > > to
> > > > > > the
> > > > > > >> > > number
> > > > > > >> > > > of
> > > > > > >> > > > > > > partitions that a group is reading from and not
> the
> > > > number
> > > > > > of
> > > > > > >> > > > members.
> > > > > > >> > > > > > For
> > > > > > >> > > > > > > example, consider the memory use in the offsets
> cache.
> > > > The
> > > > > > >> > benefit
> > > > > > >> > > of
> > > > > > >> > > > > > this
> > > > > > >> > > > > > > KIP is probably limited to preventing "runaway"
> > > consumer
> > > > > > >> groups
> > > > > > >> > due
> > > > > > >> > > > to
> > > > > > >> > > > > > > leaks or some other application bug. That still
> seems
> > > > useful
> > > > > > >> > > though.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > I completely agree with this and I *ask everybody
> to
> > > > chime
> > > > > > in
> > > > > > >> > with
> > > > > > >> > > > > > opinions
> > > > > > >> > > > > > > > on a sensible default value*.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > I think we would have to be very conservative. The
> > > group
> > > > > > >> protocol
> > > > > > >> > > is
> > > > > > >> > > > > > > generic in some sense, so there may be use cases
> we
> > > > don't
> > > > > > >> know of
> > > > > > >> > > > where
> > > > > > >> > > > > > > larger groups are reasonable. Probably we should
> make
> > > > this
> > > > > > an
> > > > > > >> > > opt-in
> > > > > > >> > > > > > > feature so that we do not risk breaking anyone's
> > > > application
> > > > > > >> > after
> > > > > > >> > > an
> > > > > > >> > > > > > > upgrade. Either that, or use a very high default
> like
> > > > 5,000.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Thanks,
> > > > > > >> > > > > > > Jason
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav
> Kozlovski <
> > > > > > >> > > > > > > stanislav@confluent.io>
> > > > > > >> > > > > > > wrote:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > Hey Jason and Boyang, those were important
> comments
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > One suggestion I have is that it would be
> helpful
> > > > to put
> > > > > > >> your
> > > > > > >> > > > > > reasoning
> > > > > > >> > > > > > > > on deciding the current default value. For
> example,
> > > in
> > > > > > >> certain
> > > > > > >> > > use
> > > > > > >> > > > > > cases
> > > > > > >> > > > > > > at
> > > > > > >> > > > > > > > Pinterest we are very likely to have more
> consumers
> > > > than
> > > > > > 250
> > > > > > >> > when
> > > > > > >> > > > we
> > > > > > >> > > > > > > > configure 8 stream instances with 32 threads.
> > > > > > >> > > > > > > > > For the effectiveness of this KIP, we should
> > > > encourage
> > > > > > >> people
> > > > > > >> > > to
> > > > > > >> > > > > > > discuss
> > > > > > >> > > > > > > > their opinions on the default setting and
> ideally
> > > > reach a
> > > > > > >> > > > consensus.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > I completely agree with this and I *ask
> everybody to
> > > > chime
> > > > > > >> in
> > > > > > >> > > with
> > > > > > >> > > > > > > opinions
> > > > > > >> > > > > > > > on a sensible default value*.
> > > > > > >> > > > > > > > My thought process was that in the current model
> > > > > > rebalances
> > > > > > >> in
> > > > > > >> > > > large
> > > > > > >> > > > > > > groups
> > > > > > >> > > > > > > > are more costly. I imagine most use cases in
> most
> > > > Kafka
> > > > > > >> users
> > > > > > >> > do
> > > > > > >> > > > not
> > > > > > >> > > > > > > > require more than 250 consumers.
> > > > > > >> > > > > > > > Boyang, you say that you are "likely to have...
> when
> > > > > > we..."
> > > > > > >> -
> > > > > > >> > do
> > > > > > >> > > > you
> > > > > > >> > > > > > have
> > > > > > >> > > > > > > > systems running with so many consumers in a
> group or
> > > > are
> > > > > > you
> > > > > > >> > > > planning
> > > > > > >> > > > > > > to? I
> > > > > > >> > > > > > > > guess what I'm asking is whether this has been
> > > tested
> > > > in
> > > > > > >> > > production
> > > > > > >> > > > > > with
> > > > > > >> > > > > > > > the current rebalance model (ignoring KIP-345)
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > >  Can you clarify the compatibility impact
> here?
> > > What
> > > > > > >> > > > > > > > > will happen to groups that are already larger
> than
> > > > the
> > > > > > max
> > > > > > >> > > size?
> > > > > > >> > > > > > > > This is a very important question.
> > > > > > >> > > > > > > > From my current understanding, when a
> coordinator
> > > > broker
> > > > > > >> gets
> > > > > > >> > > shut
> > > > > > >> > > > > > > > down during a cluster rolling upgrade, a replica
> > > will
> > > > take
> > > > > > >> > > > leadership
> > > > > > >> > > > > > of
> > > > > > >> > > > > > > > the `__offset_commits` partition. Clients will
> then
> > > > find
> > > > > > >> that
> > > > > > >> > > > > > coordinator
> > > > > > >> > > > > > > > and send `joinGroup` on it, effectively
> rebuilding
> > > the
> > > > > > >> group,
> > > > > > >> > > since
> > > > > > >> > > > > the
> > > > > > >> > > > > > > > cache of active consumers is not stored outside
> the
> > > > > > >> > Coordinator's
> > > > > > >> > > > > > memory.
> > > > > > >> > > > > > > > (please do say if that is incorrect)
> > > > > > >> > > > > > > > Then, I believe that working as if this is a new
> > > > group is
> > > > > > a
> > > > > > >> > > > > reasonable
> > > > > > >> > > > > > > > approach. Namely, fail joinGroups when the
> max.size
> > > is
> > > > > > >> > exceeded.
> > > > > > >> > > > > > > > What do you guys think about this? (I'll update
> the
> > > > KIP
> > > > > > >> after
> > > > > > >> > we
> > > > > > >> > > > > settle
> > > > > > >> > > > > > > on
> > > > > > >> > > > > > > > a solution)
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > >  Also, just to be clear, the resource we are
> > > trying
> > > > to
> > > > > > >> > conserve
> > > > > > >> > > > > here
> > > > > > >> > > > > > is
> > > > > > >> > > > > > > > what? Memory?
> > > > > > >> > > > > > > > My thinking is that we should abstract away from
> > > > > > conserving
> > > > > > >> > > > resources
> > > > > > >> > > > > > and
> > > > > > >> > > > > > > > focus on giving control to the broker. The issue
> > > that
> > > > > > >> spawned
> > > > > > >> > > this
> > > > > > >> > > > > KIP
> > > > > > >> > > > > > > was
> > > > > > >> > > > > > > > a memory problem but I feel this change is
> useful
> > > in a
> > > > > > more
> > > > > > >> > > general
> > > > > > >> > > > > > way.
> > > > > > >> > > > > > > It
> > > > > > >> > > > > > > > limits the control clients have on the cluster
> and
> > > > helps
> > > > > > >> Kafka
> > > > > > >> > > > > become a
> > > > > > >> > > > > > > > more self-serving system. Admin/Ops teams can
> better
> > > > > > control
> > > > > > >> > the
> > > > > > >> > > > > impact
> > > > > > >> > > > > > > > application developers can have on a Kafka
> cluster
> > > > with
> > > > > > this
> > > > > > >> > > change
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Best,
> > > > > > >> > > > > > > > Stanislav
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson
> <
> > > > > > >> > > > jason@confluent.io>
> > > > > > >> > > > > > > > wrote:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > Hi Stanislav,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Thanks for the KIP. Can you clarify the
> > > > compatibility
> > > > > > >> impact
> > > > > > >> > > > here?
> > > > > > >> > > > > > What
> > > > > > >> > > > > > > > > will happen to groups that are already larger
> than
> > > > the
> > > > > > max
> > > > > > >> > > size?
> > > > > > >> > > > > > Also,
> > > > > > >> > > > > > > > just
> > > > > > >> > > > > > > > > to be clear, the resource we are trying to
> > > conserve
> > > > here
> > > > > > >> is
> > > > > > >> > > what?
> > > > > > >> > > > > > > Memory?
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > -Jason
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > > > > > >> > > bchen11@outlook.com
> > > > > > >> > > > >
> > > > > > >> > > > > > > wrote:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > Thanks Stanislav for the update! One
> suggestion
> > > I
> > > > have
> > > > > > >> is
> > > > > > >> > > that
> > > > > > >> > > > it
> > > > > > >> > > > > > > would
> > > > > > >> > > > > > > > > be
> > > > > > >> > > > > > > > > > helpful to put your
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > reasoning on deciding the current default
> value.
> > > > For
> > > > > > >> > example,
> > > > > > >> > > > in
> > > > > > >> > > > > > > > certain
> > > > > > >> > > > > > > > > > use cases at Pinterest we are very likely
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > to have more consumers than 250 when we
> > > configure
> > > > 8
> > > > > > >> stream
> > > > > > >> > > > > > instances
> > > > > > >> > > > > > > > with
> > > > > > >> > > > > > > > > > 32 threads.
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > For the effectiveness of this KIP, we should
> > > > encourage
> > > > > > >> > people
> > > > > > >> > > > to
> > > > > > >> > > > > > > > discuss
> > > > > > >> > > > > > > > > > their opinions on the default setting and
> > > ideally
> > > > > > reach
> > > > > > >> a
> > > > > > >> > > > > > consensus.
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Best,
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Boyang
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > ________________________________
> > > > > > >> > > > > > > > > > From: Stanislav Kozlovski <
> > > stanislav@confluent.io
> > > > >
> > > > > > >> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > >> > > > > > > > > > To: dev@kafka.apache.org
> > > > > > >> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce
> > > > group.max.size
> > > > > > >> to
> > > > > > >> > cap
> > > > > > >> > > > > > member
> > > > > > >> > > > > > > > > > metadata growth
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Hey everybody,
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > It's been a week since this KIP and not much
> > > > > > discussion
> > > > > > >> has
> > > > > > >> > > > been
> > > > > > >> > > > > > > made.
> > > > > > >> > > > > > > > > > I assume that this is a straight forward
> change
> > > > and I
> > > > > > >> will
> > > > > > >> > > > open a
> > > > > > >> > > > > > > > voting
> > > > > > >> > > > > > > > > > thread in the next couple of days if nobody
> has
> > > > > > >> anything to
> > > > > > >> > > > > > suggest.
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Best,
> > > > > > >> > > > > > > > > > Stanislav
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav
> > > > Kozlovski <
> > > > > > >> > > > > > > > > > stanislav@confluent.io>
> > > > > > >> > > > > > > > > > wrote:
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > > Greetings everybody,
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > I have enriched the KIP a bit with a
> bigger
> > > > > > Motivation
> > > > > > >> > > > section
> > > > > > >> > > > > > and
> > > > > > >> > > > > > > > also
> > > > > > >> > > > > > > > > > > renamed it.
> > > > > > >> > > > > > > > > > > KIP:
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > I'm looking forward to discussions around
> it.
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Best,
> > > > > > >> > > > > > > > > > > Stanislav
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav
> > > > Kozlovski
> > > > > > <
> > > > > > >> > > > > > > > > > > stanislav@confluent.io> wrote:
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >> Hey there everybody,
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >> Thanks for the introduction Boyang. I
> > > > appreciate
> > > > > > the
> > > > > > >> > > effort
> > > > > > >> > > > > you
> > > > > > >> > > > > > > are
> > > > > > >> > > > > > > > > > >> putting into improving consumer behavior
> in
> > > > Kafka.
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >> @Matt
> > > > > > >> > > > > > > > > > >> I also believe the default value is
> high. In
> > > my
> > > > > > >> opinion,
> > > > > > >> > > we
> > > > > > >> > > > > > should
> > > > > > >> > > > > > > > aim
> > > > > > >> > > > > > > > > > to
> > > > > > >> > > > > > > > > > >> a default cap around 250. This is
> because in
> > > > the
> > > > > > >> current
> > > > > > >> > > > model
> > > > > > >> > > > > > any
> > > > > > >> > > > > > > > > > consumer
> > > > > > >> > > > > > > > > > >> rebalance is disrupting to every
> consumer.
> > > The
> > > > > > bigger
> > > > > > >> > the
> > > > > > >> > > > > group,
> > > > > > >> > > > > > > the
> > > > > > >> > > > > > > > > > longer
> > > > > > >> > > > > > > > > > >> this period of disruption.
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >> If you have such a large consumer group,
> > > > chances
> > > > > > are
> > > > > > >> > that
> > > > > > >> > > > your
> > > > > > >> > > > > > > > > > >> client-side logic could be structured
> better
> > > > and
> > > > > > that
> > > > > > >> > you
> > > > > > >> > > > are
> > > > > > >> > > > > > not
> > > > > > >> > > > > > > > > using
> > > > > > >> > > > > > > > > > the
> > > > > > >> > > > > > > > > > >> high number of consumers to achieve high
> > > > > > throughput.
> > > > > > >> > > > > > > > > > >> 250 can still be considered of a high
> upper
> > > > bound,
> > > > > > I
> > > > > > >> > > believe
> > > > > > >> > > > > in
> > > > > > >> > > > > > > > > practice
> > > > > > >> > > > > > > > > > >> users should aim to not go over 100
> consumers
> > > > per
> > > > > > >> > consumer
> > > > > > >> > > > > > group.
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >> In regards to the cap being
> > > global/per-broker,
> > > > I
> > > > > > >> think
> > > > > > >> > > that
> > > > > > >> > > > we
> > > > > > >> > > > > > > > should
> > > > > > >> > > > > > > > > > >> consider whether we want it to be global
> or
> > > > > > >> *per-topic*.
> > > > > > >> > > For
> > > > > > >> > > > > the
> > > > > > >> > > > > > > > time
> > > > > > >> > > > > > > > > > >> being, I believe that having it per-topic
> > > with
> > > > a
> > > > > > >> global
> > > > > > >> > > > > default
> > > > > > >> > > > > > > > might
> > > > > > >> > > > > > > > > be
> > > > > > >> > > > > > > > > > >> the best situation. Having it global only
> > > > seems a
> > > > > > bit
> > > > > > >> > > > > > restricting
> > > > > > >> > > > > > > to
> > > > > > >> > > > > > > > > me
> > > > > > >> > > > > > > > > > and
> > > > > > >> > > > > > > > > > >> it never hurts to support more
> fine-grained
> > > > > > >> > > configurability
> > > > > > >> > > > > > (given
> > > > > > >> > > > > > > > > it's
> > > > > > >> > > > > > > > > > the
> > > > > > >> > > > > > > > > > >> same config, not a new one being
> introduced).
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang
> Chen
> > > <
> > > > > > >> > > > > > bchen11@outlook.com
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > > wrote:
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm
> still
> > > > open to
> > > > > > >> any
> > > > > > >> > > > > > suggestion
> > > > > > >> > > > > > > to
> > > > > > >> > > > > > > > > > >>> change the default value. Meanwhile I
> just
> > > > want to
> > > > > > >> > point
> > > > > > >> > > > out
> > > > > > >> > > > > > that
> > > > > > >> > > > > > > > > this
> > > > > > >> > > > > > > > > > >>> value is a just last line of defense,
> not a
> > > > real
> > > > > > >> > scenario
> > > > > > >> > > > we
> > > > > > >> > > > > > > would
> > > > > > >> > > > > > > > > > expect.
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> In the meanwhile, I discussed with
> Stanislav
> > > > and
> > > > > > he
> > > > > > >> > would
> > > > > > >> > > > be
> > > > > > >> > > > > > > > driving
> > > > > > >> > > > > > > > > > the
> > > > > > >> > > > > > > > > > >>> 389 effort from now on. Stanislav
> proposed
> > > the
> > > > > > idea
> > > > > > >> in
> > > > > > >> > > the
> > > > > > >> > > > > > first
> > > > > > >> > > > > > > > > place
> > > > > > >> > > > > > > > > > and
> > > > > > >> > > > > > > > > > >>> had already come up a draft design,
> while I
> > > > will
> > > > > > >> keep
> > > > > > >> > > > > focusing
> > > > > > >> > > > > > on
> > > > > > >> > > > > > > > > > KIP-345
> > > > > > >> > > > > > > > > > >>> effort to ensure solving the edge case
> > > > described
> > > > > > in
> > > > > > >> the
> > > > > > >> > > > JIRA<
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > > > > > >> > > > > > > > > > >.
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> Thank you Stanislav for making this
> happen!
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> Boyang
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> ________________________________
> > > > > > >> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > >> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24
> AM
> > > > > > >> > > > > > > > > > >>> To: dev@kafka.apache.org
> > > > > > >> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> > > > > > >> group.max.size
> > > > > > >> > to
> > > > > > >> > > > cap
> > > > > > >> > > > > > > > member
> > > > > > >> > > > > > > > > > >>> metadata growth
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> Thanks for the KIP.
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> Will this cap be a global cap across the
> > > > entire
> > > > > > >> cluster
> > > > > > >> > > or
> > > > > > >> > > > > per
> > > > > > >> > > > > > > > > broker?
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> Either way the default value seems a bit
> > > high
> > > > to
> > > > > > me,
> > > > > > >> > but
> > > > > > >> > > > that
> > > > > > >> > > > > > > could
> > > > > > >> > > > > > > > > > just
> > > > > > >> > > > > > > > > > >>> be
> > > > > > >> > > > > > > > > > >>> from my own usage patterns. I'd have
> > > probably
> > > > > > >> started
> > > > > > >> > > with
> > > > > > >> > > > > 500
> > > > > > >> > > > > > or
> > > > > > >> > > > > > > > 1k
> > > > > > >> > > > > > > > > > but
> > > > > > >> > > > > > > > > > >>> could be easily convinced that's wrong.
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> Thanks,
> > > > > > >> > > > > > > > > > >>> Matt
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang
> Chen
> > > <
> > > > > > >> > > > > > bchen11@outlook.com
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > > wrote:
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>> > Hey folks,
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>> > I would like to start a discussion on
> > > > KIP-389:
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>> > This is a pretty simple change to cap
> the
> > > > > > consumer
> > > > > > >> > > group
> > > > > > >> > > > > size
> > > > > > >> > > > > > > for
> > > > > > >> > > > > > > > > > >>> broker
> > > > > > >> > > > > > > > > > >>> > stability. Give me your valuable
> feedback
> > > > when
> > > > > > you
> > > > > > >> > got
> > > > > > >> > > > > time.
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>> > Thank you!
> > > > > > >> > > > > > > > > > >>> >
> > > > > > >> > > > > > > > > > >>>
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >> --
> > > > > > >> > > > > > > > > > >> Best,
> > > > > > >> > > > > > > > > > >> Stanislav
> > > > > > >> > > > > > > > > > >>
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > --
> > > > > > >> > > > > > > > > > > Best,
> > > > > > >> > > > > > > > > > > Stanislav
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > --
> > > > > > >> > > > > > > > > > Best,
> > > > > > >> > > > > > > > > > Stanislav
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > --
> > > > > > >> > > > > > > > Best,
> > > > > > >> > > > > > > > Stanislav
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > > --
> > > > > > >> > > > > > Best,
> > > > > > >> > > > > > Stanislav
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > --
> > > > > > >> > > > Best,
> > > > > > >> > > > Stanislav
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > > Best,
> > > > > > >> > > Stanislav
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> > Best,
> > > > > > >> > Stanislav
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > >
> > > >
> > > >
> > > > --
> > > > Gwen Shapira
> > > > Product Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter | blog
> > > >
> > > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Gwen Shapira <gw...@confluent.io>.
Thanks for the data driven approach, Stanislav. I love it :)
And thank you for sharing your formula, Boyang. I totally agree that
rebalance latency will not grow linearly with the consumer group size.

My recommendation, considering what we know today:
1. Add the limit config, and set it to MAX_INT by default (effectively
unlimited, without a magic number like -1)
2. Document our thoughts - the concern about runaway groups,
Pinterest's 500 limit, Confluent's experience with few thousand
consumers in a group, the conclusions from Stanislav's memory research
(Personally, I wouldn't want what is essentially a linked list that we
iterate to grow beyond 1M).

Mostly likely, 99% of the users won't need it and those who do will
have the right information to figure things out (or at least, they'll
know everything that we know).

WDYT?

On Wed, Jan 9, 2019 at 4:25 AM Stanislav Kozlovski
<st...@confluent.io> wrote:
>
> Hey everybody,
>
> I ran a quick benchmark and took some heap dumps to gauge how much memory
> each consumer in a group is using, all done locally.
> The setup was the following: 10 topics with 10 partitions each (100
> partitions total) and one consumer group with 10 members, then expanded to
> 20 members.
> Here are some notes of my findings in a public Google doc:
> https://docs.google.com/document/d/1Z4aY5qg8lU2uNXzdgp_30_oJ9_I9xNelPko6GIQYXYk/edit?usp=sharing
>
>
> On Mon, Jan 7, 2019 at 10:51 PM Boyang Chen <bc...@outlook.com> wrote:
>
> > Hey Stanislav,
> >
> > I think the time taken to rebalance is not linearly correlated with number
> > of consumers with our application. As for our current and future use cases,
> > the main concern for Pinterest is still on the broker memory not CPU,
> > because crashing server by one application could have cascading effect on
> > all jobs.
> > Do you want to drive a more detailed formula on how to compute the memory
> > consumption against number of consumers within the group?
> >
> > In the meantime, I'm pretty buying in the motivation of this KIP, so I
> > think the follow-up work is just refinement to make the new config easy to
> > use. We should be good
> > to vote IMO.
> >
> > Best,
> > Boyang
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Monday, January 7, 2019 4:21 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hey there,
> >
> > Per Gwen's comments, I slightly reworked the motivation section. Let me
> > know if it's any better now
> >
> > I completely agree that it would be best if we were to add a recommended
> > number to a typical consumer group size. There is a problem that timing the
> > CPU usage and rebalance times of consumer groups is tricky. We can update
> > the KIP with memory guidelines (e.g 1 consumer in a group uses X memory,
> > therefore 100 use Y).
> > I fear that the most useful recommendations though would be knowing the CPU
> > impact of large consumer groups and the rebalance times. That is,
> > unfortunately, tricky to test and measure.
> >
> > @Boyang, you had mentioned some numbers used in Pinterest. If available to
> > you, would you be comfortable sharing the number of consumers you are using
> > in a group and maybe the potential time it takes to rebalance it?
> >
> > I'd appreciate any anecdotes regarding consumer group sizes from the
> > community
> >
> > Best,
> > Stanislav
> >
> > On Thu, Jan 3, 2019 at 1:59 AM Boyang Chen <bc...@outlook.com> wrote:
> >
> > > Thanks Gwen for the suggestion! +1 on the guidance of defining
> > > group.max.size. I guess a sample formula would be:
> > > 2 * (# of brokers * average metadata cache size * 80%) / (# of consumer
> > > groups * size of a single member metadata)
> > >
> > > if we assumed non-skewed partition assignment and pretty fair consumer
> > > group consumption. The "2" is the 95 percentile of normal distribution
> > and
> > > 80% is just to buffer some memory capacity which are both open to
> > > discussion. This config should be useful for Kafka platform team to make
> > > sure one extreme large consumer group won't bring down the whole cluster.
> > >
> > > What do you think?
> > >
> > > Best,
> > > Boyang
> > >
> > > ________________________________
> > > From: Gwen Shapira <gw...@confluent.io>
> > > Sent: Thursday, January 3, 2019 2:59 AM
> > > To: dev
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Sorry for joining the fun late, but I think the problem we are solving
> > > evolved a bit in the thread, and I'd like to have better understanding
> > > of the problem before voting :)
> > >
> > > Both KIP and discussion assert that large groups are a problem, but
> > > they are kinda inconsistent regarding why they are a problem and whose
> > > problem they are...
> > > 1. The KIP itself states that the main issue with large groups are
> > > long rebalance times. Per my understanding, this is mostly a problem
> > > for the application that consumes data, but not really a problem for
> > > the brokers themselves, so broker admins probably don't and shouldn't
> > > care about it. Also, my understanding is that this is a problem for
> > > consumer groups, but not necessarily a problem for other group types.
> > > 2. The discussion highlights the issue of "run away" groups that
> > > essentially create tons of members needlessly and use up lots of
> > > broker memory. This is something the broker admins will care about a
> > > lot. And is also a problem for every group that uses coordinators and
> > > not just consumers. And since the memory in question is the metadata
> > > cache, it probably has the largest impact on Kafka Streams
> > > applications, since they have lots of metadata.
> > >
> > > The solution proposed makes the most sense in the context of #2, so
> > > perhaps we should update the motivation section of the KIP to reflect
> > > that.
> > >
> > > The reason I'm probing here is that in my opinion we have to give our
> > > users some guidelines on what a reasonable limit is (otherwise, how
> > > will they know?). Calculating the impact of group-size on rebalance
> > > time in order to make good recommendations will take a significant
> > > effort. On the other hand, informing users regarding the memory
> > > footprint of a consumer in a group and using that to make a reasonable
> > > suggestion isn't hard.
> > >
> > > Gwen
> > >
> > >
> > > On Sun, Dec 30, 2018 at 12:51 PM Stanislav Kozlovski
> > > <st...@confluent.io> wrote:
> > > >
> > > > Thanks Boyang,
> > > >
> > > > If there aren't any more thoughts on the KIP I'll start a vote thread
> > in
> > > > the new year
> > > >
> > > > On Sat, Dec 29, 2018 at 12:58 AM Boyang Chen <bc...@outlook.com>
> > > wrote:
> > > >
> > > > > Yep Stanislav, that's what I'm proposing, and your explanation makes
> > > sense.
> > > > >
> > > > > Boyang
> > > > >
> > > > > ________________________________
> > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > Sent: Friday, December 28, 2018 7:59 PM
> > > > > To: dev@kafka.apache.org
> > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > > > metadata growth
> > > > >
> > > > > Hey there everybody, let's work on wrapping this discussion up.
> > > > >
> > > > > @Boyang, could you clarify what you mean by
> > > > > > One more question is whether you feel we should enforce group size
> > > cap
> > > > > statically or on runtime?
> > > > > Is that related to the option of enabling this config via the dynamic
> > > > > broker config feature?
> > > > >
> > > > > Regarding that - I feel it's useful to have and I also think it might
> > > not
> > > > > introduce additional complexity. Ås long as we handle the config
> > being
> > > > > changed midway through a rebalance (via using the old value) we
> > should
> > > be
> > > > > good to go.
> > > > >
> > > > > On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <
> > > > > stanislav@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Jason,
> > > > > >
> > > > > > Yes, that is what I meant by
> > > > > > > Given those constraints, I think that we can simply mark the
> > group
> > > as
> > > > > > `PreparingRebalance` with a rebalanceTimeout of the server setting
> > `
> > > > > > group.max.session.timeout.ms`. That's a bit long by default (5
> > > minutes)
> > > > > > but I can't seem to come up with a better alternative
> > > > > > So either the timeout or all members calling joinGroup, yes
> > > > > >
> > > > > >
> > > > > > On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com>
> > > wrote:
> > > > > >
> > > > > >> Hey Jason,
> > > > > >>
> > > > > >> I think this is the correct understanding. One more question is
> > > whether
> > > > > >> you feel
> > > > > >> we should enforce group size cap statically or on runtime?
> > > > > >>
> > > > > >> Boyang
> > > > > >> ________________________________
> > > > > >> From: Jason Gustafson <ja...@confluent.io>
> > > > > >> Sent: Tuesday, December 11, 2018 3:24 AM
> > > > > >> To: dev
> > > > > >> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > member
> > > > > >> metadata growth
> > > > > >>
> > > > > >> Hey Stanislav,
> > > > > >>
> > > > > >> Just to clarify, I think what you're suggesting is something like
> > > this
> > > > > in
> > > > > >> order to gracefully shrink the group:
> > > > > >>
> > > > > >> 1. Transition the group to PREPARING_REBALANCE. No members are
> > > kicked
> > > > > out.
> > > > > >> 2. Continue to allow offset commits and heartbeats for all current
> > > > > >> members.
> > > > > >> 3. Allow the first n members that send JoinGroup to stay in the
> > > group,
> > > > > but
> > > > > >> wait for the JoinGroup (or session timeout) from all active
> > members
> > > > > before
> > > > > >> finishing the rebalance.
> > > > > >>
> > > > > >> So basically we try to give the current members an opportunity to
> > > finish
> > > > > >> work, but we prevent some of them from rejoining after the
> > rebalance
> > > > > >> completes. It sounds reasonable if I've understood correctly.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Jason
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com>
> > > wrote:
> > > > > >>
> > > > > >> > Yep, LGTM on my side. Thanks Stanislav!
> > > > > >> > ________________________________
> > > > > >> > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > >> > Sent: Friday, December 7, 2018 8:51 PM
> > > > > >> > To: dev@kafka.apache.org
> > > > > >> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > member
> > > > > >> > metadata growth
> > > > > >> >
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > We discussed this offline with Boyang and figured that it's best
> > > to
> > > > > not
> > > > > >> > wait on the Cooperative Rebalancing proposal. Our thinking is
> > > that we
> > > > > >> can
> > > > > >> > just force a rebalance from the broker, allowing consumers to
> > > commit
> > > > > >> > offsets if their rebalanceListener is configured correctly.
> > > > > >> > When rebalancing improvements are implemented, we assume that
> > they
> > > > > would
> > > > > >> > improve KIP-389's behavior as well as the normal rebalance
> > > scenarios
> > > > > >> >
> > > > > >> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <
> > bchen11@outlook.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hey Stanislav,
> > > > > >> > >
> > > > > >> > > thanks for the question! `Trivial rebalance` means "we don't
> > > start
> > > > > >> > > reassignment right now, but you need to know it's coming soon
> > > > > >> > > and you should start preparation".
> > > > > >> > >
> > > > > >> > > An example KStream use case is that before actually starting
> > to
> > > > > shrink
> > > > > >> > the
> > > > > >> > > consumer group, we need to
> > > > > >> > > 1. partition the consumer group into two subgroups, where one
> > > will
> > > > > be
> > > > > >> > > offline soon and the other will keep serving;
> > > > > >> > > 2. make sure the states associated with near-future offline
> > > > > consumers
> > > > > >> are
> > > > > >> > > successfully replicated on the serving ones.
> > > > > >> > >
> > > > > >> > > As I have mentioned shrinking the consumer group is pretty
> > much
> > > > > >> > equivalent
> > > > > >> > > to group scaling down, so we could think of this
> > > > > >> > > as an add-on use case for cluster scaling. So my understanding
> > > is
> > > > > that
> > > > > >> > the
> > > > > >> > > KIP-389 could be sequenced within our cooperative rebalancing<
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > > > > >> > > >
> > > > > >> > > proposal.
> > > > > >> > >
> > > > > >> > > Let me know if this makes sense.
> > > > > >> > >
> > > > > >> > > Best,
> > > > > >> > > Boyang
> > > > > >> > > ________________________________
> > > > > >> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > >> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > > > > >> > > To: dev@kafka.apache.org
> > > > > >> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > member
> > > > > >> > > metadata growth
> > > > > >> > >
> > > > > >> > > Hey Boyang,
> > > > > >> > >
> > > > > >> > > I think we still need to take care of group shrinkage because
> > > even
> > > > > if
> > > > > >> > users
> > > > > >> > > change the config value we cannot guarantee that all consumer
> > > groups
> > > > > >> > would
> > > > > >> > > have been manually shrunk.
> > > > > >> > >
> > > > > >> > > Regarding 2., I agree that forcefully triggering a rebalance
> > > might
> > > > > be
> > > > > >> the
> > > > > >> > > most intuitive way to handle the situation.
> > > > > >> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar
> > > with
> > > > > the
> > > > > >> > > term.
> > > > > >> > > I was thinking that maybe we could force a rebalance, which
> > > would
> > > > > >> cause
> > > > > >> > > consumers to commit their offsets (given their
> > > rebalanceListener is
> > > > > >> > > configured correctly) and subsequently reject some of the
> > > incoming
> > > > > >> > > `joinGroup` requests. Does that sound like it would work?
> > > > > >> > >
> > > > > >> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <
> > bchen11@outlook.com
> > > >
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > > Hey Stanislav,
> > > > > >> > > >
> > > > > >> > > > I read the latest KIP and saw that we already changed the
> > > default
> > > > > >> value
> > > > > >> > > to
> > > > > >> > > > -1. Do
> > > > > >> > > > we still need to take care of the consumer group shrinking
> > > when
> > > > > >> doing
> > > > > >> > the
> > > > > >> > > > upgrade?
> > > > > >> > > >
> > > > > >> > > > However this is an interesting topic that worth discussing.
> > > > > Although
> > > > > >> > > > rolling
> > > > > >> > > > upgrade is fine, `consumer.group.max.size` could always have
> > > > > >> conflict
> > > > > >> > > with
> > > > > >> > > > the current
> > > > > >> > > > consumer group size which means we need to adhere to one
> > > source of
> > > > > >> > truth.
> > > > > >> > > >
> > > > > >> > > > 1.Choose the current group size, which means we never
> > > interrupt
> > > > > the
> > > > > >> > > > consumer group until
> > > > > >> > > > it transits to PREPARE_REBALANCE. And we keep track of how
> > > many
> > > > > join
> > > > > >> > > group
> > > > > >> > > > requests
> > > > > >> > > > we have seen so far during PREPARE_REBALANCE. After reaching
> > > the
> > > > > >> > consumer
> > > > > >> > > > cap,
> > > > > >> > > > we start to inform over provisioned consumers that you
> > should
> > > send
> > > > > >> > > > LeaveGroupRequest and
> > > > > >> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we
> > > could
> > > > > >> mark
> > > > > >> > > > extra members
> > > > > >> > > > as hot backup and rebalance without them.
> > > > > >> > > >
> > > > > >> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> > > > > >> rebalancing
> > > > > >> > > > (you proposed) could be of help here.
> > > > > >> > > > When a new cap is enforced, leader should be notified. If
> > the
> > > > > >> current
> > > > > >> > > > group size is already over limit, leader
> > > > > >> > > > shall trigger a trivial rebalance to shuffle some topic
> > > partitions
> > > > > >> and
> > > > > >> > > let
> > > > > >> > > > a subset of consumers prepare the ownership
> > > > > >> > > > transition. Until they are ready, we trigger a real
> > rebalance
> > > to
> > > > > >> remove
> > > > > >> > > > over-provisioned consumers. It is pretty much
> > > > > >> > > > equivalent to `how do we scale down the consumer group
> > without
> > > > > >> > > > interrupting the current processing`.
> > > > > >> > > >
> > > > > >> > > > I personally feel inclined to 2 because we could kill two
> > > birds
> > > > > with
> > > > > >> > one
> > > > > >> > > > stone in a generic way. What do you think?
> > > > > >> > > >
> > > > > >> > > > Boyang
> > > > > >> > > > ________________________________
> > > > > >> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > >> > > > Sent: Monday, December 3, 2018 8:35 PM
> > > > > >> > > > To: dev@kafka.apache.org
> > > > > >> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> > cap
> > > > > member
> > > > > >> > > > metadata growth
> > > > > >> > > >
> > > > > >> > > > Hi Jason,
> > > > > >> > > >
> > > > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > > > >> > > > I'm not sure. Looking at the config from the perspective of
> > a
> > > > > >> > > prescriptive
> > > > > >> > > > config, we may get away with not updating it dynamically.
> > > > > >> > > > But in my opinion, it always makes sense to have a config be
> > > > > >> > dynamically
> > > > > >> > > > configurable. As long as we limit it to being a cluster-wide
> > > > > >> config, we
> > > > > >> > > > should be fine.
> > > > > >> > > >
> > > > > >> > > > > 1. I think it would be helpful to clarify the details on
> > > how the
> > > > > >> > > > coordinator will shrink the group. It will need to choose
> > > which
> > > > > >> members
> > > > > >> > > to
> > > > > >> > > > remove. Are we going to give current members an opportunity
> > to
> > > > > >> commit
> > > > > >> > > > offsets before kicking them from the group?
> > > > > >> > > >
> > > > > >> > > > This turns out to be somewhat tricky. I think that we may
> > not
> > > be
> > > > > >> able
> > > > > >> > to
> > > > > >> > > > guarantee that consumers don't process a message twice.
> > > > > >> > > > My initial approach was to do as much as we could to let
> > > consumers
> > > > > >> > commit
> > > > > >> > > > offsets.
> > > > > >> > > >
> > > > > >> > > > I was thinking that we mark a group to be shrunk, we could
> > > keep a
> > > > > >> map
> > > > > >> > of
> > > > > >> > > > consumer_id->boolean indicating whether they have committed
> > > > > >> offsets. I
> > > > > >> > > then
> > > > > >> > > > thought we could delay the rebalance until every consumer
> > > commits
> > > > > >> (or
> > > > > >> > > some
> > > > > >> > > > time passes).
> > > > > >> > > > In the meantime, we would block all incoming fetch calls (by
> > > > > either
> > > > > >> > > > returning empty records or a retriable error) and we would
> > > > > continue
> > > > > >> to
> > > > > >> > > > accept offset commits (even twice for a single consumer)
> > > > > >> > > >
> > > > > >> > > > I see two problems with this approach:
> > > > > >> > > > * We have async offset commits, which implies that we can
> > > receive
> > > > > >> fetch
> > > > > >> > > > requests before the offset commit req has been handled. i.e
> > > > > consmer
> > > > > >> > sends
> > > > > >> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive
> > A,C,B
> > > in
> > > > > the
> > > > > >> > > > broker. Meaning we could have saved the offsets for B but
> > > > > rebalance
> > > > > >> > > before
> > > > > >> > > > the offsetCommit for the offsets processed in C come in.
> > > > > >> > > > * KIP-392 Allow consumers to fetch from closest replica
> > > > > >> > > > <
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > > > > >> > > > >
> > > > > >> > > > would
> > > > > >> > > > make it significantly harder to block poll() calls on
> > > consumers
> > > > > >> whose
> > > > > >> > > > groups are being shrunk. Even if we implemented a solution,
> > > the
> > > > > same
> > > > > >> > race
> > > > > >> > > > condition noted above seems to apply and probably others
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > Given those constraints, I think that we can simply mark the
> > > group
> > > > > >> as
> > > > > >> > > > `PreparingRebalance` with a rebalanceTimeout of the server
> > > > > setting `
> > > > > >> > > > group.max.session.timeout.ms`. That's a bit long by default
> > > (5
> > > > > >> > minutes)
> > > > > >> > > > but
> > > > > >> > > > I can't seem to come up with a better alternative
> > > > > >> > > >
> > > > > >> > > > I'm interested in hearing your thoughts.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Stanislav
> > > > > >> > > >
> > > > > >> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <
> > > > > jason@confluent.io
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hey Stanislav,
> > > > > >> > > > >
> > > > > >> > > > > What do you think about the use case I mentioned in my
> > > previous
> > > > > >> reply
> > > > > >> > > > about
> > > > > >> > > > > > a more resilient self-service Kafka? I believe the
> > benefit
> > > > > >> there is
> > > > > >> > > > > bigger.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > I see this config as analogous to the open file limit.
> > > Probably
> > > > > >> this
> > > > > >> > > > limit
> > > > > >> > > > > was intended to be prescriptive at some point about what
> > was
> > > > > >> deemed a
> > > > > >> > > > > reasonable number of open files for an application. But
> > > mostly
> > > > > >> people
> > > > > >> > > > treat
> > > > > >> > > > > it as an annoyance which they have to work around. If it
> > > happens
> > > > > >> to
> > > > > >> > be
> > > > > >> > > > hit,
> > > > > >> > > > > usually you just increase it because it is not tied to an
> > > actual
> > > > > >> > > resource
> > > > > >> > > > > constraint. However, occasionally hitting the limit does
> > > > > indicate
> > > > > >> an
> > > > > >> > > > > application bug such as a leak, so I wouldn't say it is
> > > useless.
> > > > > >> > > > Similarly,
> > > > > >> > > > > the issue in KAFKA-7610 was a consumer leak and having
> > this
> > > > > limit
> > > > > >> > would
> > > > > >> > > > > have allowed the problem to be detected before it impacted
> > > the
> > > > > >> > cluster.
> > > > > >> > > > To
> > > > > >> > > > > me, that's the main benefit. It's possible that it could
> > be
> > > used
> > > > > >> > > > > prescriptively to prevent poor usage of groups, but like
> > the
> > > > > open
> > > > > >> > file
> > > > > >> > > > > limit, I suspect administrators will just set it large
> > > enough
> > > > > that
> > > > > >> > > users
> > > > > >> > > > > are unlikely to complain.
> > > > > >> > > > >
> > > > > >> > > > > Anyway, just a couple additional questions:
> > > > > >> > > > >
> > > > > >> > > > > 1. I think it would be helpful to clarify the details on
> > > how the
> > > > > >> > > > > coordinator will shrink the group. It will need to choose
> > > which
> > > > > >> > members
> > > > > >> > > > to
> > > > > >> > > > > remove. Are we going to give current members an
> > opportunity
> > > to
> > > > > >> commit
> > > > > >> > > > > offsets before kicking them from the group?
> > > > > >> > > > >
> > > > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > > > >> > > > >
> > > > > >> > > > > Thanks,
> > > > > >> > > > > Jason
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > > > >> > > > > stanislav@confluent.io>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi Jason,
> > > > > >> > > > > >
> > > > > >> > > > > > You raise some very valid points.
> > > > > >> > > > > >
> > > > > >> > > > > > > The benefit of this KIP is probably limited to
> > > preventing
> > > > > >> > "runaway"
> > > > > >> > > > > > consumer groups due to leaks or some other application
> > bug
> > > > > >> > > > > > What do you think about the use case I mentioned in my
> > > > > previous
> > > > > >> > reply
> > > > > >> > > > > about
> > > > > >> > > > > > a more resilient self-service Kafka? I believe the
> > benefit
> > > > > >> there is
> > > > > >> > > > > bigger
> > > > > >> > > > > >
> > > > > >> > > > > > * Default value
> > > > > >> > > > > > You're right, we probably do need to be conservative.
> > Big
> > > > > >> consumer
> > > > > >> > > > groups
> > > > > >> > > > > > are considered an anti-pattern and my goal was to also
> > > hint at
> > > > > >> this
> > > > > >> > > > > through
> > > > > >> > > > > > the config's default. Regardless, it is better to not
> > > have the
> > > > > >> > > > potential
> > > > > >> > > > > to
> > > > > >> > > > > > break applications with an upgrade.
> > > > > >> > > > > > Choosing between the default of something big like 5000
> > > or an
> > > > > >> > opt-in
> > > > > >> > > > > > option, I think we should go with the *disabled default
> > > > > option*
> > > > > >> > > (-1).
> > > > > >> > > > > > The only benefit we would get from a big default of 5000
> > > is
> > > > > >> default
> > > > > >> > > > > > protection against buggy/malicious applications that hit
> > > the
> > > > > >> > > KAFKA-7610
> > > > > >> > > > > > issue.
> > > > > >> > > > > > While this KIP was spawned from that issue, I believe
> > its
> > > > > value
> > > > > >> is
> > > > > >> > > > > enabling
> > > > > >> > > > > > the possibility of protection and helping move towards a
> > > more
> > > > > >> > > > > self-service
> > > > > >> > > > > > Kafka. I also think that a default value of 5000 might
> > be
> > > > > >> > misleading
> > > > > >> > > to
> > > > > >> > > > > > users and lead them to think that big consumer groups (>
> > > 250)
> > > > > >> are a
> > > > > >> > > > good
> > > > > >> > > > > > thing.
> > > > > >> > > > > >
> > > > > >> > > > > > The good news is that KAFKA-7610 should be fully
> > resolved
> > > and
> > > > > >> the
> > > > > >> > > > > rebalance
> > > > > >> > > > > > protocol should, in general, be more solid after the
> > > planned
> > > > > >> > > > improvements
> > > > > >> > > > > > in KIP-345 and KIP-394.
> > > > > >> > > > > >
> > > > > >> > > > > > * Handling bigger groups during upgrade
> > > > > >> > > > > > I now see that we store the state of consumer groups in
> > > the
> > > > > log
> > > > > >> and
> > > > > >> > > > why a
> > > > > >> > > > > > rebalance isn't expected during a rolling upgrade.
> > > > > >> > > > > > Since we're going with the default value of the max.size
> > > being
> > > > > >> > > > disabled,
> > > > > >> > > > > I
> > > > > >> > > > > > believe we can afford to be more strict here.
> > > > > >> > > > > > During state reloading of a new Coordinator with a
> > defined
> > > > > >> > > > max.group.size
> > > > > >> > > > > > config, I believe we should *force* rebalances for
> > groups
> > > that
> > > > > >> > exceed
> > > > > >> > > > the
> > > > > >> > > > > > configured size. Then, only some consumers will be able
> > to
> > > > > join
> > > > > >> and
> > > > > >> > > the
> > > > > >> > > > > max
> > > > > >> > > > > > size invariant will be satisfied.
> > > > > >> > > > > >
> > > > > >> > > > > > I updated the KIP with a migration plan, rejected
> > > alternatives
> > > > > >> and
> > > > > >> > > the
> > > > > >> > > > > new
> > > > > >> > > > > > default value.
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks,
> > > > > >> > > > > > Stanislav
> > > > > >> > > > > >
> > > > > >> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > > > > >> > jason@confluent.io>
> > > > > >> > > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Hey Stanislav,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Clients will then find that coordinator
> > > > > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding
> > the
> > > > > >> group,
> > > > > >> > > since
> > > > > >> > > > > the
> > > > > >> > > > > > > > cache of active consumers is not stored outside the
> > > > > >> > Coordinator's
> > > > > >> > > > > > memory.
> > > > > >> > > > > > > > (please do say if that is incorrect)
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > Groups do not typically rebalance after a coordinator
> > > > > change.
> > > > > >> You
> > > > > >> > > > could
> > > > > >> > > > > > > potentially force a rebalance if the group is too big
> > > and
> > > > > kick
> > > > > >> > out
> > > > > >> > > > the
> > > > > >> > > > > > > slowest members or something. A more graceful solution
> > > is
> > > > > >> > probably
> > > > > >> > > to
> > > > > >> > > > > > just
> > > > > >> > > > > > > accept the current size and prevent it from getting
> > > bigger.
> > > > > We
> > > > > >> > > could
> > > > > >> > > > > log
> > > > > >> > > > > > a
> > > > > >> > > > > > > warning potentially.
> > > > > >> > > > > > >
> > > > > >> > > > > > > My thinking is that we should abstract away from
> > > conserving
> > > > > >> > > resources
> > > > > >> > > > > and
> > > > > >> > > > > > > > focus on giving control to the broker. The issue
> > that
> > > > > >> spawned
> > > > > >> > > this
> > > > > >> > > > > KIP
> > > > > >> > > > > > > was
> > > > > >> > > > > > > > a memory problem but I feel this change is useful
> > in a
> > > > > more
> > > > > >> > > general
> > > > > >> > > > > > way.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > So you probably already know why I'm asking about
> > this.
> > > For
> > > > > >> > > consumer
> > > > > >> > > > > > groups
> > > > > >> > > > > > > anyway, resource usage would typically be proportional
> > > to
> > > > > the
> > > > > >> > > number
> > > > > >> > > > of
> > > > > >> > > > > > > partitions that a group is reading from and not the
> > > number
> > > > > of
> > > > > >> > > > members.
> > > > > >> > > > > > For
> > > > > >> > > > > > > example, consider the memory use in the offsets cache.
> > > The
> > > > > >> > benefit
> > > > > >> > > of
> > > > > >> > > > > > this
> > > > > >> > > > > > > KIP is probably limited to preventing "runaway"
> > consumer
> > > > > >> groups
> > > > > >> > due
> > > > > >> > > > to
> > > > > >> > > > > > > leaks or some other application bug. That still seems
> > > useful
> > > > > >> > > though.
> > > > > >> > > > > > >
> > > > > >> > > > > > > I completely agree with this and I *ask everybody to
> > > chime
> > > > > in
> > > > > >> > with
> > > > > >> > > > > > opinions
> > > > > >> > > > > > > > on a sensible default value*.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > I think we would have to be very conservative. The
> > group
> > > > > >> protocol
> > > > > >> > > is
> > > > > >> > > > > > > generic in some sense, so there may be use cases we
> > > don't
> > > > > >> know of
> > > > > >> > > > where
> > > > > >> > > > > > > larger groups are reasonable. Probably we should make
> > > this
> > > > > an
> > > > > >> > > opt-in
> > > > > >> > > > > > > feature so that we do not risk breaking anyone's
> > > application
> > > > > >> > after
> > > > > >> > > an
> > > > > >> > > > > > > upgrade. Either that, or use a very high default like
> > > 5,000.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > > Jason
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > > >> > > > > > > stanislav@confluent.io>
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Hey Jason and Boyang, those were important comments
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > One suggestion I have is that it would be helpful
> > > to put
> > > > > >> your
> > > > > >> > > > > > reasoning
> > > > > >> > > > > > > > on deciding the current default value. For example,
> > in
> > > > > >> certain
> > > > > >> > > use
> > > > > >> > > > > > cases
> > > > > >> > > > > > > at
> > > > > >> > > > > > > > Pinterest we are very likely to have more consumers
> > > than
> > > > > 250
> > > > > >> > when
> > > > > >> > > > we
> > > > > >> > > > > > > > configure 8 stream instances with 32 threads.
> > > > > >> > > > > > > > > For the effectiveness of this KIP, we should
> > > encourage
> > > > > >> people
> > > > > >> > > to
> > > > > >> > > > > > > discuss
> > > > > >> > > > > > > > their opinions on the default setting and ideally
> > > reach a
> > > > > >> > > > consensus.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > I completely agree with this and I *ask everybody to
> > > chime
> > > > > >> in
> > > > > >> > > with
> > > > > >> > > > > > > opinions
> > > > > >> > > > > > > > on a sensible default value*.
> > > > > >> > > > > > > > My thought process was that in the current model
> > > > > rebalances
> > > > > >> in
> > > > > >> > > > large
> > > > > >> > > > > > > groups
> > > > > >> > > > > > > > are more costly. I imagine most use cases in most
> > > Kafka
> > > > > >> users
> > > > > >> > do
> > > > > >> > > > not
> > > > > >> > > > > > > > require more than 250 consumers.
> > > > > >> > > > > > > > Boyang, you say that you are "likely to have... when
> > > > > we..."
> > > > > >> -
> > > > > >> > do
> > > > > >> > > > you
> > > > > >> > > > > > have
> > > > > >> > > > > > > > systems running with so many consumers in a group or
> > > are
> > > > > you
> > > > > >> > > > planning
> > > > > >> > > > > > > to? I
> > > > > >> > > > > > > > guess what I'm asking is whether this has been
> > tested
> > > in
> > > > > >> > > production
> > > > > >> > > > > > with
> > > > > >> > > > > > > > the current rebalance model (ignoring KIP-345)
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > >  Can you clarify the compatibility impact here?
> > What
> > > > > >> > > > > > > > > will happen to groups that are already larger than
> > > the
> > > > > max
> > > > > >> > > size?
> > > > > >> > > > > > > > This is a very important question.
> > > > > >> > > > > > > > From my current understanding, when a coordinator
> > > broker
> > > > > >> gets
> > > > > >> > > shut
> > > > > >> > > > > > > > down during a cluster rolling upgrade, a replica
> > will
> > > take
> > > > > >> > > > leadership
> > > > > >> > > > > > of
> > > > > >> > > > > > > > the `__offset_commits` partition. Clients will then
> > > find
> > > > > >> that
> > > > > >> > > > > > coordinator
> > > > > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding
> > the
> > > > > >> group,
> > > > > >> > > since
> > > > > >> > > > > the
> > > > > >> > > > > > > > cache of active consumers is not stored outside the
> > > > > >> > Coordinator's
> > > > > >> > > > > > memory.
> > > > > >> > > > > > > > (please do say if that is incorrect)
> > > > > >> > > > > > > > Then, I believe that working as if this is a new
> > > group is
> > > > > a
> > > > > >> > > > > reasonable
> > > > > >> > > > > > > > approach. Namely, fail joinGroups when the max.size
> > is
> > > > > >> > exceeded.
> > > > > >> > > > > > > > What do you guys think about this? (I'll update the
> > > KIP
> > > > > >> after
> > > > > >> > we
> > > > > >> > > > > settle
> > > > > >> > > > > > > on
> > > > > >> > > > > > > > a solution)
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > >  Also, just to be clear, the resource we are
> > trying
> > > to
> > > > > >> > conserve
> > > > > >> > > > > here
> > > > > >> > > > > > is
> > > > > >> > > > > > > > what? Memory?
> > > > > >> > > > > > > > My thinking is that we should abstract away from
> > > > > conserving
> > > > > >> > > > resources
> > > > > >> > > > > > and
> > > > > >> > > > > > > > focus on giving control to the broker. The issue
> > that
> > > > > >> spawned
> > > > > >> > > this
> > > > > >> > > > > KIP
> > > > > >> > > > > > > was
> > > > > >> > > > > > > > a memory problem but I feel this change is useful
> > in a
> > > > > more
> > > > > >> > > general
> > > > > >> > > > > > way.
> > > > > >> > > > > > > It
> > > > > >> > > > > > > > limits the control clients have on the cluster and
> > > helps
> > > > > >> Kafka
> > > > > >> > > > > become a
> > > > > >> > > > > > > > more self-serving system. Admin/Ops teams can better
> > > > > control
> > > > > >> > the
> > > > > >> > > > > impact
> > > > > >> > > > > > > > application developers can have on a Kafka cluster
> > > with
> > > > > this
> > > > > >> > > change
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Best,
> > > > > >> > > > > > > > Stanislav
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > > > >> > > > jason@confluent.io>
> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Hi Stanislav,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks for the KIP. Can you clarify the
> > > compatibility
> > > > > >> impact
> > > > > >> > > > here?
> > > > > >> > > > > > What
> > > > > >> > > > > > > > > will happen to groups that are already larger than
> > > the
> > > > > max
> > > > > >> > > size?
> > > > > >> > > > > > Also,
> > > > > >> > > > > > > > just
> > > > > >> > > > > > > > > to be clear, the resource we are trying to
> > conserve
> > > here
> > > > > >> is
> > > > > >> > > what?
> > > > > >> > > > > > > Memory?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > -Jason
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > > > > >> > > bchen11@outlook.com
> > > > > >> > > > >
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > Thanks Stanislav for the update! One suggestion
> > I
> > > have
> > > > > >> is
> > > > > >> > > that
> > > > > >> > > > it
> > > > > >> > > > > > > would
> > > > > >> > > > > > > > > be
> > > > > >> > > > > > > > > > helpful to put your
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > reasoning on deciding the current default value.
> > > For
> > > > > >> > example,
> > > > > >> > > > in
> > > > > >> > > > > > > > certain
> > > > > >> > > > > > > > > > use cases at Pinterest we are very likely
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > to have more consumers than 250 when we
> > configure
> > > 8
> > > > > >> stream
> > > > > >> > > > > > instances
> > > > > >> > > > > > > > with
> > > > > >> > > > > > > > > > 32 threads.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > For the effectiveness of this KIP, we should
> > > encourage
> > > > > >> > people
> > > > > >> > > > to
> > > > > >> > > > > > > > discuss
> > > > > >> > > > > > > > > > their opinions on the default setting and
> > ideally
> > > > > reach
> > > > > >> a
> > > > > >> > > > > > consensus.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Best,
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Boyang
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > ________________________________
> > > > > >> > > > > > > > > > From: Stanislav Kozlovski <
> > stanislav@confluent.io
> > > >
> > > > > >> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > >> > > > > > > > > > To: dev@kafka.apache.org
> > > > > >> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce
> > > group.max.size
> > > > > >> to
> > > > > >> > cap
> > > > > >> > > > > > member
> > > > > >> > > > > > > > > > metadata growth
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Hey everybody,
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > It's been a week since this KIP and not much
> > > > > discussion
> > > > > >> has
> > > > > >> > > > been
> > > > > >> > > > > > > made.
> > > > > >> > > > > > > > > > I assume that this is a straight forward change
> > > and I
> > > > > >> will
> > > > > >> > > > open a
> > > > > >> > > > > > > > voting
> > > > > >> > > > > > > > > > thread in the next couple of days if nobody has
> > > > > >> anything to
> > > > > >> > > > > > suggest.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Best,
> > > > > >> > > > > > > > > > Stanislav
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav
> > > Kozlovski <
> > > > > >> > > > > > > > > > stanislav@confluent.io>
> > > > > >> > > > > > > > > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > Greetings everybody,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > I have enriched the KIP a bit with a bigger
> > > > > Motivation
> > > > > >> > > > section
> > > > > >> > > > > > and
> > > > > >> > > > > > > > also
> > > > > >> > > > > > > > > > > renamed it.
> > > > > >> > > > > > > > > > > KIP:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > I'm looking forward to discussions around it.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Best,
> > > > > >> > > > > > > > > > > Stanislav
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav
> > > Kozlovski
> > > > > <
> > > > > >> > > > > > > > > > > stanislav@confluent.io> wrote:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >> Hey there everybody,
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> Thanks for the introduction Boyang. I
> > > appreciate
> > > > > the
> > > > > >> > > effort
> > > > > >> > > > > you
> > > > > >> > > > > > > are
> > > > > >> > > > > > > > > > >> putting into improving consumer behavior in
> > > Kafka.
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> @Matt
> > > > > >> > > > > > > > > > >> I also believe the default value is high. In
> > my
> > > > > >> opinion,
> > > > > >> > > we
> > > > > >> > > > > > should
> > > > > >> > > > > > > > aim
> > > > > >> > > > > > > > > > to
> > > > > >> > > > > > > > > > >> a default cap around 250. This is because in
> > > the
> > > > > >> current
> > > > > >> > > > model
> > > > > >> > > > > > any
> > > > > >> > > > > > > > > > consumer
> > > > > >> > > > > > > > > > >> rebalance is disrupting to every consumer.
> > The
> > > > > bigger
> > > > > >> > the
> > > > > >> > > > > group,
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > > longer
> > > > > >> > > > > > > > > > >> this period of disruption.
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> If you have such a large consumer group,
> > > chances
> > > > > are
> > > > > >> > that
> > > > > >> > > > your
> > > > > >> > > > > > > > > > >> client-side logic could be structured better
> > > and
> > > > > that
> > > > > >> > you
> > > > > >> > > > are
> > > > > >> > > > > > not
> > > > > >> > > > > > > > > using
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > >> high number of consumers to achieve high
> > > > > throughput.
> > > > > >> > > > > > > > > > >> 250 can still be considered of a high upper
> > > bound,
> > > > > I
> > > > > >> > > believe
> > > > > >> > > > > in
> > > > > >> > > > > > > > > practice
> > > > > >> > > > > > > > > > >> users should aim to not go over 100 consumers
> > > per
> > > > > >> > consumer
> > > > > >> > > > > > group.
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> In regards to the cap being
> > global/per-broker,
> > > I
> > > > > >> think
> > > > > >> > > that
> > > > > >> > > > we
> > > > > >> > > > > > > > should
> > > > > >> > > > > > > > > > >> consider whether we want it to be global or
> > > > > >> *per-topic*.
> > > > > >> > > For
> > > > > >> > > > > the
> > > > > >> > > > > > > > time
> > > > > >> > > > > > > > > > >> being, I believe that having it per-topic
> > with
> > > a
> > > > > >> global
> > > > > >> > > > > default
> > > > > >> > > > > > > > might
> > > > > >> > > > > > > > > be
> > > > > >> > > > > > > > > > >> the best situation. Having it global only
> > > seems a
> > > > > bit
> > > > > >> > > > > > restricting
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > > me
> > > > > >> > > > > > > > > > and
> > > > > >> > > > > > > > > > >> it never hurts to support more fine-grained
> > > > > >> > > configurability
> > > > > >> > > > > > (given
> > > > > >> > > > > > > > > it's
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > >> same config, not a new one being introduced).
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen
> > <
> > > > > >> > > > > > bchen11@outlook.com
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still
> > > open to
> > > > > >> any
> > > > > >> > > > > > suggestion
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > > > >>> change the default value. Meanwhile I just
> > > want to
> > > > > >> > point
> > > > > >> > > > out
> > > > > >> > > > > > that
> > > > > >> > > > > > > > > this
> > > > > >> > > > > > > > > > >>> value is a just last line of defense, not a
> > > real
> > > > > >> > scenario
> > > > > >> > > > we
> > > > > >> > > > > > > would
> > > > > >> > > > > > > > > > expect.
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav
> > > and
> > > > > he
> > > > > >> > would
> > > > > >> > > > be
> > > > > >> > > > > > > > driving
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed
> > the
> > > > > idea
> > > > > >> in
> > > > > >> > > the
> > > > > >> > > > > > first
> > > > > >> > > > > > > > > place
> > > > > >> > > > > > > > > > and
> > > > > >> > > > > > > > > > >>> had already come up a draft design, while I
> > > will
> > > > > >> keep
> > > > > >> > > > > focusing
> > > > > >> > > > > > on
> > > > > >> > > > > > > > > > KIP-345
> > > > > >> > > > > > > > > > >>> effort to ensure solving the edge case
> > > described
> > > > > in
> > > > > >> the
> > > > > >> > > > JIRA<
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > > > > >> > > > > > > > > > >.
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> Boyang
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> ________________________________
> > > > > >> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > >> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > >> > > > > > > > > > >>> To: dev@kafka.apache.org
> > > > > >> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> > > > > >> group.max.size
> > > > > >> > to
> > > > > >> > > > cap
> > > > > >> > > > > > > > member
> > > > > >> > > > > > > > > > >>> metadata growth
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> Thanks for the KIP.
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> Will this cap be a global cap across the
> > > entire
> > > > > >> cluster
> > > > > >> > > or
> > > > > >> > > > > per
> > > > > >> > > > > > > > > broker?
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> Either way the default value seems a bit
> > high
> > > to
> > > > > me,
> > > > > >> > but
> > > > > >> > > > that
> > > > > >> > > > > > > could
> > > > > >> > > > > > > > > > just
> > > > > >> > > > > > > > > > >>> be
> > > > > >> > > > > > > > > > >>> from my own usage patterns. I'd have
> > probably
> > > > > >> started
> > > > > >> > > with
> > > > > >> > > > > 500
> > > > > >> > > > > > or
> > > > > >> > > > > > > > 1k
> > > > > >> > > > > > > > > > but
> > > > > >> > > > > > > > > > >>> could be easily convinced that's wrong.
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> Thanks,
> > > > > >> > > > > > > > > > >>> Matt
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen
> > <
> > > > > >> > > > > > bchen11@outlook.com
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>> > Hey folks,
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>> > I would like to start a discussion on
> > > KIP-389:
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>> > This is a pretty simple change to cap the
> > > > > consumer
> > > > > >> > > group
> > > > > >> > > > > size
> > > > > >> > > > > > > for
> > > > > >> > > > > > > > > > >>> broker
> > > > > >> > > > > > > > > > >>> > stability. Give me your valuable feedback
> > > when
> > > > > you
> > > > > >> > got
> > > > > >> > > > > time.
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>> > Thank you!
> > > > > >> > > > > > > > > > >>> >
> > > > > >> > > > > > > > > > >>>
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> --
> > > > > >> > > > > > > > > > >> Best,
> > > > > >> > > > > > > > > > >> Stanislav
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > --
> > > > > >> > > > > > > > > > > Best,
> > > > > >> > > > > > > > > > > Stanislav
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > --
> > > > > >> > > > > > > > > > Best,
> > > > > >> > > > > > > > > > Stanislav
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > --
> > > > > >> > > > > > > > Best,
> > > > > >> > > > > > > > Stanislav
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > --
> > > > > >> > > > > > Best,
> > > > > >> > > > > > Stanislav
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > > Best,
> > > > > >> > > > Stanislav
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > Best,
> > > > > >> > > Stanislav
> > > > > >> > >
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> > Best,
> > > > > >> > Stanislav
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> > >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey everybody,

I ran a quick benchmark and took some heap dumps to gauge how much memory
each consumer in a group is using, all done locally.
The setup was the following: 10 topics with 10 partitions each (100
partitions total) and one consumer group with 10 members, then expanded to
20 members.
Here are some notes of my findings in a public Google doc:
https://docs.google.com/document/d/1Z4aY5qg8lU2uNXzdgp_30_oJ9_I9xNelPko6GIQYXYk/edit?usp=sharing


On Mon, Jan 7, 2019 at 10:51 PM Boyang Chen <bc...@outlook.com> wrote:

> Hey Stanislav,
>
> I think the time taken to rebalance is not linearly correlated with number
> of consumers with our application. As for our current and future use cases,
> the main concern for Pinterest is still on the broker memory not CPU,
> because crashing server by one application could have cascading effect on
> all jobs.
> Do you want to drive a more detailed formula on how to compute the memory
> consumption against number of consumers within the group?
>
> In the meantime, I'm pretty buying in the motivation of this KIP, so I
> think the follow-up work is just refinement to make the new config easy to
> use. We should be good
> to vote IMO.
>
> Best,
> Boyang
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Monday, January 7, 2019 4:21 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hey there,
>
> Per Gwen's comments, I slightly reworked the motivation section. Let me
> know if it's any better now
>
> I completely agree that it would be best if we were to add a recommended
> number to a typical consumer group size. There is a problem that timing the
> CPU usage and rebalance times of consumer groups is tricky. We can update
> the KIP with memory guidelines (e.g 1 consumer in a group uses X memory,
> therefore 100 use Y).
> I fear that the most useful recommendations though would be knowing the CPU
> impact of large consumer groups and the rebalance times. That is,
> unfortunately, tricky to test and measure.
>
> @Boyang, you had mentioned some numbers used in Pinterest. If available to
> you, would you be comfortable sharing the number of consumers you are using
> in a group and maybe the potential time it takes to rebalance it?
>
> I'd appreciate any anecdotes regarding consumer group sizes from the
> community
>
> Best,
> Stanislav
>
> On Thu, Jan 3, 2019 at 1:59 AM Boyang Chen <bc...@outlook.com> wrote:
>
> > Thanks Gwen for the suggestion! +1 on the guidance of defining
> > group.max.size. I guess a sample formula would be:
> > 2 * (# of brokers * average metadata cache size * 80%) / (# of consumer
> > groups * size of a single member metadata)
> >
> > if we assumed non-skewed partition assignment and pretty fair consumer
> > group consumption. The "2" is the 95 percentile of normal distribution
> and
> > 80% is just to buffer some memory capacity which are both open to
> > discussion. This config should be useful for Kafka platform team to make
> > sure one extreme large consumer group won't bring down the whole cluster.
> >
> > What do you think?
> >
> > Best,
> > Boyang
> >
> > ________________________________
> > From: Gwen Shapira <gw...@confluent.io>
> > Sent: Thursday, January 3, 2019 2:59 AM
> > To: dev
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Sorry for joining the fun late, but I think the problem we are solving
> > evolved a bit in the thread, and I'd like to have better understanding
> > of the problem before voting :)
> >
> > Both KIP and discussion assert that large groups are a problem, but
> > they are kinda inconsistent regarding why they are a problem and whose
> > problem they are...
> > 1. The KIP itself states that the main issue with large groups are
> > long rebalance times. Per my understanding, this is mostly a problem
> > for the application that consumes data, but not really a problem for
> > the brokers themselves, so broker admins probably don't and shouldn't
> > care about it. Also, my understanding is that this is a problem for
> > consumer groups, but not necessarily a problem for other group types.
> > 2. The discussion highlights the issue of "run away" groups that
> > essentially create tons of members needlessly and use up lots of
> > broker memory. This is something the broker admins will care about a
> > lot. And is also a problem for every group that uses coordinators and
> > not just consumers. And since the memory in question is the metadata
> > cache, it probably has the largest impact on Kafka Streams
> > applications, since they have lots of metadata.
> >
> > The solution proposed makes the most sense in the context of #2, so
> > perhaps we should update the motivation section of the KIP to reflect
> > that.
> >
> > The reason I'm probing here is that in my opinion we have to give our
> > users some guidelines on what a reasonable limit is (otherwise, how
> > will they know?). Calculating the impact of group-size on rebalance
> > time in order to make good recommendations will take a significant
> > effort. On the other hand, informing users regarding the memory
> > footprint of a consumer in a group and using that to make a reasonable
> > suggestion isn't hard.
> >
> > Gwen
> >
> >
> > On Sun, Dec 30, 2018 at 12:51 PM Stanislav Kozlovski
> > <st...@confluent.io> wrote:
> > >
> > > Thanks Boyang,
> > >
> > > If there aren't any more thoughts on the KIP I'll start a vote thread
> in
> > > the new year
> > >
> > > On Sat, Dec 29, 2018 at 12:58 AM Boyang Chen <bc...@outlook.com>
> > wrote:
> > >
> > > > Yep Stanislav, that's what I'm proposing, and your explanation makes
> > sense.
> > > >
> > > > Boyang
> > > >
> > > > ________________________________
> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > Sent: Friday, December 28, 2018 7:59 PM
> > > > To: dev@kafka.apache.org
> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > > metadata growth
> > > >
> > > > Hey there everybody, let's work on wrapping this discussion up.
> > > >
> > > > @Boyang, could you clarify what you mean by
> > > > > One more question is whether you feel we should enforce group size
> > cap
> > > > statically or on runtime?
> > > > Is that related to the option of enabling this config via the dynamic
> > > > broker config feature?
> > > >
> > > > Regarding that - I feel it's useful to have and I also think it might
> > not
> > > > introduce additional complexity. Ås long as we handle the config
> being
> > > > changed midway through a rebalance (via using the old value) we
> should
> > be
> > > > good to go.
> > > >
> > > > On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <
> > > > stanislav@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Jason,
> > > > >
> > > > > Yes, that is what I meant by
> > > > > > Given those constraints, I think that we can simply mark the
> group
> > as
> > > > > `PreparingRebalance` with a rebalanceTimeout of the server setting
> `
> > > > > group.max.session.timeout.ms`. That's a bit long by default (5
> > minutes)
> > > > > but I can't seem to come up with a better alternative
> > > > > So either the timeout or all members calling joinGroup, yes
> > > > >
> > > > >
> > > > > On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com>
> > wrote:
> > > > >
> > > > >> Hey Jason,
> > > > >>
> > > > >> I think this is the correct understanding. One more question is
> > whether
> > > > >> you feel
> > > > >> we should enforce group size cap statically or on runtime?
> > > > >>
> > > > >> Boyang
> > > > >> ________________________________
> > > > >> From: Jason Gustafson <ja...@confluent.io>
> > > > >> Sent: Tuesday, December 11, 2018 3:24 AM
> > > > >> To: dev
> > > > >> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> > > > >> metadata growth
> > > > >>
> > > > >> Hey Stanislav,
> > > > >>
> > > > >> Just to clarify, I think what you're suggesting is something like
> > this
> > > > in
> > > > >> order to gracefully shrink the group:
> > > > >>
> > > > >> 1. Transition the group to PREPARING_REBALANCE. No members are
> > kicked
> > > > out.
> > > > >> 2. Continue to allow offset commits and heartbeats for all current
> > > > >> members.
> > > > >> 3. Allow the first n members that send JoinGroup to stay in the
> > group,
> > > > but
> > > > >> wait for the JoinGroup (or session timeout) from all active
> members
> > > > before
> > > > >> finishing the rebalance.
> > > > >>
> > > > >> So basically we try to give the current members an opportunity to
> > finish
> > > > >> work, but we prevent some of them from rejoining after the
> rebalance
> > > > >> completes. It sounds reasonable if I've understood correctly.
> > > > >>
> > > > >> Thanks,
> > > > >> Jason
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com>
> > wrote:
> > > > >>
> > > > >> > Yep, LGTM on my side. Thanks Stanislav!
> > > > >> > ________________________________
> > > > >> > From: Stanislav Kozlovski <st...@confluent.io>
> > > > >> > Sent: Friday, December 7, 2018 8:51 PM
> > > > >> > To: dev@kafka.apache.org
> > > > >> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > member
> > > > >> > metadata growth
> > > > >> >
> > > > >> > Hi,
> > > > >> >
> > > > >> > We discussed this offline with Boyang and figured that it's best
> > to
> > > > not
> > > > >> > wait on the Cooperative Rebalancing proposal. Our thinking is
> > that we
> > > > >> can
> > > > >> > just force a rebalance from the broker, allowing consumers to
> > commit
> > > > >> > offsets if their rebalanceListener is configured correctly.
> > > > >> > When rebalancing improvements are implemented, we assume that
> they
> > > > would
> > > > >> > improve KIP-389's behavior as well as the normal rebalance
> > scenarios
> > > > >> >
> > > > >> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <
> bchen11@outlook.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hey Stanislav,
> > > > >> > >
> > > > >> > > thanks for the question! `Trivial rebalance` means "we don't
> > start
> > > > >> > > reassignment right now, but you need to know it's coming soon
> > > > >> > > and you should start preparation".
> > > > >> > >
> > > > >> > > An example KStream use case is that before actually starting
> to
> > > > shrink
> > > > >> > the
> > > > >> > > consumer group, we need to
> > > > >> > > 1. partition the consumer group into two subgroups, where one
> > will
> > > > be
> > > > >> > > offline soon and the other will keep serving;
> > > > >> > > 2. make sure the states associated with near-future offline
> > > > consumers
> > > > >> are
> > > > >> > > successfully replicated on the serving ones.
> > > > >> > >
> > > > >> > > As I have mentioned shrinking the consumer group is pretty
> much
> > > > >> > equivalent
> > > > >> > > to group scaling down, so we could think of this
> > > > >> > > as an add-on use case for cluster scaling. So my understanding
> > is
> > > > that
> > > > >> > the
> > > > >> > > KIP-389 could be sequenced within our cooperative rebalancing<
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > > > >> > > >
> > > > >> > > proposal.
> > > > >> > >
> > > > >> > > Let me know if this makes sense.
> > > > >> > >
> > > > >> > > Best,
> > > > >> > > Boyang
> > > > >> > > ________________________________
> > > > >> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > >> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > > > >> > > To: dev@kafka.apache.org
> > > > >> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > member
> > > > >> > > metadata growth
> > > > >> > >
> > > > >> > > Hey Boyang,
> > > > >> > >
> > > > >> > > I think we still need to take care of group shrinkage because
> > even
> > > > if
> > > > >> > users
> > > > >> > > change the config value we cannot guarantee that all consumer
> > groups
> > > > >> > would
> > > > >> > > have been manually shrunk.
> > > > >> > >
> > > > >> > > Regarding 2., I agree that forcefully triggering a rebalance
> > might
> > > > be
> > > > >> the
> > > > >> > > most intuitive way to handle the situation.
> > > > >> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar
> > with
> > > > the
> > > > >> > > term.
> > > > >> > > I was thinking that maybe we could force a rebalance, which
> > would
> > > > >> cause
> > > > >> > > consumers to commit their offsets (given their
> > rebalanceListener is
> > > > >> > > configured correctly) and subsequently reject some of the
> > incoming
> > > > >> > > `joinGroup` requests. Does that sound like it would work?
> > > > >> > >
> > > > >> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <
> bchen11@outlook.com
> > >
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Hey Stanislav,
> > > > >> > > >
> > > > >> > > > I read the latest KIP and saw that we already changed the
> > default
> > > > >> value
> > > > >> > > to
> > > > >> > > > -1. Do
> > > > >> > > > we still need to take care of the consumer group shrinking
> > when
> > > > >> doing
> > > > >> > the
> > > > >> > > > upgrade?
> > > > >> > > >
> > > > >> > > > However this is an interesting topic that worth discussing.
> > > > Although
> > > > >> > > > rolling
> > > > >> > > > upgrade is fine, `consumer.group.max.size` could always have
> > > > >> conflict
> > > > >> > > with
> > > > >> > > > the current
> > > > >> > > > consumer group size which means we need to adhere to one
> > source of
> > > > >> > truth.
> > > > >> > > >
> > > > >> > > > 1.Choose the current group size, which means we never
> > interrupt
> > > > the
> > > > >> > > > consumer group until
> > > > >> > > > it transits to PREPARE_REBALANCE. And we keep track of how
> > many
> > > > join
> > > > >> > > group
> > > > >> > > > requests
> > > > >> > > > we have seen so far during PREPARE_REBALANCE. After reaching
> > the
> > > > >> > consumer
> > > > >> > > > cap,
> > > > >> > > > we start to inform over provisioned consumers that you
> should
> > send
> > > > >> > > > LeaveGroupRequest and
> > > > >> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we
> > could
> > > > >> mark
> > > > >> > > > extra members
> > > > >> > > > as hot backup and rebalance without them.
> > > > >> > > >
> > > > >> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> > > > >> rebalancing
> > > > >> > > > (you proposed) could be of help here.
> > > > >> > > > When a new cap is enforced, leader should be notified. If
> the
> > > > >> current
> > > > >> > > > group size is already over limit, leader
> > > > >> > > > shall trigger a trivial rebalance to shuffle some topic
> > partitions
> > > > >> and
> > > > >> > > let
> > > > >> > > > a subset of consumers prepare the ownership
> > > > >> > > > transition. Until they are ready, we trigger a real
> rebalance
> > to
> > > > >> remove
> > > > >> > > > over-provisioned consumers. It is pretty much
> > > > >> > > > equivalent to `how do we scale down the consumer group
> without
> > > > >> > > > interrupting the current processing`.
> > > > >> > > >
> > > > >> > > > I personally feel inclined to 2 because we could kill two
> > birds
> > > > with
> > > > >> > one
> > > > >> > > > stone in a generic way. What do you think?
> > > > >> > > >
> > > > >> > > > Boyang
> > > > >> > > > ________________________________
> > > > >> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > >> > > > Sent: Monday, December 3, 2018 8:35 PM
> > > > >> > > > To: dev@kafka.apache.org
> > > > >> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> cap
> > > > member
> > > > >> > > > metadata growth
> > > > >> > > >
> > > > >> > > > Hi Jason,
> > > > >> > > >
> > > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > > >> > > > I'm not sure. Looking at the config from the perspective of
> a
> > > > >> > > prescriptive
> > > > >> > > > config, we may get away with not updating it dynamically.
> > > > >> > > > But in my opinion, it always makes sense to have a config be
> > > > >> > dynamically
> > > > >> > > > configurable. As long as we limit it to being a cluster-wide
> > > > >> config, we
> > > > >> > > > should be fine.
> > > > >> > > >
> > > > >> > > > > 1. I think it would be helpful to clarify the details on
> > how the
> > > > >> > > > coordinator will shrink the group. It will need to choose
> > which
> > > > >> members
> > > > >> > > to
> > > > >> > > > remove. Are we going to give current members an opportunity
> to
> > > > >> commit
> > > > >> > > > offsets before kicking them from the group?
> > > > >> > > >
> > > > >> > > > This turns out to be somewhat tricky. I think that we may
> not
> > be
> > > > >> able
> > > > >> > to
> > > > >> > > > guarantee that consumers don't process a message twice.
> > > > >> > > > My initial approach was to do as much as we could to let
> > consumers
> > > > >> > commit
> > > > >> > > > offsets.
> > > > >> > > >
> > > > >> > > > I was thinking that we mark a group to be shrunk, we could
> > keep a
> > > > >> map
> > > > >> > of
> > > > >> > > > consumer_id->boolean indicating whether they have committed
> > > > >> offsets. I
> > > > >> > > then
> > > > >> > > > thought we could delay the rebalance until every consumer
> > commits
> > > > >> (or
> > > > >> > > some
> > > > >> > > > time passes).
> > > > >> > > > In the meantime, we would block all incoming fetch calls (by
> > > > either
> > > > >> > > > returning empty records or a retriable error) and we would
> > > > continue
> > > > >> to
> > > > >> > > > accept offset commits (even twice for a single consumer)
> > > > >> > > >
> > > > >> > > > I see two problems with this approach:
> > > > >> > > > * We have async offset commits, which implies that we can
> > receive
> > > > >> fetch
> > > > >> > > > requests before the offset commit req has been handled. i.e
> > > > consmer
> > > > >> > sends
> > > > >> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive
> A,C,B
> > in
> > > > the
> > > > >> > > > broker. Meaning we could have saved the offsets for B but
> > > > rebalance
> > > > >> > > before
> > > > >> > > > the offsetCommit for the offsets processed in C come in.
> > > > >> > > > * KIP-392 Allow consumers to fetch from closest replica
> > > > >> > > > <
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > > > >> > > > >
> > > > >> > > > would
> > > > >> > > > make it significantly harder to block poll() calls on
> > consumers
> > > > >> whose
> > > > >> > > > groups are being shrunk. Even if we implemented a solution,
> > the
> > > > same
> > > > >> > race
> > > > >> > > > condition noted above seems to apply and probably others
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Given those constraints, I think that we can simply mark the
> > group
> > > > >> as
> > > > >> > > > `PreparingRebalance` with a rebalanceTimeout of the server
> > > > setting `
> > > > >> > > > group.max.session.timeout.ms`. That's a bit long by default
> > (5
> > > > >> > minutes)
> > > > >> > > > but
> > > > >> > > > I can't seem to come up with a better alternative
> > > > >> > > >
> > > > >> > > > I'm interested in hearing your thoughts.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Stanislav
> > > > >> > > >
> > > > >> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <
> > > > jason@confluent.io
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Hey Stanislav,
> > > > >> > > > >
> > > > >> > > > > What do you think about the use case I mentioned in my
> > previous
> > > > >> reply
> > > > >> > > > about
> > > > >> > > > > > a more resilient self-service Kafka? I believe the
> benefit
> > > > >> there is
> > > > >> > > > > bigger.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > I see this config as analogous to the open file limit.
> > Probably
> > > > >> this
> > > > >> > > > limit
> > > > >> > > > > was intended to be prescriptive at some point about what
> was
> > > > >> deemed a
> > > > >> > > > > reasonable number of open files for an application. But
> > mostly
> > > > >> people
> > > > >> > > > treat
> > > > >> > > > > it as an annoyance which they have to work around. If it
> > happens
> > > > >> to
> > > > >> > be
> > > > >> > > > hit,
> > > > >> > > > > usually you just increase it because it is not tied to an
> > actual
> > > > >> > > resource
> > > > >> > > > > constraint. However, occasionally hitting the limit does
> > > > indicate
> > > > >> an
> > > > >> > > > > application bug such as a leak, so I wouldn't say it is
> > useless.
> > > > >> > > > Similarly,
> > > > >> > > > > the issue in KAFKA-7610 was a consumer leak and having
> this
> > > > limit
> > > > >> > would
> > > > >> > > > > have allowed the problem to be detected before it impacted
> > the
> > > > >> > cluster.
> > > > >> > > > To
> > > > >> > > > > me, that's the main benefit. It's possible that it could
> be
> > used
> > > > >> > > > > prescriptively to prevent poor usage of groups, but like
> the
> > > > open
> > > > >> > file
> > > > >> > > > > limit, I suspect administrators will just set it large
> > enough
> > > > that
> > > > >> > > users
> > > > >> > > > > are unlikely to complain.
> > > > >> > > > >
> > > > >> > > > > Anyway, just a couple additional questions:
> > > > >> > > > >
> > > > >> > > > > 1. I think it would be helpful to clarify the details on
> > how the
> > > > >> > > > > coordinator will shrink the group. It will need to choose
> > which
> > > > >> > members
> > > > >> > > > to
> > > > >> > > > > remove. Are we going to give current members an
> opportunity
> > to
> > > > >> commit
> > > > >> > > > > offsets before kicking them from the group?
> > > > >> > > > >
> > > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > Jason
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > > >> > > > > stanislav@confluent.io>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi Jason,
> > > > >> > > > > >
> > > > >> > > > > > You raise some very valid points.
> > > > >> > > > > >
> > > > >> > > > > > > The benefit of this KIP is probably limited to
> > preventing
> > > > >> > "runaway"
> > > > >> > > > > > consumer groups due to leaks or some other application
> bug
> > > > >> > > > > > What do you think about the use case I mentioned in my
> > > > previous
> > > > >> > reply
> > > > >> > > > > about
> > > > >> > > > > > a more resilient self-service Kafka? I believe the
> benefit
> > > > >> there is
> > > > >> > > > > bigger
> > > > >> > > > > >
> > > > >> > > > > > * Default value
> > > > >> > > > > > You're right, we probably do need to be conservative.
> Big
> > > > >> consumer
> > > > >> > > > groups
> > > > >> > > > > > are considered an anti-pattern and my goal was to also
> > hint at
> > > > >> this
> > > > >> > > > > through
> > > > >> > > > > > the config's default. Regardless, it is better to not
> > have the
> > > > >> > > > potential
> > > > >> > > > > to
> > > > >> > > > > > break applications with an upgrade.
> > > > >> > > > > > Choosing between the default of something big like 5000
> > or an
> > > > >> > opt-in
> > > > >> > > > > > option, I think we should go with the *disabled default
> > > > option*
> > > > >> > > (-1).
> > > > >> > > > > > The only benefit we would get from a big default of 5000
> > is
> > > > >> default
> > > > >> > > > > > protection against buggy/malicious applications that hit
> > the
> > > > >> > > KAFKA-7610
> > > > >> > > > > > issue.
> > > > >> > > > > > While this KIP was spawned from that issue, I believe
> its
> > > > value
> > > > >> is
> > > > >> > > > > enabling
> > > > >> > > > > > the possibility of protection and helping move towards a
> > more
> > > > >> > > > > self-service
> > > > >> > > > > > Kafka. I also think that a default value of 5000 might
> be
> > > > >> > misleading
> > > > >> > > to
> > > > >> > > > > > users and lead them to think that big consumer groups (>
> > 250)
> > > > >> are a
> > > > >> > > > good
> > > > >> > > > > > thing.
> > > > >> > > > > >
> > > > >> > > > > > The good news is that KAFKA-7610 should be fully
> resolved
> > and
> > > > >> the
> > > > >> > > > > rebalance
> > > > >> > > > > > protocol should, in general, be more solid after the
> > planned
> > > > >> > > > improvements
> > > > >> > > > > > in KIP-345 and KIP-394.
> > > > >> > > > > >
> > > > >> > > > > > * Handling bigger groups during upgrade
> > > > >> > > > > > I now see that we store the state of consumer groups in
> > the
> > > > log
> > > > >> and
> > > > >> > > > why a
> > > > >> > > > > > rebalance isn't expected during a rolling upgrade.
> > > > >> > > > > > Since we're going with the default value of the max.size
> > being
> > > > >> > > > disabled,
> > > > >> > > > > I
> > > > >> > > > > > believe we can afford to be more strict here.
> > > > >> > > > > > During state reloading of a new Coordinator with a
> defined
> > > > >> > > > max.group.size
> > > > >> > > > > > config, I believe we should *force* rebalances for
> groups
> > that
> > > > >> > exceed
> > > > >> > > > the
> > > > >> > > > > > configured size. Then, only some consumers will be able
> to
> > > > join
> > > > >> and
> > > > >> > > the
> > > > >> > > > > max
> > > > >> > > > > > size invariant will be satisfied.
> > > > >> > > > > >
> > > > >> > > > > > I updated the KIP with a migration plan, rejected
> > alternatives
> > > > >> and
> > > > >> > > the
> > > > >> > > > > new
> > > > >> > > > > > default value.
> > > > >> > > > > >
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Stanislav
> > > > >> > > > > >
> > > > >> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > > > >> > jason@confluent.io>
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hey Stanislav,
> > > > >> > > > > > >
> > > > >> > > > > > > Clients will then find that coordinator
> > > > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding
> the
> > > > >> group,
> > > > >> > > since
> > > > >> > > > > the
> > > > >> > > > > > > > cache of active consumers is not stored outside the
> > > > >> > Coordinator's
> > > > >> > > > > > memory.
> > > > >> > > > > > > > (please do say if that is incorrect)
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > Groups do not typically rebalance after a coordinator
> > > > change.
> > > > >> You
> > > > >> > > > could
> > > > >> > > > > > > potentially force a rebalance if the group is too big
> > and
> > > > kick
> > > > >> > out
> > > > >> > > > the
> > > > >> > > > > > > slowest members or something. A more graceful solution
> > is
> > > > >> > probably
> > > > >> > > to
> > > > >> > > > > > just
> > > > >> > > > > > > accept the current size and prevent it from getting
> > bigger.
> > > > We
> > > > >> > > could
> > > > >> > > > > log
> > > > >> > > > > > a
> > > > >> > > > > > > warning potentially.
> > > > >> > > > > > >
> > > > >> > > > > > > My thinking is that we should abstract away from
> > conserving
> > > > >> > > resources
> > > > >> > > > > and
> > > > >> > > > > > > > focus on giving control to the broker. The issue
> that
> > > > >> spawned
> > > > >> > > this
> > > > >> > > > > KIP
> > > > >> > > > > > > was
> > > > >> > > > > > > > a memory problem but I feel this change is useful
> in a
> > > > more
> > > > >> > > general
> > > > >> > > > > > way.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > So you probably already know why I'm asking about
> this.
> > For
> > > > >> > > consumer
> > > > >> > > > > > groups
> > > > >> > > > > > > anyway, resource usage would typically be proportional
> > to
> > > > the
> > > > >> > > number
> > > > >> > > > of
> > > > >> > > > > > > partitions that a group is reading from and not the
> > number
> > > > of
> > > > >> > > > members.
> > > > >> > > > > > For
> > > > >> > > > > > > example, consider the memory use in the offsets cache.
> > The
> > > > >> > benefit
> > > > >> > > of
> > > > >> > > > > > this
> > > > >> > > > > > > KIP is probably limited to preventing "runaway"
> consumer
> > > > >> groups
> > > > >> > due
> > > > >> > > > to
> > > > >> > > > > > > leaks or some other application bug. That still seems
> > useful
> > > > >> > > though.
> > > > >> > > > > > >
> > > > >> > > > > > > I completely agree with this and I *ask everybody to
> > chime
> > > > in
> > > > >> > with
> > > > >> > > > > > opinions
> > > > >> > > > > > > > on a sensible default value*.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > I think we would have to be very conservative. The
> group
> > > > >> protocol
> > > > >> > > is
> > > > >> > > > > > > generic in some sense, so there may be use cases we
> > don't
> > > > >> know of
> > > > >> > > > where
> > > > >> > > > > > > larger groups are reasonable. Probably we should make
> > this
> > > > an
> > > > >> > > opt-in
> > > > >> > > > > > > feature so that we do not risk breaking anyone's
> > application
> > > > >> > after
> > > > >> > > an
> > > > >> > > > > > > upgrade. Either that, or use a very high default like
> > 5,000.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > > Jason
> > > > >> > > > > > >
> > > > >> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > >> > > > > > > stanislav@confluent.io>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hey Jason and Boyang, those were important comments
> > > > >> > > > > > > >
> > > > >> > > > > > > > > One suggestion I have is that it would be helpful
> > to put
> > > > >> your
> > > > >> > > > > > reasoning
> > > > >> > > > > > > > on deciding the current default value. For example,
> in
> > > > >> certain
> > > > >> > > use
> > > > >> > > > > > cases
> > > > >> > > > > > > at
> > > > >> > > > > > > > Pinterest we are very likely to have more consumers
> > than
> > > > 250
> > > > >> > when
> > > > >> > > > we
> > > > >> > > > > > > > configure 8 stream instances with 32 threads.
> > > > >> > > > > > > > > For the effectiveness of this KIP, we should
> > encourage
> > > > >> people
> > > > >> > > to
> > > > >> > > > > > > discuss
> > > > >> > > > > > > > their opinions on the default setting and ideally
> > reach a
> > > > >> > > > consensus.
> > > > >> > > > > > > >
> > > > >> > > > > > > > I completely agree with this and I *ask everybody to
> > chime
> > > > >> in
> > > > >> > > with
> > > > >> > > > > > > opinions
> > > > >> > > > > > > > on a sensible default value*.
> > > > >> > > > > > > > My thought process was that in the current model
> > > > rebalances
> > > > >> in
> > > > >> > > > large
> > > > >> > > > > > > groups
> > > > >> > > > > > > > are more costly. I imagine most use cases in most
> > Kafka
> > > > >> users
> > > > >> > do
> > > > >> > > > not
> > > > >> > > > > > > > require more than 250 consumers.
> > > > >> > > > > > > > Boyang, you say that you are "likely to have... when
> > > > we..."
> > > > >> -
> > > > >> > do
> > > > >> > > > you
> > > > >> > > > > > have
> > > > >> > > > > > > > systems running with so many consumers in a group or
> > are
> > > > you
> > > > >> > > > planning
> > > > >> > > > > > > to? I
> > > > >> > > > > > > > guess what I'm asking is whether this has been
> tested
> > in
> > > > >> > > production
> > > > >> > > > > > with
> > > > >> > > > > > > > the current rebalance model (ignoring KIP-345)
> > > > >> > > > > > > >
> > > > >> > > > > > > > >  Can you clarify the compatibility impact here?
> What
> > > > >> > > > > > > > > will happen to groups that are already larger than
> > the
> > > > max
> > > > >> > > size?
> > > > >> > > > > > > > This is a very important question.
> > > > >> > > > > > > > From my current understanding, when a coordinator
> > broker
> > > > >> gets
> > > > >> > > shut
> > > > >> > > > > > > > down during a cluster rolling upgrade, a replica
> will
> > take
> > > > >> > > > leadership
> > > > >> > > > > > of
> > > > >> > > > > > > > the `__offset_commits` partition. Clients will then
> > find
> > > > >> that
> > > > >> > > > > > coordinator
> > > > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding
> the
> > > > >> group,
> > > > >> > > since
> > > > >> > > > > the
> > > > >> > > > > > > > cache of active consumers is not stored outside the
> > > > >> > Coordinator's
> > > > >> > > > > > memory.
> > > > >> > > > > > > > (please do say if that is incorrect)
> > > > >> > > > > > > > Then, I believe that working as if this is a new
> > group is
> > > > a
> > > > >> > > > > reasonable
> > > > >> > > > > > > > approach. Namely, fail joinGroups when the max.size
> is
> > > > >> > exceeded.
> > > > >> > > > > > > > What do you guys think about this? (I'll update the
> > KIP
> > > > >> after
> > > > >> > we
> > > > >> > > > > settle
> > > > >> > > > > > > on
> > > > >> > > > > > > > a solution)
> > > > >> > > > > > > >
> > > > >> > > > > > > > >  Also, just to be clear, the resource we are
> trying
> > to
> > > > >> > conserve
> > > > >> > > > > here
> > > > >> > > > > > is
> > > > >> > > > > > > > what? Memory?
> > > > >> > > > > > > > My thinking is that we should abstract away from
> > > > conserving
> > > > >> > > > resources
> > > > >> > > > > > and
> > > > >> > > > > > > > focus on giving control to the broker. The issue
> that
> > > > >> spawned
> > > > >> > > this
> > > > >> > > > > KIP
> > > > >> > > > > > > was
> > > > >> > > > > > > > a memory problem but I feel this change is useful
> in a
> > > > more
> > > > >> > > general
> > > > >> > > > > > way.
> > > > >> > > > > > > It
> > > > >> > > > > > > > limits the control clients have on the cluster and
> > helps
> > > > >> Kafka
> > > > >> > > > > become a
> > > > >> > > > > > > > more self-serving system. Admin/Ops teams can better
> > > > control
> > > > >> > the
> > > > >> > > > > impact
> > > > >> > > > > > > > application developers can have on a Kafka cluster
> > with
> > > > this
> > > > >> > > change
> > > > >> > > > > > > >
> > > > >> > > > > > > > Best,
> > > > >> > > > > > > > Stanislav
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > > >> > > > jason@confluent.io>
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Hi Stanislav,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thanks for the KIP. Can you clarify the
> > compatibility
> > > > >> impact
> > > > >> > > > here?
> > > > >> > > > > > What
> > > > >> > > > > > > > > will happen to groups that are already larger than
> > the
> > > > max
> > > > >> > > size?
> > > > >> > > > > > Also,
> > > > >> > > > > > > > just
> > > > >> > > > > > > > > to be clear, the resource we are trying to
> conserve
> > here
> > > > >> is
> > > > >> > > what?
> > > > >> > > > > > > Memory?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > -Jason
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > > > >> > > bchen11@outlook.com
> > > > >> > > > >
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > Thanks Stanislav for the update! One suggestion
> I
> > have
> > > > >> is
> > > > >> > > that
> > > > >> > > > it
> > > > >> > > > > > > would
> > > > >> > > > > > > > > be
> > > > >> > > > > > > > > > helpful to put your
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > reasoning on deciding the current default value.
> > For
> > > > >> > example,
> > > > >> > > > in
> > > > >> > > > > > > > certain
> > > > >> > > > > > > > > > use cases at Pinterest we are very likely
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > to have more consumers than 250 when we
> configure
> > 8
> > > > >> stream
> > > > >> > > > > > instances
> > > > >> > > > > > > > with
> > > > >> > > > > > > > > > 32 threads.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > For the effectiveness of this KIP, we should
> > encourage
> > > > >> > people
> > > > >> > > > to
> > > > >> > > > > > > > discuss
> > > > >> > > > > > > > > > their opinions on the default setting and
> ideally
> > > > reach
> > > > >> a
> > > > >> > > > > > consensus.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Best,
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Boyang
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > ________________________________
> > > > >> > > > > > > > > > From: Stanislav Kozlovski <
> stanislav@confluent.io
> > >
> > > > >> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > >> > > > > > > > > > To: dev@kafka.apache.org
> > > > >> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce
> > group.max.size
> > > > >> to
> > > > >> > cap
> > > > >> > > > > > member
> > > > >> > > > > > > > > > metadata growth
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Hey everybody,
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > It's been a week since this KIP and not much
> > > > discussion
> > > > >> has
> > > > >> > > > been
> > > > >> > > > > > > made.
> > > > >> > > > > > > > > > I assume that this is a straight forward change
> > and I
> > > > >> will
> > > > >> > > > open a
> > > > >> > > > > > > > voting
> > > > >> > > > > > > > > > thread in the next couple of days if nobody has
> > > > >> anything to
> > > > >> > > > > > suggest.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Best,
> > > > >> > > > > > > > > > Stanislav
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav
> > Kozlovski <
> > > > >> > > > > > > > > > stanislav@confluent.io>
> > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > Greetings everybody,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I have enriched the KIP a bit with a bigger
> > > > Motivation
> > > > >> > > > section
> > > > >> > > > > > and
> > > > >> > > > > > > > also
> > > > >> > > > > > > > > > > renamed it.
> > > > >> > > > > > > > > > > KIP:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I'm looking forward to discussions around it.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Best,
> > > > >> > > > > > > > > > > Stanislav
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav
> > Kozlovski
> > > > <
> > > > >> > > > > > > > > > > stanislav@confluent.io> wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >> Hey there everybody,
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> Thanks for the introduction Boyang. I
> > appreciate
> > > > the
> > > > >> > > effort
> > > > >> > > > > you
> > > > >> > > > > > > are
> > > > >> > > > > > > > > > >> putting into improving consumer behavior in
> > Kafka.
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> @Matt
> > > > >> > > > > > > > > > >> I also believe the default value is high. In
> my
> > > > >> opinion,
> > > > >> > > we
> > > > >> > > > > > should
> > > > >> > > > > > > > aim
> > > > >> > > > > > > > > > to
> > > > >> > > > > > > > > > >> a default cap around 250. This is because in
> > the
> > > > >> current
> > > > >> > > > model
> > > > >> > > > > > any
> > > > >> > > > > > > > > > consumer
> > > > >> > > > > > > > > > >> rebalance is disrupting to every consumer.
> The
> > > > bigger
> > > > >> > the
> > > > >> > > > > group,
> > > > >> > > > > > > the
> > > > >> > > > > > > > > > longer
> > > > >> > > > > > > > > > >> this period of disruption.
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> If you have such a large consumer group,
> > chances
> > > > are
> > > > >> > that
> > > > >> > > > your
> > > > >> > > > > > > > > > >> client-side logic could be structured better
> > and
> > > > that
> > > > >> > you
> > > > >> > > > are
> > > > >> > > > > > not
> > > > >> > > > > > > > > using
> > > > >> > > > > > > > > > the
> > > > >> > > > > > > > > > >> high number of consumers to achieve high
> > > > throughput.
> > > > >> > > > > > > > > > >> 250 can still be considered of a high upper
> > bound,
> > > > I
> > > > >> > > believe
> > > > >> > > > > in
> > > > >> > > > > > > > > practice
> > > > >> > > > > > > > > > >> users should aim to not go over 100 consumers
> > per
> > > > >> > consumer
> > > > >> > > > > > group.
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> In regards to the cap being
> global/per-broker,
> > I
> > > > >> think
> > > > >> > > that
> > > > >> > > > we
> > > > >> > > > > > > > should
> > > > >> > > > > > > > > > >> consider whether we want it to be global or
> > > > >> *per-topic*.
> > > > >> > > For
> > > > >> > > > > the
> > > > >> > > > > > > > time
> > > > >> > > > > > > > > > >> being, I believe that having it per-topic
> with
> > a
> > > > >> global
> > > > >> > > > > default
> > > > >> > > > > > > > might
> > > > >> > > > > > > > > be
> > > > >> > > > > > > > > > >> the best situation. Having it global only
> > seems a
> > > > bit
> > > > >> > > > > > restricting
> > > > >> > > > > > > to
> > > > >> > > > > > > > > me
> > > > >> > > > > > > > > > and
> > > > >> > > > > > > > > > >> it never hurts to support more fine-grained
> > > > >> > > configurability
> > > > >> > > > > > (given
> > > > >> > > > > > > > > it's
> > > > >> > > > > > > > > > the
> > > > >> > > > > > > > > > >> same config, not a new one being introduced).
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen
> <
> > > > >> > > > > > bchen11@outlook.com
> > > > >> > > > > > > >
> > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still
> > open to
> > > > >> any
> > > > >> > > > > > suggestion
> > > > >> > > > > > > to
> > > > >> > > > > > > > > > >>> change the default value. Meanwhile I just
> > want to
> > > > >> > point
> > > > >> > > > out
> > > > >> > > > > > that
> > > > >> > > > > > > > > this
> > > > >> > > > > > > > > > >>> value is a just last line of defense, not a
> > real
> > > > >> > scenario
> > > > >> > > > we
> > > > >> > > > > > > would
> > > > >> > > > > > > > > > expect.
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav
> > and
> > > > he
> > > > >> > would
> > > > >> > > > be
> > > > >> > > > > > > > driving
> > > > >> > > > > > > > > > the
> > > > >> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed
> the
> > > > idea
> > > > >> in
> > > > >> > > the
> > > > >> > > > > > first
> > > > >> > > > > > > > > place
> > > > >> > > > > > > > > > and
> > > > >> > > > > > > > > > >>> had already come up a draft design, while I
> > will
> > > > >> keep
> > > > >> > > > > focusing
> > > > >> > > > > > on
> > > > >> > > > > > > > > > KIP-345
> > > > >> > > > > > > > > > >>> effort to ensure solving the edge case
> > described
> > > > in
> > > > >> the
> > > > >> > > > JIRA<
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > > > >> > > > > > > > > > >.
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> Boyang
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> ________________________________
> > > > >> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > >> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > >> > > > > > > > > > >>> To: dev@kafka.apache.org
> > > > >> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> > > > >> group.max.size
> > > > >> > to
> > > > >> > > > cap
> > > > >> > > > > > > > member
> > > > >> > > > > > > > > > >>> metadata growth
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> Thanks for the KIP.
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> Will this cap be a global cap across the
> > entire
> > > > >> cluster
> > > > >> > > or
> > > > >> > > > > per
> > > > >> > > > > > > > > broker?
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> Either way the default value seems a bit
> high
> > to
> > > > me,
> > > > >> > but
> > > > >> > > > that
> > > > >> > > > > > > could
> > > > >> > > > > > > > > > just
> > > > >> > > > > > > > > > >>> be
> > > > >> > > > > > > > > > >>> from my own usage patterns. I'd have
> probably
> > > > >> started
> > > > >> > > with
> > > > >> > > > > 500
> > > > >> > > > > > or
> > > > >> > > > > > > > 1k
> > > > >> > > > > > > > > > but
> > > > >> > > > > > > > > > >>> could be easily convinced that's wrong.
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> Thanks,
> > > > >> > > > > > > > > > >>> Matt
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen
> <
> > > > >> > > > > > bchen11@outlook.com
> > > > >> > > > > > > >
> > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>> > Hey folks,
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>> > I would like to start a discussion on
> > KIP-389:
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>> > This is a pretty simple change to cap the
> > > > consumer
> > > > >> > > group
> > > > >> > > > > size
> > > > >> > > > > > > for
> > > > >> > > > > > > > > > >>> broker
> > > > >> > > > > > > > > > >>> > stability. Give me your valuable feedback
> > when
> > > > you
> > > > >> > got
> > > > >> > > > > time.
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>> > Thank you!
> > > > >> > > > > > > > > > >>> >
> > > > >> > > > > > > > > > >>>
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> --
> > > > >> > > > > > > > > > >> Best,
> > > > >> > > > > > > > > > >> Stanislav
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > --
> > > > >> > > > > > > > > > > Best,
> > > > >> > > > > > > > > > > Stanislav
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > --
> > > > >> > > > > > > > > > Best,
> > > > >> > > > > > > > > > Stanislav
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > --
> > > > >> > > > > > > > Best,
> > > > >> > > > > > > > Stanislav
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > --
> > > > >> > > > > > Best,
> > > > >> > > > > > Stanislav
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > Best,
> > > > >> > > > Stanislav
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Best,
> > > > >> > > Stanislav
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Best,
> > > > >> > Stanislav
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> >
> >
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
> >
>
> --
> Best,
> Stanislav
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Hey Stanislav,

I think the time taken to rebalance is not linearly correlated with number of consumers with our application. As for our current and future use cases,
the main concern for Pinterest is still on the broker memory not CPU, because crashing server by one application could have cascading effect on all jobs.
Do you want to drive a more detailed formula on how to compute the memory consumption against number of consumers within the group?

In the meantime, I'm pretty buying in the motivation of this KIP, so I think the follow-up work is just refinement to make the new config easy to use. We should be good
to vote IMO.

Best,
Boyang
________________________________
From: Stanislav Kozlovski <st...@confluent.io>
Sent: Monday, January 7, 2019 4:21 PM
To: dev@kafka.apache.org
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Hey there,

Per Gwen's comments, I slightly reworked the motivation section. Let me
know if it's any better now

I completely agree that it would be best if we were to add a recommended
number to a typical consumer group size. There is a problem that timing the
CPU usage and rebalance times of consumer groups is tricky. We can update
the KIP with memory guidelines (e.g 1 consumer in a group uses X memory,
therefore 100 use Y).
I fear that the most useful recommendations though would be knowing the CPU
impact of large consumer groups and the rebalance times. That is,
unfortunately, tricky to test and measure.

@Boyang, you had mentioned some numbers used in Pinterest. If available to
you, would you be comfortable sharing the number of consumers you are using
in a group and maybe the potential time it takes to rebalance it?

I'd appreciate any anecdotes regarding consumer group sizes from the
community

Best,
Stanislav

On Thu, Jan 3, 2019 at 1:59 AM Boyang Chen <bc...@outlook.com> wrote:

> Thanks Gwen for the suggestion! +1 on the guidance of defining
> group.max.size. I guess a sample formula would be:
> 2 * (# of brokers * average metadata cache size * 80%) / (# of consumer
> groups * size of a single member metadata)
>
> if we assumed non-skewed partition assignment and pretty fair consumer
> group consumption. The "2" is the 95 percentile of normal distribution and
> 80% is just to buffer some memory capacity which are both open to
> discussion. This config should be useful for Kafka platform team to make
> sure one extreme large consumer group won't bring down the whole cluster.
>
> What do you think?
>
> Best,
> Boyang
>
> ________________________________
> From: Gwen Shapira <gw...@confluent.io>
> Sent: Thursday, January 3, 2019 2:59 AM
> To: dev
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Sorry for joining the fun late, but I think the problem we are solving
> evolved a bit in the thread, and I'd like to have better understanding
> of the problem before voting :)
>
> Both KIP and discussion assert that large groups are a problem, but
> they are kinda inconsistent regarding why they are a problem and whose
> problem they are...
> 1. The KIP itself states that the main issue with large groups are
> long rebalance times. Per my understanding, this is mostly a problem
> for the application that consumes data, but not really a problem for
> the brokers themselves, so broker admins probably don't and shouldn't
> care about it. Also, my understanding is that this is a problem for
> consumer groups, but not necessarily a problem for other group types.
> 2. The discussion highlights the issue of "run away" groups that
> essentially create tons of members needlessly and use up lots of
> broker memory. This is something the broker admins will care about a
> lot. And is also a problem for every group that uses coordinators and
> not just consumers. And since the memory in question is the metadata
> cache, it probably has the largest impact on Kafka Streams
> applications, since they have lots of metadata.
>
> The solution proposed makes the most sense in the context of #2, so
> perhaps we should update the motivation section of the KIP to reflect
> that.
>
> The reason I'm probing here is that in my opinion we have to give our
> users some guidelines on what a reasonable limit is (otherwise, how
> will they know?). Calculating the impact of group-size on rebalance
> time in order to make good recommendations will take a significant
> effort. On the other hand, informing users regarding the memory
> footprint of a consumer in a group and using that to make a reasonable
> suggestion isn't hard.
>
> Gwen
>
>
> On Sun, Dec 30, 2018 at 12:51 PM Stanislav Kozlovski
> <st...@confluent.io> wrote:
> >
> > Thanks Boyang,
> >
> > If there aren't any more thoughts on the KIP I'll start a vote thread in
> > the new year
> >
> > On Sat, Dec 29, 2018 at 12:58 AM Boyang Chen <bc...@outlook.com>
> wrote:
> >
> > > Yep Stanislav, that's what I'm proposing, and your explanation makes
> sense.
> > >
> > > Boyang
> > >
> > > ________________________________
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > Sent: Friday, December 28, 2018 7:59 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hey there everybody, let's work on wrapping this discussion up.
> > >
> > > @Boyang, could you clarify what you mean by
> > > > One more question is whether you feel we should enforce group size
> cap
> > > statically or on runtime?
> > > Is that related to the option of enabling this config via the dynamic
> > > broker config feature?
> > >
> > > Regarding that - I feel it's useful to have and I also think it might
> not
> > > introduce additional complexity. Ås long as we handle the config being
> > > changed midway through a rebalance (via using the old value) we should
> be
> > > good to go.
> > >
> > > On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Hey Jason,
> > > >
> > > > Yes, that is what I meant by
> > > > > Given those constraints, I think that we can simply mark the group
> as
> > > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > > group.max.session.timeout.ms`. That's a bit long by default (5
> minutes)
> > > > but I can't seem to come up with a better alternative
> > > > So either the timeout or all members calling joinGroup, yes
> > > >
> > > >
> > > > On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com>
> wrote:
> > > >
> > > >> Hey Jason,
> > > >>
> > > >> I think this is the correct understanding. One more question is
> whether
> > > >> you feel
> > > >> we should enforce group size cap statically or on runtime?
> > > >>
> > > >> Boyang
> > > >> ________________________________
> > > >> From: Jason Gustafson <ja...@confluent.io>
> > > >> Sent: Tuesday, December 11, 2018 3:24 AM
> > > >> To: dev
> > > >> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > >> metadata growth
> > > >>
> > > >> Hey Stanislav,
> > > >>
> > > >> Just to clarify, I think what you're suggesting is something like
> this
> > > in
> > > >> order to gracefully shrink the group:
> > > >>
> > > >> 1. Transition the group to PREPARING_REBALANCE. No members are
> kicked
> > > out.
> > > >> 2. Continue to allow offset commits and heartbeats for all current
> > > >> members.
> > > >> 3. Allow the first n members that send JoinGroup to stay in the
> group,
> > > but
> > > >> wait for the JoinGroup (or session timeout) from all active members
> > > before
> > > >> finishing the rebalance.
> > > >>
> > > >> So basically we try to give the current members an opportunity to
> finish
> > > >> work, but we prevent some of them from rejoining after the rebalance
> > > >> completes. It sounds reasonable if I've understood correctly.
> > > >>
> > > >> Thanks,
> > > >> Jason
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com>
> wrote:
> > > >>
> > > >> > Yep, LGTM on my side. Thanks Stanislav!
> > > >> > ________________________________
> > > >> > From: Stanislav Kozlovski <st...@confluent.io>
> > > >> > Sent: Friday, December 7, 2018 8:51 PM
> > > >> > To: dev@kafka.apache.org
> > > >> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> > > >> > metadata growth
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > We discussed this offline with Boyang and figured that it's best
> to
> > > not
> > > >> > wait on the Cooperative Rebalancing proposal. Our thinking is
> that we
> > > >> can
> > > >> > just force a rebalance from the broker, allowing consumers to
> commit
> > > >> > offsets if their rebalanceListener is configured correctly.
> > > >> > When rebalancing improvements are implemented, we assume that they
> > > would
> > > >> > improve KIP-389's behavior as well as the normal rebalance
> scenarios
> > > >> >
> > > >> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com>
> > > >> wrote:
> > > >> >
> > > >> > > Hey Stanislav,
> > > >> > >
> > > >> > > thanks for the question! `Trivial rebalance` means "we don't
> start
> > > >> > > reassignment right now, but you need to know it's coming soon
> > > >> > > and you should start preparation".
> > > >> > >
> > > >> > > An example KStream use case is that before actually starting to
> > > shrink
> > > >> > the
> > > >> > > consumer group, we need to
> > > >> > > 1. partition the consumer group into two subgroups, where one
> will
> > > be
> > > >> > > offline soon and the other will keep serving;
> > > >> > > 2. make sure the states associated with near-future offline
> > > consumers
> > > >> are
> > > >> > > successfully replicated on the serving ones.
> > > >> > >
> > > >> > > As I have mentioned shrinking the consumer group is pretty much
> > > >> > equivalent
> > > >> > > to group scaling down, so we could think of this
> > > >> > > as an add-on use case for cluster scaling. So my understanding
> is
> > > that
> > > >> > the
> > > >> > > KIP-389 could be sequenced within our cooperative rebalancing<
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > > >> > > >
> > > >> > > proposal.
> > > >> > >
> > > >> > > Let me know if this makes sense.
> > > >> > >
> > > >> > > Best,
> > > >> > > Boyang
> > > >> > > ________________________________
> > > >> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > >> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > > >> > > To: dev@kafka.apache.org
> > > >> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> > > >> > > metadata growth
> > > >> > >
> > > >> > > Hey Boyang,
> > > >> > >
> > > >> > > I think we still need to take care of group shrinkage because
> even
> > > if
> > > >> > users
> > > >> > > change the config value we cannot guarantee that all consumer
> groups
> > > >> > would
> > > >> > > have been manually shrunk.
> > > >> > >
> > > >> > > Regarding 2., I agree that forcefully triggering a rebalance
> might
> > > be
> > > >> the
> > > >> > > most intuitive way to handle the situation.
> > > >> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar
> with
> > > the
> > > >> > > term.
> > > >> > > I was thinking that maybe we could force a rebalance, which
> would
> > > >> cause
> > > >> > > consumers to commit their offsets (given their
> rebalanceListener is
> > > >> > > configured correctly) and subsequently reject some of the
> incoming
> > > >> > > `joinGroup` requests. Does that sound like it would work?
> > > >> > >
> > > >> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bchen11@outlook.com
> >
> > > >> wrote:
> > > >> > >
> > > >> > > > Hey Stanislav,
> > > >> > > >
> > > >> > > > I read the latest KIP and saw that we already changed the
> default
> > > >> value
> > > >> > > to
> > > >> > > > -1. Do
> > > >> > > > we still need to take care of the consumer group shrinking
> when
> > > >> doing
> > > >> > the
> > > >> > > > upgrade?
> > > >> > > >
> > > >> > > > However this is an interesting topic that worth discussing.
> > > Although
> > > >> > > > rolling
> > > >> > > > upgrade is fine, `consumer.group.max.size` could always have
> > > >> conflict
> > > >> > > with
> > > >> > > > the current
> > > >> > > > consumer group size which means we need to adhere to one
> source of
> > > >> > truth.
> > > >> > > >
> > > >> > > > 1.Choose the current group size, which means we never
> interrupt
> > > the
> > > >> > > > consumer group until
> > > >> > > > it transits to PREPARE_REBALANCE. And we keep track of how
> many
> > > join
> > > >> > > group
> > > >> > > > requests
> > > >> > > > we have seen so far during PREPARE_REBALANCE. After reaching
> the
> > > >> > consumer
> > > >> > > > cap,
> > > >> > > > we start to inform over provisioned consumers that you should
> send
> > > >> > > > LeaveGroupRequest and
> > > >> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we
> could
> > > >> mark
> > > >> > > > extra members
> > > >> > > > as hot backup and rebalance without them.
> > > >> > > >
> > > >> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> > > >> rebalancing
> > > >> > > > (you proposed) could be of help here.
> > > >> > > > When a new cap is enforced, leader should be notified. If the
> > > >> current
> > > >> > > > group size is already over limit, leader
> > > >> > > > shall trigger a trivial rebalance to shuffle some topic
> partitions
> > > >> and
> > > >> > > let
> > > >> > > > a subset of consumers prepare the ownership
> > > >> > > > transition. Until they are ready, we trigger a real rebalance
> to
> > > >> remove
> > > >> > > > over-provisioned consumers. It is pretty much
> > > >> > > > equivalent to `how do we scale down the consumer group without
> > > >> > > > interrupting the current processing`.
> > > >> > > >
> > > >> > > > I personally feel inclined to 2 because we could kill two
> birds
> > > with
> > > >> > one
> > > >> > > > stone in a generic way. What do you think?
> > > >> > > >
> > > >> > > > Boyang
> > > >> > > > ________________________________
> > > >> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > >> > > > Sent: Monday, December 3, 2018 8:35 PM
> > > >> > > > To: dev@kafka.apache.org
> > > >> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > member
> > > >> > > > metadata growth
> > > >> > > >
> > > >> > > > Hi Jason,
> > > >> > > >
> > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > >> > > > I'm not sure. Looking at the config from the perspective of a
> > > >> > > prescriptive
> > > >> > > > config, we may get away with not updating it dynamically.
> > > >> > > > But in my opinion, it always makes sense to have a config be
> > > >> > dynamically
> > > >> > > > configurable. As long as we limit it to being a cluster-wide
> > > >> config, we
> > > >> > > > should be fine.
> > > >> > > >
> > > >> > > > > 1. I think it would be helpful to clarify the details on
> how the
> > > >> > > > coordinator will shrink the group. It will need to choose
> which
> > > >> members
> > > >> > > to
> > > >> > > > remove. Are we going to give current members an opportunity to
> > > >> commit
> > > >> > > > offsets before kicking them from the group?
> > > >> > > >
> > > >> > > > This turns out to be somewhat tricky. I think that we may not
> be
> > > >> able
> > > >> > to
> > > >> > > > guarantee that consumers don't process a message twice.
> > > >> > > > My initial approach was to do as much as we could to let
> consumers
> > > >> > commit
> > > >> > > > offsets.
> > > >> > > >
> > > >> > > > I was thinking that we mark a group to be shrunk, we could
> keep a
> > > >> map
> > > >> > of
> > > >> > > > consumer_id->boolean indicating whether they have committed
> > > >> offsets. I
> > > >> > > then
> > > >> > > > thought we could delay the rebalance until every consumer
> commits
> > > >> (or
> > > >> > > some
> > > >> > > > time passes).
> > > >> > > > In the meantime, we would block all incoming fetch calls (by
> > > either
> > > >> > > > returning empty records or a retriable error) and we would
> > > continue
> > > >> to
> > > >> > > > accept offset commits (even twice for a single consumer)
> > > >> > > >
> > > >> > > > I see two problems with this approach:
> > > >> > > > * We have async offset commits, which implies that we can
> receive
> > > >> fetch
> > > >> > > > requests before the offset commit req has been handled. i.e
> > > consmer
> > > >> > sends
> > > >> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B
> in
> > > the
> > > >> > > > broker. Meaning we could have saved the offsets for B but
> > > rebalance
> > > >> > > before
> > > >> > > > the offsetCommit for the offsets processed in C come in.
> > > >> > > > * KIP-392 Allow consumers to fetch from closest replica
> > > >> > > > <
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > > >> > > > >
> > > >> > > > would
> > > >> > > > make it significantly harder to block poll() calls on
> consumers
> > > >> whose
> > > >> > > > groups are being shrunk. Even if we implemented a solution,
> the
> > > same
> > > >> > race
> > > >> > > > condition noted above seems to apply and probably others
> > > >> > > >
> > > >> > > >
> > > >> > > > Given those constraints, I think that we can simply mark the
> group
> > > >> as
> > > >> > > > `PreparingRebalance` with a rebalanceTimeout of the server
> > > setting `
> > > >> > > > group.max.session.timeout.ms`. That's a bit long by default
> (5
> > > >> > minutes)
> > > >> > > > but
> > > >> > > > I can't seem to come up with a better alternative
> > > >> > > >
> > > >> > > > I'm interested in hearing your thoughts.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Stanislav
> > > >> > > >
> > > >> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <
> > > jason@confluent.io
> > > >> >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Hey Stanislav,
> > > >> > > > >
> > > >> > > > > What do you think about the use case I mentioned in my
> previous
> > > >> reply
> > > >> > > > about
> > > >> > > > > > a more resilient self-service Kafka? I believe the benefit
> > > >> there is
> > > >> > > > > bigger.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > I see this config as analogous to the open file limit.
> Probably
> > > >> this
> > > >> > > > limit
> > > >> > > > > was intended to be prescriptive at some point about what was
> > > >> deemed a
> > > >> > > > > reasonable number of open files for an application. But
> mostly
> > > >> people
> > > >> > > > treat
> > > >> > > > > it as an annoyance which they have to work around. If it
> happens
> > > >> to
> > > >> > be
> > > >> > > > hit,
> > > >> > > > > usually you just increase it because it is not tied to an
> actual
> > > >> > > resource
> > > >> > > > > constraint. However, occasionally hitting the limit does
> > > indicate
> > > >> an
> > > >> > > > > application bug such as a leak, so I wouldn't say it is
> useless.
> > > >> > > > Similarly,
> > > >> > > > > the issue in KAFKA-7610 was a consumer leak and having this
> > > limit
> > > >> > would
> > > >> > > > > have allowed the problem to be detected before it impacted
> the
> > > >> > cluster.
> > > >> > > > To
> > > >> > > > > me, that's the main benefit. It's possible that it could be
> used
> > > >> > > > > prescriptively to prevent poor usage of groups, but like the
> > > open
> > > >> > file
> > > >> > > > > limit, I suspect administrators will just set it large
> enough
> > > that
> > > >> > > users
> > > >> > > > > are unlikely to complain.
> > > >> > > > >
> > > >> > > > > Anyway, just a couple additional questions:
> > > >> > > > >
> > > >> > > > > 1. I think it would be helpful to clarify the details on
> how the
> > > >> > > > > coordinator will shrink the group. It will need to choose
> which
> > > >> > members
> > > >> > > > to
> > > >> > > > > remove. Are we going to give current members an opportunity
> to
> > > >> commit
> > > >> > > > > offsets before kicking them from the group?
> > > >> > > > >
> > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Jason
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > >> > > > > stanislav@confluent.io>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi Jason,
> > > >> > > > > >
> > > >> > > > > > You raise some very valid points.
> > > >> > > > > >
> > > >> > > > > > > The benefit of this KIP is probably limited to
> preventing
> > > >> > "runaway"
> > > >> > > > > > consumer groups due to leaks or some other application bug
> > > >> > > > > > What do you think about the use case I mentioned in my
> > > previous
> > > >> > reply
> > > >> > > > > about
> > > >> > > > > > a more resilient self-service Kafka? I believe the benefit
> > > >> there is
> > > >> > > > > bigger
> > > >> > > > > >
> > > >> > > > > > * Default value
> > > >> > > > > > You're right, we probably do need to be conservative. Big
> > > >> consumer
> > > >> > > > groups
> > > >> > > > > > are considered an anti-pattern and my goal was to also
> hint at
> > > >> this
> > > >> > > > > through
> > > >> > > > > > the config's default. Regardless, it is better to not
> have the
> > > >> > > > potential
> > > >> > > > > to
> > > >> > > > > > break applications with an upgrade.
> > > >> > > > > > Choosing between the default of something big like 5000
> or an
> > > >> > opt-in
> > > >> > > > > > option, I think we should go with the *disabled default
> > > option*
> > > >> > > (-1).
> > > >> > > > > > The only benefit we would get from a big default of 5000
> is
> > > >> default
> > > >> > > > > > protection against buggy/malicious applications that hit
> the
> > > >> > > KAFKA-7610
> > > >> > > > > > issue.
> > > >> > > > > > While this KIP was spawned from that issue, I believe its
> > > value
> > > >> is
> > > >> > > > > enabling
> > > >> > > > > > the possibility of protection and helping move towards a
> more
> > > >> > > > > self-service
> > > >> > > > > > Kafka. I also think that a default value of 5000 might be
> > > >> > misleading
> > > >> > > to
> > > >> > > > > > users and lead them to think that big consumer groups (>
> 250)
> > > >> are a
> > > >> > > > good
> > > >> > > > > > thing.
> > > >> > > > > >
> > > >> > > > > > The good news is that KAFKA-7610 should be fully resolved
> and
> > > >> the
> > > >> > > > > rebalance
> > > >> > > > > > protocol should, in general, be more solid after the
> planned
> > > >> > > > improvements
> > > >> > > > > > in KIP-345 and KIP-394.
> > > >> > > > > >
> > > >> > > > > > * Handling bigger groups during upgrade
> > > >> > > > > > I now see that we store the state of consumer groups in
> the
> > > log
> > > >> and
> > > >> > > > why a
> > > >> > > > > > rebalance isn't expected during a rolling upgrade.
> > > >> > > > > > Since we're going with the default value of the max.size
> being
> > > >> > > > disabled,
> > > >> > > > > I
> > > >> > > > > > believe we can afford to be more strict here.
> > > >> > > > > > During state reloading of a new Coordinator with a defined
> > > >> > > > max.group.size
> > > >> > > > > > config, I believe we should *force* rebalances for groups
> that
> > > >> > exceed
> > > >> > > > the
> > > >> > > > > > configured size. Then, only some consumers will be able to
> > > join
> > > >> and
> > > >> > > the
> > > >> > > > > max
> > > >> > > > > > size invariant will be satisfied.
> > > >> > > > > >
> > > >> > > > > > I updated the KIP with a migration plan, rejected
> alternatives
> > > >> and
> > > >> > > the
> > > >> > > > > new
> > > >> > > > > > default value.
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Stanislav
> > > >> > > > > >
> > > >> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > > >> > jason@confluent.io>
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hey Stanislav,
> > > >> > > > > > >
> > > >> > > > > > > Clients will then find that coordinator
> > > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> > > >> group,
> > > >> > > since
> > > >> > > > > the
> > > >> > > > > > > > cache of active consumers is not stored outside the
> > > >> > Coordinator's
> > > >> > > > > > memory.
> > > >> > > > > > > > (please do say if that is incorrect)
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > Groups do not typically rebalance after a coordinator
> > > change.
> > > >> You
> > > >> > > > could
> > > >> > > > > > > potentially force a rebalance if the group is too big
> and
> > > kick
> > > >> > out
> > > >> > > > the
> > > >> > > > > > > slowest members or something. A more graceful solution
> is
> > > >> > probably
> > > >> > > to
> > > >> > > > > > just
> > > >> > > > > > > accept the current size and prevent it from getting
> bigger.
> > > We
> > > >> > > could
> > > >> > > > > log
> > > >> > > > > > a
> > > >> > > > > > > warning potentially.
> > > >> > > > > > >
> > > >> > > > > > > My thinking is that we should abstract away from
> conserving
> > > >> > > resources
> > > >> > > > > and
> > > >> > > > > > > > focus on giving control to the broker. The issue that
> > > >> spawned
> > > >> > > this
> > > >> > > > > KIP
> > > >> > > > > > > was
> > > >> > > > > > > > a memory problem but I feel this change is useful in a
> > > more
> > > >> > > general
> > > >> > > > > > way.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > So you probably already know why I'm asking about this.
> For
> > > >> > > consumer
> > > >> > > > > > groups
> > > >> > > > > > > anyway, resource usage would typically be proportional
> to
> > > the
> > > >> > > number
> > > >> > > > of
> > > >> > > > > > > partitions that a group is reading from and not the
> number
> > > of
> > > >> > > > members.
> > > >> > > > > > For
> > > >> > > > > > > example, consider the memory use in the offsets cache.
> The
> > > >> > benefit
> > > >> > > of
> > > >> > > > > > this
> > > >> > > > > > > KIP is probably limited to preventing "runaway" consumer
> > > >> groups
> > > >> > due
> > > >> > > > to
> > > >> > > > > > > leaks or some other application bug. That still seems
> useful
> > > >> > > though.
> > > >> > > > > > >
> > > >> > > > > > > I completely agree with this and I *ask everybody to
> chime
> > > in
> > > >> > with
> > > >> > > > > > opinions
> > > >> > > > > > > > on a sensible default value*.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > I think we would have to be very conservative. The group
> > > >> protocol
> > > >> > > is
> > > >> > > > > > > generic in some sense, so there may be use cases we
> don't
> > > >> know of
> > > >> > > > where
> > > >> > > > > > > larger groups are reasonable. Probably we should make
> this
> > > an
> > > >> > > opt-in
> > > >> > > > > > > feature so that we do not risk breaking anyone's
> application
> > > >> > after
> > > >> > > an
> > > >> > > > > > > upgrade. Either that, or use a very high default like
> 5,000.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > > Jason
> > > >> > > > > > >
> > > >> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > >> > > > > > > stanislav@confluent.io>
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hey Jason and Boyang, those were important comments
> > > >> > > > > > > >
> > > >> > > > > > > > > One suggestion I have is that it would be helpful
> to put
> > > >> your
> > > >> > > > > > reasoning
> > > >> > > > > > > > on deciding the current default value. For example, in
> > > >> certain
> > > >> > > use
> > > >> > > > > > cases
> > > >> > > > > > > at
> > > >> > > > > > > > Pinterest we are very likely to have more consumers
> than
> > > 250
> > > >> > when
> > > >> > > > we
> > > >> > > > > > > > configure 8 stream instances with 32 threads.
> > > >> > > > > > > > > For the effectiveness of this KIP, we should
> encourage
> > > >> people
> > > >> > > to
> > > >> > > > > > > discuss
> > > >> > > > > > > > their opinions on the default setting and ideally
> reach a
> > > >> > > > consensus.
> > > >> > > > > > > >
> > > >> > > > > > > > I completely agree with this and I *ask everybody to
> chime
> > > >> in
> > > >> > > with
> > > >> > > > > > > opinions
> > > >> > > > > > > > on a sensible default value*.
> > > >> > > > > > > > My thought process was that in the current model
> > > rebalances
> > > >> in
> > > >> > > > large
> > > >> > > > > > > groups
> > > >> > > > > > > > are more costly. I imagine most use cases in most
> Kafka
> > > >> users
> > > >> > do
> > > >> > > > not
> > > >> > > > > > > > require more than 250 consumers.
> > > >> > > > > > > > Boyang, you say that you are "likely to have... when
> > > we..."
> > > >> -
> > > >> > do
> > > >> > > > you
> > > >> > > > > > have
> > > >> > > > > > > > systems running with so many consumers in a group or
> are
> > > you
> > > >> > > > planning
> > > >> > > > > > > to? I
> > > >> > > > > > > > guess what I'm asking is whether this has been tested
> in
> > > >> > > production
> > > >> > > > > > with
> > > >> > > > > > > > the current rebalance model (ignoring KIP-345)
> > > >> > > > > > > >
> > > >> > > > > > > > >  Can you clarify the compatibility impact here? What
> > > >> > > > > > > > > will happen to groups that are already larger than
> the
> > > max
> > > >> > > size?
> > > >> > > > > > > > This is a very important question.
> > > >> > > > > > > > From my current understanding, when a coordinator
> broker
> > > >> gets
> > > >> > > shut
> > > >> > > > > > > > down during a cluster rolling upgrade, a replica will
> take
> > > >> > > > leadership
> > > >> > > > > > of
> > > >> > > > > > > > the `__offset_commits` partition. Clients will then
> find
> > > >> that
> > > >> > > > > > coordinator
> > > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> > > >> group,
> > > >> > > since
> > > >> > > > > the
> > > >> > > > > > > > cache of active consumers is not stored outside the
> > > >> > Coordinator's
> > > >> > > > > > memory.
> > > >> > > > > > > > (please do say if that is incorrect)
> > > >> > > > > > > > Then, I believe that working as if this is a new
> group is
> > > a
> > > >> > > > > reasonable
> > > >> > > > > > > > approach. Namely, fail joinGroups when the max.size is
> > > >> > exceeded.
> > > >> > > > > > > > What do you guys think about this? (I'll update the
> KIP
> > > >> after
> > > >> > we
> > > >> > > > > settle
> > > >> > > > > > > on
> > > >> > > > > > > > a solution)
> > > >> > > > > > > >
> > > >> > > > > > > > >  Also, just to be clear, the resource we are trying
> to
> > > >> > conserve
> > > >> > > > > here
> > > >> > > > > > is
> > > >> > > > > > > > what? Memory?
> > > >> > > > > > > > My thinking is that we should abstract away from
> > > conserving
> > > >> > > > resources
> > > >> > > > > > and
> > > >> > > > > > > > focus on giving control to the broker. The issue that
> > > >> spawned
> > > >> > > this
> > > >> > > > > KIP
> > > >> > > > > > > was
> > > >> > > > > > > > a memory problem but I feel this change is useful in a
> > > more
> > > >> > > general
> > > >> > > > > > way.
> > > >> > > > > > > It
> > > >> > > > > > > > limits the control clients have on the cluster and
> helps
> > > >> Kafka
> > > >> > > > > become a
> > > >> > > > > > > > more self-serving system. Admin/Ops teams can better
> > > control
> > > >> > the
> > > >> > > > > impact
> > > >> > > > > > > > application developers can have on a Kafka cluster
> with
> > > this
> > > >> > > change
> > > >> > > > > > > >
> > > >> > > > > > > > Best,
> > > >> > > > > > > > Stanislav
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > >> > > > jason@confluent.io>
> > > >> > > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Hi Stanislav,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks for the KIP. Can you clarify the
> compatibility
> > > >> impact
> > > >> > > > here?
> > > >> > > > > > What
> > > >> > > > > > > > > will happen to groups that are already larger than
> the
> > > max
> > > >> > > size?
> > > >> > > > > > Also,
> > > >> > > > > > > > just
> > > >> > > > > > > > > to be clear, the resource we are trying to conserve
> here
> > > >> is
> > > >> > > what?
> > > >> > > > > > > Memory?
> > > >> > > > > > > > >
> > > >> > > > > > > > > -Jason
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > > >> > > bchen11@outlook.com
> > > >> > > > >
> > > >> > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Thanks Stanislav for the update! One suggestion I
> have
> > > >> is
> > > >> > > that
> > > >> > > > it
> > > >> > > > > > > would
> > > >> > > > > > > > > be
> > > >> > > > > > > > > > helpful to put your
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > reasoning on deciding the current default value.
> For
> > > >> > example,
> > > >> > > > in
> > > >> > > > > > > > certain
> > > >> > > > > > > > > > use cases at Pinterest we are very likely
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > to have more consumers than 250 when we configure
> 8
> > > >> stream
> > > >> > > > > > instances
> > > >> > > > > > > > with
> > > >> > > > > > > > > > 32 threads.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > For the effectiveness of this KIP, we should
> encourage
> > > >> > people
> > > >> > > > to
> > > >> > > > > > > > discuss
> > > >> > > > > > > > > > their opinions on the default setting and ideally
> > > reach
> > > >> a
> > > >> > > > > > consensus.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Best,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Boyang
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > ________________________________
> > > >> > > > > > > > > > From: Stanislav Kozlovski <stanislav@confluent.io
> >
> > > >> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > >> > > > > > > > > > To: dev@kafka.apache.org
> > > >> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce
> group.max.size
> > > >> to
> > > >> > cap
> > > >> > > > > > member
> > > >> > > > > > > > > > metadata growth
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Hey everybody,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > It's been a week since this KIP and not much
> > > discussion
> > > >> has
> > > >> > > > been
> > > >> > > > > > > made.
> > > >> > > > > > > > > > I assume that this is a straight forward change
> and I
> > > >> will
> > > >> > > > open a
> > > >> > > > > > > > voting
> > > >> > > > > > > > > > thread in the next couple of days if nobody has
> > > >> anything to
> > > >> > > > > > suggest.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Best,
> > > >> > > > > > > > > > Stanislav
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav
> Kozlovski <
> > > >> > > > > > > > > > stanislav@confluent.io>
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > Greetings everybody,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > I have enriched the KIP a bit with a bigger
> > > Motivation
> > > >> > > > section
> > > >> > > > > > and
> > > >> > > > > > > > also
> > > >> > > > > > > > > > > renamed it.
> > > >> > > > > > > > > > > KIP:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > I'm looking forward to discussions around it.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Best,
> > > >> > > > > > > > > > > Stanislav
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav
> Kozlovski
> > > <
> > > >> > > > > > > > > > > stanislav@confluent.io> wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >> Hey there everybody,
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> Thanks for the introduction Boyang. I
> appreciate
> > > the
> > > >> > > effort
> > > >> > > > > you
> > > >> > > > > > > are
> > > >> > > > > > > > > > >> putting into improving consumer behavior in
> Kafka.
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> @Matt
> > > >> > > > > > > > > > >> I also believe the default value is high. In my
> > > >> opinion,
> > > >> > > we
> > > >> > > > > > should
> > > >> > > > > > > > aim
> > > >> > > > > > > > > > to
> > > >> > > > > > > > > > >> a default cap around 250. This is because in
> the
> > > >> current
> > > >> > > > model
> > > >> > > > > > any
> > > >> > > > > > > > > > consumer
> > > >> > > > > > > > > > >> rebalance is disrupting to every consumer. The
> > > bigger
> > > >> > the
> > > >> > > > > group,
> > > >> > > > > > > the
> > > >> > > > > > > > > > longer
> > > >> > > > > > > > > > >> this period of disruption.
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> If you have such a large consumer group,
> chances
> > > are
> > > >> > that
> > > >> > > > your
> > > >> > > > > > > > > > >> client-side logic could be structured better
> and
> > > that
> > > >> > you
> > > >> > > > are
> > > >> > > > > > not
> > > >> > > > > > > > > using
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > >> high number of consumers to achieve high
> > > throughput.
> > > >> > > > > > > > > > >> 250 can still be considered of a high upper
> bound,
> > > I
> > > >> > > believe
> > > >> > > > > in
> > > >> > > > > > > > > practice
> > > >> > > > > > > > > > >> users should aim to not go over 100 consumers
> per
> > > >> > consumer
> > > >> > > > > > group.
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> In regards to the cap being global/per-broker,
> I
> > > >> think
> > > >> > > that
> > > >> > > > we
> > > >> > > > > > > > should
> > > >> > > > > > > > > > >> consider whether we want it to be global or
> > > >> *per-topic*.
> > > >> > > For
> > > >> > > > > the
> > > >> > > > > > > > time
> > > >> > > > > > > > > > >> being, I believe that having it per-topic with
> a
> > > >> global
> > > >> > > > > default
> > > >> > > > > > > > might
> > > >> > > > > > > > > be
> > > >> > > > > > > > > > >> the best situation. Having it global only
> seems a
> > > bit
> > > >> > > > > > restricting
> > > >> > > > > > > to
> > > >> > > > > > > > > me
> > > >> > > > > > > > > > and
> > > >> > > > > > > > > > >> it never hurts to support more fine-grained
> > > >> > > configurability
> > > >> > > > > > (given
> > > >> > > > > > > > > it's
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > >> same config, not a new one being introduced).
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > >> > > > > > bchen11@outlook.com
> > > >> > > > > > > >
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still
> open to
> > > >> any
> > > >> > > > > > suggestion
> > > >> > > > > > > to
> > > >> > > > > > > > > > >>> change the default value. Meanwhile I just
> want to
> > > >> > point
> > > >> > > > out
> > > >> > > > > > that
> > > >> > > > > > > > > this
> > > >> > > > > > > > > > >>> value is a just last line of defense, not a
> real
> > > >> > scenario
> > > >> > > > we
> > > >> > > > > > > would
> > > >> > > > > > > > > > expect.
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav
> and
> > > he
> > > >> > would
> > > >> > > > be
> > > >> > > > > > > > driving
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the
> > > idea
> > > >> in
> > > >> > > the
> > > >> > > > > > first
> > > >> > > > > > > > > place
> > > >> > > > > > > > > > and
> > > >> > > > > > > > > > >>> had already come up a draft design, while I
> will
> > > >> keep
> > > >> > > > > focusing
> > > >> > > > > > on
> > > >> > > > > > > > > > KIP-345
> > > >> > > > > > > > > > >>> effort to ensure solving the edge case
> described
> > > in
> > > >> the
> > > >> > > > JIRA<
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > > >> > > > > > > > > > >.
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Boyang
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> ________________________________
> > > >> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > >> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > >> > > > > > > > > > >>> To: dev@kafka.apache.org
> > > >> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> > > >> group.max.size
> > > >> > to
> > > >> > > > cap
> > > >> > > > > > > > member
> > > >> > > > > > > > > > >>> metadata growth
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Thanks for the KIP.
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Will this cap be a global cap across the
> entire
> > > >> cluster
> > > >> > > or
> > > >> > > > > per
> > > >> > > > > > > > > broker?
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Either way the default value seems a bit high
> to
> > > me,
> > > >> > but
> > > >> > > > that
> > > >> > > > > > > could
> > > >> > > > > > > > > > just
> > > >> > > > > > > > > > >>> be
> > > >> > > > > > > > > > >>> from my own usage patterns. I'd have probably
> > > >> started
> > > >> > > with
> > > >> > > > > 500
> > > >> > > > > > or
> > > >> > > > > > > > 1k
> > > >> > > > > > > > > > but
> > > >> > > > > > > > > > >>> could be easily convinced that's wrong.
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Thanks,
> > > >> > > > > > > > > > >>> Matt
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > >> > > > > > bchen11@outlook.com
> > > >> > > > > > > >
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> > Hey folks,
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> > I would like to start a discussion on
> KIP-389:
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> > This is a pretty simple change to cap the
> > > consumer
> > > >> > > group
> > > >> > > > > size
> > > >> > > > > > > for
> > > >> > > > > > > > > > >>> broker
> > > >> > > > > > > > > > >>> > stability. Give me your valuable feedback
> when
> > > you
> > > >> > got
> > > >> > > > > time.
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> > Thank you!
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> --
> > > >> > > > > > > > > > >> Best,
> > > >> > > > > > > > > > >> Stanislav
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > --
> > > >> > > > > > > > > > > Best,
> > > >> > > > > > > > > > > Stanislav
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > --
> > > >> > > > > > > > > > Best,
> > > >> > > > > > > > > > Stanislav
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > --
> > > >> > > > > > > > Best,
> > > >> > > > > > > > Stanislav
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > > Best,
> > > >> > > > > > Stanislav
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > Best,
> > > >> > > > Stanislav
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Best,
> > > >> > > Stanislav
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Best,
> > > >> > Stanislav
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>
>

--
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey there,

Per Gwen's comments, I slightly reworked the motivation section. Let me
know if it's any better now

I completely agree that it would be best if we were to add a recommended
number to a typical consumer group size. There is a problem that timing the
CPU usage and rebalance times of consumer groups is tricky. We can update
the KIP with memory guidelines (e.g 1 consumer in a group uses X memory,
therefore 100 use Y).
I fear that the most useful recommendations though would be knowing the CPU
impact of large consumer groups and the rebalance times. That is,
unfortunately, tricky to test and measure.

@Boyang, you had mentioned some numbers used in Pinterest. If available to
you, would you be comfortable sharing the number of consumers you are using
in a group and maybe the potential time it takes to rebalance it?

I'd appreciate any anecdotes regarding consumer group sizes from the
community

Best,
Stanislav

On Thu, Jan 3, 2019 at 1:59 AM Boyang Chen <bc...@outlook.com> wrote:

> Thanks Gwen for the suggestion! +1 on the guidance of defining
> group.max.size. I guess a sample formula would be:
> 2 * (# of brokers * average metadata cache size * 80%) / (# of consumer
> groups * size of a single member metadata)
>
> if we assumed non-skewed partition assignment and pretty fair consumer
> group consumption. The "2" is the 95 percentile of normal distribution and
> 80% is just to buffer some memory capacity which are both open to
> discussion. This config should be useful for Kafka platform team to make
> sure one extreme large consumer group won't bring down the whole cluster.
>
> What do you think?
>
> Best,
> Boyang
>
> ________________________________
> From: Gwen Shapira <gw...@confluent.io>
> Sent: Thursday, January 3, 2019 2:59 AM
> To: dev
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Sorry for joining the fun late, but I think the problem we are solving
> evolved a bit in the thread, and I'd like to have better understanding
> of the problem before voting :)
>
> Both KIP and discussion assert that large groups are a problem, but
> they are kinda inconsistent regarding why they are a problem and whose
> problem they are...
> 1. The KIP itself states that the main issue with large groups are
> long rebalance times. Per my understanding, this is mostly a problem
> for the application that consumes data, but not really a problem for
> the brokers themselves, so broker admins probably don't and shouldn't
> care about it. Also, my understanding is that this is a problem for
> consumer groups, but not necessarily a problem for other group types.
> 2. The discussion highlights the issue of "run away" groups that
> essentially create tons of members needlessly and use up lots of
> broker memory. This is something the broker admins will care about a
> lot. And is also a problem for every group that uses coordinators and
> not just consumers. And since the memory in question is the metadata
> cache, it probably has the largest impact on Kafka Streams
> applications, since they have lots of metadata.
>
> The solution proposed makes the most sense in the context of #2, so
> perhaps we should update the motivation section of the KIP to reflect
> that.
>
> The reason I'm probing here is that in my opinion we have to give our
> users some guidelines on what a reasonable limit is (otherwise, how
> will they know?). Calculating the impact of group-size on rebalance
> time in order to make good recommendations will take a significant
> effort. On the other hand, informing users regarding the memory
> footprint of a consumer in a group and using that to make a reasonable
> suggestion isn't hard.
>
> Gwen
>
>
> On Sun, Dec 30, 2018 at 12:51 PM Stanislav Kozlovski
> <st...@confluent.io> wrote:
> >
> > Thanks Boyang,
> >
> > If there aren't any more thoughts on the KIP I'll start a vote thread in
> > the new year
> >
> > On Sat, Dec 29, 2018 at 12:58 AM Boyang Chen <bc...@outlook.com>
> wrote:
> >
> > > Yep Stanislav, that's what I'm proposing, and your explanation makes
> sense.
> > >
> > > Boyang
> > >
> > > ________________________________
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > Sent: Friday, December 28, 2018 7:59 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hey there everybody, let's work on wrapping this discussion up.
> > >
> > > @Boyang, could you clarify what you mean by
> > > > One more question is whether you feel we should enforce group size
> cap
> > > statically or on runtime?
> > > Is that related to the option of enabling this config via the dynamic
> > > broker config feature?
> > >
> > > Regarding that - I feel it's useful to have and I also think it might
> not
> > > introduce additional complexity. Ås long as we handle the config being
> > > changed midway through a rebalance (via using the old value) we should
> be
> > > good to go.
> > >
> > > On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Hey Jason,
> > > >
> > > > Yes, that is what I meant by
> > > > > Given those constraints, I think that we can simply mark the group
> as
> > > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > > group.max.session.timeout.ms`. That's a bit long by default (5
> minutes)
> > > > but I can't seem to come up with a better alternative
> > > > So either the timeout or all members calling joinGroup, yes
> > > >
> > > >
> > > > On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com>
> wrote:
> > > >
> > > >> Hey Jason,
> > > >>
> > > >> I think this is the correct understanding. One more question is
> whether
> > > >> you feel
> > > >> we should enforce group size cap statically or on runtime?
> > > >>
> > > >> Boyang
> > > >> ________________________________
> > > >> From: Jason Gustafson <ja...@confluent.io>
> > > >> Sent: Tuesday, December 11, 2018 3:24 AM
> > > >> To: dev
> > > >> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > >> metadata growth
> > > >>
> > > >> Hey Stanislav,
> > > >>
> > > >> Just to clarify, I think what you're suggesting is something like
> this
> > > in
> > > >> order to gracefully shrink the group:
> > > >>
> > > >> 1. Transition the group to PREPARING_REBALANCE. No members are
> kicked
> > > out.
> > > >> 2. Continue to allow offset commits and heartbeats for all current
> > > >> members.
> > > >> 3. Allow the first n members that send JoinGroup to stay in the
> group,
> > > but
> > > >> wait for the JoinGroup (or session timeout) from all active members
> > > before
> > > >> finishing the rebalance.
> > > >>
> > > >> So basically we try to give the current members an opportunity to
> finish
> > > >> work, but we prevent some of them from rejoining after the rebalance
> > > >> completes. It sounds reasonable if I've understood correctly.
> > > >>
> > > >> Thanks,
> > > >> Jason
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com>
> wrote:
> > > >>
> > > >> > Yep, LGTM on my side. Thanks Stanislav!
> > > >> > ________________________________
> > > >> > From: Stanislav Kozlovski <st...@confluent.io>
> > > >> > Sent: Friday, December 7, 2018 8:51 PM
> > > >> > To: dev@kafka.apache.org
> > > >> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> > > >> > metadata growth
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > We discussed this offline with Boyang and figured that it's best
> to
> > > not
> > > >> > wait on the Cooperative Rebalancing proposal. Our thinking is
> that we
> > > >> can
> > > >> > just force a rebalance from the broker, allowing consumers to
> commit
> > > >> > offsets if their rebalanceListener is configured correctly.
> > > >> > When rebalancing improvements are implemented, we assume that they
> > > would
> > > >> > improve KIP-389's behavior as well as the normal rebalance
> scenarios
> > > >> >
> > > >> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com>
> > > >> wrote:
> > > >> >
> > > >> > > Hey Stanislav,
> > > >> > >
> > > >> > > thanks for the question! `Trivial rebalance` means "we don't
> start
> > > >> > > reassignment right now, but you need to know it's coming soon
> > > >> > > and you should start preparation".
> > > >> > >
> > > >> > > An example KStream use case is that before actually starting to
> > > shrink
> > > >> > the
> > > >> > > consumer group, we need to
> > > >> > > 1. partition the consumer group into two subgroups, where one
> will
> > > be
> > > >> > > offline soon and the other will keep serving;
> > > >> > > 2. make sure the states associated with near-future offline
> > > consumers
> > > >> are
> > > >> > > successfully replicated on the serving ones.
> > > >> > >
> > > >> > > As I have mentioned shrinking the consumer group is pretty much
> > > >> > equivalent
> > > >> > > to group scaling down, so we could think of this
> > > >> > > as an add-on use case for cluster scaling. So my understanding
> is
> > > that
> > > >> > the
> > > >> > > KIP-389 could be sequenced within our cooperative rebalancing<
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > > >> > > >
> > > >> > > proposal.
> > > >> > >
> > > >> > > Let me know if this makes sense.
> > > >> > >
> > > >> > > Best,
> > > >> > > Boyang
> > > >> > > ________________________________
> > > >> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > >> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > > >> > > To: dev@kafka.apache.org
> > > >> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> > > >> > > metadata growth
> > > >> > >
> > > >> > > Hey Boyang,
> > > >> > >
> > > >> > > I think we still need to take care of group shrinkage because
> even
> > > if
> > > >> > users
> > > >> > > change the config value we cannot guarantee that all consumer
> groups
> > > >> > would
> > > >> > > have been manually shrunk.
> > > >> > >
> > > >> > > Regarding 2., I agree that forcefully triggering a rebalance
> might
> > > be
> > > >> the
> > > >> > > most intuitive way to handle the situation.
> > > >> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar
> with
> > > the
> > > >> > > term.
> > > >> > > I was thinking that maybe we could force a rebalance, which
> would
> > > >> cause
> > > >> > > consumers to commit their offsets (given their
> rebalanceListener is
> > > >> > > configured correctly) and subsequently reject some of the
> incoming
> > > >> > > `joinGroup` requests. Does that sound like it would work?
> > > >> > >
> > > >> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bchen11@outlook.com
> >
> > > >> wrote:
> > > >> > >
> > > >> > > > Hey Stanislav,
> > > >> > > >
> > > >> > > > I read the latest KIP and saw that we already changed the
> default
> > > >> value
> > > >> > > to
> > > >> > > > -1. Do
> > > >> > > > we still need to take care of the consumer group shrinking
> when
> > > >> doing
> > > >> > the
> > > >> > > > upgrade?
> > > >> > > >
> > > >> > > > However this is an interesting topic that worth discussing.
> > > Although
> > > >> > > > rolling
> > > >> > > > upgrade is fine, `consumer.group.max.size` could always have
> > > >> conflict
> > > >> > > with
> > > >> > > > the current
> > > >> > > > consumer group size which means we need to adhere to one
> source of
> > > >> > truth.
> > > >> > > >
> > > >> > > > 1.Choose the current group size, which means we never
> interrupt
> > > the
> > > >> > > > consumer group until
> > > >> > > > it transits to PREPARE_REBALANCE. And we keep track of how
> many
> > > join
> > > >> > > group
> > > >> > > > requests
> > > >> > > > we have seen so far during PREPARE_REBALANCE. After reaching
> the
> > > >> > consumer
> > > >> > > > cap,
> > > >> > > > we start to inform over provisioned consumers that you should
> send
> > > >> > > > LeaveGroupRequest and
> > > >> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we
> could
> > > >> mark
> > > >> > > > extra members
> > > >> > > > as hot backup and rebalance without them.
> > > >> > > >
> > > >> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> > > >> rebalancing
> > > >> > > > (you proposed) could be of help here.
> > > >> > > > When a new cap is enforced, leader should be notified. If the
> > > >> current
> > > >> > > > group size is already over limit, leader
> > > >> > > > shall trigger a trivial rebalance to shuffle some topic
> partitions
> > > >> and
> > > >> > > let
> > > >> > > > a subset of consumers prepare the ownership
> > > >> > > > transition. Until they are ready, we trigger a real rebalance
> to
> > > >> remove
> > > >> > > > over-provisioned consumers. It is pretty much
> > > >> > > > equivalent to `how do we scale down the consumer group without
> > > >> > > > interrupting the current processing`.
> > > >> > > >
> > > >> > > > I personally feel inclined to 2 because we could kill two
> birds
> > > with
> > > >> > one
> > > >> > > > stone in a generic way. What do you think?
> > > >> > > >
> > > >> > > > Boyang
> > > >> > > > ________________________________
> > > >> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > >> > > > Sent: Monday, December 3, 2018 8:35 PM
> > > >> > > > To: dev@kafka.apache.org
> > > >> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > member
> > > >> > > > metadata growth
> > > >> > > >
> > > >> > > > Hi Jason,
> > > >> > > >
> > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > >> > > > I'm not sure. Looking at the config from the perspective of a
> > > >> > > prescriptive
> > > >> > > > config, we may get away with not updating it dynamically.
> > > >> > > > But in my opinion, it always makes sense to have a config be
> > > >> > dynamically
> > > >> > > > configurable. As long as we limit it to being a cluster-wide
> > > >> config, we
> > > >> > > > should be fine.
> > > >> > > >
> > > >> > > > > 1. I think it would be helpful to clarify the details on
> how the
> > > >> > > > coordinator will shrink the group. It will need to choose
> which
> > > >> members
> > > >> > > to
> > > >> > > > remove. Are we going to give current members an opportunity to
> > > >> commit
> > > >> > > > offsets before kicking them from the group?
> > > >> > > >
> > > >> > > > This turns out to be somewhat tricky. I think that we may not
> be
> > > >> able
> > > >> > to
> > > >> > > > guarantee that consumers don't process a message twice.
> > > >> > > > My initial approach was to do as much as we could to let
> consumers
> > > >> > commit
> > > >> > > > offsets.
> > > >> > > >
> > > >> > > > I was thinking that we mark a group to be shrunk, we could
> keep a
> > > >> map
> > > >> > of
> > > >> > > > consumer_id->boolean indicating whether they have committed
> > > >> offsets. I
> > > >> > > then
> > > >> > > > thought we could delay the rebalance until every consumer
> commits
> > > >> (or
> > > >> > > some
> > > >> > > > time passes).
> > > >> > > > In the meantime, we would block all incoming fetch calls (by
> > > either
> > > >> > > > returning empty records or a retriable error) and we would
> > > continue
> > > >> to
> > > >> > > > accept offset commits (even twice for a single consumer)
> > > >> > > >
> > > >> > > > I see two problems with this approach:
> > > >> > > > * We have async offset commits, which implies that we can
> receive
> > > >> fetch
> > > >> > > > requests before the offset commit req has been handled. i.e
> > > consmer
> > > >> > sends
> > > >> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B
> in
> > > the
> > > >> > > > broker. Meaning we could have saved the offsets for B but
> > > rebalance
> > > >> > > before
> > > >> > > > the offsetCommit for the offsets processed in C come in.
> > > >> > > > * KIP-392 Allow consumers to fetch from closest replica
> > > >> > > > <
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > > >> > > > >
> > > >> > > > would
> > > >> > > > make it significantly harder to block poll() calls on
> consumers
> > > >> whose
> > > >> > > > groups are being shrunk. Even if we implemented a solution,
> the
> > > same
> > > >> > race
> > > >> > > > condition noted above seems to apply and probably others
> > > >> > > >
> > > >> > > >
> > > >> > > > Given those constraints, I think that we can simply mark the
> group
> > > >> as
> > > >> > > > `PreparingRebalance` with a rebalanceTimeout of the server
> > > setting `
> > > >> > > > group.max.session.timeout.ms`. That's a bit long by default
> (5
> > > >> > minutes)
> > > >> > > > but
> > > >> > > > I can't seem to come up with a better alternative
> > > >> > > >
> > > >> > > > I'm interested in hearing your thoughts.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Stanislav
> > > >> > > >
> > > >> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <
> > > jason@confluent.io
> > > >> >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Hey Stanislav,
> > > >> > > > >
> > > >> > > > > What do you think about the use case I mentioned in my
> previous
> > > >> reply
> > > >> > > > about
> > > >> > > > > > a more resilient self-service Kafka? I believe the benefit
> > > >> there is
> > > >> > > > > bigger.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > I see this config as analogous to the open file limit.
> Probably
> > > >> this
> > > >> > > > limit
> > > >> > > > > was intended to be prescriptive at some point about what was
> > > >> deemed a
> > > >> > > > > reasonable number of open files for an application. But
> mostly
> > > >> people
> > > >> > > > treat
> > > >> > > > > it as an annoyance which they have to work around. If it
> happens
> > > >> to
> > > >> > be
> > > >> > > > hit,
> > > >> > > > > usually you just increase it because it is not tied to an
> actual
> > > >> > > resource
> > > >> > > > > constraint. However, occasionally hitting the limit does
> > > indicate
> > > >> an
> > > >> > > > > application bug such as a leak, so I wouldn't say it is
> useless.
> > > >> > > > Similarly,
> > > >> > > > > the issue in KAFKA-7610 was a consumer leak and having this
> > > limit
> > > >> > would
> > > >> > > > > have allowed the problem to be detected before it impacted
> the
> > > >> > cluster.
> > > >> > > > To
> > > >> > > > > me, that's the main benefit. It's possible that it could be
> used
> > > >> > > > > prescriptively to prevent poor usage of groups, but like the
> > > open
> > > >> > file
> > > >> > > > > limit, I suspect administrators will just set it large
> enough
> > > that
> > > >> > > users
> > > >> > > > > are unlikely to complain.
> > > >> > > > >
> > > >> > > > > Anyway, just a couple additional questions:
> > > >> > > > >
> > > >> > > > > 1. I think it would be helpful to clarify the details on
> how the
> > > >> > > > > coordinator will shrink the group. It will need to choose
> which
> > > >> > members
> > > >> > > > to
> > > >> > > > > remove. Are we going to give current members an opportunity
> to
> > > >> commit
> > > >> > > > > offsets before kicking them from the group?
> > > >> > > > >
> > > >> > > > > 2. Do you think we should make this a dynamic config?
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Jason
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > >> > > > > stanislav@confluent.io>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi Jason,
> > > >> > > > > >
> > > >> > > > > > You raise some very valid points.
> > > >> > > > > >
> > > >> > > > > > > The benefit of this KIP is probably limited to
> preventing
> > > >> > "runaway"
> > > >> > > > > > consumer groups due to leaks or some other application bug
> > > >> > > > > > What do you think about the use case I mentioned in my
> > > previous
> > > >> > reply
> > > >> > > > > about
> > > >> > > > > > a more resilient self-service Kafka? I believe the benefit
> > > >> there is
> > > >> > > > > bigger
> > > >> > > > > >
> > > >> > > > > > * Default value
> > > >> > > > > > You're right, we probably do need to be conservative. Big
> > > >> consumer
> > > >> > > > groups
> > > >> > > > > > are considered an anti-pattern and my goal was to also
> hint at
> > > >> this
> > > >> > > > > through
> > > >> > > > > > the config's default. Regardless, it is better to not
> have the
> > > >> > > > potential
> > > >> > > > > to
> > > >> > > > > > break applications with an upgrade.
> > > >> > > > > > Choosing between the default of something big like 5000
> or an
> > > >> > opt-in
> > > >> > > > > > option, I think we should go with the *disabled default
> > > option*
> > > >> > > (-1).
> > > >> > > > > > The only benefit we would get from a big default of 5000
> is
> > > >> default
> > > >> > > > > > protection against buggy/malicious applications that hit
> the
> > > >> > > KAFKA-7610
> > > >> > > > > > issue.
> > > >> > > > > > While this KIP was spawned from that issue, I believe its
> > > value
> > > >> is
> > > >> > > > > enabling
> > > >> > > > > > the possibility of protection and helping move towards a
> more
> > > >> > > > > self-service
> > > >> > > > > > Kafka. I also think that a default value of 5000 might be
> > > >> > misleading
> > > >> > > to
> > > >> > > > > > users and lead them to think that big consumer groups (>
> 250)
> > > >> are a
> > > >> > > > good
> > > >> > > > > > thing.
> > > >> > > > > >
> > > >> > > > > > The good news is that KAFKA-7610 should be fully resolved
> and
> > > >> the
> > > >> > > > > rebalance
> > > >> > > > > > protocol should, in general, be more solid after the
> planned
> > > >> > > > improvements
> > > >> > > > > > in KIP-345 and KIP-394.
> > > >> > > > > >
> > > >> > > > > > * Handling bigger groups during upgrade
> > > >> > > > > > I now see that we store the state of consumer groups in
> the
> > > log
> > > >> and
> > > >> > > > why a
> > > >> > > > > > rebalance isn't expected during a rolling upgrade.
> > > >> > > > > > Since we're going with the default value of the max.size
> being
> > > >> > > > disabled,
> > > >> > > > > I
> > > >> > > > > > believe we can afford to be more strict here.
> > > >> > > > > > During state reloading of a new Coordinator with a defined
> > > >> > > > max.group.size
> > > >> > > > > > config, I believe we should *force* rebalances for groups
> that
> > > >> > exceed
> > > >> > > > the
> > > >> > > > > > configured size. Then, only some consumers will be able to
> > > join
> > > >> and
> > > >> > > the
> > > >> > > > > max
> > > >> > > > > > size invariant will be satisfied.
> > > >> > > > > >
> > > >> > > > > > I updated the KIP with a migration plan, rejected
> alternatives
> > > >> and
> > > >> > > the
> > > >> > > > > new
> > > >> > > > > > default value.
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Stanislav
> > > >> > > > > >
> > > >> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > > >> > jason@confluent.io>
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hey Stanislav,
> > > >> > > > > > >
> > > >> > > > > > > Clients will then find that coordinator
> > > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> > > >> group,
> > > >> > > since
> > > >> > > > > the
> > > >> > > > > > > > cache of active consumers is not stored outside the
> > > >> > Coordinator's
> > > >> > > > > > memory.
> > > >> > > > > > > > (please do say if that is incorrect)
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > Groups do not typically rebalance after a coordinator
> > > change.
> > > >> You
> > > >> > > > could
> > > >> > > > > > > potentially force a rebalance if the group is too big
> and
> > > kick
> > > >> > out
> > > >> > > > the
> > > >> > > > > > > slowest members or something. A more graceful solution
> is
> > > >> > probably
> > > >> > > to
> > > >> > > > > > just
> > > >> > > > > > > accept the current size and prevent it from getting
> bigger.
> > > We
> > > >> > > could
> > > >> > > > > log
> > > >> > > > > > a
> > > >> > > > > > > warning potentially.
> > > >> > > > > > >
> > > >> > > > > > > My thinking is that we should abstract away from
> conserving
> > > >> > > resources
> > > >> > > > > and
> > > >> > > > > > > > focus on giving control to the broker. The issue that
> > > >> spawned
> > > >> > > this
> > > >> > > > > KIP
> > > >> > > > > > > was
> > > >> > > > > > > > a memory problem but I feel this change is useful in a
> > > more
> > > >> > > general
> > > >> > > > > > way.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > So you probably already know why I'm asking about this.
> For
> > > >> > > consumer
> > > >> > > > > > groups
> > > >> > > > > > > anyway, resource usage would typically be proportional
> to
> > > the
> > > >> > > number
> > > >> > > > of
> > > >> > > > > > > partitions that a group is reading from and not the
> number
> > > of
> > > >> > > > members.
> > > >> > > > > > For
> > > >> > > > > > > example, consider the memory use in the offsets cache.
> The
> > > >> > benefit
> > > >> > > of
> > > >> > > > > > this
> > > >> > > > > > > KIP is probably limited to preventing "runaway" consumer
> > > >> groups
> > > >> > due
> > > >> > > > to
> > > >> > > > > > > leaks or some other application bug. That still seems
> useful
> > > >> > > though.
> > > >> > > > > > >
> > > >> > > > > > > I completely agree with this and I *ask everybody to
> chime
> > > in
> > > >> > with
> > > >> > > > > > opinions
> > > >> > > > > > > > on a sensible default value*.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > I think we would have to be very conservative. The group
> > > >> protocol
> > > >> > > is
> > > >> > > > > > > generic in some sense, so there may be use cases we
> don't
> > > >> know of
> > > >> > > > where
> > > >> > > > > > > larger groups are reasonable. Probably we should make
> this
> > > an
> > > >> > > opt-in
> > > >> > > > > > > feature so that we do not risk breaking anyone's
> application
> > > >> > after
> > > >> > > an
> > > >> > > > > > > upgrade. Either that, or use a very high default like
> 5,000.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > > Jason
> > > >> > > > > > >
> > > >> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > >> > > > > > > stanislav@confluent.io>
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hey Jason and Boyang, those were important comments
> > > >> > > > > > > >
> > > >> > > > > > > > > One suggestion I have is that it would be helpful
> to put
> > > >> your
> > > >> > > > > > reasoning
> > > >> > > > > > > > on deciding the current default value. For example, in
> > > >> certain
> > > >> > > use
> > > >> > > > > > cases
> > > >> > > > > > > at
> > > >> > > > > > > > Pinterest we are very likely to have more consumers
> than
> > > 250
> > > >> > when
> > > >> > > > we
> > > >> > > > > > > > configure 8 stream instances with 32 threads.
> > > >> > > > > > > > > For the effectiveness of this KIP, we should
> encourage
> > > >> people
> > > >> > > to
> > > >> > > > > > > discuss
> > > >> > > > > > > > their opinions on the default setting and ideally
> reach a
> > > >> > > > consensus.
> > > >> > > > > > > >
> > > >> > > > > > > > I completely agree with this and I *ask everybody to
> chime
> > > >> in
> > > >> > > with
> > > >> > > > > > > opinions
> > > >> > > > > > > > on a sensible default value*.
> > > >> > > > > > > > My thought process was that in the current model
> > > rebalances
> > > >> in
> > > >> > > > large
> > > >> > > > > > > groups
> > > >> > > > > > > > are more costly. I imagine most use cases in most
> Kafka
> > > >> users
> > > >> > do
> > > >> > > > not
> > > >> > > > > > > > require more than 250 consumers.
> > > >> > > > > > > > Boyang, you say that you are "likely to have... when
> > > we..."
> > > >> -
> > > >> > do
> > > >> > > > you
> > > >> > > > > > have
> > > >> > > > > > > > systems running with so many consumers in a group or
> are
> > > you
> > > >> > > > planning
> > > >> > > > > > > to? I
> > > >> > > > > > > > guess what I'm asking is whether this has been tested
> in
> > > >> > > production
> > > >> > > > > > with
> > > >> > > > > > > > the current rebalance model (ignoring KIP-345)
> > > >> > > > > > > >
> > > >> > > > > > > > >  Can you clarify the compatibility impact here? What
> > > >> > > > > > > > > will happen to groups that are already larger than
> the
> > > max
> > > >> > > size?
> > > >> > > > > > > > This is a very important question.
> > > >> > > > > > > > From my current understanding, when a coordinator
> broker
> > > >> gets
> > > >> > > shut
> > > >> > > > > > > > down during a cluster rolling upgrade, a replica will
> take
> > > >> > > > leadership
> > > >> > > > > > of
> > > >> > > > > > > > the `__offset_commits` partition. Clients will then
> find
> > > >> that
> > > >> > > > > > coordinator
> > > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> > > >> group,
> > > >> > > since
> > > >> > > > > the
> > > >> > > > > > > > cache of active consumers is not stored outside the
> > > >> > Coordinator's
> > > >> > > > > > memory.
> > > >> > > > > > > > (please do say if that is incorrect)
> > > >> > > > > > > > Then, I believe that working as if this is a new
> group is
> > > a
> > > >> > > > > reasonable
> > > >> > > > > > > > approach. Namely, fail joinGroups when the max.size is
> > > >> > exceeded.
> > > >> > > > > > > > What do you guys think about this? (I'll update the
> KIP
> > > >> after
> > > >> > we
> > > >> > > > > settle
> > > >> > > > > > > on
> > > >> > > > > > > > a solution)
> > > >> > > > > > > >
> > > >> > > > > > > > >  Also, just to be clear, the resource we are trying
> to
> > > >> > conserve
> > > >> > > > > here
> > > >> > > > > > is
> > > >> > > > > > > > what? Memory?
> > > >> > > > > > > > My thinking is that we should abstract away from
> > > conserving
> > > >> > > > resources
> > > >> > > > > > and
> > > >> > > > > > > > focus on giving control to the broker. The issue that
> > > >> spawned
> > > >> > > this
> > > >> > > > > KIP
> > > >> > > > > > > was
> > > >> > > > > > > > a memory problem but I feel this change is useful in a
> > > more
> > > >> > > general
> > > >> > > > > > way.
> > > >> > > > > > > It
> > > >> > > > > > > > limits the control clients have on the cluster and
> helps
> > > >> Kafka
> > > >> > > > > become a
> > > >> > > > > > > > more self-serving system. Admin/Ops teams can better
> > > control
> > > >> > the
> > > >> > > > > impact
> > > >> > > > > > > > application developers can have on a Kafka cluster
> with
> > > this
> > > >> > > change
> > > >> > > > > > > >
> > > >> > > > > > > > Best,
> > > >> > > > > > > > Stanislav
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > >> > > > jason@confluent.io>
> > > >> > > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Hi Stanislav,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks for the KIP. Can you clarify the
> compatibility
> > > >> impact
> > > >> > > > here?
> > > >> > > > > > What
> > > >> > > > > > > > > will happen to groups that are already larger than
> the
> > > max
> > > >> > > size?
> > > >> > > > > > Also,
> > > >> > > > > > > > just
> > > >> > > > > > > > > to be clear, the resource we are trying to conserve
> here
> > > >> is
> > > >> > > what?
> > > >> > > > > > > Memory?
> > > >> > > > > > > > >
> > > >> > > > > > > > > -Jason
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > > >> > > bchen11@outlook.com
> > > >> > > > >
> > > >> > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Thanks Stanislav for the update! One suggestion I
> have
> > > >> is
> > > >> > > that
> > > >> > > > it
> > > >> > > > > > > would
> > > >> > > > > > > > > be
> > > >> > > > > > > > > > helpful to put your
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > reasoning on deciding the current default value.
> For
> > > >> > example,
> > > >> > > > in
> > > >> > > > > > > > certain
> > > >> > > > > > > > > > use cases at Pinterest we are very likely
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > to have more consumers than 250 when we configure
> 8
> > > >> stream
> > > >> > > > > > instances
> > > >> > > > > > > > with
> > > >> > > > > > > > > > 32 threads.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > For the effectiveness of this KIP, we should
> encourage
> > > >> > people
> > > >> > > > to
> > > >> > > > > > > > discuss
> > > >> > > > > > > > > > their opinions on the default setting and ideally
> > > reach
> > > >> a
> > > >> > > > > > consensus.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Best,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Boyang
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > ________________________________
> > > >> > > > > > > > > > From: Stanislav Kozlovski <stanislav@confluent.io
> >
> > > >> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > >> > > > > > > > > > To: dev@kafka.apache.org
> > > >> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce
> group.max.size
> > > >> to
> > > >> > cap
> > > >> > > > > > member
> > > >> > > > > > > > > > metadata growth
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Hey everybody,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > It's been a week since this KIP and not much
> > > discussion
> > > >> has
> > > >> > > > been
> > > >> > > > > > > made.
> > > >> > > > > > > > > > I assume that this is a straight forward change
> and I
> > > >> will
> > > >> > > > open a
> > > >> > > > > > > > voting
> > > >> > > > > > > > > > thread in the next couple of days if nobody has
> > > >> anything to
> > > >> > > > > > suggest.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Best,
> > > >> > > > > > > > > > Stanislav
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav
> Kozlovski <
> > > >> > > > > > > > > > stanislav@confluent.io>
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > Greetings everybody,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > I have enriched the KIP a bit with a bigger
> > > Motivation
> > > >> > > > section
> > > >> > > > > > and
> > > >> > > > > > > > also
> > > >> > > > > > > > > > > renamed it.
> > > >> > > > > > > > > > > KIP:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > I'm looking forward to discussions around it.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Best,
> > > >> > > > > > > > > > > Stanislav
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav
> Kozlovski
> > > <
> > > >> > > > > > > > > > > stanislav@confluent.io> wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >> Hey there everybody,
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> Thanks for the introduction Boyang. I
> appreciate
> > > the
> > > >> > > effort
> > > >> > > > > you
> > > >> > > > > > > are
> > > >> > > > > > > > > > >> putting into improving consumer behavior in
> Kafka.
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> @Matt
> > > >> > > > > > > > > > >> I also believe the default value is high. In my
> > > >> opinion,
> > > >> > > we
> > > >> > > > > > should
> > > >> > > > > > > > aim
> > > >> > > > > > > > > > to
> > > >> > > > > > > > > > >> a default cap around 250. This is because in
> the
> > > >> current
> > > >> > > > model
> > > >> > > > > > any
> > > >> > > > > > > > > > consumer
> > > >> > > > > > > > > > >> rebalance is disrupting to every consumer. The
> > > bigger
> > > >> > the
> > > >> > > > > group,
> > > >> > > > > > > the
> > > >> > > > > > > > > > longer
> > > >> > > > > > > > > > >> this period of disruption.
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> If you have such a large consumer group,
> chances
> > > are
> > > >> > that
> > > >> > > > your
> > > >> > > > > > > > > > >> client-side logic could be structured better
> and
> > > that
> > > >> > you
> > > >> > > > are
> > > >> > > > > > not
> > > >> > > > > > > > > using
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > >> high number of consumers to achieve high
> > > throughput.
> > > >> > > > > > > > > > >> 250 can still be considered of a high upper
> bound,
> > > I
> > > >> > > believe
> > > >> > > > > in
> > > >> > > > > > > > > practice
> > > >> > > > > > > > > > >> users should aim to not go over 100 consumers
> per
> > > >> > consumer
> > > >> > > > > > group.
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> In regards to the cap being global/per-broker,
> I
> > > >> think
> > > >> > > that
> > > >> > > > we
> > > >> > > > > > > > should
> > > >> > > > > > > > > > >> consider whether we want it to be global or
> > > >> *per-topic*.
> > > >> > > For
> > > >> > > > > the
> > > >> > > > > > > > time
> > > >> > > > > > > > > > >> being, I believe that having it per-topic with
> a
> > > >> global
> > > >> > > > > default
> > > >> > > > > > > > might
> > > >> > > > > > > > > be
> > > >> > > > > > > > > > >> the best situation. Having it global only
> seems a
> > > bit
> > > >> > > > > > restricting
> > > >> > > > > > > to
> > > >> > > > > > > > > me
> > > >> > > > > > > > > > and
> > > >> > > > > > > > > > >> it never hurts to support more fine-grained
> > > >> > > configurability
> > > >> > > > > > (given
> > > >> > > > > > > > > it's
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > >> same config, not a new one being introduced).
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > >> > > > > > bchen11@outlook.com
> > > >> > > > > > > >
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still
> open to
> > > >> any
> > > >> > > > > > suggestion
> > > >> > > > > > > to
> > > >> > > > > > > > > > >>> change the default value. Meanwhile I just
> want to
> > > >> > point
> > > >> > > > out
> > > >> > > > > > that
> > > >> > > > > > > > > this
> > > >> > > > > > > > > > >>> value is a just last line of defense, not a
> real
> > > >> > scenario
> > > >> > > > we
> > > >> > > > > > > would
> > > >> > > > > > > > > > expect.
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav
> and
> > > he
> > > >> > would
> > > >> > > > be
> > > >> > > > > > > > driving
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the
> > > idea
> > > >> in
> > > >> > > the
> > > >> > > > > > first
> > > >> > > > > > > > > place
> > > >> > > > > > > > > > and
> > > >> > > > > > > > > > >>> had already come up a draft design, while I
> will
> > > >> keep
> > > >> > > > > focusing
> > > >> > > > > > on
> > > >> > > > > > > > > > KIP-345
> > > >> > > > > > > > > > >>> effort to ensure solving the edge case
> described
> > > in
> > > >> the
> > > >> > > > JIRA<
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > > >> > > > > > > > > > >.
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Boyang
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> ________________________________
> > > >> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > >> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > >> > > > > > > > > > >>> To: dev@kafka.apache.org
> > > >> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> > > >> group.max.size
> > > >> > to
> > > >> > > > cap
> > > >> > > > > > > > member
> > > >> > > > > > > > > > >>> metadata growth
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Thanks for the KIP.
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Will this cap be a global cap across the
> entire
> > > >> cluster
> > > >> > > or
> > > >> > > > > per
> > > >> > > > > > > > > broker?
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Either way the default value seems a bit high
> to
> > > me,
> > > >> > but
> > > >> > > > that
> > > >> > > > > > > could
> > > >> > > > > > > > > > just
> > > >> > > > > > > > > > >>> be
> > > >> > > > > > > > > > >>> from my own usage patterns. I'd have probably
> > > >> started
> > > >> > > with
> > > >> > > > > 500
> > > >> > > > > > or
> > > >> > > > > > > > 1k
> > > >> > > > > > > > > > but
> > > >> > > > > > > > > > >>> could be easily convinced that's wrong.
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> Thanks,
> > > >> > > > > > > > > > >>> Matt
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > >> > > > > > bchen11@outlook.com
> > > >> > > > > > > >
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>> > Hey folks,
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> > I would like to start a discussion on
> KIP-389:
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> > This is a pretty simple change to cap the
> > > consumer
> > > >> > > group
> > > >> > > > > size
> > > >> > > > > > > for
> > > >> > > > > > > > > > >>> broker
> > > >> > > > > > > > > > >>> > stability. Give me your valuable feedback
> when
> > > you
> > > >> > got
> > > >> > > > > time.
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>> > Thank you!
> > > >> > > > > > > > > > >>> >
> > > >> > > > > > > > > > >>>
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> --
> > > >> > > > > > > > > > >> Best,
> > > >> > > > > > > > > > >> Stanislav
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > --
> > > >> > > > > > > > > > > Best,
> > > >> > > > > > > > > > > Stanislav
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > --
> > > >> > > > > > > > > > Best,
> > > >> > > > > > > > > > Stanislav
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > --
> > > >> > > > > > > > Best,
> > > >> > > > > > > > Stanislav
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > > Best,
> > > >> > > > > > Stanislav
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > Best,
> > > >> > > > Stanislav
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Best,
> > > >> > > Stanislav
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Best,
> > > >> > Stanislav
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>
>

-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Thanks Gwen for the suggestion! +1 on the guidance of defining group.max.size. I guess a sample formula would be:
2 * (# of brokers * average metadata cache size * 80%) / (# of consumer groups * size of a single member metadata)

if we assumed non-skewed partition assignment and pretty fair consumer group consumption. The "2" is the 95 percentile of normal distribution and 80% is just to buffer some memory capacity which are both open to discussion. This config should be useful for Kafka platform team to make sure one extreme large consumer group won't bring down the whole cluster.

What do you think?

Best,
Boyang

________________________________
From: Gwen Shapira <gw...@confluent.io>
Sent: Thursday, January 3, 2019 2:59 AM
To: dev
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Sorry for joining the fun late, but I think the problem we are solving
evolved a bit in the thread, and I'd like to have better understanding
of the problem before voting :)

Both KIP and discussion assert that large groups are a problem, but
they are kinda inconsistent regarding why they are a problem and whose
problem they are...
1. The KIP itself states that the main issue with large groups are
long rebalance times. Per my understanding, this is mostly a problem
for the application that consumes data, but not really a problem for
the brokers themselves, so broker admins probably don't and shouldn't
care about it. Also, my understanding is that this is a problem for
consumer groups, but not necessarily a problem for other group types.
2. The discussion highlights the issue of "run away" groups that
essentially create tons of members needlessly and use up lots of
broker memory. This is something the broker admins will care about a
lot. And is also a problem for every group that uses coordinators and
not just consumers. And since the memory in question is the metadata
cache, it probably has the largest impact on Kafka Streams
applications, since they have lots of metadata.

The solution proposed makes the most sense in the context of #2, so
perhaps we should update the motivation section of the KIP to reflect
that.

The reason I'm probing here is that in my opinion we have to give our
users some guidelines on what a reasonable limit is (otherwise, how
will they know?). Calculating the impact of group-size on rebalance
time in order to make good recommendations will take a significant
effort. On the other hand, informing users regarding the memory
footprint of a consumer in a group and using that to make a reasonable
suggestion isn't hard.

Gwen


On Sun, Dec 30, 2018 at 12:51 PM Stanislav Kozlovski
<st...@confluent.io> wrote:
>
> Thanks Boyang,
>
> If there aren't any more thoughts on the KIP I'll start a vote thread in
> the new year
>
> On Sat, Dec 29, 2018 at 12:58 AM Boyang Chen <bc...@outlook.com> wrote:
>
> > Yep Stanislav, that's what I'm proposing, and your explanation makes sense.
> >
> > Boyang
> >
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Friday, December 28, 2018 7:59 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hey there everybody, let's work on wrapping this discussion up.
> >
> > @Boyang, could you clarify what you mean by
> > > One more question is whether you feel we should enforce group size cap
> > statically or on runtime?
> > Is that related to the option of enabling this config via the dynamic
> > broker config feature?
> >
> > Regarding that - I feel it's useful to have and I also think it might not
> > introduce additional complexity. Ås long as we handle the config being
> > changed midway through a rebalance (via using the old value) we should be
> > good to go.
> >
> > On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > wrote:
> >
> > > Hey Jason,
> > >
> > > Yes, that is what I meant by
> > > > Given those constraints, I think that we can simply mark the group as
> > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> > > but I can't seem to come up with a better alternative
> > > So either the timeout or all members calling joinGroup, yes
> > >
> > >
> > > On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com> wrote:
> > >
> > >> Hey Jason,
> > >>
> > >> I think this is the correct understanding. One more question is whether
> > >> you feel
> > >> we should enforce group size cap statically or on runtime?
> > >>
> > >> Boyang
> > >> ________________________________
> > >> From: Jason Gustafson <ja...@confluent.io>
> > >> Sent: Tuesday, December 11, 2018 3:24 AM
> > >> To: dev
> > >> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > >> metadata growth
> > >>
> > >> Hey Stanislav,
> > >>
> > >> Just to clarify, I think what you're suggesting is something like this
> > in
> > >> order to gracefully shrink the group:
> > >>
> > >> 1. Transition the group to PREPARING_REBALANCE. No members are kicked
> > out.
> > >> 2. Continue to allow offset commits and heartbeats for all current
> > >> members.
> > >> 3. Allow the first n members that send JoinGroup to stay in the group,
> > but
> > >> wait for the JoinGroup (or session timeout) from all active members
> > before
> > >> finishing the rebalance.
> > >>
> > >> So basically we try to give the current members an opportunity to finish
> > >> work, but we prevent some of them from rejoining after the rebalance
> > >> completes. It sounds reasonable if I've understood correctly.
> > >>
> > >> Thanks,
> > >> Jason
> > >>
> > >>
> > >>
> > >> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com> wrote:
> > >>
> > >> > Yep, LGTM on my side. Thanks Stanislav!
> > >> > ________________________________
> > >> > From: Stanislav Kozlovski <st...@confluent.io>
> > >> > Sent: Friday, December 7, 2018 8:51 PM
> > >> > To: dev@kafka.apache.org
> > >> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > >> > metadata growth
> > >> >
> > >> > Hi,
> > >> >
> > >> > We discussed this offline with Boyang and figured that it's best to
> > not
> > >> > wait on the Cooperative Rebalancing proposal. Our thinking is that we
> > >> can
> > >> > just force a rebalance from the broker, allowing consumers to commit
> > >> > offsets if their rebalanceListener is configured correctly.
> > >> > When rebalancing improvements are implemented, we assume that they
> > would
> > >> > improve KIP-389's behavior as well as the normal rebalance scenarios
> > >> >
> > >> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com>
> > >> wrote:
> > >> >
> > >> > > Hey Stanislav,
> > >> > >
> > >> > > thanks for the question! `Trivial rebalance` means "we don't start
> > >> > > reassignment right now, but you need to know it's coming soon
> > >> > > and you should start preparation".
> > >> > >
> > >> > > An example KStream use case is that before actually starting to
> > shrink
> > >> > the
> > >> > > consumer group, we need to
> > >> > > 1. partition the consumer group into two subgroups, where one will
> > be
> > >> > > offline soon and the other will keep serving;
> > >> > > 2. make sure the states associated with near-future offline
> > consumers
> > >> are
> > >> > > successfully replicated on the serving ones.
> > >> > >
> > >> > > As I have mentioned shrinking the consumer group is pretty much
> > >> > equivalent
> > >> > > to group scaling down, so we could think of this
> > >> > > as an add-on use case for cluster scaling. So my understanding is
> > that
> > >> > the
> > >> > > KIP-389 could be sequenced within our cooperative rebalancing<
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > >> > > >
> > >> > > proposal.
> > >> > >
> > >> > > Let me know if this makes sense.
> > >> > >
> > >> > > Best,
> > >> > > Boyang
> > >> > > ________________________________
> > >> > > From: Stanislav Kozlovski <st...@confluent.io>
> > >> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > >> > > To: dev@kafka.apache.org
> > >> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > >> > > metadata growth
> > >> > >
> > >> > > Hey Boyang,
> > >> > >
> > >> > > I think we still need to take care of group shrinkage because even
> > if
> > >> > users
> > >> > > change the config value we cannot guarantee that all consumer groups
> > >> > would
> > >> > > have been manually shrunk.
> > >> > >
> > >> > > Regarding 2., I agree that forcefully triggering a rebalance might
> > be
> > >> the
> > >> > > most intuitive way to handle the situation.
> > >> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar with
> > the
> > >> > > term.
> > >> > > I was thinking that maybe we could force a rebalance, which would
> > >> cause
> > >> > > consumers to commit their offsets (given their rebalanceListener is
> > >> > > configured correctly) and subsequently reject some of the incoming
> > >> > > `joinGroup` requests. Does that sound like it would work?
> > >> > >
> > >> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com>
> > >> wrote:
> > >> > >
> > >> > > > Hey Stanislav,
> > >> > > >
> > >> > > > I read the latest KIP and saw that we already changed the default
> > >> value
> > >> > > to
> > >> > > > -1. Do
> > >> > > > we still need to take care of the consumer group shrinking when
> > >> doing
> > >> > the
> > >> > > > upgrade?
> > >> > > >
> > >> > > > However this is an interesting topic that worth discussing.
> > Although
> > >> > > > rolling
> > >> > > > upgrade is fine, `consumer.group.max.size` could always have
> > >> conflict
> > >> > > with
> > >> > > > the current
> > >> > > > consumer group size which means we need to adhere to one source of
> > >> > truth.
> > >> > > >
> > >> > > > 1.Choose the current group size, which means we never interrupt
> > the
> > >> > > > consumer group until
> > >> > > > it transits to PREPARE_REBALANCE. And we keep track of how many
> > join
> > >> > > group
> > >> > > > requests
> > >> > > > we have seen so far during PREPARE_REBALANCE. After reaching the
> > >> > consumer
> > >> > > > cap,
> > >> > > > we start to inform over provisioned consumers that you should send
> > >> > > > LeaveGroupRequest and
> > >> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could
> > >> mark
> > >> > > > extra members
> > >> > > > as hot backup and rebalance without them.
> > >> > > >
> > >> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> > >> rebalancing
> > >> > > > (you proposed) could be of help here.
> > >> > > > When a new cap is enforced, leader should be notified. If the
> > >> current
> > >> > > > group size is already over limit, leader
> > >> > > > shall trigger a trivial rebalance to shuffle some topic partitions
> > >> and
> > >> > > let
> > >> > > > a subset of consumers prepare the ownership
> > >> > > > transition. Until they are ready, we trigger a real rebalance to
> > >> remove
> > >> > > > over-provisioned consumers. It is pretty much
> > >> > > > equivalent to `how do we scale down the consumer group without
> > >> > > > interrupting the current processing`.
> > >> > > >
> > >> > > > I personally feel inclined to 2 because we could kill two birds
> > with
> > >> > one
> > >> > > > stone in a generic way. What do you think?
> > >> > > >
> > >> > > > Boyang
> > >> > > > ________________________________
> > >> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > >> > > > Sent: Monday, December 3, 2018 8:35 PM
> > >> > > > To: dev@kafka.apache.org
> > >> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > member
> > >> > > > metadata growth
> > >> > > >
> > >> > > > Hi Jason,
> > >> > > >
> > >> > > > > 2. Do you think we should make this a dynamic config?
> > >> > > > I'm not sure. Looking at the config from the perspective of a
> > >> > > prescriptive
> > >> > > > config, we may get away with not updating it dynamically.
> > >> > > > But in my opinion, it always makes sense to have a config be
> > >> > dynamically
> > >> > > > configurable. As long as we limit it to being a cluster-wide
> > >> config, we
> > >> > > > should be fine.
> > >> > > >
> > >> > > > > 1. I think it would be helpful to clarify the details on how the
> > >> > > > coordinator will shrink the group. It will need to choose which
> > >> members
> > >> > > to
> > >> > > > remove. Are we going to give current members an opportunity to
> > >> commit
> > >> > > > offsets before kicking them from the group?
> > >> > > >
> > >> > > > This turns out to be somewhat tricky. I think that we may not be
> > >> able
> > >> > to
> > >> > > > guarantee that consumers don't process a message twice.
> > >> > > > My initial approach was to do as much as we could to let consumers
> > >> > commit
> > >> > > > offsets.
> > >> > > >
> > >> > > > I was thinking that we mark a group to be shrunk, we could keep a
> > >> map
> > >> > of
> > >> > > > consumer_id->boolean indicating whether they have committed
> > >> offsets. I
> > >> > > then
> > >> > > > thought we could delay the rebalance until every consumer commits
> > >> (or
> > >> > > some
> > >> > > > time passes).
> > >> > > > In the meantime, we would block all incoming fetch calls (by
> > either
> > >> > > > returning empty records or a retriable error) and we would
> > continue
> > >> to
> > >> > > > accept offset commits (even twice for a single consumer)
> > >> > > >
> > >> > > > I see two problems with this approach:
> > >> > > > * We have async offset commits, which implies that we can receive
> > >> fetch
> > >> > > > requests before the offset commit req has been handled. i.e
> > consmer
> > >> > sends
> > >> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in
> > the
> > >> > > > broker. Meaning we could have saved the offsets for B but
> > rebalance
> > >> > > before
> > >> > > > the offsetCommit for the offsets processed in C come in.
> > >> > > > * KIP-392 Allow consumers to fetch from closest replica
> > >> > > > <
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > >> > > > >
> > >> > > > would
> > >> > > > make it significantly harder to block poll() calls on consumers
> > >> whose
> > >> > > > groups are being shrunk. Even if we implemented a solution, the
> > same
> > >> > race
> > >> > > > condition noted above seems to apply and probably others
> > >> > > >
> > >> > > >
> > >> > > > Given those constraints, I think that we can simply mark the group
> > >> as
> > >> > > > `PreparingRebalance` with a rebalanceTimeout of the server
> > setting `
> > >> > > > group.max.session.timeout.ms`. That's a bit long by default (5
> > >> > minutes)
> > >> > > > but
> > >> > > > I can't seem to come up with a better alternative
> > >> > > >
> > >> > > > I'm interested in hearing your thoughts.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Stanislav
> > >> > > >
> > >> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <
> > jason@confluent.io
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hey Stanislav,
> > >> > > > >
> > >> > > > > What do you think about the use case I mentioned in my previous
> > >> reply
> > >> > > > about
> > >> > > > > > a more resilient self-service Kafka? I believe the benefit
> > >> there is
> > >> > > > > bigger.
> > >> > > > >
> > >> > > > >
> > >> > > > > I see this config as analogous to the open file limit. Probably
> > >> this
> > >> > > > limit
> > >> > > > > was intended to be prescriptive at some point about what was
> > >> deemed a
> > >> > > > > reasonable number of open files for an application. But mostly
> > >> people
> > >> > > > treat
> > >> > > > > it as an annoyance which they have to work around. If it happens
> > >> to
> > >> > be
> > >> > > > hit,
> > >> > > > > usually you just increase it because it is not tied to an actual
> > >> > > resource
> > >> > > > > constraint. However, occasionally hitting the limit does
> > indicate
> > >> an
> > >> > > > > application bug such as a leak, so I wouldn't say it is useless.
> > >> > > > Similarly,
> > >> > > > > the issue in KAFKA-7610 was a consumer leak and having this
> > limit
> > >> > would
> > >> > > > > have allowed the problem to be detected before it impacted the
> > >> > cluster.
> > >> > > > To
> > >> > > > > me, that's the main benefit. It's possible that it could be used
> > >> > > > > prescriptively to prevent poor usage of groups, but like the
> > open
> > >> > file
> > >> > > > > limit, I suspect administrators will just set it large enough
> > that
> > >> > > users
> > >> > > > > are unlikely to complain.
> > >> > > > >
> > >> > > > > Anyway, just a couple additional questions:
> > >> > > > >
> > >> > > > > 1. I think it would be helpful to clarify the details on how the
> > >> > > > > coordinator will shrink the group. It will need to choose which
> > >> > members
> > >> > > > to
> > >> > > > > remove. Are we going to give current members an opportunity to
> > >> commit
> > >> > > > > offsets before kicking them from the group?
> > >> > > > >
> > >> > > > > 2. Do you think we should make this a dynamic config?
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Jason
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > >> > > > > stanislav@confluent.io>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi Jason,
> > >> > > > > >
> > >> > > > > > You raise some very valid points.
> > >> > > > > >
> > >> > > > > > > The benefit of this KIP is probably limited to preventing
> > >> > "runaway"
> > >> > > > > > consumer groups due to leaks or some other application bug
> > >> > > > > > What do you think about the use case I mentioned in my
> > previous
> > >> > reply
> > >> > > > > about
> > >> > > > > > a more resilient self-service Kafka? I believe the benefit
> > >> there is
> > >> > > > > bigger
> > >> > > > > >
> > >> > > > > > * Default value
> > >> > > > > > You're right, we probably do need to be conservative. Big
> > >> consumer
> > >> > > > groups
> > >> > > > > > are considered an anti-pattern and my goal was to also hint at
> > >> this
> > >> > > > > through
> > >> > > > > > the config's default. Regardless, it is better to not have the
> > >> > > > potential
> > >> > > > > to
> > >> > > > > > break applications with an upgrade.
> > >> > > > > > Choosing between the default of something big like 5000 or an
> > >> > opt-in
> > >> > > > > > option, I think we should go with the *disabled default
> > option*
> > >> > > (-1).
> > >> > > > > > The only benefit we would get from a big default of 5000 is
> > >> default
> > >> > > > > > protection against buggy/malicious applications that hit the
> > >> > > KAFKA-7610
> > >> > > > > > issue.
> > >> > > > > > While this KIP was spawned from that issue, I believe its
> > value
> > >> is
> > >> > > > > enabling
> > >> > > > > > the possibility of protection and helping move towards a more
> > >> > > > > self-service
> > >> > > > > > Kafka. I also think that a default value of 5000 might be
> > >> > misleading
> > >> > > to
> > >> > > > > > users and lead them to think that big consumer groups (> 250)
> > >> are a
> > >> > > > good
> > >> > > > > > thing.
> > >> > > > > >
> > >> > > > > > The good news is that KAFKA-7610 should be fully resolved and
> > >> the
> > >> > > > > rebalance
> > >> > > > > > protocol should, in general, be more solid after the planned
> > >> > > > improvements
> > >> > > > > > in KIP-345 and KIP-394.
> > >> > > > > >
> > >> > > > > > * Handling bigger groups during upgrade
> > >> > > > > > I now see that we store the state of consumer groups in the
> > log
> > >> and
> > >> > > > why a
> > >> > > > > > rebalance isn't expected during a rolling upgrade.
> > >> > > > > > Since we're going with the default value of the max.size being
> > >> > > > disabled,
> > >> > > > > I
> > >> > > > > > believe we can afford to be more strict here.
> > >> > > > > > During state reloading of a new Coordinator with a defined
> > >> > > > max.group.size
> > >> > > > > > config, I believe we should *force* rebalances for groups that
> > >> > exceed
> > >> > > > the
> > >> > > > > > configured size. Then, only some consumers will be able to
> > join
> > >> and
> > >> > > the
> > >> > > > > max
> > >> > > > > > size invariant will be satisfied.
> > >> > > > > >
> > >> > > > > > I updated the KIP with a migration plan, rejected alternatives
> > >> and
> > >> > > the
> > >> > > > > new
> > >> > > > > > default value.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Stanislav
> > >> > > > > >
> > >> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > >> > jason@confluent.io>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hey Stanislav,
> > >> > > > > > >
> > >> > > > > > > Clients will then find that coordinator
> > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> > >> group,
> > >> > > since
> > >> > > > > the
> > >> > > > > > > > cache of active consumers is not stored outside the
> > >> > Coordinator's
> > >> > > > > > memory.
> > >> > > > > > > > (please do say if that is incorrect)
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Groups do not typically rebalance after a coordinator
> > change.
> > >> You
> > >> > > > could
> > >> > > > > > > potentially force a rebalance if the group is too big and
> > kick
> > >> > out
> > >> > > > the
> > >> > > > > > > slowest members or something. A more graceful solution is
> > >> > probably
> > >> > > to
> > >> > > > > > just
> > >> > > > > > > accept the current size and prevent it from getting bigger.
> > We
> > >> > > could
> > >> > > > > log
> > >> > > > > > a
> > >> > > > > > > warning potentially.
> > >> > > > > > >
> > >> > > > > > > My thinking is that we should abstract away from conserving
> > >> > > resources
> > >> > > > > and
> > >> > > > > > > > focus on giving control to the broker. The issue that
> > >> spawned
> > >> > > this
> > >> > > > > KIP
> > >> > > > > > > was
> > >> > > > > > > > a memory problem but I feel this change is useful in a
> > more
> > >> > > general
> > >> > > > > > way.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > So you probably already know why I'm asking about this. For
> > >> > > consumer
> > >> > > > > > groups
> > >> > > > > > > anyway, resource usage would typically be proportional to
> > the
> > >> > > number
> > >> > > > of
> > >> > > > > > > partitions that a group is reading from and not the number
> > of
> > >> > > > members.
> > >> > > > > > For
> > >> > > > > > > example, consider the memory use in the offsets cache. The
> > >> > benefit
> > >> > > of
> > >> > > > > > this
> > >> > > > > > > KIP is probably limited to preventing "runaway" consumer
> > >> groups
> > >> > due
> > >> > > > to
> > >> > > > > > > leaks or some other application bug. That still seems useful
> > >> > > though.
> > >> > > > > > >
> > >> > > > > > > I completely agree with this and I *ask everybody to chime
> > in
> > >> > with
> > >> > > > > > opinions
> > >> > > > > > > > on a sensible default value*.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > I think we would have to be very conservative. The group
> > >> protocol
> > >> > > is
> > >> > > > > > > generic in some sense, so there may be use cases we don't
> > >> know of
> > >> > > > where
> > >> > > > > > > larger groups are reasonable. Probably we should make this
> > an
> > >> > > opt-in
> > >> > > > > > > feature so that we do not risk breaking anyone's application
> > >> > after
> > >> > > an
> > >> > > > > > > upgrade. Either that, or use a very high default like 5,000.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Jason
> > >> > > > > > >
> > >> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > >> > > > > > > stanislav@confluent.io>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hey Jason and Boyang, those were important comments
> > >> > > > > > > >
> > >> > > > > > > > > One suggestion I have is that it would be helpful to put
> > >> your
> > >> > > > > > reasoning
> > >> > > > > > > > on deciding the current default value. For example, in
> > >> certain
> > >> > > use
> > >> > > > > > cases
> > >> > > > > > > at
> > >> > > > > > > > Pinterest we are very likely to have more consumers than
> > 250
> > >> > when
> > >> > > > we
> > >> > > > > > > > configure 8 stream instances with 32 threads.
> > >> > > > > > > > > For the effectiveness of this KIP, we should encourage
> > >> people
> > >> > > to
> > >> > > > > > > discuss
> > >> > > > > > > > their opinions on the default setting and ideally reach a
> > >> > > > consensus.
> > >> > > > > > > >
> > >> > > > > > > > I completely agree with this and I *ask everybody to chime
> > >> in
> > >> > > with
> > >> > > > > > > opinions
> > >> > > > > > > > on a sensible default value*.
> > >> > > > > > > > My thought process was that in the current model
> > rebalances
> > >> in
> > >> > > > large
> > >> > > > > > > groups
> > >> > > > > > > > are more costly. I imagine most use cases in most Kafka
> > >> users
> > >> > do
> > >> > > > not
> > >> > > > > > > > require more than 250 consumers.
> > >> > > > > > > > Boyang, you say that you are "likely to have... when
> > we..."
> > >> -
> > >> > do
> > >> > > > you
> > >> > > > > > have
> > >> > > > > > > > systems running with so many consumers in a group or are
> > you
> > >> > > > planning
> > >> > > > > > > to? I
> > >> > > > > > > > guess what I'm asking is whether this has been tested in
> > >> > > production
> > >> > > > > > with
> > >> > > > > > > > the current rebalance model (ignoring KIP-345)
> > >> > > > > > > >
> > >> > > > > > > > >  Can you clarify the compatibility impact here? What
> > >> > > > > > > > > will happen to groups that are already larger than the
> > max
> > >> > > size?
> > >> > > > > > > > This is a very important question.
> > >> > > > > > > > From my current understanding, when a coordinator broker
> > >> gets
> > >> > > shut
> > >> > > > > > > > down during a cluster rolling upgrade, a replica will take
> > >> > > > leadership
> > >> > > > > > of
> > >> > > > > > > > the `__offset_commits` partition. Clients will then find
> > >> that
> > >> > > > > > coordinator
> > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> > >> group,
> > >> > > since
> > >> > > > > the
> > >> > > > > > > > cache of active consumers is not stored outside the
> > >> > Coordinator's
> > >> > > > > > memory.
> > >> > > > > > > > (please do say if that is incorrect)
> > >> > > > > > > > Then, I believe that working as if this is a new group is
> > a
> > >> > > > > reasonable
> > >> > > > > > > > approach. Namely, fail joinGroups when the max.size is
> > >> > exceeded.
> > >> > > > > > > > What do you guys think about this? (I'll update the KIP
> > >> after
> > >> > we
> > >> > > > > settle
> > >> > > > > > > on
> > >> > > > > > > > a solution)
> > >> > > > > > > >
> > >> > > > > > > > >  Also, just to be clear, the resource we are trying to
> > >> > conserve
> > >> > > > > here
> > >> > > > > > is
> > >> > > > > > > > what? Memory?
> > >> > > > > > > > My thinking is that we should abstract away from
> > conserving
> > >> > > > resources
> > >> > > > > > and
> > >> > > > > > > > focus on giving control to the broker. The issue that
> > >> spawned
> > >> > > this
> > >> > > > > KIP
> > >> > > > > > > was
> > >> > > > > > > > a memory problem but I feel this change is useful in a
> > more
> > >> > > general
> > >> > > > > > way.
> > >> > > > > > > It
> > >> > > > > > > > limits the control clients have on the cluster and helps
> > >> Kafka
> > >> > > > > become a
> > >> > > > > > > > more self-serving system. Admin/Ops teams can better
> > control
> > >> > the
> > >> > > > > impact
> > >> > > > > > > > application developers can have on a Kafka cluster with
> > this
> > >> > > change
> > >> > > > > > > >
> > >> > > > > > > > Best,
> > >> > > > > > > > Stanislav
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > >> > > > jason@confluent.io>
> > >> > > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Hi Stanislav,
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks for the KIP. Can you clarify the compatibility
> > >> impact
> > >> > > > here?
> > >> > > > > > What
> > >> > > > > > > > > will happen to groups that are already larger than the
> > max
> > >> > > size?
> > >> > > > > > Also,
> > >> > > > > > > > just
> > >> > > > > > > > > to be clear, the resource we are trying to conserve here
> > >> is
> > >> > > what?
> > >> > > > > > > Memory?
> > >> > > > > > > > >
> > >> > > > > > > > > -Jason
> > >> > > > > > > > >
> > >> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > >> > > bchen11@outlook.com
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Thanks Stanislav for the update! One suggestion I have
> > >> is
> > >> > > that
> > >> > > > it
> > >> > > > > > > would
> > >> > > > > > > > > be
> > >> > > > > > > > > > helpful to put your
> > >> > > > > > > > > >
> > >> > > > > > > > > > reasoning on deciding the current default value. For
> > >> > example,
> > >> > > > in
> > >> > > > > > > > certain
> > >> > > > > > > > > > use cases at Pinterest we are very likely
> > >> > > > > > > > > >
> > >> > > > > > > > > > to have more consumers than 250 when we configure 8
> > >> stream
> > >> > > > > > instances
> > >> > > > > > > > with
> > >> > > > > > > > > > 32 threads.
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > For the effectiveness of this KIP, we should encourage
> > >> > people
> > >> > > > to
> > >> > > > > > > > discuss
> > >> > > > > > > > > > their opinions on the default setting and ideally
> > reach
> > >> a
> > >> > > > > > consensus.
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > >
> > >> > > > > > > > > > Boyang
> > >> > > > > > > > > >
> > >> > > > > > > > > > ________________________________
> > >> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > >> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > >> > > > > > > > > > To: dev@kafka.apache.org
> > >> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size
> > >> to
> > >> > cap
> > >> > > > > > member
> > >> > > > > > > > > > metadata growth
> > >> > > > > > > > > >
> > >> > > > > > > > > > Hey everybody,
> > >> > > > > > > > > >
> > >> > > > > > > > > > It's been a week since this KIP and not much
> > discussion
> > >> has
> > >> > > > been
> > >> > > > > > > made.
> > >> > > > > > > > > > I assume that this is a straight forward change and I
> > >> will
> > >> > > > open a
> > >> > > > > > > > voting
> > >> > > > > > > > > > thread in the next couple of days if nobody has
> > >> anything to
> > >> > > > > > suggest.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > > Stanislav
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > >> > > > > > > > > > stanislav@confluent.io>
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Greetings everybody,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I have enriched the KIP a bit with a bigger
> > Motivation
> > >> > > > section
> > >> > > > > > and
> > >> > > > > > > > also
> > >> > > > > > > > > > > renamed it.
> > >> > > > > > > > > > > KIP:
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I'm looking forward to discussions around it.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Best,
> > >> > > > > > > > > > > Stanislav
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski
> > <
> > >> > > > > > > > > > > stanislav@confluent.io> wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >> Hey there everybody,
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> Thanks for the introduction Boyang. I appreciate
> > the
> > >> > > effort
> > >> > > > > you
> > >> > > > > > > are
> > >> > > > > > > > > > >> putting into improving consumer behavior in Kafka.
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> @Matt
> > >> > > > > > > > > > >> I also believe the default value is high. In my
> > >> opinion,
> > >> > > we
> > >> > > > > > should
> > >> > > > > > > > aim
> > >> > > > > > > > > > to
> > >> > > > > > > > > > >> a default cap around 250. This is because in the
> > >> current
> > >> > > > model
> > >> > > > > > any
> > >> > > > > > > > > > consumer
> > >> > > > > > > > > > >> rebalance is disrupting to every consumer. The
> > bigger
> > >> > the
> > >> > > > > group,
> > >> > > > > > > the
> > >> > > > > > > > > > longer
> > >> > > > > > > > > > >> this period of disruption.
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> If you have such a large consumer group, chances
> > are
> > >> > that
> > >> > > > your
> > >> > > > > > > > > > >> client-side logic could be structured better and
> > that
> > >> > you
> > >> > > > are
> > >> > > > > > not
> > >> > > > > > > > > using
> > >> > > > > > > > > > the
> > >> > > > > > > > > > >> high number of consumers to achieve high
> > throughput.
> > >> > > > > > > > > > >> 250 can still be considered of a high upper bound,
> > I
> > >> > > believe
> > >> > > > > in
> > >> > > > > > > > > practice
> > >> > > > > > > > > > >> users should aim to not go over 100 consumers per
> > >> > consumer
> > >> > > > > > group.
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> In regards to the cap being global/per-broker, I
> > >> think
> > >> > > that
> > >> > > > we
> > >> > > > > > > > should
> > >> > > > > > > > > > >> consider whether we want it to be global or
> > >> *per-topic*.
> > >> > > For
> > >> > > > > the
> > >> > > > > > > > time
> > >> > > > > > > > > > >> being, I believe that having it per-topic with a
> > >> global
> > >> > > > > default
> > >> > > > > > > > might
> > >> > > > > > > > > be
> > >> > > > > > > > > > >> the best situation. Having it global only seems a
> > bit
> > >> > > > > > restricting
> > >> > > > > > > to
> > >> > > > > > > > > me
> > >> > > > > > > > > > and
> > >> > > > > > > > > > >> it never hurts to support more fine-grained
> > >> > > configurability
> > >> > > > > > (given
> > >> > > > > > > > > it's
> > >> > > > > > > > > > the
> > >> > > > > > > > > > >> same config, not a new one being introduced).
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > >> > > > > > bchen11@outlook.com
> > >> > > > > > > >
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to
> > >> any
> > >> > > > > > suggestion
> > >> > > > > > > to
> > >> > > > > > > > > > >>> change the default value. Meanwhile I just want to
> > >> > point
> > >> > > > out
> > >> > > > > > that
> > >> > > > > > > > > this
> > >> > > > > > > > > > >>> value is a just last line of defense, not a real
> > >> > scenario
> > >> > > > we
> > >> > > > > > > would
> > >> > > > > > > > > > expect.
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and
> > he
> > >> > would
> > >> > > > be
> > >> > > > > > > > driving
> > >> > > > > > > > > > the
> > >> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the
> > idea
> > >> in
> > >> > > the
> > >> > > > > > first
> > >> > > > > > > > > place
> > >> > > > > > > > > > and
> > >> > > > > > > > > > >>> had already come up a draft design, while I will
> > >> keep
> > >> > > > > focusing
> > >> > > > > > on
> > >> > > > > > > > > > KIP-345
> > >> > > > > > > > > > >>> effort to ensure solving the edge case described
> > in
> > >> the
> > >> > > > JIRA<
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > >> > > > > > > > > > >.
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Boyang
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> ________________________________
> > >> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > >> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > >> > > > > > > > > > >>> To: dev@kafka.apache.org
> > >> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> > >> group.max.size
> > >> > to
> > >> > > > cap
> > >> > > > > > > > member
> > >> > > > > > > > > > >>> metadata growth
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Thanks for the KIP.
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Will this cap be a global cap across the entire
> > >> cluster
> > >> > > or
> > >> > > > > per
> > >> > > > > > > > > broker?
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Either way the default value seems a bit high to
> > me,
> > >> > but
> > >> > > > that
> > >> > > > > > > could
> > >> > > > > > > > > > just
> > >> > > > > > > > > > >>> be
> > >> > > > > > > > > > >>> from my own usage patterns. I'd have probably
> > >> started
> > >> > > with
> > >> > > > > 500
> > >> > > > > > or
> > >> > > > > > > > 1k
> > >> > > > > > > > > > but
> > >> > > > > > > > > > >>> could be easily convinced that's wrong.
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Thanks,
> > >> > > > > > > > > > >>> Matt
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > >> > > > > > bchen11@outlook.com
> > >> > > > > > > >
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> > Hey folks,
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> > This is a pretty simple change to cap the
> > consumer
> > >> > > group
> > >> > > > > size
> > >> > > > > > > for
> > >> > > > > > > > > > >>> broker
> > >> > > > > > > > > > >>> > stability. Give me your valuable feedback when
> > you
> > >> > got
> > >> > > > > time.
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> > Thank you!
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> --
> > >> > > > > > > > > > >> Best,
> > >> > > > > > > > > > >> Stanislav
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > --
> > >> > > > > > > > > > > Best,
> > >> > > > > > > > > > > Stanislav
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > --
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > > Stanislav
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > --
> > >> > > > > > > > Best,
> > >> > > > > > > > Stanislav
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Best,
> > >> > > > > > Stanislav
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Best,
> > >> > > > Stanislav
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Best,
> > >> > > Stanislav
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Best,
> > >> > Stanislav
> > >> >
> > >>
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav



--
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Gwen Shapira <gw...@confluent.io>.
Sorry for joining the fun late, but I think the problem we are solving
evolved a bit in the thread, and I'd like to have better understanding
of the problem before voting :)

Both KIP and discussion assert that large groups are a problem, but
they are kinda inconsistent regarding why they are a problem and whose
problem they are...
1. The KIP itself states that the main issue with large groups are
long rebalance times. Per my understanding, this is mostly a problem
for the application that consumes data, but not really a problem for
the brokers themselves, so broker admins probably don't and shouldn't
care about it. Also, my understanding is that this is a problem for
consumer groups, but not necessarily a problem for other group types.
2. The discussion highlights the issue of "run away" groups that
essentially create tons of members needlessly and use up lots of
broker memory. This is something the broker admins will care about a
lot. And is also a problem for every group that uses coordinators and
not just consumers. And since the memory in question is the metadata
cache, it probably has the largest impact on Kafka Streams
applications, since they have lots of metadata.

The solution proposed makes the most sense in the context of #2, so
perhaps we should update the motivation section of the KIP to reflect
that.

The reason I'm probing here is that in my opinion we have to give our
users some guidelines on what a reasonable limit is (otherwise, how
will they know?). Calculating the impact of group-size on rebalance
time in order to make good recommendations will take a significant
effort. On the other hand, informing users regarding the memory
footprint of a consumer in a group and using that to make a reasonable
suggestion isn't hard.

Gwen


On Sun, Dec 30, 2018 at 12:51 PM Stanislav Kozlovski
<st...@confluent.io> wrote:
>
> Thanks Boyang,
>
> If there aren't any more thoughts on the KIP I'll start a vote thread in
> the new year
>
> On Sat, Dec 29, 2018 at 12:58 AM Boyang Chen <bc...@outlook.com> wrote:
>
> > Yep Stanislav, that's what I'm proposing, and your explanation makes sense.
> >
> > Boyang
> >
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Friday, December 28, 2018 7:59 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hey there everybody, let's work on wrapping this discussion up.
> >
> > @Boyang, could you clarify what you mean by
> > > One more question is whether you feel we should enforce group size cap
> > statically or on runtime?
> > Is that related to the option of enabling this config via the dynamic
> > broker config feature?
> >
> > Regarding that - I feel it's useful to have and I also think it might not
> > introduce additional complexity. Ås long as we handle the config being
> > changed midway through a rebalance (via using the old value) we should be
> > good to go.
> >
> > On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > wrote:
> >
> > > Hey Jason,
> > >
> > > Yes, that is what I meant by
> > > > Given those constraints, I think that we can simply mark the group as
> > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> > > but I can't seem to come up with a better alternative
> > > So either the timeout or all members calling joinGroup, yes
> > >
> > >
> > > On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com> wrote:
> > >
> > >> Hey Jason,
> > >>
> > >> I think this is the correct understanding. One more question is whether
> > >> you feel
> > >> we should enforce group size cap statically or on runtime?
> > >>
> > >> Boyang
> > >> ________________________________
> > >> From: Jason Gustafson <ja...@confluent.io>
> > >> Sent: Tuesday, December 11, 2018 3:24 AM
> > >> To: dev
> > >> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > >> metadata growth
> > >>
> > >> Hey Stanislav,
> > >>
> > >> Just to clarify, I think what you're suggesting is something like this
> > in
> > >> order to gracefully shrink the group:
> > >>
> > >> 1. Transition the group to PREPARING_REBALANCE. No members are kicked
> > out.
> > >> 2. Continue to allow offset commits and heartbeats for all current
> > >> members.
> > >> 3. Allow the first n members that send JoinGroup to stay in the group,
> > but
> > >> wait for the JoinGroup (or session timeout) from all active members
> > before
> > >> finishing the rebalance.
> > >>
> > >> So basically we try to give the current members an opportunity to finish
> > >> work, but we prevent some of them from rejoining after the rebalance
> > >> completes. It sounds reasonable if I've understood correctly.
> > >>
> > >> Thanks,
> > >> Jason
> > >>
> > >>
> > >>
> > >> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com> wrote:
> > >>
> > >> > Yep, LGTM on my side. Thanks Stanislav!
> > >> > ________________________________
> > >> > From: Stanislav Kozlovski <st...@confluent.io>
> > >> > Sent: Friday, December 7, 2018 8:51 PM
> > >> > To: dev@kafka.apache.org
> > >> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > >> > metadata growth
> > >> >
> > >> > Hi,
> > >> >
> > >> > We discussed this offline with Boyang and figured that it's best to
> > not
> > >> > wait on the Cooperative Rebalancing proposal. Our thinking is that we
> > >> can
> > >> > just force a rebalance from the broker, allowing consumers to commit
> > >> > offsets if their rebalanceListener is configured correctly.
> > >> > When rebalancing improvements are implemented, we assume that they
> > would
> > >> > improve KIP-389's behavior as well as the normal rebalance scenarios
> > >> >
> > >> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com>
> > >> wrote:
> > >> >
> > >> > > Hey Stanislav,
> > >> > >
> > >> > > thanks for the question! `Trivial rebalance` means "we don't start
> > >> > > reassignment right now, but you need to know it's coming soon
> > >> > > and you should start preparation".
> > >> > >
> > >> > > An example KStream use case is that before actually starting to
> > shrink
> > >> > the
> > >> > > consumer group, we need to
> > >> > > 1. partition the consumer group into two subgroups, where one will
> > be
> > >> > > offline soon and the other will keep serving;
> > >> > > 2. make sure the states associated with near-future offline
> > consumers
> > >> are
> > >> > > successfully replicated on the serving ones.
> > >> > >
> > >> > > As I have mentioned shrinking the consumer group is pretty much
> > >> > equivalent
> > >> > > to group scaling down, so we could think of this
> > >> > > as an add-on use case for cluster scaling. So my understanding is
> > that
> > >> > the
> > >> > > KIP-389 could be sequenced within our cooperative rebalancing<
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > >> > > >
> > >> > > proposal.
> > >> > >
> > >> > > Let me know if this makes sense.
> > >> > >
> > >> > > Best,
> > >> > > Boyang
> > >> > > ________________________________
> > >> > > From: Stanislav Kozlovski <st...@confluent.io>
> > >> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > >> > > To: dev@kafka.apache.org
> > >> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > >> > > metadata growth
> > >> > >
> > >> > > Hey Boyang,
> > >> > >
> > >> > > I think we still need to take care of group shrinkage because even
> > if
> > >> > users
> > >> > > change the config value we cannot guarantee that all consumer groups
> > >> > would
> > >> > > have been manually shrunk.
> > >> > >
> > >> > > Regarding 2., I agree that forcefully triggering a rebalance might
> > be
> > >> the
> > >> > > most intuitive way to handle the situation.
> > >> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar with
> > the
> > >> > > term.
> > >> > > I was thinking that maybe we could force a rebalance, which would
> > >> cause
> > >> > > consumers to commit their offsets (given their rebalanceListener is
> > >> > > configured correctly) and subsequently reject some of the incoming
> > >> > > `joinGroup` requests. Does that sound like it would work?
> > >> > >
> > >> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com>
> > >> wrote:
> > >> > >
> > >> > > > Hey Stanislav,
> > >> > > >
> > >> > > > I read the latest KIP and saw that we already changed the default
> > >> value
> > >> > > to
> > >> > > > -1. Do
> > >> > > > we still need to take care of the consumer group shrinking when
> > >> doing
> > >> > the
> > >> > > > upgrade?
> > >> > > >
> > >> > > > However this is an interesting topic that worth discussing.
> > Although
> > >> > > > rolling
> > >> > > > upgrade is fine, `consumer.group.max.size` could always have
> > >> conflict
> > >> > > with
> > >> > > > the current
> > >> > > > consumer group size which means we need to adhere to one source of
> > >> > truth.
> > >> > > >
> > >> > > > 1.Choose the current group size, which means we never interrupt
> > the
> > >> > > > consumer group until
> > >> > > > it transits to PREPARE_REBALANCE. And we keep track of how many
> > join
> > >> > > group
> > >> > > > requests
> > >> > > > we have seen so far during PREPARE_REBALANCE. After reaching the
> > >> > consumer
> > >> > > > cap,
> > >> > > > we start to inform over provisioned consumers that you should send
> > >> > > > LeaveGroupRequest and
> > >> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could
> > >> mark
> > >> > > > extra members
> > >> > > > as hot backup and rebalance without them.
> > >> > > >
> > >> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> > >> rebalancing
> > >> > > > (you proposed) could be of help here.
> > >> > > > When a new cap is enforced, leader should be notified. If the
> > >> current
> > >> > > > group size is already over limit, leader
> > >> > > > shall trigger a trivial rebalance to shuffle some topic partitions
> > >> and
> > >> > > let
> > >> > > > a subset of consumers prepare the ownership
> > >> > > > transition. Until they are ready, we trigger a real rebalance to
> > >> remove
> > >> > > > over-provisioned consumers. It is pretty much
> > >> > > > equivalent to `how do we scale down the consumer group without
> > >> > > > interrupting the current processing`.
> > >> > > >
> > >> > > > I personally feel inclined to 2 because we could kill two birds
> > with
> > >> > one
> > >> > > > stone in a generic way. What do you think?
> > >> > > >
> > >> > > > Boyang
> > >> > > > ________________________________
> > >> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > >> > > > Sent: Monday, December 3, 2018 8:35 PM
> > >> > > > To: dev@kafka.apache.org
> > >> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > member
> > >> > > > metadata growth
> > >> > > >
> > >> > > > Hi Jason,
> > >> > > >
> > >> > > > > 2. Do you think we should make this a dynamic config?
> > >> > > > I'm not sure. Looking at the config from the perspective of a
> > >> > > prescriptive
> > >> > > > config, we may get away with not updating it dynamically.
> > >> > > > But in my opinion, it always makes sense to have a config be
> > >> > dynamically
> > >> > > > configurable. As long as we limit it to being a cluster-wide
> > >> config, we
> > >> > > > should be fine.
> > >> > > >
> > >> > > > > 1. I think it would be helpful to clarify the details on how the
> > >> > > > coordinator will shrink the group. It will need to choose which
> > >> members
> > >> > > to
> > >> > > > remove. Are we going to give current members an opportunity to
> > >> commit
> > >> > > > offsets before kicking them from the group?
> > >> > > >
> > >> > > > This turns out to be somewhat tricky. I think that we may not be
> > >> able
> > >> > to
> > >> > > > guarantee that consumers don't process a message twice.
> > >> > > > My initial approach was to do as much as we could to let consumers
> > >> > commit
> > >> > > > offsets.
> > >> > > >
> > >> > > > I was thinking that we mark a group to be shrunk, we could keep a
> > >> map
> > >> > of
> > >> > > > consumer_id->boolean indicating whether they have committed
> > >> offsets. I
> > >> > > then
> > >> > > > thought we could delay the rebalance until every consumer commits
> > >> (or
> > >> > > some
> > >> > > > time passes).
> > >> > > > In the meantime, we would block all incoming fetch calls (by
> > either
> > >> > > > returning empty records or a retriable error) and we would
> > continue
> > >> to
> > >> > > > accept offset commits (even twice for a single consumer)
> > >> > > >
> > >> > > > I see two problems with this approach:
> > >> > > > * We have async offset commits, which implies that we can receive
> > >> fetch
> > >> > > > requests before the offset commit req has been handled. i.e
> > consmer
> > >> > sends
> > >> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in
> > the
> > >> > > > broker. Meaning we could have saved the offsets for B but
> > rebalance
> > >> > > before
> > >> > > > the offsetCommit for the offsets processed in C come in.
> > >> > > > * KIP-392 Allow consumers to fetch from closest replica
> > >> > > > <
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > >> > > > >
> > >> > > > would
> > >> > > > make it significantly harder to block poll() calls on consumers
> > >> whose
> > >> > > > groups are being shrunk. Even if we implemented a solution, the
> > same
> > >> > race
> > >> > > > condition noted above seems to apply and probably others
> > >> > > >
> > >> > > >
> > >> > > > Given those constraints, I think that we can simply mark the group
> > >> as
> > >> > > > `PreparingRebalance` with a rebalanceTimeout of the server
> > setting `
> > >> > > > group.max.session.timeout.ms`. That's a bit long by default (5
> > >> > minutes)
> > >> > > > but
> > >> > > > I can't seem to come up with a better alternative
> > >> > > >
> > >> > > > I'm interested in hearing your thoughts.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Stanislav
> > >> > > >
> > >> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <
> > jason@confluent.io
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hey Stanislav,
> > >> > > > >
> > >> > > > > What do you think about the use case I mentioned in my previous
> > >> reply
> > >> > > > about
> > >> > > > > > a more resilient self-service Kafka? I believe the benefit
> > >> there is
> > >> > > > > bigger.
> > >> > > > >
> > >> > > > >
> > >> > > > > I see this config as analogous to the open file limit. Probably
> > >> this
> > >> > > > limit
> > >> > > > > was intended to be prescriptive at some point about what was
> > >> deemed a
> > >> > > > > reasonable number of open files for an application. But mostly
> > >> people
> > >> > > > treat
> > >> > > > > it as an annoyance which they have to work around. If it happens
> > >> to
> > >> > be
> > >> > > > hit,
> > >> > > > > usually you just increase it because it is not tied to an actual
> > >> > > resource
> > >> > > > > constraint. However, occasionally hitting the limit does
> > indicate
> > >> an
> > >> > > > > application bug such as a leak, so I wouldn't say it is useless.
> > >> > > > Similarly,
> > >> > > > > the issue in KAFKA-7610 was a consumer leak and having this
> > limit
> > >> > would
> > >> > > > > have allowed the problem to be detected before it impacted the
> > >> > cluster.
> > >> > > > To
> > >> > > > > me, that's the main benefit. It's possible that it could be used
> > >> > > > > prescriptively to prevent poor usage of groups, but like the
> > open
> > >> > file
> > >> > > > > limit, I suspect administrators will just set it large enough
> > that
> > >> > > users
> > >> > > > > are unlikely to complain.
> > >> > > > >
> > >> > > > > Anyway, just a couple additional questions:
> > >> > > > >
> > >> > > > > 1. I think it would be helpful to clarify the details on how the
> > >> > > > > coordinator will shrink the group. It will need to choose which
> > >> > members
> > >> > > > to
> > >> > > > > remove. Are we going to give current members an opportunity to
> > >> commit
> > >> > > > > offsets before kicking them from the group?
> > >> > > > >
> > >> > > > > 2. Do you think we should make this a dynamic config?
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Jason
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > >> > > > > stanislav@confluent.io>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi Jason,
> > >> > > > > >
> > >> > > > > > You raise some very valid points.
> > >> > > > > >
> > >> > > > > > > The benefit of this KIP is probably limited to preventing
> > >> > "runaway"
> > >> > > > > > consumer groups due to leaks or some other application bug
> > >> > > > > > What do you think about the use case I mentioned in my
> > previous
> > >> > reply
> > >> > > > > about
> > >> > > > > > a more resilient self-service Kafka? I believe the benefit
> > >> there is
> > >> > > > > bigger
> > >> > > > > >
> > >> > > > > > * Default value
> > >> > > > > > You're right, we probably do need to be conservative. Big
> > >> consumer
> > >> > > > groups
> > >> > > > > > are considered an anti-pattern and my goal was to also hint at
> > >> this
> > >> > > > > through
> > >> > > > > > the config's default. Regardless, it is better to not have the
> > >> > > > potential
> > >> > > > > to
> > >> > > > > > break applications with an upgrade.
> > >> > > > > > Choosing between the default of something big like 5000 or an
> > >> > opt-in
> > >> > > > > > option, I think we should go with the *disabled default
> > option*
> > >> > > (-1).
> > >> > > > > > The only benefit we would get from a big default of 5000 is
> > >> default
> > >> > > > > > protection against buggy/malicious applications that hit the
> > >> > > KAFKA-7610
> > >> > > > > > issue.
> > >> > > > > > While this KIP was spawned from that issue, I believe its
> > value
> > >> is
> > >> > > > > enabling
> > >> > > > > > the possibility of protection and helping move towards a more
> > >> > > > > self-service
> > >> > > > > > Kafka. I also think that a default value of 5000 might be
> > >> > misleading
> > >> > > to
> > >> > > > > > users and lead them to think that big consumer groups (> 250)
> > >> are a
> > >> > > > good
> > >> > > > > > thing.
> > >> > > > > >
> > >> > > > > > The good news is that KAFKA-7610 should be fully resolved and
> > >> the
> > >> > > > > rebalance
> > >> > > > > > protocol should, in general, be more solid after the planned
> > >> > > > improvements
> > >> > > > > > in KIP-345 and KIP-394.
> > >> > > > > >
> > >> > > > > > * Handling bigger groups during upgrade
> > >> > > > > > I now see that we store the state of consumer groups in the
> > log
> > >> and
> > >> > > > why a
> > >> > > > > > rebalance isn't expected during a rolling upgrade.
> > >> > > > > > Since we're going with the default value of the max.size being
> > >> > > > disabled,
> > >> > > > > I
> > >> > > > > > believe we can afford to be more strict here.
> > >> > > > > > During state reloading of a new Coordinator with a defined
> > >> > > > max.group.size
> > >> > > > > > config, I believe we should *force* rebalances for groups that
> > >> > exceed
> > >> > > > the
> > >> > > > > > configured size. Then, only some consumers will be able to
> > join
> > >> and
> > >> > > the
> > >> > > > > max
> > >> > > > > > size invariant will be satisfied.
> > >> > > > > >
> > >> > > > > > I updated the KIP with a migration plan, rejected alternatives
> > >> and
> > >> > > the
> > >> > > > > new
> > >> > > > > > default value.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Stanislav
> > >> > > > > >
> > >> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > >> > jason@confluent.io>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hey Stanislav,
> > >> > > > > > >
> > >> > > > > > > Clients will then find that coordinator
> > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> > >> group,
> > >> > > since
> > >> > > > > the
> > >> > > > > > > > cache of active consumers is not stored outside the
> > >> > Coordinator's
> > >> > > > > > memory.
> > >> > > > > > > > (please do say if that is incorrect)
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Groups do not typically rebalance after a coordinator
> > change.
> > >> You
> > >> > > > could
> > >> > > > > > > potentially force a rebalance if the group is too big and
> > kick
> > >> > out
> > >> > > > the
> > >> > > > > > > slowest members or something. A more graceful solution is
> > >> > probably
> > >> > > to
> > >> > > > > > just
> > >> > > > > > > accept the current size and prevent it from getting bigger.
> > We
> > >> > > could
> > >> > > > > log
> > >> > > > > > a
> > >> > > > > > > warning potentially.
> > >> > > > > > >
> > >> > > > > > > My thinking is that we should abstract away from conserving
> > >> > > resources
> > >> > > > > and
> > >> > > > > > > > focus on giving control to the broker. The issue that
> > >> spawned
> > >> > > this
> > >> > > > > KIP
> > >> > > > > > > was
> > >> > > > > > > > a memory problem but I feel this change is useful in a
> > more
> > >> > > general
> > >> > > > > > way.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > So you probably already know why I'm asking about this. For
> > >> > > consumer
> > >> > > > > > groups
> > >> > > > > > > anyway, resource usage would typically be proportional to
> > the
> > >> > > number
> > >> > > > of
> > >> > > > > > > partitions that a group is reading from and not the number
> > of
> > >> > > > members.
> > >> > > > > > For
> > >> > > > > > > example, consider the memory use in the offsets cache. The
> > >> > benefit
> > >> > > of
> > >> > > > > > this
> > >> > > > > > > KIP is probably limited to preventing "runaway" consumer
> > >> groups
> > >> > due
> > >> > > > to
> > >> > > > > > > leaks or some other application bug. That still seems useful
> > >> > > though.
> > >> > > > > > >
> > >> > > > > > > I completely agree with this and I *ask everybody to chime
> > in
> > >> > with
> > >> > > > > > opinions
> > >> > > > > > > > on a sensible default value*.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > I think we would have to be very conservative. The group
> > >> protocol
> > >> > > is
> > >> > > > > > > generic in some sense, so there may be use cases we don't
> > >> know of
> > >> > > > where
> > >> > > > > > > larger groups are reasonable. Probably we should make this
> > an
> > >> > > opt-in
> > >> > > > > > > feature so that we do not risk breaking anyone's application
> > >> > after
> > >> > > an
> > >> > > > > > > upgrade. Either that, or use a very high default like 5,000.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Jason
> > >> > > > > > >
> > >> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > >> > > > > > > stanislav@confluent.io>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hey Jason and Boyang, those were important comments
> > >> > > > > > > >
> > >> > > > > > > > > One suggestion I have is that it would be helpful to put
> > >> your
> > >> > > > > > reasoning
> > >> > > > > > > > on deciding the current default value. For example, in
> > >> certain
> > >> > > use
> > >> > > > > > cases
> > >> > > > > > > at
> > >> > > > > > > > Pinterest we are very likely to have more consumers than
> > 250
> > >> > when
> > >> > > > we
> > >> > > > > > > > configure 8 stream instances with 32 threads.
> > >> > > > > > > > > For the effectiveness of this KIP, we should encourage
> > >> people
> > >> > > to
> > >> > > > > > > discuss
> > >> > > > > > > > their opinions on the default setting and ideally reach a
> > >> > > > consensus.
> > >> > > > > > > >
> > >> > > > > > > > I completely agree with this and I *ask everybody to chime
> > >> in
> > >> > > with
> > >> > > > > > > opinions
> > >> > > > > > > > on a sensible default value*.
> > >> > > > > > > > My thought process was that in the current model
> > rebalances
> > >> in
> > >> > > > large
> > >> > > > > > > groups
> > >> > > > > > > > are more costly. I imagine most use cases in most Kafka
> > >> users
> > >> > do
> > >> > > > not
> > >> > > > > > > > require more than 250 consumers.
> > >> > > > > > > > Boyang, you say that you are "likely to have... when
> > we..."
> > >> -
> > >> > do
> > >> > > > you
> > >> > > > > > have
> > >> > > > > > > > systems running with so many consumers in a group or are
> > you
> > >> > > > planning
> > >> > > > > > > to? I
> > >> > > > > > > > guess what I'm asking is whether this has been tested in
> > >> > > production
> > >> > > > > > with
> > >> > > > > > > > the current rebalance model (ignoring KIP-345)
> > >> > > > > > > >
> > >> > > > > > > > >  Can you clarify the compatibility impact here? What
> > >> > > > > > > > > will happen to groups that are already larger than the
> > max
> > >> > > size?
> > >> > > > > > > > This is a very important question.
> > >> > > > > > > > From my current understanding, when a coordinator broker
> > >> gets
> > >> > > shut
> > >> > > > > > > > down during a cluster rolling upgrade, a replica will take
> > >> > > > leadership
> > >> > > > > > of
> > >> > > > > > > > the `__offset_commits` partition. Clients will then find
> > >> that
> > >> > > > > > coordinator
> > >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> > >> group,
> > >> > > since
> > >> > > > > the
> > >> > > > > > > > cache of active consumers is not stored outside the
> > >> > Coordinator's
> > >> > > > > > memory.
> > >> > > > > > > > (please do say if that is incorrect)
> > >> > > > > > > > Then, I believe that working as if this is a new group is
> > a
> > >> > > > > reasonable
> > >> > > > > > > > approach. Namely, fail joinGroups when the max.size is
> > >> > exceeded.
> > >> > > > > > > > What do you guys think about this? (I'll update the KIP
> > >> after
> > >> > we
> > >> > > > > settle
> > >> > > > > > > on
> > >> > > > > > > > a solution)
> > >> > > > > > > >
> > >> > > > > > > > >  Also, just to be clear, the resource we are trying to
> > >> > conserve
> > >> > > > > here
> > >> > > > > > is
> > >> > > > > > > > what? Memory?
> > >> > > > > > > > My thinking is that we should abstract away from
> > conserving
> > >> > > > resources
> > >> > > > > > and
> > >> > > > > > > > focus on giving control to the broker. The issue that
> > >> spawned
> > >> > > this
> > >> > > > > KIP
> > >> > > > > > > was
> > >> > > > > > > > a memory problem but I feel this change is useful in a
> > more
> > >> > > general
> > >> > > > > > way.
> > >> > > > > > > It
> > >> > > > > > > > limits the control clients have on the cluster and helps
> > >> Kafka
> > >> > > > > become a
> > >> > > > > > > > more self-serving system. Admin/Ops teams can better
> > control
> > >> > the
> > >> > > > > impact
> > >> > > > > > > > application developers can have on a Kafka cluster with
> > this
> > >> > > change
> > >> > > > > > > >
> > >> > > > > > > > Best,
> > >> > > > > > > > Stanislav
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > >> > > > jason@confluent.io>
> > >> > > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Hi Stanislav,
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks for the KIP. Can you clarify the compatibility
> > >> impact
> > >> > > > here?
> > >> > > > > > What
> > >> > > > > > > > > will happen to groups that are already larger than the
> > max
> > >> > > size?
> > >> > > > > > Also,
> > >> > > > > > > > just
> > >> > > > > > > > > to be clear, the resource we are trying to conserve here
> > >> is
> > >> > > what?
> > >> > > > > > > Memory?
> > >> > > > > > > > >
> > >> > > > > > > > > -Jason
> > >> > > > > > > > >
> > >> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > >> > > bchen11@outlook.com
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Thanks Stanislav for the update! One suggestion I have
> > >> is
> > >> > > that
> > >> > > > it
> > >> > > > > > > would
> > >> > > > > > > > > be
> > >> > > > > > > > > > helpful to put your
> > >> > > > > > > > > >
> > >> > > > > > > > > > reasoning on deciding the current default value. For
> > >> > example,
> > >> > > > in
> > >> > > > > > > > certain
> > >> > > > > > > > > > use cases at Pinterest we are very likely
> > >> > > > > > > > > >
> > >> > > > > > > > > > to have more consumers than 250 when we configure 8
> > >> stream
> > >> > > > > > instances
> > >> > > > > > > > with
> > >> > > > > > > > > > 32 threads.
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > For the effectiveness of this KIP, we should encourage
> > >> > people
> > >> > > > to
> > >> > > > > > > > discuss
> > >> > > > > > > > > > their opinions on the default setting and ideally
> > reach
> > >> a
> > >> > > > > > consensus.
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > >
> > >> > > > > > > > > > Boyang
> > >> > > > > > > > > >
> > >> > > > > > > > > > ________________________________
> > >> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > >> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > >> > > > > > > > > > To: dev@kafka.apache.org
> > >> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size
> > >> to
> > >> > cap
> > >> > > > > > member
> > >> > > > > > > > > > metadata growth
> > >> > > > > > > > > >
> > >> > > > > > > > > > Hey everybody,
> > >> > > > > > > > > >
> > >> > > > > > > > > > It's been a week since this KIP and not much
> > discussion
> > >> has
> > >> > > > been
> > >> > > > > > > made.
> > >> > > > > > > > > > I assume that this is a straight forward change and I
> > >> will
> > >> > > > open a
> > >> > > > > > > > voting
> > >> > > > > > > > > > thread in the next couple of days if nobody has
> > >> anything to
> > >> > > > > > suggest.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > > Stanislav
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > >> > > > > > > > > > stanislav@confluent.io>
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Greetings everybody,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I have enriched the KIP a bit with a bigger
> > Motivation
> > >> > > > section
> > >> > > > > > and
> > >> > > > > > > > also
> > >> > > > > > > > > > > renamed it.
> > >> > > > > > > > > > > KIP:
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I'm looking forward to discussions around it.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Best,
> > >> > > > > > > > > > > Stanislav
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski
> > <
> > >> > > > > > > > > > > stanislav@confluent.io> wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >> Hey there everybody,
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> Thanks for the introduction Boyang. I appreciate
> > the
> > >> > > effort
> > >> > > > > you
> > >> > > > > > > are
> > >> > > > > > > > > > >> putting into improving consumer behavior in Kafka.
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> @Matt
> > >> > > > > > > > > > >> I also believe the default value is high. In my
> > >> opinion,
> > >> > > we
> > >> > > > > > should
> > >> > > > > > > > aim
> > >> > > > > > > > > > to
> > >> > > > > > > > > > >> a default cap around 250. This is because in the
> > >> current
> > >> > > > model
> > >> > > > > > any
> > >> > > > > > > > > > consumer
> > >> > > > > > > > > > >> rebalance is disrupting to every consumer. The
> > bigger
> > >> > the
> > >> > > > > group,
> > >> > > > > > > the
> > >> > > > > > > > > > longer
> > >> > > > > > > > > > >> this period of disruption.
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> If you have such a large consumer group, chances
> > are
> > >> > that
> > >> > > > your
> > >> > > > > > > > > > >> client-side logic could be structured better and
> > that
> > >> > you
> > >> > > > are
> > >> > > > > > not
> > >> > > > > > > > > using
> > >> > > > > > > > > > the
> > >> > > > > > > > > > >> high number of consumers to achieve high
> > throughput.
> > >> > > > > > > > > > >> 250 can still be considered of a high upper bound,
> > I
> > >> > > believe
> > >> > > > > in
> > >> > > > > > > > > practice
> > >> > > > > > > > > > >> users should aim to not go over 100 consumers per
> > >> > consumer
> > >> > > > > > group.
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> In regards to the cap being global/per-broker, I
> > >> think
> > >> > > that
> > >> > > > we
> > >> > > > > > > > should
> > >> > > > > > > > > > >> consider whether we want it to be global or
> > >> *per-topic*.
> > >> > > For
> > >> > > > > the
> > >> > > > > > > > time
> > >> > > > > > > > > > >> being, I believe that having it per-topic with a
> > >> global
> > >> > > > > default
> > >> > > > > > > > might
> > >> > > > > > > > > be
> > >> > > > > > > > > > >> the best situation. Having it global only seems a
> > bit
> > >> > > > > > restricting
> > >> > > > > > > to
> > >> > > > > > > > > me
> > >> > > > > > > > > > and
> > >> > > > > > > > > > >> it never hurts to support more fine-grained
> > >> > > configurability
> > >> > > > > > (given
> > >> > > > > > > > > it's
> > >> > > > > > > > > > the
> > >> > > > > > > > > > >> same config, not a new one being introduced).
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > >> > > > > > bchen11@outlook.com
> > >> > > > > > > >
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to
> > >> any
> > >> > > > > > suggestion
> > >> > > > > > > to
> > >> > > > > > > > > > >>> change the default value. Meanwhile I just want to
> > >> > point
> > >> > > > out
> > >> > > > > > that
> > >> > > > > > > > > this
> > >> > > > > > > > > > >>> value is a just last line of defense, not a real
> > >> > scenario
> > >> > > > we
> > >> > > > > > > would
> > >> > > > > > > > > > expect.
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and
> > he
> > >> > would
> > >> > > > be
> > >> > > > > > > > driving
> > >> > > > > > > > > > the
> > >> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the
> > idea
> > >> in
> > >> > > the
> > >> > > > > > first
> > >> > > > > > > > > place
> > >> > > > > > > > > > and
> > >> > > > > > > > > > >>> had already come up a draft design, while I will
> > >> keep
> > >> > > > > focusing
> > >> > > > > > on
> > >> > > > > > > > > > KIP-345
> > >> > > > > > > > > > >>> effort to ensure solving the edge case described
> > in
> > >> the
> > >> > > > JIRA<
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > >> > > > > > > > > > >.
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Boyang
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> ________________________________
> > >> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > >> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > >> > > > > > > > > > >>> To: dev@kafka.apache.org
> > >> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> > >> group.max.size
> > >> > to
> > >> > > > cap
> > >> > > > > > > > member
> > >> > > > > > > > > > >>> metadata growth
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Thanks for the KIP.
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Will this cap be a global cap across the entire
> > >> cluster
> > >> > > or
> > >> > > > > per
> > >> > > > > > > > > broker?
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Either way the default value seems a bit high to
> > me,
> > >> > but
> > >> > > > that
> > >> > > > > > > could
> > >> > > > > > > > > > just
> > >> > > > > > > > > > >>> be
> > >> > > > > > > > > > >>> from my own usage patterns. I'd have probably
> > >> started
> > >> > > with
> > >> > > > > 500
> > >> > > > > > or
> > >> > > > > > > > 1k
> > >> > > > > > > > > > but
> > >> > > > > > > > > > >>> could be easily convinced that's wrong.
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> Thanks,
> > >> > > > > > > > > > >>> Matt
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > >> > > > > > bchen11@outlook.com
> > >> > > > > > > >
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>> > Hey folks,
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> > This is a pretty simple change to cap the
> > consumer
> > >> > > group
> > >> > > > > size
> > >> > > > > > > for
> > >> > > > > > > > > > >>> broker
> > >> > > > > > > > > > >>> > stability. Give me your valuable feedback when
> > you
> > >> > got
> > >> > > > > time.
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>> > Thank you!
> > >> > > > > > > > > > >>> >
> > >> > > > > > > > > > >>>
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> --
> > >> > > > > > > > > > >> Best,
> > >> > > > > > > > > > >> Stanislav
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > --
> > >> > > > > > > > > > > Best,
> > >> > > > > > > > > > > Stanislav
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > --
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > > Stanislav
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > --
> > >> > > > > > > > Best,
> > >> > > > > > > > Stanislav
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Best,
> > >> > > > > > Stanislav
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Best,
> > >> > > > Stanislav
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Best,
> > >> > > Stanislav
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Best,
> > >> > Stanislav
> > >> >
> > >>
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Thanks Boyang,

If there aren't any more thoughts on the KIP I'll start a vote thread in
the new year

On Sat, Dec 29, 2018 at 12:58 AM Boyang Chen <bc...@outlook.com> wrote:

> Yep Stanislav, that's what I'm proposing, and your explanation makes sense.
>
> Boyang
>
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Friday, December 28, 2018 7:59 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hey there everybody, let's work on wrapping this discussion up.
>
> @Boyang, could you clarify what you mean by
> > One more question is whether you feel we should enforce group size cap
> statically or on runtime?
> Is that related to the option of enabling this config via the dynamic
> broker config feature?
>
> Regarding that - I feel it's useful to have and I also think it might not
> introduce additional complexity. Ås long as we handle the config being
> changed midway through a rebalance (via using the old value) we should be
> good to go.
>
> On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <
> stanislav@confluent.io>
> wrote:
>
> > Hey Jason,
> >
> > Yes, that is what I meant by
> > > Given those constraints, I think that we can simply mark the group as
> > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> > but I can't seem to come up with a better alternative
> > So either the timeout or all members calling joinGroup, yes
> >
> >
> > On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com> wrote:
> >
> >> Hey Jason,
> >>
> >> I think this is the correct understanding. One more question is whether
> >> you feel
> >> we should enforce group size cap statically or on runtime?
> >>
> >> Boyang
> >> ________________________________
> >> From: Jason Gustafson <ja...@confluent.io>
> >> Sent: Tuesday, December 11, 2018 3:24 AM
> >> To: dev
> >> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> >> metadata growth
> >>
> >> Hey Stanislav,
> >>
> >> Just to clarify, I think what you're suggesting is something like this
> in
> >> order to gracefully shrink the group:
> >>
> >> 1. Transition the group to PREPARING_REBALANCE. No members are kicked
> out.
> >> 2. Continue to allow offset commits and heartbeats for all current
> >> members.
> >> 3. Allow the first n members that send JoinGroup to stay in the group,
> but
> >> wait for the JoinGroup (or session timeout) from all active members
> before
> >> finishing the rebalance.
> >>
> >> So basically we try to give the current members an opportunity to finish
> >> work, but we prevent some of them from rejoining after the rebalance
> >> completes. It sounds reasonable if I've understood correctly.
> >>
> >> Thanks,
> >> Jason
> >>
> >>
> >>
> >> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com> wrote:
> >>
> >> > Yep, LGTM on my side. Thanks Stanislav!
> >> > ________________________________
> >> > From: Stanislav Kozlovski <st...@confluent.io>
> >> > Sent: Friday, December 7, 2018 8:51 PM
> >> > To: dev@kafka.apache.org
> >> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> >> > metadata growth
> >> >
> >> > Hi,
> >> >
> >> > We discussed this offline with Boyang and figured that it's best to
> not
> >> > wait on the Cooperative Rebalancing proposal. Our thinking is that we
> >> can
> >> > just force a rebalance from the broker, allowing consumers to commit
> >> > offsets if their rebalanceListener is configured correctly.
> >> > When rebalancing improvements are implemented, we assume that they
> would
> >> > improve KIP-389's behavior as well as the normal rebalance scenarios
> >> >
> >> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com>
> >> wrote:
> >> >
> >> > > Hey Stanislav,
> >> > >
> >> > > thanks for the question! `Trivial rebalance` means "we don't start
> >> > > reassignment right now, but you need to know it's coming soon
> >> > > and you should start preparation".
> >> > >
> >> > > An example KStream use case is that before actually starting to
> shrink
> >> > the
> >> > > consumer group, we need to
> >> > > 1. partition the consumer group into two subgroups, where one will
> be
> >> > > offline soon and the other will keep serving;
> >> > > 2. make sure the states associated with near-future offline
> consumers
> >> are
> >> > > successfully replicated on the serving ones.
> >> > >
> >> > > As I have mentioned shrinking the consumer group is pretty much
> >> > equivalent
> >> > > to group scaling down, so we could think of this
> >> > > as an add-on use case for cluster scaling. So my understanding is
> that
> >> > the
> >> > > KIP-389 could be sequenced within our cooperative rebalancing<
> >> > >
> >> >
> >>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> >> > > >
> >> > > proposal.
> >> > >
> >> > > Let me know if this makes sense.
> >> > >
> >> > > Best,
> >> > > Boyang
> >> > > ________________________________
> >> > > From: Stanislav Kozlovski <st...@confluent.io>
> >> > > Sent: Wednesday, December 5, 2018 5:52 PM
> >> > > To: dev@kafka.apache.org
> >> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> >> > > metadata growth
> >> > >
> >> > > Hey Boyang,
> >> > >
> >> > > I think we still need to take care of group shrinkage because even
> if
> >> > users
> >> > > change the config value we cannot guarantee that all consumer groups
> >> > would
> >> > > have been manually shrunk.
> >> > >
> >> > > Regarding 2., I agree that forcefully triggering a rebalance might
> be
> >> the
> >> > > most intuitive way to handle the situation.
> >> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar with
> the
> >> > > term.
> >> > > I was thinking that maybe we could force a rebalance, which would
> >> cause
> >> > > consumers to commit their offsets (given their rebalanceListener is
> >> > > configured correctly) and subsequently reject some of the incoming
> >> > > `joinGroup` requests. Does that sound like it would work?
> >> > >
> >> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com>
> >> wrote:
> >> > >
> >> > > > Hey Stanislav,
> >> > > >
> >> > > > I read the latest KIP and saw that we already changed the default
> >> value
> >> > > to
> >> > > > -1. Do
> >> > > > we still need to take care of the consumer group shrinking when
> >> doing
> >> > the
> >> > > > upgrade?
> >> > > >
> >> > > > However this is an interesting topic that worth discussing.
> Although
> >> > > > rolling
> >> > > > upgrade is fine, `consumer.group.max.size` could always have
> >> conflict
> >> > > with
> >> > > > the current
> >> > > > consumer group size which means we need to adhere to one source of
> >> > truth.
> >> > > >
> >> > > > 1.Choose the current group size, which means we never interrupt
> the
> >> > > > consumer group until
> >> > > > it transits to PREPARE_REBALANCE. And we keep track of how many
> join
> >> > > group
> >> > > > requests
> >> > > > we have seen so far during PREPARE_REBALANCE. After reaching the
> >> > consumer
> >> > > > cap,
> >> > > > we start to inform over provisioned consumers that you should send
> >> > > > LeaveGroupRequest and
> >> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could
> >> mark
> >> > > > extra members
> >> > > > as hot backup and rebalance without them.
> >> > > >
> >> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> >> rebalancing
> >> > > > (you proposed) could be of help here.
> >> > > > When a new cap is enforced, leader should be notified. If the
> >> current
> >> > > > group size is already over limit, leader
> >> > > > shall trigger a trivial rebalance to shuffle some topic partitions
> >> and
> >> > > let
> >> > > > a subset of consumers prepare the ownership
> >> > > > transition. Until they are ready, we trigger a real rebalance to
> >> remove
> >> > > > over-provisioned consumers. It is pretty much
> >> > > > equivalent to `how do we scale down the consumer group without
> >> > > > interrupting the current processing`.
> >> > > >
> >> > > > I personally feel inclined to 2 because we could kill two birds
> with
> >> > one
> >> > > > stone in a generic way. What do you think?
> >> > > >
> >> > > > Boyang
> >> > > > ________________________________
> >> > > > From: Stanislav Kozlovski <st...@confluent.io>
> >> > > > Sent: Monday, December 3, 2018 8:35 PM
> >> > > > To: dev@kafka.apache.org
> >> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> >> > > > metadata growth
> >> > > >
> >> > > > Hi Jason,
> >> > > >
> >> > > > > 2. Do you think we should make this a dynamic config?
> >> > > > I'm not sure. Looking at the config from the perspective of a
> >> > > prescriptive
> >> > > > config, we may get away with not updating it dynamically.
> >> > > > But in my opinion, it always makes sense to have a config be
> >> > dynamically
> >> > > > configurable. As long as we limit it to being a cluster-wide
> >> config, we
> >> > > > should be fine.
> >> > > >
> >> > > > > 1. I think it would be helpful to clarify the details on how the
> >> > > > coordinator will shrink the group. It will need to choose which
> >> members
> >> > > to
> >> > > > remove. Are we going to give current members an opportunity to
> >> commit
> >> > > > offsets before kicking them from the group?
> >> > > >
> >> > > > This turns out to be somewhat tricky. I think that we may not be
> >> able
> >> > to
> >> > > > guarantee that consumers don't process a message twice.
> >> > > > My initial approach was to do as much as we could to let consumers
> >> > commit
> >> > > > offsets.
> >> > > >
> >> > > > I was thinking that we mark a group to be shrunk, we could keep a
> >> map
> >> > of
> >> > > > consumer_id->boolean indicating whether they have committed
> >> offsets. I
> >> > > then
> >> > > > thought we could delay the rebalance until every consumer commits
> >> (or
> >> > > some
> >> > > > time passes).
> >> > > > In the meantime, we would block all incoming fetch calls (by
> either
> >> > > > returning empty records or a retriable error) and we would
> continue
> >> to
> >> > > > accept offset commits (even twice for a single consumer)
> >> > > >
> >> > > > I see two problems with this approach:
> >> > > > * We have async offset commits, which implies that we can receive
> >> fetch
> >> > > > requests before the offset commit req has been handled. i.e
> consmer
> >> > sends
> >> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in
> the
> >> > > > broker. Meaning we could have saved the offsets for B but
> rebalance
> >> > > before
> >> > > > the offsetCommit for the offsets processed in C come in.
> >> > > > * KIP-392 Allow consumers to fetch from closest replica
> >> > > > <
> >> > > >
> >> > >
> >> >
> >>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> >> > > > >
> >> > > > would
> >> > > > make it significantly harder to block poll() calls on consumers
> >> whose
> >> > > > groups are being shrunk. Even if we implemented a solution, the
> same
> >> > race
> >> > > > condition noted above seems to apply and probably others
> >> > > >
> >> > > >
> >> > > > Given those constraints, I think that we can simply mark the group
> >> as
> >> > > > `PreparingRebalance` with a rebalanceTimeout of the server
> setting `
> >> > > > group.max.session.timeout.ms`. That's a bit long by default (5
> >> > minutes)
> >> > > > but
> >> > > > I can't seem to come up with a better alternative
> >> > > >
> >> > > > I'm interested in hearing your thoughts.
> >> > > >
> >> > > > Thanks,
> >> > > > Stanislav
> >> > > >
> >> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <
> jason@confluent.io
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > Hey Stanislav,
> >> > > > >
> >> > > > > What do you think about the use case I mentioned in my previous
> >> reply
> >> > > > about
> >> > > > > > a more resilient self-service Kafka? I believe the benefit
> >> there is
> >> > > > > bigger.
> >> > > > >
> >> > > > >
> >> > > > > I see this config as analogous to the open file limit. Probably
> >> this
> >> > > > limit
> >> > > > > was intended to be prescriptive at some point about what was
> >> deemed a
> >> > > > > reasonable number of open files for an application. But mostly
> >> people
> >> > > > treat
> >> > > > > it as an annoyance which they have to work around. If it happens
> >> to
> >> > be
> >> > > > hit,
> >> > > > > usually you just increase it because it is not tied to an actual
> >> > > resource
> >> > > > > constraint. However, occasionally hitting the limit does
> indicate
> >> an
> >> > > > > application bug such as a leak, so I wouldn't say it is useless.
> >> > > > Similarly,
> >> > > > > the issue in KAFKA-7610 was a consumer leak and having this
> limit
> >> > would
> >> > > > > have allowed the problem to be detected before it impacted the
> >> > cluster.
> >> > > > To
> >> > > > > me, that's the main benefit. It's possible that it could be used
> >> > > > > prescriptively to prevent poor usage of groups, but like the
> open
> >> > file
> >> > > > > limit, I suspect administrators will just set it large enough
> that
> >> > > users
> >> > > > > are unlikely to complain.
> >> > > > >
> >> > > > > Anyway, just a couple additional questions:
> >> > > > >
> >> > > > > 1. I think it would be helpful to clarify the details on how the
> >> > > > > coordinator will shrink the group. It will need to choose which
> >> > members
> >> > > > to
> >> > > > > remove. Are we going to give current members an opportunity to
> >> commit
> >> > > > > offsets before kicking them from the group?
> >> > > > >
> >> > > > > 2. Do you think we should make this a dynamic config?
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Jason
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> >> > > > > stanislav@confluent.io>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi Jason,
> >> > > > > >
> >> > > > > > You raise some very valid points.
> >> > > > > >
> >> > > > > > > The benefit of this KIP is probably limited to preventing
> >> > "runaway"
> >> > > > > > consumer groups due to leaks or some other application bug
> >> > > > > > What do you think about the use case I mentioned in my
> previous
> >> > reply
> >> > > > > about
> >> > > > > > a more resilient self-service Kafka? I believe the benefit
> >> there is
> >> > > > > bigger
> >> > > > > >
> >> > > > > > * Default value
> >> > > > > > You're right, we probably do need to be conservative. Big
> >> consumer
> >> > > > groups
> >> > > > > > are considered an anti-pattern and my goal was to also hint at
> >> this
> >> > > > > through
> >> > > > > > the config's default. Regardless, it is better to not have the
> >> > > > potential
> >> > > > > to
> >> > > > > > break applications with an upgrade.
> >> > > > > > Choosing between the default of something big like 5000 or an
> >> > opt-in
> >> > > > > > option, I think we should go with the *disabled default
> option*
> >> > > (-1).
> >> > > > > > The only benefit we would get from a big default of 5000 is
> >> default
> >> > > > > > protection against buggy/malicious applications that hit the
> >> > > KAFKA-7610
> >> > > > > > issue.
> >> > > > > > While this KIP was spawned from that issue, I believe its
> value
> >> is
> >> > > > > enabling
> >> > > > > > the possibility of protection and helping move towards a more
> >> > > > > self-service
> >> > > > > > Kafka. I also think that a default value of 5000 might be
> >> > misleading
> >> > > to
> >> > > > > > users and lead them to think that big consumer groups (> 250)
> >> are a
> >> > > > good
> >> > > > > > thing.
> >> > > > > >
> >> > > > > > The good news is that KAFKA-7610 should be fully resolved and
> >> the
> >> > > > > rebalance
> >> > > > > > protocol should, in general, be more solid after the planned
> >> > > > improvements
> >> > > > > > in KIP-345 and KIP-394.
> >> > > > > >
> >> > > > > > * Handling bigger groups during upgrade
> >> > > > > > I now see that we store the state of consumer groups in the
> log
> >> and
> >> > > > why a
> >> > > > > > rebalance isn't expected during a rolling upgrade.
> >> > > > > > Since we're going with the default value of the max.size being
> >> > > > disabled,
> >> > > > > I
> >> > > > > > believe we can afford to be more strict here.
> >> > > > > > During state reloading of a new Coordinator with a defined
> >> > > > max.group.size
> >> > > > > > config, I believe we should *force* rebalances for groups that
> >> > exceed
> >> > > > the
> >> > > > > > configured size. Then, only some consumers will be able to
> join
> >> and
> >> > > the
> >> > > > > max
> >> > > > > > size invariant will be satisfied.
> >> > > > > >
> >> > > > > > I updated the KIP with a migration plan, rejected alternatives
> >> and
> >> > > the
> >> > > > > new
> >> > > > > > default value.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Stanislav
> >> > > > > >
> >> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> >> > jason@confluent.io>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hey Stanislav,
> >> > > > > > >
> >> > > > > > > Clients will then find that coordinator
> >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> >> group,
> >> > > since
> >> > > > > the
> >> > > > > > > > cache of active consumers is not stored outside the
> >> > Coordinator's
> >> > > > > > memory.
> >> > > > > > > > (please do say if that is incorrect)
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Groups do not typically rebalance after a coordinator
> change.
> >> You
> >> > > > could
> >> > > > > > > potentially force a rebalance if the group is too big and
> kick
> >> > out
> >> > > > the
> >> > > > > > > slowest members or something. A more graceful solution is
> >> > probably
> >> > > to
> >> > > > > > just
> >> > > > > > > accept the current size and prevent it from getting bigger.
> We
> >> > > could
> >> > > > > log
> >> > > > > > a
> >> > > > > > > warning potentially.
> >> > > > > > >
> >> > > > > > > My thinking is that we should abstract away from conserving
> >> > > resources
> >> > > > > and
> >> > > > > > > > focus on giving control to the broker. The issue that
> >> spawned
> >> > > this
> >> > > > > KIP
> >> > > > > > > was
> >> > > > > > > > a memory problem but I feel this change is useful in a
> more
> >> > > general
> >> > > > > > way.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > So you probably already know why I'm asking about this. For
> >> > > consumer
> >> > > > > > groups
> >> > > > > > > anyway, resource usage would typically be proportional to
> the
> >> > > number
> >> > > > of
> >> > > > > > > partitions that a group is reading from and not the number
> of
> >> > > > members.
> >> > > > > > For
> >> > > > > > > example, consider the memory use in the offsets cache. The
> >> > benefit
> >> > > of
> >> > > > > > this
> >> > > > > > > KIP is probably limited to preventing "runaway" consumer
> >> groups
> >> > due
> >> > > > to
> >> > > > > > > leaks or some other application bug. That still seems useful
> >> > > though.
> >> > > > > > >
> >> > > > > > > I completely agree with this and I *ask everybody to chime
> in
> >> > with
> >> > > > > > opinions
> >> > > > > > > > on a sensible default value*.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > I think we would have to be very conservative. The group
> >> protocol
> >> > > is
> >> > > > > > > generic in some sense, so there may be use cases we don't
> >> know of
> >> > > > where
> >> > > > > > > larger groups are reasonable. Probably we should make this
> an
> >> > > opt-in
> >> > > > > > > feature so that we do not risk breaking anyone's application
> >> > after
> >> > > an
> >> > > > > > > upgrade. Either that, or use a very high default like 5,000.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Jason
> >> > > > > > >
> >> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> >> > > > > > > stanislav@confluent.io>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hey Jason and Boyang, those were important comments
> >> > > > > > > >
> >> > > > > > > > > One suggestion I have is that it would be helpful to put
> >> your
> >> > > > > > reasoning
> >> > > > > > > > on deciding the current default value. For example, in
> >> certain
> >> > > use
> >> > > > > > cases
> >> > > > > > > at
> >> > > > > > > > Pinterest we are very likely to have more consumers than
> 250
> >> > when
> >> > > > we
> >> > > > > > > > configure 8 stream instances with 32 threads.
> >> > > > > > > > > For the effectiveness of this KIP, we should encourage
> >> people
> >> > > to
> >> > > > > > > discuss
> >> > > > > > > > their opinions on the default setting and ideally reach a
> >> > > > consensus.
> >> > > > > > > >
> >> > > > > > > > I completely agree with this and I *ask everybody to chime
> >> in
> >> > > with
> >> > > > > > > opinions
> >> > > > > > > > on a sensible default value*.
> >> > > > > > > > My thought process was that in the current model
> rebalances
> >> in
> >> > > > large
> >> > > > > > > groups
> >> > > > > > > > are more costly. I imagine most use cases in most Kafka
> >> users
> >> > do
> >> > > > not
> >> > > > > > > > require more than 250 consumers.
> >> > > > > > > > Boyang, you say that you are "likely to have... when
> we..."
> >> -
> >> > do
> >> > > > you
> >> > > > > > have
> >> > > > > > > > systems running with so many consumers in a group or are
> you
> >> > > > planning
> >> > > > > > > to? I
> >> > > > > > > > guess what I'm asking is whether this has been tested in
> >> > > production
> >> > > > > > with
> >> > > > > > > > the current rebalance model (ignoring KIP-345)
> >> > > > > > > >
> >> > > > > > > > >  Can you clarify the compatibility impact here? What
> >> > > > > > > > > will happen to groups that are already larger than the
> max
> >> > > size?
> >> > > > > > > > This is a very important question.
> >> > > > > > > > From my current understanding, when a coordinator broker
> >> gets
> >> > > shut
> >> > > > > > > > down during a cluster rolling upgrade, a replica will take
> >> > > > leadership
> >> > > > > > of
> >> > > > > > > > the `__offset_commits` partition. Clients will then find
> >> that
> >> > > > > > coordinator
> >> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
> >> group,
> >> > > since
> >> > > > > the
> >> > > > > > > > cache of active consumers is not stored outside the
> >> > Coordinator's
> >> > > > > > memory.
> >> > > > > > > > (please do say if that is incorrect)
> >> > > > > > > > Then, I believe that working as if this is a new group is
> a
> >> > > > > reasonable
> >> > > > > > > > approach. Namely, fail joinGroups when the max.size is
> >> > exceeded.
> >> > > > > > > > What do you guys think about this? (I'll update the KIP
> >> after
> >> > we
> >> > > > > settle
> >> > > > > > > on
> >> > > > > > > > a solution)
> >> > > > > > > >
> >> > > > > > > > >  Also, just to be clear, the resource we are trying to
> >> > conserve
> >> > > > > here
> >> > > > > > is
> >> > > > > > > > what? Memory?
> >> > > > > > > > My thinking is that we should abstract away from
> conserving
> >> > > > resources
> >> > > > > > and
> >> > > > > > > > focus on giving control to the broker. The issue that
> >> spawned
> >> > > this
> >> > > > > KIP
> >> > > > > > > was
> >> > > > > > > > a memory problem but I feel this change is useful in a
> more
> >> > > general
> >> > > > > > way.
> >> > > > > > > It
> >> > > > > > > > limits the control clients have on the cluster and helps
> >> Kafka
> >> > > > > become a
> >> > > > > > > > more self-serving system. Admin/Ops teams can better
> control
> >> > the
> >> > > > > impact
> >> > > > > > > > application developers can have on a Kafka cluster with
> this
> >> > > change
> >> > > > > > > >
> >> > > > > > > > Best,
> >> > > > > > > > Stanislav
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> >> > > > jason@confluent.io>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi Stanislav,
> >> > > > > > > > >
> >> > > > > > > > > Thanks for the KIP. Can you clarify the compatibility
> >> impact
> >> > > > here?
> >> > > > > > What
> >> > > > > > > > > will happen to groups that are already larger than the
> max
> >> > > size?
> >> > > > > > Also,
> >> > > > > > > > just
> >> > > > > > > > > to be clear, the resource we are trying to conserve here
> >> is
> >> > > what?
> >> > > > > > > Memory?
> >> > > > > > > > >
> >> > > > > > > > > -Jason
> >> > > > > > > > >
> >> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> >> > > bchen11@outlook.com
> >> > > > >
> >> > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Thanks Stanislav for the update! One suggestion I have
> >> is
> >> > > that
> >> > > > it
> >> > > > > > > would
> >> > > > > > > > > be
> >> > > > > > > > > > helpful to put your
> >> > > > > > > > > >
> >> > > > > > > > > > reasoning on deciding the current default value. For
> >> > example,
> >> > > > in
> >> > > > > > > > certain
> >> > > > > > > > > > use cases at Pinterest we are very likely
> >> > > > > > > > > >
> >> > > > > > > > > > to have more consumers than 250 when we configure 8
> >> stream
> >> > > > > > instances
> >> > > > > > > > with
> >> > > > > > > > > > 32 threads.
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > For the effectiveness of this KIP, we should encourage
> >> > people
> >> > > > to
> >> > > > > > > > discuss
> >> > > > > > > > > > their opinions on the default setting and ideally
> reach
> >> a
> >> > > > > > consensus.
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Best,
> >> > > > > > > > > >
> >> > > > > > > > > > Boyang
> >> > > > > > > > > >
> >> > > > > > > > > > ________________________________
> >> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> >> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> >> > > > > > > > > > To: dev@kafka.apache.org
> >> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size
> >> to
> >> > cap
> >> > > > > > member
> >> > > > > > > > > > metadata growth
> >> > > > > > > > > >
> >> > > > > > > > > > Hey everybody,
> >> > > > > > > > > >
> >> > > > > > > > > > It's been a week since this KIP and not much
> discussion
> >> has
> >> > > > been
> >> > > > > > > made.
> >> > > > > > > > > > I assume that this is a straight forward change and I
> >> will
> >> > > > open a
> >> > > > > > > > voting
> >> > > > > > > > > > thread in the next couple of days if nobody has
> >> anything to
> >> > > > > > suggest.
> >> > > > > > > > > >
> >> > > > > > > > > > Best,
> >> > > > > > > > > > Stanislav
> >> > > > > > > > > >
> >> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> >> > > > > > > > > > stanislav@confluent.io>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Greetings everybody,
> >> > > > > > > > > > >
> >> > > > > > > > > > > I have enriched the KIP a bit with a bigger
> Motivation
> >> > > > section
> >> > > > > > and
> >> > > > > > > > also
> >> > > > > > > > > > > renamed it.
> >> > > > > > > > > > > KIP:
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> >> > > > > > > > > > >
> >> > > > > > > > > > > I'm looking forward to discussions around it.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Best,
> >> > > > > > > > > > > Stanislav
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski
> <
> >> > > > > > > > > > > stanislav@confluent.io> wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > >> Hey there everybody,
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> Thanks for the introduction Boyang. I appreciate
> the
> >> > > effort
> >> > > > > you
> >> > > > > > > are
> >> > > > > > > > > > >> putting into improving consumer behavior in Kafka.
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> @Matt
> >> > > > > > > > > > >> I also believe the default value is high. In my
> >> opinion,
> >> > > we
> >> > > > > > should
> >> > > > > > > > aim
> >> > > > > > > > > > to
> >> > > > > > > > > > >> a default cap around 250. This is because in the
> >> current
> >> > > > model
> >> > > > > > any
> >> > > > > > > > > > consumer
> >> > > > > > > > > > >> rebalance is disrupting to every consumer. The
> bigger
> >> > the
> >> > > > > group,
> >> > > > > > > the
> >> > > > > > > > > > longer
> >> > > > > > > > > > >> this period of disruption.
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> If you have such a large consumer group, chances
> are
> >> > that
> >> > > > your
> >> > > > > > > > > > >> client-side logic could be structured better and
> that
> >> > you
> >> > > > are
> >> > > > > > not
> >> > > > > > > > > using
> >> > > > > > > > > > the
> >> > > > > > > > > > >> high number of consumers to achieve high
> throughput.
> >> > > > > > > > > > >> 250 can still be considered of a high upper bound,
> I
> >> > > believe
> >> > > > > in
> >> > > > > > > > > practice
> >> > > > > > > > > > >> users should aim to not go over 100 consumers per
> >> > consumer
> >> > > > > > group.
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> In regards to the cap being global/per-broker, I
> >> think
> >> > > that
> >> > > > we
> >> > > > > > > > should
> >> > > > > > > > > > >> consider whether we want it to be global or
> >> *per-topic*.
> >> > > For
> >> > > > > the
> >> > > > > > > > time
> >> > > > > > > > > > >> being, I believe that having it per-topic with a
> >> global
> >> > > > > default
> >> > > > > > > > might
> >> > > > > > > > > be
> >> > > > > > > > > > >> the best situation. Having it global only seems a
> bit
> >> > > > > > restricting
> >> > > > > > > to
> >> > > > > > > > > me
> >> > > > > > > > > > and
> >> > > > > > > > > > >> it never hurts to support more fine-grained
> >> > > configurability
> >> > > > > > (given
> >> > > > > > > > > it's
> >> > > > > > > > > > the
> >> > > > > > > > > > >> same config, not a new one being introduced).
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> >> > > > > > bchen11@outlook.com
> >> > > > > > > >
> >> > > > > > > > > > wrote:
> >> > > > > > > > > > >>
> >> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to
> >> any
> >> > > > > > suggestion
> >> > > > > > > to
> >> > > > > > > > > > >>> change the default value. Meanwhile I just want to
> >> > point
> >> > > > out
> >> > > > > > that
> >> > > > > > > > > this
> >> > > > > > > > > > >>> value is a just last line of defense, not a real
> >> > scenario
> >> > > > we
> >> > > > > > > would
> >> > > > > > > > > > expect.
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and
> he
> >> > would
> >> > > > be
> >> > > > > > > > driving
> >> > > > > > > > > > the
> >> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the
> idea
> >> in
> >> > > the
> >> > > > > > first
> >> > > > > > > > > place
> >> > > > > > > > > > and
> >> > > > > > > > > > >>> had already come up a draft design, while I will
> >> keep
> >> > > > > focusing
> >> > > > > > on
> >> > > > > > > > > > KIP-345
> >> > > > > > > > > > >>> effort to ensure solving the edge case described
> in
> >> the
> >> > > > JIRA<
> >> > > > > > > > > > >>>
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> >> > > > > > > > > > >.
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> Boyang
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> ________________________________
> >> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> >> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> >> > > > > > > > > > >>> To: dev@kafka.apache.org
> >> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> >> group.max.size
> >> > to
> >> > > > cap
> >> > > > > > > > member
> >> > > > > > > > > > >>> metadata growth
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> Thanks for the KIP.
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> Will this cap be a global cap across the entire
> >> cluster
> >> > > or
> >> > > > > per
> >> > > > > > > > > broker?
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> Either way the default value seems a bit high to
> me,
> >> > but
> >> > > > that
> >> > > > > > > could
> >> > > > > > > > > > just
> >> > > > > > > > > > >>> be
> >> > > > > > > > > > >>> from my own usage patterns. I'd have probably
> >> started
> >> > > with
> >> > > > > 500
> >> > > > > > or
> >> > > > > > > > 1k
> >> > > > > > > > > > but
> >> > > > > > > > > > >>> could be easily convinced that's wrong.
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> Thanks,
> >> > > > > > > > > > >>> Matt
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> >> > > > > > bchen11@outlook.com
> >> > > > > > > >
> >> > > > > > > > > > wrote:
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>> > Hey folks,
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>> > I would like to start a discussion on KIP-389:
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>>
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>> > This is a pretty simple change to cap the
> consumer
> >> > > group
> >> > > > > size
> >> > > > > > > for
> >> > > > > > > > > > >>> broker
> >> > > > > > > > > > >>> > stability. Give me your valuable feedback when
> you
> >> > got
> >> > > > > time.
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>> > Thank you!
> >> > > > > > > > > > >>> >
> >> > > > > > > > > > >>>
> >> > > > > > > > > > >>
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> --
> >> > > > > > > > > > >> Best,
> >> > > > > > > > > > >> Stanislav
> >> > > > > > > > > > >>
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > --
> >> > > > > > > > > > > Best,
> >> > > > > > > > > > > Stanislav
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > --
> >> > > > > > > > > > Best,
> >> > > > > > > > > > Stanislav
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > Best,
> >> > > > > > > > Stanislav
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Best,
> >> > > > > > Stanislav
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Best,
> >> > > > Stanislav
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Best,
> >> > > Stanislav
> >> > >
> >> >
> >> >
> >> > --
> >> > Best,
> >> > Stanislav
> >> >
> >>
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Yep Stanislav, that's what I'm proposing, and your explanation makes sense.

Boyang

________________________________
From: Stanislav Kozlovski <st...@confluent.io>
Sent: Friday, December 28, 2018 7:59 PM
To: dev@kafka.apache.org
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Hey there everybody, let's work on wrapping this discussion up.

@Boyang, could you clarify what you mean by
> One more question is whether you feel we should enforce group size cap
statically or on runtime?
Is that related to the option of enabling this config via the dynamic
broker config feature?

Regarding that - I feel it's useful to have and I also think it might not
introduce additional complexity. Ås long as we handle the config being
changed midway through a rebalance (via using the old value) we should be
good to go.

On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hey Jason,
>
> Yes, that is what I meant by
> > Given those constraints, I think that we can simply mark the group as
> `PreparingRebalance` with a rebalanceTimeout of the server setting `
> group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> but I can't seem to come up with a better alternative
> So either the timeout or all members calling joinGroup, yes
>
>
> On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com> wrote:
>
>> Hey Jason,
>>
>> I think this is the correct understanding. One more question is whether
>> you feel
>> we should enforce group size cap statically or on runtime?
>>
>> Boyang
>> ________________________________
>> From: Jason Gustafson <ja...@confluent.io>
>> Sent: Tuesday, December 11, 2018 3:24 AM
>> To: dev
>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> metadata growth
>>
>> Hey Stanislav,
>>
>> Just to clarify, I think what you're suggesting is something like this in
>> order to gracefully shrink the group:
>>
>> 1. Transition the group to PREPARING_REBALANCE. No members are kicked out.
>> 2. Continue to allow offset commits and heartbeats for all current
>> members.
>> 3. Allow the first n members that send JoinGroup to stay in the group, but
>> wait for the JoinGroup (or session timeout) from all active members before
>> finishing the rebalance.
>>
>> So basically we try to give the current members an opportunity to finish
>> work, but we prevent some of them from rejoining after the rebalance
>> completes. It sounds reasonable if I've understood correctly.
>>
>> Thanks,
>> Jason
>>
>>
>>
>> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com> wrote:
>>
>> > Yep, LGTM on my side. Thanks Stanislav!
>> > ________________________________
>> > From: Stanislav Kozlovski <st...@confluent.io>
>> > Sent: Friday, December 7, 2018 8:51 PM
>> > To: dev@kafka.apache.org
>> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> > metadata growth
>> >
>> > Hi,
>> >
>> > We discussed this offline with Boyang and figured that it's best to not
>> > wait on the Cooperative Rebalancing proposal. Our thinking is that we
>> can
>> > just force a rebalance from the broker, allowing consumers to commit
>> > offsets if their rebalanceListener is configured correctly.
>> > When rebalancing improvements are implemented, we assume that they would
>> > improve KIP-389's behavior as well as the normal rebalance scenarios
>> >
>> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com>
>> wrote:
>> >
>> > > Hey Stanislav,
>> > >
>> > > thanks for the question! `Trivial rebalance` means "we don't start
>> > > reassignment right now, but you need to know it's coming soon
>> > > and you should start preparation".
>> > >
>> > > An example KStream use case is that before actually starting to shrink
>> > the
>> > > consumer group, we need to
>> > > 1. partition the consumer group into two subgroups, where one will be
>> > > offline soon and the other will keep serving;
>> > > 2. make sure the states associated with near-future offline consumers
>> are
>> > > successfully replicated on the serving ones.
>> > >
>> > > As I have mentioned shrinking the consumer group is pretty much
>> > equivalent
>> > > to group scaling down, so we could think of this
>> > > as an add-on use case for cluster scaling. So my understanding is that
>> > the
>> > > KIP-389 could be sequenced within our cooperative rebalancing<
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
>> > > >
>> > > proposal.
>> > >
>> > > Let me know if this makes sense.
>> > >
>> > > Best,
>> > > Boyang
>> > > ________________________________
>> > > From: Stanislav Kozlovski <st...@confluent.io>
>> > > Sent: Wednesday, December 5, 2018 5:52 PM
>> > > To: dev@kafka.apache.org
>> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> > > metadata growth
>> > >
>> > > Hey Boyang,
>> > >
>> > > I think we still need to take care of group shrinkage because even if
>> > users
>> > > change the config value we cannot guarantee that all consumer groups
>> > would
>> > > have been manually shrunk.
>> > >
>> > > Regarding 2., I agree that forcefully triggering a rebalance might be
>> the
>> > > most intuitive way to handle the situation.
>> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
>> > > term.
>> > > I was thinking that maybe we could force a rebalance, which would
>> cause
>> > > consumers to commit their offsets (given their rebalanceListener is
>> > > configured correctly) and subsequently reject some of the incoming
>> > > `joinGroup` requests. Does that sound like it would work?
>> > >
>> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com>
>> wrote:
>> > >
>> > > > Hey Stanislav,
>> > > >
>> > > > I read the latest KIP and saw that we already changed the default
>> value
>> > > to
>> > > > -1. Do
>> > > > we still need to take care of the consumer group shrinking when
>> doing
>> > the
>> > > > upgrade?
>> > > >
>> > > > However this is an interesting topic that worth discussing. Although
>> > > > rolling
>> > > > upgrade is fine, `consumer.group.max.size` could always have
>> conflict
>> > > with
>> > > > the current
>> > > > consumer group size which means we need to adhere to one source of
>> > truth.
>> > > >
>> > > > 1.Choose the current group size, which means we never interrupt the
>> > > > consumer group until
>> > > > it transits to PREPARE_REBALANCE. And we keep track of how many join
>> > > group
>> > > > requests
>> > > > we have seen so far during PREPARE_REBALANCE. After reaching the
>> > consumer
>> > > > cap,
>> > > > we start to inform over provisioned consumers that you should send
>> > > > LeaveGroupRequest and
>> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could
>> mark
>> > > > extra members
>> > > > as hot backup and rebalance without them.
>> > > >
>> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
>> rebalancing
>> > > > (you proposed) could be of help here.
>> > > > When a new cap is enforced, leader should be notified. If the
>> current
>> > > > group size is already over limit, leader
>> > > > shall trigger a trivial rebalance to shuffle some topic partitions
>> and
>> > > let
>> > > > a subset of consumers prepare the ownership
>> > > > transition. Until they are ready, we trigger a real rebalance to
>> remove
>> > > > over-provisioned consumers. It is pretty much
>> > > > equivalent to `how do we scale down the consumer group without
>> > > > interrupting the current processing`.
>> > > >
>> > > > I personally feel inclined to 2 because we could kill two birds with
>> > one
>> > > > stone in a generic way. What do you think?
>> > > >
>> > > > Boyang
>> > > > ________________________________
>> > > > From: Stanislav Kozlovski <st...@confluent.io>
>> > > > Sent: Monday, December 3, 2018 8:35 PM
>> > > > To: dev@kafka.apache.org
>> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> > > > metadata growth
>> > > >
>> > > > Hi Jason,
>> > > >
>> > > > > 2. Do you think we should make this a dynamic config?
>> > > > I'm not sure. Looking at the config from the perspective of a
>> > > prescriptive
>> > > > config, we may get away with not updating it dynamically.
>> > > > But in my opinion, it always makes sense to have a config be
>> > dynamically
>> > > > configurable. As long as we limit it to being a cluster-wide
>> config, we
>> > > > should be fine.
>> > > >
>> > > > > 1. I think it would be helpful to clarify the details on how the
>> > > > coordinator will shrink the group. It will need to choose which
>> members
>> > > to
>> > > > remove. Are we going to give current members an opportunity to
>> commit
>> > > > offsets before kicking them from the group?
>> > > >
>> > > > This turns out to be somewhat tricky. I think that we may not be
>> able
>> > to
>> > > > guarantee that consumers don't process a message twice.
>> > > > My initial approach was to do as much as we could to let consumers
>> > commit
>> > > > offsets.
>> > > >
>> > > > I was thinking that we mark a group to be shrunk, we could keep a
>> map
>> > of
>> > > > consumer_id->boolean indicating whether they have committed
>> offsets. I
>> > > then
>> > > > thought we could delay the rebalance until every consumer commits
>> (or
>> > > some
>> > > > time passes).
>> > > > In the meantime, we would block all incoming fetch calls (by either
>> > > > returning empty records or a retriable error) and we would continue
>> to
>> > > > accept offset commits (even twice for a single consumer)
>> > > >
>> > > > I see two problems with this approach:
>> > > > * We have async offset commits, which implies that we can receive
>> fetch
>> > > > requests before the offset commit req has been handled. i.e consmer
>> > sends
>> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
>> > > > broker. Meaning we could have saved the offsets for B but rebalance
>> > > before
>> > > > the offsetCommit for the offsets processed in C come in.
>> > > > * KIP-392 Allow consumers to fetch from closest replica
>> > > > <
>> > > >
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
>> > > > >
>> > > > would
>> > > > make it significantly harder to block poll() calls on consumers
>> whose
>> > > > groups are being shrunk. Even if we implemented a solution, the same
>> > race
>> > > > condition noted above seems to apply and probably others
>> > > >
>> > > >
>> > > > Given those constraints, I think that we can simply mark the group
>> as
>> > > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
>> > > > group.max.session.timeout.ms`. That's a bit long by default (5
>> > minutes)
>> > > > but
>> > > > I can't seem to come up with a better alternative
>> > > >
>> > > > I'm interested in hearing your thoughts.
>> > > >
>> > > > Thanks,
>> > > > Stanislav
>> > > >
>> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <jason@confluent.io
>> >
>> > > > wrote:
>> > > >
>> > > > > Hey Stanislav,
>> > > > >
>> > > > > What do you think about the use case I mentioned in my previous
>> reply
>> > > > about
>> > > > > > a more resilient self-service Kafka? I believe the benefit
>> there is
>> > > > > bigger.
>> > > > >
>> > > > >
>> > > > > I see this config as analogous to the open file limit. Probably
>> this
>> > > > limit
>> > > > > was intended to be prescriptive at some point about what was
>> deemed a
>> > > > > reasonable number of open files for an application. But mostly
>> people
>> > > > treat
>> > > > > it as an annoyance which they have to work around. If it happens
>> to
>> > be
>> > > > hit,
>> > > > > usually you just increase it because it is not tied to an actual
>> > > resource
>> > > > > constraint. However, occasionally hitting the limit does indicate
>> an
>> > > > > application bug such as a leak, so I wouldn't say it is useless.
>> > > > Similarly,
>> > > > > the issue in KAFKA-7610 was a consumer leak and having this limit
>> > would
>> > > > > have allowed the problem to be detected before it impacted the
>> > cluster.
>> > > > To
>> > > > > me, that's the main benefit. It's possible that it could be used
>> > > > > prescriptively to prevent poor usage of groups, but like the open
>> > file
>> > > > > limit, I suspect administrators will just set it large enough that
>> > > users
>> > > > > are unlikely to complain.
>> > > > >
>> > > > > Anyway, just a couple additional questions:
>> > > > >
>> > > > > 1. I think it would be helpful to clarify the details on how the
>> > > > > coordinator will shrink the group. It will need to choose which
>> > members
>> > > > to
>> > > > > remove. Are we going to give current members an opportunity to
>> commit
>> > > > > offsets before kicking them from the group?
>> > > > >
>> > > > > 2. Do you think we should make this a dynamic config?
>> > > > >
>> > > > > Thanks,
>> > > > > Jason
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
>> > > > > stanislav@confluent.io>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Jason,
>> > > > > >
>> > > > > > You raise some very valid points.
>> > > > > >
>> > > > > > > The benefit of this KIP is probably limited to preventing
>> > "runaway"
>> > > > > > consumer groups due to leaks or some other application bug
>> > > > > > What do you think about the use case I mentioned in my previous
>> > reply
>> > > > > about
>> > > > > > a more resilient self-service Kafka? I believe the benefit
>> there is
>> > > > > bigger
>> > > > > >
>> > > > > > * Default value
>> > > > > > You're right, we probably do need to be conservative. Big
>> consumer
>> > > > groups
>> > > > > > are considered an anti-pattern and my goal was to also hint at
>> this
>> > > > > through
>> > > > > > the config's default. Regardless, it is better to not have the
>> > > > potential
>> > > > > to
>> > > > > > break applications with an upgrade.
>> > > > > > Choosing between the default of something big like 5000 or an
>> > opt-in
>> > > > > > option, I think we should go with the *disabled default option*
>> > > (-1).
>> > > > > > The only benefit we would get from a big default of 5000 is
>> default
>> > > > > > protection against buggy/malicious applications that hit the
>> > > KAFKA-7610
>> > > > > > issue.
>> > > > > > While this KIP was spawned from that issue, I believe its value
>> is
>> > > > > enabling
>> > > > > > the possibility of protection and helping move towards a more
>> > > > > self-service
>> > > > > > Kafka. I also think that a default value of 5000 might be
>> > misleading
>> > > to
>> > > > > > users and lead them to think that big consumer groups (> 250)
>> are a
>> > > > good
>> > > > > > thing.
>> > > > > >
>> > > > > > The good news is that KAFKA-7610 should be fully resolved and
>> the
>> > > > > rebalance
>> > > > > > protocol should, in general, be more solid after the planned
>> > > > improvements
>> > > > > > in KIP-345 and KIP-394.
>> > > > > >
>> > > > > > * Handling bigger groups during upgrade
>> > > > > > I now see that we store the state of consumer groups in the log
>> and
>> > > > why a
>> > > > > > rebalance isn't expected during a rolling upgrade.
>> > > > > > Since we're going with the default value of the max.size being
>> > > > disabled,
>> > > > > I
>> > > > > > believe we can afford to be more strict here.
>> > > > > > During state reloading of a new Coordinator with a defined
>> > > > max.group.size
>> > > > > > config, I believe we should *force* rebalances for groups that
>> > exceed
>> > > > the
>> > > > > > configured size. Then, only some consumers will be able to join
>> and
>> > > the
>> > > > > max
>> > > > > > size invariant will be satisfied.
>> > > > > >
>> > > > > > I updated the KIP with a migration plan, rejected alternatives
>> and
>> > > the
>> > > > > new
>> > > > > > default value.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Stanislav
>> > > > > >
>> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
>> > jason@confluent.io>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hey Stanislav,
>> > > > > > >
>> > > > > > > Clients will then find that coordinator
>> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
>> group,
>> > > since
>> > > > > the
>> > > > > > > > cache of active consumers is not stored outside the
>> > Coordinator's
>> > > > > > memory.
>> > > > > > > > (please do say if that is incorrect)
>> > > > > > >
>> > > > > > >
>> > > > > > > Groups do not typically rebalance after a coordinator change.
>> You
>> > > > could
>> > > > > > > potentially force a rebalance if the group is too big and kick
>> > out
>> > > > the
>> > > > > > > slowest members or something. A more graceful solution is
>> > probably
>> > > to
>> > > > > > just
>> > > > > > > accept the current size and prevent it from getting bigger. We
>> > > could
>> > > > > log
>> > > > > > a
>> > > > > > > warning potentially.
>> > > > > > >
>> > > > > > > My thinking is that we should abstract away from conserving
>> > > resources
>> > > > > and
>> > > > > > > > focus on giving control to the broker. The issue that
>> spawned
>> > > this
>> > > > > KIP
>> > > > > > > was
>> > > > > > > > a memory problem but I feel this change is useful in a more
>> > > general
>> > > > > > way.
>> > > > > > >
>> > > > > > >
>> > > > > > > So you probably already know why I'm asking about this. For
>> > > consumer
>> > > > > > groups
>> > > > > > > anyway, resource usage would typically be proportional to the
>> > > number
>> > > > of
>> > > > > > > partitions that a group is reading from and not the number of
>> > > > members.
>> > > > > > For
>> > > > > > > example, consider the memory use in the offsets cache. The
>> > benefit
>> > > of
>> > > > > > this
>> > > > > > > KIP is probably limited to preventing "runaway" consumer
>> groups
>> > due
>> > > > to
>> > > > > > > leaks or some other application bug. That still seems useful
>> > > though.
>> > > > > > >
>> > > > > > > I completely agree with this and I *ask everybody to chime in
>> > with
>> > > > > > opinions
>> > > > > > > > on a sensible default value*.
>> > > > > > >
>> > > > > > >
>> > > > > > > I think we would have to be very conservative. The group
>> protocol
>> > > is
>> > > > > > > generic in some sense, so there may be use cases we don't
>> know of
>> > > > where
>> > > > > > > larger groups are reasonable. Probably we should make this an
>> > > opt-in
>> > > > > > > feature so that we do not risk breaking anyone's application
>> > after
>> > > an
>> > > > > > > upgrade. Either that, or use a very high default like 5,000.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Jason
>> > > > > > >
>> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
>> > > > > > > stanislav@confluent.io>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hey Jason and Boyang, those were important comments
>> > > > > > > >
>> > > > > > > > > One suggestion I have is that it would be helpful to put
>> your
>> > > > > > reasoning
>> > > > > > > > on deciding the current default value. For example, in
>> certain
>> > > use
>> > > > > > cases
>> > > > > > > at
>> > > > > > > > Pinterest we are very likely to have more consumers than 250
>> > when
>> > > > we
>> > > > > > > > configure 8 stream instances with 32 threads.
>> > > > > > > > > For the effectiveness of this KIP, we should encourage
>> people
>> > > to
>> > > > > > > discuss
>> > > > > > > > their opinions on the default setting and ideally reach a
>> > > > consensus.
>> > > > > > > >
>> > > > > > > > I completely agree with this and I *ask everybody to chime
>> in
>> > > with
>> > > > > > > opinions
>> > > > > > > > on a sensible default value*.
>> > > > > > > > My thought process was that in the current model rebalances
>> in
>> > > > large
>> > > > > > > groups
>> > > > > > > > are more costly. I imagine most use cases in most Kafka
>> users
>> > do
>> > > > not
>> > > > > > > > require more than 250 consumers.
>> > > > > > > > Boyang, you say that you are "likely to have... when we..."
>> -
>> > do
>> > > > you
>> > > > > > have
>> > > > > > > > systems running with so many consumers in a group or are you
>> > > > planning
>> > > > > > > to? I
>> > > > > > > > guess what I'm asking is whether this has been tested in
>> > > production
>> > > > > > with
>> > > > > > > > the current rebalance model (ignoring KIP-345)
>> > > > > > > >
>> > > > > > > > >  Can you clarify the compatibility impact here? What
>> > > > > > > > > will happen to groups that are already larger than the max
>> > > size?
>> > > > > > > > This is a very important question.
>> > > > > > > > From my current understanding, when a coordinator broker
>> gets
>> > > shut
>> > > > > > > > down during a cluster rolling upgrade, a replica will take
>> > > > leadership
>> > > > > > of
>> > > > > > > > the `__offset_commits` partition. Clients will then find
>> that
>> > > > > > coordinator
>> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
>> group,
>> > > since
>> > > > > the
>> > > > > > > > cache of active consumers is not stored outside the
>> > Coordinator's
>> > > > > > memory.
>> > > > > > > > (please do say if that is incorrect)
>> > > > > > > > Then, I believe that working as if this is a new group is a
>> > > > > reasonable
>> > > > > > > > approach. Namely, fail joinGroups when the max.size is
>> > exceeded.
>> > > > > > > > What do you guys think about this? (I'll update the KIP
>> after
>> > we
>> > > > > settle
>> > > > > > > on
>> > > > > > > > a solution)
>> > > > > > > >
>> > > > > > > > >  Also, just to be clear, the resource we are trying to
>> > conserve
>> > > > > here
>> > > > > > is
>> > > > > > > > what? Memory?
>> > > > > > > > My thinking is that we should abstract away from conserving
>> > > > resources
>> > > > > > and
>> > > > > > > > focus on giving control to the broker. The issue that
>> spawned
>> > > this
>> > > > > KIP
>> > > > > > > was
>> > > > > > > > a memory problem but I feel this change is useful in a more
>> > > general
>> > > > > > way.
>> > > > > > > It
>> > > > > > > > limits the control clients have on the cluster and helps
>> Kafka
>> > > > > become a
>> > > > > > > > more self-serving system. Admin/Ops teams can better control
>> > the
>> > > > > impact
>> > > > > > > > application developers can have on a Kafka cluster with this
>> > > change
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Stanislav
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
>> > > > jason@confluent.io>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Stanislav,
>> > > > > > > > >
>> > > > > > > > > Thanks for the KIP. Can you clarify the compatibility
>> impact
>> > > > here?
>> > > > > > What
>> > > > > > > > > will happen to groups that are already larger than the max
>> > > size?
>> > > > > > Also,
>> > > > > > > > just
>> > > > > > > > > to be clear, the resource we are trying to conserve here
>> is
>> > > what?
>> > > > > > > Memory?
>> > > > > > > > >
>> > > > > > > > > -Jason
>> > > > > > > > >
>> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
>> > > bchen11@outlook.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Thanks Stanislav for the update! One suggestion I have
>> is
>> > > that
>> > > > it
>> > > > > > > would
>> > > > > > > > > be
>> > > > > > > > > > helpful to put your
>> > > > > > > > > >
>> > > > > > > > > > reasoning on deciding the current default value. For
>> > example,
>> > > > in
>> > > > > > > > certain
>> > > > > > > > > > use cases at Pinterest we are very likely
>> > > > > > > > > >
>> > > > > > > > > > to have more consumers than 250 when we configure 8
>> stream
>> > > > > > instances
>> > > > > > > > with
>> > > > > > > > > > 32 threads.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > For the effectiveness of this KIP, we should encourage
>> > people
>> > > > to
>> > > > > > > > discuss
>> > > > > > > > > > their opinions on the default setting and ideally reach
>> a
>> > > > > > consensus.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Best,
>> > > > > > > > > >
>> > > > > > > > > > Boyang
>> > > > > > > > > >
>> > > > > > > > > > ________________________________
>> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
>> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
>> > > > > > > > > > To: dev@kafka.apache.org
>> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size
>> to
>> > cap
>> > > > > > member
>> > > > > > > > > > metadata growth
>> > > > > > > > > >
>> > > > > > > > > > Hey everybody,
>> > > > > > > > > >
>> > > > > > > > > > It's been a week since this KIP and not much discussion
>> has
>> > > > been
>> > > > > > > made.
>> > > > > > > > > > I assume that this is a straight forward change and I
>> will
>> > > > open a
>> > > > > > > > voting
>> > > > > > > > > > thread in the next couple of days if nobody has
>> anything to
>> > > > > > suggest.
>> > > > > > > > > >
>> > > > > > > > > > Best,
>> > > > > > > > > > Stanislav
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
>> > > > > > > > > > stanislav@confluent.io>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Greetings everybody,
>> > > > > > > > > > >
>> > > > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
>> > > > section
>> > > > > > and
>> > > > > > > > also
>> > > > > > > > > > > renamed it.
>> > > > > > > > > > > KIP:
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
>> > > > > > > > > > >
>> > > > > > > > > > > I'm looking forward to discussions around it.
>> > > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Stanislav
>> > > > > > > > > > >
>> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
>> > > > > > > > > > > stanislav@confluent.io> wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >> Hey there everybody,
>> > > > > > > > > > >>
>> > > > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
>> > > effort
>> > > > > you
>> > > > > > > are
>> > > > > > > > > > >> putting into improving consumer behavior in Kafka.
>> > > > > > > > > > >>
>> > > > > > > > > > >> @Matt
>> > > > > > > > > > >> I also believe the default value is high. In my
>> opinion,
>> > > we
>> > > > > > should
>> > > > > > > > aim
>> > > > > > > > > > to
>> > > > > > > > > > >> a default cap around 250. This is because in the
>> current
>> > > > model
>> > > > > > any
>> > > > > > > > > > consumer
>> > > > > > > > > > >> rebalance is disrupting to every consumer. The bigger
>> > the
>> > > > > group,
>> > > > > > > the
>> > > > > > > > > > longer
>> > > > > > > > > > >> this period of disruption.
>> > > > > > > > > > >>
>> > > > > > > > > > >> If you have such a large consumer group, chances are
>> > that
>> > > > your
>> > > > > > > > > > >> client-side logic could be structured better and that
>> > you
>> > > > are
>> > > > > > not
>> > > > > > > > > using
>> > > > > > > > > > the
>> > > > > > > > > > >> high number of consumers to achieve high throughput.
>> > > > > > > > > > >> 250 can still be considered of a high upper bound, I
>> > > believe
>> > > > > in
>> > > > > > > > > practice
>> > > > > > > > > > >> users should aim to not go over 100 consumers per
>> > consumer
>> > > > > > group.
>> > > > > > > > > > >>
>> > > > > > > > > > >> In regards to the cap being global/per-broker, I
>> think
>> > > that
>> > > > we
>> > > > > > > > should
>> > > > > > > > > > >> consider whether we want it to be global or
>> *per-topic*.
>> > > For
>> > > > > the
>> > > > > > > > time
>> > > > > > > > > > >> being, I believe that having it per-topic with a
>> global
>> > > > > default
>> > > > > > > > might
>> > > > > > > > > be
>> > > > > > > > > > >> the best situation. Having it global only seems a bit
>> > > > > > restricting
>> > > > > > > to
>> > > > > > > > > me
>> > > > > > > > > > and
>> > > > > > > > > > >> it never hurts to support more fine-grained
>> > > configurability
>> > > > > > (given
>> > > > > > > > > it's
>> > > > > > > > > > the
>> > > > > > > > > > >> same config, not a new one being introduced).
>> > > > > > > > > > >>
>> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
>> > > > > > bchen11@outlook.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > >>
>> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to
>> any
>> > > > > > suggestion
>> > > > > > > to
>> > > > > > > > > > >>> change the default value. Meanwhile I just want to
>> > point
>> > > > out
>> > > > > > that
>> > > > > > > > > this
>> > > > > > > > > > >>> value is a just last line of defense, not a real
>> > scenario
>> > > > we
>> > > > > > > would
>> > > > > > > > > > expect.
>> > > > > > > > > > >>>
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he
>> > would
>> > > > be
>> > > > > > > > driving
>> > > > > > > > > > the
>> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea
>> in
>> > > the
>> > > > > > first
>> > > > > > > > > place
>> > > > > > > > > > and
>> > > > > > > > > > >>> had already come up a draft design, while I will
>> keep
>> > > > > focusing
>> > > > > > on
>> > > > > > > > > > KIP-345
>> > > > > > > > > > >>> effort to ensure solving the edge case described in
>> the
>> > > > JIRA<
>> > > > > > > > > > >>>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
>> > > > > > > > > > >.
>> > > > > > > > > > >>>
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Thank you Stanislav for making this happen!
>> > > > > > > > > > >>>
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Boyang
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> ________________________________
>> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
>> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
>> > > > > > > > > > >>> To: dev@kafka.apache.org
>> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
>> group.max.size
>> > to
>> > > > cap
>> > > > > > > > member
>> > > > > > > > > > >>> metadata growth
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Thanks for the KIP.
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Will this cap be a global cap across the entire
>> cluster
>> > > or
>> > > > > per
>> > > > > > > > > broker?
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Either way the default value seems a bit high to me,
>> > but
>> > > > that
>> > > > > > > could
>> > > > > > > > > > just
>> > > > > > > > > > >>> be
>> > > > > > > > > > >>> from my own usage patterns. I'd have probably
>> started
>> > > with
>> > > > > 500
>> > > > > > or
>> > > > > > > > 1k
>> > > > > > > > > > but
>> > > > > > > > > > >>> could be easily convinced that's wrong.
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Thanks,
>> > > > > > > > > > >>> Matt
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
>> > > > > > bchen11@outlook.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> > Hey folks,
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> > I would like to start a discussion on KIP-389:
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> > This is a pretty simple change to cap the consumer
>> > > group
>> > > > > size
>> > > > > > > for
>> > > > > > > > > > >>> broker
>> > > > > > > > > > >>> > stability. Give me your valuable feedback when you
>> > got
>> > > > > time.
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> > Thank you!
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>>
>> > > > > > > > > > >>
>> > > > > > > > > > >>
>> > > > > > > > > > >> --
>> > > > > > > > > > >> Best,
>> > > > > > > > > > >> Stanislav
>> > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > --
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Stanislav
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > --
>> > > > > > > > > > Best,
>> > > > > > > > > > Stanislav
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Best,
>> > > > > > > > Stanislav
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best,
>> > > > > > Stanislav
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best,
>> > > > Stanislav
>> > > >
>> > >
>> > >
>> > > --
>> > > Best,
>> > > Stanislav
>> > >
>> >
>> >
>> > --
>> > Best,
>> > Stanislav
>> >
>>
>
>
> --
> Best,
> Stanislav
>


--
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey there everybody, let's work on wrapping this discussion up.

@Boyang, could you clarify what you mean by
> One more question is whether you feel we should enforce group size cap
statically or on runtime?
Is that related to the option of enabling this config via the dynamic
broker config feature?

Regarding that - I feel it's useful to have and I also think it might not
introduce additional complexity. Ås long as we handle the config being
changed midway through a rebalance (via using the old value) we should be
good to go.

On Wed, Dec 12, 2018 at 4:12 PM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hey Jason,
>
> Yes, that is what I meant by
> > Given those constraints, I think that we can simply mark the group as
> `PreparingRebalance` with a rebalanceTimeout of the server setting `
> group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> but I can't seem to come up with a better alternative
> So either the timeout or all members calling joinGroup, yes
>
>
> On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com> wrote:
>
>> Hey Jason,
>>
>> I think this is the correct understanding. One more question is whether
>> you feel
>> we should enforce group size cap statically or on runtime?
>>
>> Boyang
>> ________________________________
>> From: Jason Gustafson <ja...@confluent.io>
>> Sent: Tuesday, December 11, 2018 3:24 AM
>> To: dev
>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> metadata growth
>>
>> Hey Stanislav,
>>
>> Just to clarify, I think what you're suggesting is something like this in
>> order to gracefully shrink the group:
>>
>> 1. Transition the group to PREPARING_REBALANCE. No members are kicked out.
>> 2. Continue to allow offset commits and heartbeats for all current
>> members.
>> 3. Allow the first n members that send JoinGroup to stay in the group, but
>> wait for the JoinGroup (or session timeout) from all active members before
>> finishing the rebalance.
>>
>> So basically we try to give the current members an opportunity to finish
>> work, but we prevent some of them from rejoining after the rebalance
>> completes. It sounds reasonable if I've understood correctly.
>>
>> Thanks,
>> Jason
>>
>>
>>
>> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com> wrote:
>>
>> > Yep, LGTM on my side. Thanks Stanislav!
>> > ________________________________
>> > From: Stanislav Kozlovski <st...@confluent.io>
>> > Sent: Friday, December 7, 2018 8:51 PM
>> > To: dev@kafka.apache.org
>> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> > metadata growth
>> >
>> > Hi,
>> >
>> > We discussed this offline with Boyang and figured that it's best to not
>> > wait on the Cooperative Rebalancing proposal. Our thinking is that we
>> can
>> > just force a rebalance from the broker, allowing consumers to commit
>> > offsets if their rebalanceListener is configured correctly.
>> > When rebalancing improvements are implemented, we assume that they would
>> > improve KIP-389's behavior as well as the normal rebalance scenarios
>> >
>> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com>
>> wrote:
>> >
>> > > Hey Stanislav,
>> > >
>> > > thanks for the question! `Trivial rebalance` means "we don't start
>> > > reassignment right now, but you need to know it's coming soon
>> > > and you should start preparation".
>> > >
>> > > An example KStream use case is that before actually starting to shrink
>> > the
>> > > consumer group, we need to
>> > > 1. partition the consumer group into two subgroups, where one will be
>> > > offline soon and the other will keep serving;
>> > > 2. make sure the states associated with near-future offline consumers
>> are
>> > > successfully replicated on the serving ones.
>> > >
>> > > As I have mentioned shrinking the consumer group is pretty much
>> > equivalent
>> > > to group scaling down, so we could think of this
>> > > as an add-on use case for cluster scaling. So my understanding is that
>> > the
>> > > KIP-389 could be sequenced within our cooperative rebalancing<
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
>> > > >
>> > > proposal.
>> > >
>> > > Let me know if this makes sense.
>> > >
>> > > Best,
>> > > Boyang
>> > > ________________________________
>> > > From: Stanislav Kozlovski <st...@confluent.io>
>> > > Sent: Wednesday, December 5, 2018 5:52 PM
>> > > To: dev@kafka.apache.org
>> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> > > metadata growth
>> > >
>> > > Hey Boyang,
>> > >
>> > > I think we still need to take care of group shrinkage because even if
>> > users
>> > > change the config value we cannot guarantee that all consumer groups
>> > would
>> > > have been manually shrunk.
>> > >
>> > > Regarding 2., I agree that forcefully triggering a rebalance might be
>> the
>> > > most intuitive way to handle the situation.
>> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
>> > > term.
>> > > I was thinking that maybe we could force a rebalance, which would
>> cause
>> > > consumers to commit their offsets (given their rebalanceListener is
>> > > configured correctly) and subsequently reject some of the incoming
>> > > `joinGroup` requests. Does that sound like it would work?
>> > >
>> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com>
>> wrote:
>> > >
>> > > > Hey Stanislav,
>> > > >
>> > > > I read the latest KIP and saw that we already changed the default
>> value
>> > > to
>> > > > -1. Do
>> > > > we still need to take care of the consumer group shrinking when
>> doing
>> > the
>> > > > upgrade?
>> > > >
>> > > > However this is an interesting topic that worth discussing. Although
>> > > > rolling
>> > > > upgrade is fine, `consumer.group.max.size` could always have
>> conflict
>> > > with
>> > > > the current
>> > > > consumer group size which means we need to adhere to one source of
>> > truth.
>> > > >
>> > > > 1.Choose the current group size, which means we never interrupt the
>> > > > consumer group until
>> > > > it transits to PREPARE_REBALANCE. And we keep track of how many join
>> > > group
>> > > > requests
>> > > > we have seen so far during PREPARE_REBALANCE. After reaching the
>> > consumer
>> > > > cap,
>> > > > we start to inform over provisioned consumers that you should send
>> > > > LeaveGroupRequest and
>> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could
>> mark
>> > > > extra members
>> > > > as hot backup and rebalance without them.
>> > > >
>> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
>> rebalancing
>> > > > (you proposed) could be of help here.
>> > > > When a new cap is enforced, leader should be notified. If the
>> current
>> > > > group size is already over limit, leader
>> > > > shall trigger a trivial rebalance to shuffle some topic partitions
>> and
>> > > let
>> > > > a subset of consumers prepare the ownership
>> > > > transition. Until they are ready, we trigger a real rebalance to
>> remove
>> > > > over-provisioned consumers. It is pretty much
>> > > > equivalent to `how do we scale down the consumer group without
>> > > > interrupting the current processing`.
>> > > >
>> > > > I personally feel inclined to 2 because we could kill two birds with
>> > one
>> > > > stone in a generic way. What do you think?
>> > > >
>> > > > Boyang
>> > > > ________________________________
>> > > > From: Stanislav Kozlovski <st...@confluent.io>
>> > > > Sent: Monday, December 3, 2018 8:35 PM
>> > > > To: dev@kafka.apache.org
>> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> > > > metadata growth
>> > > >
>> > > > Hi Jason,
>> > > >
>> > > > > 2. Do you think we should make this a dynamic config?
>> > > > I'm not sure. Looking at the config from the perspective of a
>> > > prescriptive
>> > > > config, we may get away with not updating it dynamically.
>> > > > But in my opinion, it always makes sense to have a config be
>> > dynamically
>> > > > configurable. As long as we limit it to being a cluster-wide
>> config, we
>> > > > should be fine.
>> > > >
>> > > > > 1. I think it would be helpful to clarify the details on how the
>> > > > coordinator will shrink the group. It will need to choose which
>> members
>> > > to
>> > > > remove. Are we going to give current members an opportunity to
>> commit
>> > > > offsets before kicking them from the group?
>> > > >
>> > > > This turns out to be somewhat tricky. I think that we may not be
>> able
>> > to
>> > > > guarantee that consumers don't process a message twice.
>> > > > My initial approach was to do as much as we could to let consumers
>> > commit
>> > > > offsets.
>> > > >
>> > > > I was thinking that we mark a group to be shrunk, we could keep a
>> map
>> > of
>> > > > consumer_id->boolean indicating whether they have committed
>> offsets. I
>> > > then
>> > > > thought we could delay the rebalance until every consumer commits
>> (or
>> > > some
>> > > > time passes).
>> > > > In the meantime, we would block all incoming fetch calls (by either
>> > > > returning empty records or a retriable error) and we would continue
>> to
>> > > > accept offset commits (even twice for a single consumer)
>> > > >
>> > > > I see two problems with this approach:
>> > > > * We have async offset commits, which implies that we can receive
>> fetch
>> > > > requests before the offset commit req has been handled. i.e consmer
>> > sends
>> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
>> > > > broker. Meaning we could have saved the offsets for B but rebalance
>> > > before
>> > > > the offsetCommit for the offsets processed in C come in.
>> > > > * KIP-392 Allow consumers to fetch from closest replica
>> > > > <
>> > > >
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
>> > > > >
>> > > > would
>> > > > make it significantly harder to block poll() calls on consumers
>> whose
>> > > > groups are being shrunk. Even if we implemented a solution, the same
>> > race
>> > > > condition noted above seems to apply and probably others
>> > > >
>> > > >
>> > > > Given those constraints, I think that we can simply mark the group
>> as
>> > > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
>> > > > group.max.session.timeout.ms`. That's a bit long by default (5
>> > minutes)
>> > > > but
>> > > > I can't seem to come up with a better alternative
>> > > >
>> > > > I'm interested in hearing your thoughts.
>> > > >
>> > > > Thanks,
>> > > > Stanislav
>> > > >
>> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <jason@confluent.io
>> >
>> > > > wrote:
>> > > >
>> > > > > Hey Stanislav,
>> > > > >
>> > > > > What do you think about the use case I mentioned in my previous
>> reply
>> > > > about
>> > > > > > a more resilient self-service Kafka? I believe the benefit
>> there is
>> > > > > bigger.
>> > > > >
>> > > > >
>> > > > > I see this config as analogous to the open file limit. Probably
>> this
>> > > > limit
>> > > > > was intended to be prescriptive at some point about what was
>> deemed a
>> > > > > reasonable number of open files for an application. But mostly
>> people
>> > > > treat
>> > > > > it as an annoyance which they have to work around. If it happens
>> to
>> > be
>> > > > hit,
>> > > > > usually you just increase it because it is not tied to an actual
>> > > resource
>> > > > > constraint. However, occasionally hitting the limit does indicate
>> an
>> > > > > application bug such as a leak, so I wouldn't say it is useless.
>> > > > Similarly,
>> > > > > the issue in KAFKA-7610 was a consumer leak and having this limit
>> > would
>> > > > > have allowed the problem to be detected before it impacted the
>> > cluster.
>> > > > To
>> > > > > me, that's the main benefit. It's possible that it could be used
>> > > > > prescriptively to prevent poor usage of groups, but like the open
>> > file
>> > > > > limit, I suspect administrators will just set it large enough that
>> > > users
>> > > > > are unlikely to complain.
>> > > > >
>> > > > > Anyway, just a couple additional questions:
>> > > > >
>> > > > > 1. I think it would be helpful to clarify the details on how the
>> > > > > coordinator will shrink the group. It will need to choose which
>> > members
>> > > > to
>> > > > > remove. Are we going to give current members an opportunity to
>> commit
>> > > > > offsets before kicking them from the group?
>> > > > >
>> > > > > 2. Do you think we should make this a dynamic config?
>> > > > >
>> > > > > Thanks,
>> > > > > Jason
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
>> > > > > stanislav@confluent.io>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Jason,
>> > > > > >
>> > > > > > You raise some very valid points.
>> > > > > >
>> > > > > > > The benefit of this KIP is probably limited to preventing
>> > "runaway"
>> > > > > > consumer groups due to leaks or some other application bug
>> > > > > > What do you think about the use case I mentioned in my previous
>> > reply
>> > > > > about
>> > > > > > a more resilient self-service Kafka? I believe the benefit
>> there is
>> > > > > bigger
>> > > > > >
>> > > > > > * Default value
>> > > > > > You're right, we probably do need to be conservative. Big
>> consumer
>> > > > groups
>> > > > > > are considered an anti-pattern and my goal was to also hint at
>> this
>> > > > > through
>> > > > > > the config's default. Regardless, it is better to not have the
>> > > > potential
>> > > > > to
>> > > > > > break applications with an upgrade.
>> > > > > > Choosing between the default of something big like 5000 or an
>> > opt-in
>> > > > > > option, I think we should go with the *disabled default option*
>> > > (-1).
>> > > > > > The only benefit we would get from a big default of 5000 is
>> default
>> > > > > > protection against buggy/malicious applications that hit the
>> > > KAFKA-7610
>> > > > > > issue.
>> > > > > > While this KIP was spawned from that issue, I believe its value
>> is
>> > > > > enabling
>> > > > > > the possibility of protection and helping move towards a more
>> > > > > self-service
>> > > > > > Kafka. I also think that a default value of 5000 might be
>> > misleading
>> > > to
>> > > > > > users and lead them to think that big consumer groups (> 250)
>> are a
>> > > > good
>> > > > > > thing.
>> > > > > >
>> > > > > > The good news is that KAFKA-7610 should be fully resolved and
>> the
>> > > > > rebalance
>> > > > > > protocol should, in general, be more solid after the planned
>> > > > improvements
>> > > > > > in KIP-345 and KIP-394.
>> > > > > >
>> > > > > > * Handling bigger groups during upgrade
>> > > > > > I now see that we store the state of consumer groups in the log
>> and
>> > > > why a
>> > > > > > rebalance isn't expected during a rolling upgrade.
>> > > > > > Since we're going with the default value of the max.size being
>> > > > disabled,
>> > > > > I
>> > > > > > believe we can afford to be more strict here.
>> > > > > > During state reloading of a new Coordinator with a defined
>> > > > max.group.size
>> > > > > > config, I believe we should *force* rebalances for groups that
>> > exceed
>> > > > the
>> > > > > > configured size. Then, only some consumers will be able to join
>> and
>> > > the
>> > > > > max
>> > > > > > size invariant will be satisfied.
>> > > > > >
>> > > > > > I updated the KIP with a migration plan, rejected alternatives
>> and
>> > > the
>> > > > > new
>> > > > > > default value.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Stanislav
>> > > > > >
>> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
>> > jason@confluent.io>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hey Stanislav,
>> > > > > > >
>> > > > > > > Clients will then find that coordinator
>> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
>> group,
>> > > since
>> > > > > the
>> > > > > > > > cache of active consumers is not stored outside the
>> > Coordinator's
>> > > > > > memory.
>> > > > > > > > (please do say if that is incorrect)
>> > > > > > >
>> > > > > > >
>> > > > > > > Groups do not typically rebalance after a coordinator change.
>> You
>> > > > could
>> > > > > > > potentially force a rebalance if the group is too big and kick
>> > out
>> > > > the
>> > > > > > > slowest members or something. A more graceful solution is
>> > probably
>> > > to
>> > > > > > just
>> > > > > > > accept the current size and prevent it from getting bigger. We
>> > > could
>> > > > > log
>> > > > > > a
>> > > > > > > warning potentially.
>> > > > > > >
>> > > > > > > My thinking is that we should abstract away from conserving
>> > > resources
>> > > > > and
>> > > > > > > > focus on giving control to the broker. The issue that
>> spawned
>> > > this
>> > > > > KIP
>> > > > > > > was
>> > > > > > > > a memory problem but I feel this change is useful in a more
>> > > general
>> > > > > > way.
>> > > > > > >
>> > > > > > >
>> > > > > > > So you probably already know why I'm asking about this. For
>> > > consumer
>> > > > > > groups
>> > > > > > > anyway, resource usage would typically be proportional to the
>> > > number
>> > > > of
>> > > > > > > partitions that a group is reading from and not the number of
>> > > > members.
>> > > > > > For
>> > > > > > > example, consider the memory use in the offsets cache. The
>> > benefit
>> > > of
>> > > > > > this
>> > > > > > > KIP is probably limited to preventing "runaway" consumer
>> groups
>> > due
>> > > > to
>> > > > > > > leaks or some other application bug. That still seems useful
>> > > though.
>> > > > > > >
>> > > > > > > I completely agree with this and I *ask everybody to chime in
>> > with
>> > > > > > opinions
>> > > > > > > > on a sensible default value*.
>> > > > > > >
>> > > > > > >
>> > > > > > > I think we would have to be very conservative. The group
>> protocol
>> > > is
>> > > > > > > generic in some sense, so there may be use cases we don't
>> know of
>> > > > where
>> > > > > > > larger groups are reasonable. Probably we should make this an
>> > > opt-in
>> > > > > > > feature so that we do not risk breaking anyone's application
>> > after
>> > > an
>> > > > > > > upgrade. Either that, or use a very high default like 5,000.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Jason
>> > > > > > >
>> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
>> > > > > > > stanislav@confluent.io>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hey Jason and Boyang, those were important comments
>> > > > > > > >
>> > > > > > > > > One suggestion I have is that it would be helpful to put
>> your
>> > > > > > reasoning
>> > > > > > > > on deciding the current default value. For example, in
>> certain
>> > > use
>> > > > > > cases
>> > > > > > > at
>> > > > > > > > Pinterest we are very likely to have more consumers than 250
>> > when
>> > > > we
>> > > > > > > > configure 8 stream instances with 32 threads.
>> > > > > > > > > For the effectiveness of this KIP, we should encourage
>> people
>> > > to
>> > > > > > > discuss
>> > > > > > > > their opinions on the default setting and ideally reach a
>> > > > consensus.
>> > > > > > > >
>> > > > > > > > I completely agree with this and I *ask everybody to chime
>> in
>> > > with
>> > > > > > > opinions
>> > > > > > > > on a sensible default value*.
>> > > > > > > > My thought process was that in the current model rebalances
>> in
>> > > > large
>> > > > > > > groups
>> > > > > > > > are more costly. I imagine most use cases in most Kafka
>> users
>> > do
>> > > > not
>> > > > > > > > require more than 250 consumers.
>> > > > > > > > Boyang, you say that you are "likely to have... when we..."
>> -
>> > do
>> > > > you
>> > > > > > have
>> > > > > > > > systems running with so many consumers in a group or are you
>> > > > planning
>> > > > > > > to? I
>> > > > > > > > guess what I'm asking is whether this has been tested in
>> > > production
>> > > > > > with
>> > > > > > > > the current rebalance model (ignoring KIP-345)
>> > > > > > > >
>> > > > > > > > >  Can you clarify the compatibility impact here? What
>> > > > > > > > > will happen to groups that are already larger than the max
>> > > size?
>> > > > > > > > This is a very important question.
>> > > > > > > > From my current understanding, when a coordinator broker
>> gets
>> > > shut
>> > > > > > > > down during a cluster rolling upgrade, a replica will take
>> > > > leadership
>> > > > > > of
>> > > > > > > > the `__offset_commits` partition. Clients will then find
>> that
>> > > > > > coordinator
>> > > > > > > > and send `joinGroup` on it, effectively rebuilding the
>> group,
>> > > since
>> > > > > the
>> > > > > > > > cache of active consumers is not stored outside the
>> > Coordinator's
>> > > > > > memory.
>> > > > > > > > (please do say if that is incorrect)
>> > > > > > > > Then, I believe that working as if this is a new group is a
>> > > > > reasonable
>> > > > > > > > approach. Namely, fail joinGroups when the max.size is
>> > exceeded.
>> > > > > > > > What do you guys think about this? (I'll update the KIP
>> after
>> > we
>> > > > > settle
>> > > > > > > on
>> > > > > > > > a solution)
>> > > > > > > >
>> > > > > > > > >  Also, just to be clear, the resource we are trying to
>> > conserve
>> > > > > here
>> > > > > > is
>> > > > > > > > what? Memory?
>> > > > > > > > My thinking is that we should abstract away from conserving
>> > > > resources
>> > > > > > and
>> > > > > > > > focus on giving control to the broker. The issue that
>> spawned
>> > > this
>> > > > > KIP
>> > > > > > > was
>> > > > > > > > a memory problem but I feel this change is useful in a more
>> > > general
>> > > > > > way.
>> > > > > > > It
>> > > > > > > > limits the control clients have on the cluster and helps
>> Kafka
>> > > > > become a
>> > > > > > > > more self-serving system. Admin/Ops teams can better control
>> > the
>> > > > > impact
>> > > > > > > > application developers can have on a Kafka cluster with this
>> > > change
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Stanislav
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
>> > > > jason@confluent.io>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Stanislav,
>> > > > > > > > >
>> > > > > > > > > Thanks for the KIP. Can you clarify the compatibility
>> impact
>> > > > here?
>> > > > > > What
>> > > > > > > > > will happen to groups that are already larger than the max
>> > > size?
>> > > > > > Also,
>> > > > > > > > just
>> > > > > > > > > to be clear, the resource we are trying to conserve here
>> is
>> > > what?
>> > > > > > > Memory?
>> > > > > > > > >
>> > > > > > > > > -Jason
>> > > > > > > > >
>> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
>> > > bchen11@outlook.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Thanks Stanislav for the update! One suggestion I have
>> is
>> > > that
>> > > > it
>> > > > > > > would
>> > > > > > > > > be
>> > > > > > > > > > helpful to put your
>> > > > > > > > > >
>> > > > > > > > > > reasoning on deciding the current default value. For
>> > example,
>> > > > in
>> > > > > > > > certain
>> > > > > > > > > > use cases at Pinterest we are very likely
>> > > > > > > > > >
>> > > > > > > > > > to have more consumers than 250 when we configure 8
>> stream
>> > > > > > instances
>> > > > > > > > with
>> > > > > > > > > > 32 threads.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > For the effectiveness of this KIP, we should encourage
>> > people
>> > > > to
>> > > > > > > > discuss
>> > > > > > > > > > their opinions on the default setting and ideally reach
>> a
>> > > > > > consensus.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Best,
>> > > > > > > > > >
>> > > > > > > > > > Boyang
>> > > > > > > > > >
>> > > > > > > > > > ________________________________
>> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
>> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
>> > > > > > > > > > To: dev@kafka.apache.org
>> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size
>> to
>> > cap
>> > > > > > member
>> > > > > > > > > > metadata growth
>> > > > > > > > > >
>> > > > > > > > > > Hey everybody,
>> > > > > > > > > >
>> > > > > > > > > > It's been a week since this KIP and not much discussion
>> has
>> > > > been
>> > > > > > > made.
>> > > > > > > > > > I assume that this is a straight forward change and I
>> will
>> > > > open a
>> > > > > > > > voting
>> > > > > > > > > > thread in the next couple of days if nobody has
>> anything to
>> > > > > > suggest.
>> > > > > > > > > >
>> > > > > > > > > > Best,
>> > > > > > > > > > Stanislav
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
>> > > > > > > > > > stanislav@confluent.io>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Greetings everybody,
>> > > > > > > > > > >
>> > > > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
>> > > > section
>> > > > > > and
>> > > > > > > > also
>> > > > > > > > > > > renamed it.
>> > > > > > > > > > > KIP:
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
>> > > > > > > > > > >
>> > > > > > > > > > > I'm looking forward to discussions around it.
>> > > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Stanislav
>> > > > > > > > > > >
>> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
>> > > > > > > > > > > stanislav@confluent.io> wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >> Hey there everybody,
>> > > > > > > > > > >>
>> > > > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
>> > > effort
>> > > > > you
>> > > > > > > are
>> > > > > > > > > > >> putting into improving consumer behavior in Kafka.
>> > > > > > > > > > >>
>> > > > > > > > > > >> @Matt
>> > > > > > > > > > >> I also believe the default value is high. In my
>> opinion,
>> > > we
>> > > > > > should
>> > > > > > > > aim
>> > > > > > > > > > to
>> > > > > > > > > > >> a default cap around 250. This is because in the
>> current
>> > > > model
>> > > > > > any
>> > > > > > > > > > consumer
>> > > > > > > > > > >> rebalance is disrupting to every consumer. The bigger
>> > the
>> > > > > group,
>> > > > > > > the
>> > > > > > > > > > longer
>> > > > > > > > > > >> this period of disruption.
>> > > > > > > > > > >>
>> > > > > > > > > > >> If you have such a large consumer group, chances are
>> > that
>> > > > your
>> > > > > > > > > > >> client-side logic could be structured better and that
>> > you
>> > > > are
>> > > > > > not
>> > > > > > > > > using
>> > > > > > > > > > the
>> > > > > > > > > > >> high number of consumers to achieve high throughput.
>> > > > > > > > > > >> 250 can still be considered of a high upper bound, I
>> > > believe
>> > > > > in
>> > > > > > > > > practice
>> > > > > > > > > > >> users should aim to not go over 100 consumers per
>> > consumer
>> > > > > > group.
>> > > > > > > > > > >>
>> > > > > > > > > > >> In regards to the cap being global/per-broker, I
>> think
>> > > that
>> > > > we
>> > > > > > > > should
>> > > > > > > > > > >> consider whether we want it to be global or
>> *per-topic*.
>> > > For
>> > > > > the
>> > > > > > > > time
>> > > > > > > > > > >> being, I believe that having it per-topic with a
>> global
>> > > > > default
>> > > > > > > > might
>> > > > > > > > > be
>> > > > > > > > > > >> the best situation. Having it global only seems a bit
>> > > > > > restricting
>> > > > > > > to
>> > > > > > > > > me
>> > > > > > > > > > and
>> > > > > > > > > > >> it never hurts to support more fine-grained
>> > > configurability
>> > > > > > (given
>> > > > > > > > > it's
>> > > > > > > > > > the
>> > > > > > > > > > >> same config, not a new one being introduced).
>> > > > > > > > > > >>
>> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
>> > > > > > bchen11@outlook.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > >>
>> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to
>> any
>> > > > > > suggestion
>> > > > > > > to
>> > > > > > > > > > >>> change the default value. Meanwhile I just want to
>> > point
>> > > > out
>> > > > > > that
>> > > > > > > > > this
>> > > > > > > > > > >>> value is a just last line of defense, not a real
>> > scenario
>> > > > we
>> > > > > > > would
>> > > > > > > > > > expect.
>> > > > > > > > > > >>>
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he
>> > would
>> > > > be
>> > > > > > > > driving
>> > > > > > > > > > the
>> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea
>> in
>> > > the
>> > > > > > first
>> > > > > > > > > place
>> > > > > > > > > > and
>> > > > > > > > > > >>> had already come up a draft design, while I will
>> keep
>> > > > > focusing
>> > > > > > on
>> > > > > > > > > > KIP-345
>> > > > > > > > > > >>> effort to ensure solving the edge case described in
>> the
>> > > > JIRA<
>> > > > > > > > > > >>>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
>> > > > > > > > > > >.
>> > > > > > > > > > >>>
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Thank you Stanislav for making this happen!
>> > > > > > > > > > >>>
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Boyang
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> ________________________________
>> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
>> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
>> > > > > > > > > > >>> To: dev@kafka.apache.org
>> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
>> group.max.size
>> > to
>> > > > cap
>> > > > > > > > member
>> > > > > > > > > > >>> metadata growth
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Thanks for the KIP.
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Will this cap be a global cap across the entire
>> cluster
>> > > or
>> > > > > per
>> > > > > > > > > broker?
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Either way the default value seems a bit high to me,
>> > but
>> > > > that
>> > > > > > > could
>> > > > > > > > > > just
>> > > > > > > > > > >>> be
>> > > > > > > > > > >>> from my own usage patterns. I'd have probably
>> started
>> > > with
>> > > > > 500
>> > > > > > or
>> > > > > > > > 1k
>> > > > > > > > > > but
>> > > > > > > > > > >>> could be easily convinced that's wrong.
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> Thanks,
>> > > > > > > > > > >>> Matt
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
>> > > > > > bchen11@outlook.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > >>>
>> > > > > > > > > > >>> > Hey folks,
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> > I would like to start a discussion on KIP-389:
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> > This is a pretty simple change to cap the consumer
>> > > group
>> > > > > size
>> > > > > > > for
>> > > > > > > > > > >>> broker
>> > > > > > > > > > >>> > stability. Give me your valuable feedback when you
>> > got
>> > > > > time.
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>> > Thank you!
>> > > > > > > > > > >>> >
>> > > > > > > > > > >>>
>> > > > > > > > > > >>
>> > > > > > > > > > >>
>> > > > > > > > > > >> --
>> > > > > > > > > > >> Best,
>> > > > > > > > > > >> Stanislav
>> > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > --
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Stanislav
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > --
>> > > > > > > > > > Best,
>> > > > > > > > > > Stanislav
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Best,
>> > > > > > > > Stanislav
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best,
>> > > > > > Stanislav
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best,
>> > > > Stanislav
>> > > >
>> > >
>> > >
>> > > --
>> > > Best,
>> > > Stanislav
>> > >
>> >
>> >
>> > --
>> > Best,
>> > Stanislav
>> >
>>
>
>
> --
> Best,
> Stanislav
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey Jason,

Yes, that is what I meant by
> Given those constraints, I think that we can simply mark the group as
`PreparingRebalance` with a rebalanceTimeout of the server setting `
group.max.session.timeout.ms`. That's a bit long by default (5 minutes) but
I can't seem to come up with a better alternative
So either the timeout or all members calling joinGroup, yes


On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <bc...@outlook.com> wrote:

> Hey Jason,
>
> I think this is the correct understanding. One more question is whether
> you feel
> we should enforce group size cap statically or on runtime?
>
> Boyang
> ________________________________
> From: Jason Gustafson <ja...@confluent.io>
> Sent: Tuesday, December 11, 2018 3:24 AM
> To: dev
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hey Stanislav,
>
> Just to clarify, I think what you're suggesting is something like this in
> order to gracefully shrink the group:
>
> 1. Transition the group to PREPARING_REBALANCE. No members are kicked out.
> 2. Continue to allow offset commits and heartbeats for all current members.
> 3. Allow the first n members that send JoinGroup to stay in the group, but
> wait for the JoinGroup (or session timeout) from all active members before
> finishing the rebalance.
>
> So basically we try to give the current members an opportunity to finish
> work, but we prevent some of them from rejoining after the rebalance
> completes. It sounds reasonable if I've understood correctly.
>
> Thanks,
> Jason
>
>
>
> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com> wrote:
>
> > Yep, LGTM on my side. Thanks Stanislav!
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Friday, December 7, 2018 8:51 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hi,
> >
> > We discussed this offline with Boyang and figured that it's best to not
> > wait on the Cooperative Rebalancing proposal. Our thinking is that we can
> > just force a rebalance from the broker, allowing consumers to commit
> > offsets if their rebalanceListener is configured correctly.
> > When rebalancing improvements are implemented, we assume that they would
> > improve KIP-389's behavior as well as the normal rebalance scenarios
> >
> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com> wrote:
> >
> > > Hey Stanislav,
> > >
> > > thanks for the question! `Trivial rebalance` means "we don't start
> > > reassignment right now, but you need to know it's coming soon
> > > and you should start preparation".
> > >
> > > An example KStream use case is that before actually starting to shrink
> > the
> > > consumer group, we need to
> > > 1. partition the consumer group into two subgroups, where one will be
> > > offline soon and the other will keep serving;
> > > 2. make sure the states associated with near-future offline consumers
> are
> > > successfully replicated on the serving ones.
> > >
> > > As I have mentioned shrinking the consumer group is pretty much
> > equivalent
> > > to group scaling down, so we could think of this
> > > as an add-on use case for cluster scaling. So my understanding is that
> > the
> > > KIP-389 could be sequenced within our cooperative rebalancing<
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > > >
> > > proposal.
> > >
> > > Let me know if this makes sense.
> > >
> > > Best,
> > > Boyang
> > > ________________________________
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hey Boyang,
> > >
> > > I think we still need to take care of group shrinkage because even if
> > users
> > > change the config value we cannot guarantee that all consumer groups
> > would
> > > have been manually shrunk.
> > >
> > > Regarding 2., I agree that forcefully triggering a rebalance might be
> the
> > > most intuitive way to handle the situation.
> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
> > > term.
> > > I was thinking that maybe we could force a rebalance, which would cause
> > > consumers to commit their offsets (given their rebalanceListener is
> > > configured correctly) and subsequently reject some of the incoming
> > > `joinGroup` requests. Does that sound like it would work?
> > >
> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com>
> wrote:
> > >
> > > > Hey Stanislav,
> > > >
> > > > I read the latest KIP and saw that we already changed the default
> value
> > > to
> > > > -1. Do
> > > > we still need to take care of the consumer group shrinking when doing
> > the
> > > > upgrade?
> > > >
> > > > However this is an interesting topic that worth discussing. Although
> > > > rolling
> > > > upgrade is fine, `consumer.group.max.size` could always have conflict
> > > with
> > > > the current
> > > > consumer group size which means we need to adhere to one source of
> > truth.
> > > >
> > > > 1.Choose the current group size, which means we never interrupt the
> > > > consumer group until
> > > > it transits to PREPARE_REBALANCE. And we keep track of how many join
> > > group
> > > > requests
> > > > we have seen so far during PREPARE_REBALANCE. After reaching the
> > consumer
> > > > cap,
> > > > we start to inform over provisioned consumers that you should send
> > > > LeaveGroupRequest and
> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could
> mark
> > > > extra members
> > > > as hot backup and rebalance without them.
> > > >
> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> rebalancing
> > > > (you proposed) could be of help here.
> > > > When a new cap is enforced, leader should be notified. If the current
> > > > group size is already over limit, leader
> > > > shall trigger a trivial rebalance to shuffle some topic partitions
> and
> > > let
> > > > a subset of consumers prepare the ownership
> > > > transition. Until they are ready, we trigger a real rebalance to
> remove
> > > > over-provisioned consumers. It is pretty much
> > > > equivalent to `how do we scale down the consumer group without
> > > > interrupting the current processing`.
> > > >
> > > > I personally feel inclined to 2 because we could kill two birds with
> > one
> > > > stone in a generic way. What do you think?
> > > >
> > > > Boyang
> > > > ________________________________
> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > Sent: Monday, December 3, 2018 8:35 PM
> > > > To: dev@kafka.apache.org
> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > > metadata growth
> > > >
> > > > Hi Jason,
> > > >
> > > > > 2. Do you think we should make this a dynamic config?
> > > > I'm not sure. Looking at the config from the perspective of a
> > > prescriptive
> > > > config, we may get away with not updating it dynamically.
> > > > But in my opinion, it always makes sense to have a config be
> > dynamically
> > > > configurable. As long as we limit it to being a cluster-wide config,
> we
> > > > should be fine.
> > > >
> > > > > 1. I think it would be helpful to clarify the details on how the
> > > > coordinator will shrink the group. It will need to choose which
> members
> > > to
> > > > remove. Are we going to give current members an opportunity to commit
> > > > offsets before kicking them from the group?
> > > >
> > > > This turns out to be somewhat tricky. I think that we may not be able
> > to
> > > > guarantee that consumers don't process a message twice.
> > > > My initial approach was to do as much as we could to let consumers
> > commit
> > > > offsets.
> > > >
> > > > I was thinking that we mark a group to be shrunk, we could keep a map
> > of
> > > > consumer_id->boolean indicating whether they have committed offsets.
> I
> > > then
> > > > thought we could delay the rebalance until every consumer commits (or
> > > some
> > > > time passes).
> > > > In the meantime, we would block all incoming fetch calls (by either
> > > > returning empty records or a retriable error) and we would continue
> to
> > > > accept offset commits (even twice for a single consumer)
> > > >
> > > > I see two problems with this approach:
> > > > * We have async offset commits, which implies that we can receive
> fetch
> > > > requests before the offset commit req has been handled. i.e consmer
> > sends
> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> > > > broker. Meaning we could have saved the offsets for B but rebalance
> > > before
> > > > the offsetCommit for the offsets processed in C come in.
> > > > * KIP-392 Allow consumers to fetch from closest replica
> > > > <
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > > > >
> > > > would
> > > > make it significantly harder to block poll() calls on consumers whose
> > > > groups are being shrunk. Even if we implemented a solution, the same
> > race
> > > > condition noted above seems to apply and probably others
> > > >
> > > >
> > > > Given those constraints, I think that we can simply mark the group as
> > > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > > group.max.session.timeout.ms`. That's a bit long by default (5
> > minutes)
> > > > but
> > > > I can't seem to come up with a better alternative
> > > >
> > > > I'm interested in hearing your thoughts.
> > > >
> > > > Thanks,
> > > > Stanislav
> > > >
> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Stanislav,
> > > > >
> > > > > What do you think about the use case I mentioned in my previous
> reply
> > > > about
> > > > > > a more resilient self-service Kafka? I believe the benefit there
> is
> > > > > bigger.
> > > > >
> > > > >
> > > > > I see this config as analogous to the open file limit. Probably
> this
> > > > limit
> > > > > was intended to be prescriptive at some point about what was
> deemed a
> > > > > reasonable number of open files for an application. But mostly
> people
> > > > treat
> > > > > it as an annoyance which they have to work around. If it happens to
> > be
> > > > hit,
> > > > > usually you just increase it because it is not tied to an actual
> > > resource
> > > > > constraint. However, occasionally hitting the limit does indicate
> an
> > > > > application bug such as a leak, so I wouldn't say it is useless.
> > > > Similarly,
> > > > > the issue in KAFKA-7610 was a consumer leak and having this limit
> > would
> > > > > have allowed the problem to be detected before it impacted the
> > cluster.
> > > > To
> > > > > me, that's the main benefit. It's possible that it could be used
> > > > > prescriptively to prevent poor usage of groups, but like the open
> > file
> > > > > limit, I suspect administrators will just set it large enough that
> > > users
> > > > > are unlikely to complain.
> > > > >
> > > > > Anyway, just a couple additional questions:
> > > > >
> > > > > 1. I think it would be helpful to clarify the details on how the
> > > > > coordinator will shrink the group. It will need to choose which
> > members
> > > > to
> > > > > remove. Are we going to give current members an opportunity to
> commit
> > > > > offsets before kicking them from the group?
> > > > >
> > > > > 2. Do you think we should make this a dynamic config?
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > > > stanislav@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Jason,
> > > > > >
> > > > > > You raise some very valid points.
> > > > > >
> > > > > > > The benefit of this KIP is probably limited to preventing
> > "runaway"
> > > > > > consumer groups due to leaks or some other application bug
> > > > > > What do you think about the use case I mentioned in my previous
> > reply
> > > > > about
> > > > > > a more resilient self-service Kafka? I believe the benefit there
> is
> > > > > bigger
> > > > > >
> > > > > > * Default value
> > > > > > You're right, we probably do need to be conservative. Big
> consumer
> > > > groups
> > > > > > are considered an anti-pattern and my goal was to also hint at
> this
> > > > > through
> > > > > > the config's default. Regardless, it is better to not have the
> > > > potential
> > > > > to
> > > > > > break applications with an upgrade.
> > > > > > Choosing between the default of something big like 5000 or an
> > opt-in
> > > > > > option, I think we should go with the *disabled default option*
> > > (-1).
> > > > > > The only benefit we would get from a big default of 5000 is
> default
> > > > > > protection against buggy/malicious applications that hit the
> > > KAFKA-7610
> > > > > > issue.
> > > > > > While this KIP was spawned from that issue, I believe its value
> is
> > > > > enabling
> > > > > > the possibility of protection and helping move towards a more
> > > > > self-service
> > > > > > Kafka. I also think that a default value of 5000 might be
> > misleading
> > > to
> > > > > > users and lead them to think that big consumer groups (> 250)
> are a
> > > > good
> > > > > > thing.
> > > > > >
> > > > > > The good news is that KAFKA-7610 should be fully resolved and the
> > > > > rebalance
> > > > > > protocol should, in general, be more solid after the planned
> > > > improvements
> > > > > > in KIP-345 and KIP-394.
> > > > > >
> > > > > > * Handling bigger groups during upgrade
> > > > > > I now see that we store the state of consumer groups in the log
> and
> > > > why a
> > > > > > rebalance isn't expected during a rolling upgrade.
> > > > > > Since we're going with the default value of the max.size being
> > > > disabled,
> > > > > I
> > > > > > believe we can afford to be more strict here.
> > > > > > During state reloading of a new Coordinator with a defined
> > > > max.group.size
> > > > > > config, I believe we should *force* rebalances for groups that
> > exceed
> > > > the
> > > > > > configured size. Then, only some consumers will be able to join
> and
> > > the
> > > > > max
> > > > > > size invariant will be satisfied.
> > > > > >
> > > > > > I updated the KIP with a migration plan, rejected alternatives
> and
> > > the
> > > > > new
> > > > > > default value.
> > > > > >
> > > > > > Thanks,
> > > > > > Stanislav
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Stanislav,
> > > > > > >
> > > > > > > Clients will then find that coordinator
> > > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > > since
> > > > > the
> > > > > > > > cache of active consumers is not stored outside the
> > Coordinator's
> > > > > > memory.
> > > > > > > > (please do say if that is incorrect)
> > > > > > >
> > > > > > >
> > > > > > > Groups do not typically rebalance after a coordinator change.
> You
> > > > could
> > > > > > > potentially force a rebalance if the group is too big and kick
> > out
> > > > the
> > > > > > > slowest members or something. A more graceful solution is
> > probably
> > > to
> > > > > > just
> > > > > > > accept the current size and prevent it from getting bigger. We
> > > could
> > > > > log
> > > > > > a
> > > > > > > warning potentially.
> > > > > > >
> > > > > > > My thinking is that we should abstract away from conserving
> > > resources
> > > > > and
> > > > > > > > focus on giving control to the broker. The issue that spawned
> > > this
> > > > > KIP
> > > > > > > was
> > > > > > > > a memory problem but I feel this change is useful in a more
> > > general
> > > > > > way.
> > > > > > >
> > > > > > >
> > > > > > > So you probably already know why I'm asking about this. For
> > > consumer
> > > > > > groups
> > > > > > > anyway, resource usage would typically be proportional to the
> > > number
> > > > of
> > > > > > > partitions that a group is reading from and not the number of
> > > > members.
> > > > > > For
> > > > > > > example, consider the memory use in the offsets cache. The
> > benefit
> > > of
> > > > > > this
> > > > > > > KIP is probably limited to preventing "runaway" consumer groups
> > due
> > > > to
> > > > > > > leaks or some other application bug. That still seems useful
> > > though.
> > > > > > >
> > > > > > > I completely agree with this and I *ask everybody to chime in
> > with
> > > > > > opinions
> > > > > > > > on a sensible default value*.
> > > > > > >
> > > > > > >
> > > > > > > I think we would have to be very conservative. The group
> protocol
> > > is
> > > > > > > generic in some sense, so there may be use cases we don't know
> of
> > > > where
> > > > > > > larger groups are reasonable. Probably we should make this an
> > > opt-in
> > > > > > > feature so that we do not risk breaking anyone's application
> > after
> > > an
> > > > > > > upgrade. Either that, or use a very high default like 5,000.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > > > > stanislav@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Jason and Boyang, those were important comments
> > > > > > > >
> > > > > > > > > One suggestion I have is that it would be helpful to put
> your
> > > > > > reasoning
> > > > > > > > on deciding the current default value. For example, in
> certain
> > > use
> > > > > > cases
> > > > > > > at
> > > > > > > > Pinterest we are very likely to have more consumers than 250
> > when
> > > > we
> > > > > > > > configure 8 stream instances with 32 threads.
> > > > > > > > > For the effectiveness of this KIP, we should encourage
> people
> > > to
> > > > > > > discuss
> > > > > > > > their opinions on the default setting and ideally reach a
> > > > consensus.
> > > > > > > >
> > > > > > > > I completely agree with this and I *ask everybody to chime in
> > > with
> > > > > > > opinions
> > > > > > > > on a sensible default value*.
> > > > > > > > My thought process was that in the current model rebalances
> in
> > > > large
> > > > > > > groups
> > > > > > > > are more costly. I imagine most use cases in most Kafka users
> > do
> > > > not
> > > > > > > > require more than 250 consumers.
> > > > > > > > Boyang, you say that you are "likely to have... when we..." -
> > do
> > > > you
> > > > > > have
> > > > > > > > systems running with so many consumers in a group or are you
> > > > planning
> > > > > > > to? I
> > > > > > > > guess what I'm asking is whether this has been tested in
> > > production
> > > > > > with
> > > > > > > > the current rebalance model (ignoring KIP-345)
> > > > > > > >
> > > > > > > > >  Can you clarify the compatibility impact here? What
> > > > > > > > > will happen to groups that are already larger than the max
> > > size?
> > > > > > > > This is a very important question.
> > > > > > > > From my current understanding, when a coordinator broker gets
> > > shut
> > > > > > > > down during a cluster rolling upgrade, a replica will take
> > > > leadership
> > > > > > of
> > > > > > > > the `__offset_commits` partition. Clients will then find that
> > > > > > coordinator
> > > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > > since
> > > > > the
> > > > > > > > cache of active consumers is not stored outside the
> > Coordinator's
> > > > > > memory.
> > > > > > > > (please do say if that is incorrect)
> > > > > > > > Then, I believe that working as if this is a new group is a
> > > > > reasonable
> > > > > > > > approach. Namely, fail joinGroups when the max.size is
> > exceeded.
> > > > > > > > What do you guys think about this? (I'll update the KIP after
> > we
> > > > > settle
> > > > > > > on
> > > > > > > > a solution)
> > > > > > > >
> > > > > > > > >  Also, just to be clear, the resource we are trying to
> > conserve
> > > > > here
> > > > > > is
> > > > > > > > what? Memory?
> > > > > > > > My thinking is that we should abstract away from conserving
> > > > resources
> > > > > > and
> > > > > > > > focus on giving control to the broker. The issue that spawned
> > > this
> > > > > KIP
> > > > > > > was
> > > > > > > > a memory problem but I feel this change is useful in a more
> > > general
> > > > > > way.
> > > > > > > It
> > > > > > > > limits the control clients have on the cluster and helps
> Kafka
> > > > > become a
> > > > > > > > more self-serving system. Admin/Ops teams can better control
> > the
> > > > > impact
> > > > > > > > application developers can have on a Kafka cluster with this
> > > change
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > > jason@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Stanislav,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Can you clarify the compatibility
> impact
> > > > here?
> > > > > > What
> > > > > > > > > will happen to groups that are already larger than the max
> > > size?
> > > > > > Also,
> > > > > > > > just
> > > > > > > > > to be clear, the resource we are trying to conserve here is
> > > what?
> > > > > > > Memory?
> > > > > > > > >
> > > > > > > > > -Jason
> > > > > > > > >
> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > > bchen11@outlook.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks Stanislav for the update! One suggestion I have is
> > > that
> > > > it
> > > > > > > would
> > > > > > > > > be
> > > > > > > > > > helpful to put your
> > > > > > > > > >
> > > > > > > > > > reasoning on deciding the current default value. For
> > example,
> > > > in
> > > > > > > > certain
> > > > > > > > > > use cases at Pinterest we are very likely
> > > > > > > > > >
> > > > > > > > > > to have more consumers than 250 when we configure 8
> stream
> > > > > > instances
> > > > > > > > with
> > > > > > > > > > 32 threads.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > For the effectiveness of this KIP, we should encourage
> > people
> > > > to
> > > > > > > > discuss
> > > > > > > > > > their opinions on the default setting and ideally reach a
> > > > > > consensus.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > >
> > > > > > > > > > Boyang
> > > > > > > > > >
> > > > > > > > > > ________________________________
> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> > cap
> > > > > > member
> > > > > > > > > > metadata growth
> > > > > > > > > >
> > > > > > > > > > Hey everybody,
> > > > > > > > > >
> > > > > > > > > > It's been a week since this KIP and not much discussion
> has
> > > > been
> > > > > > > made.
> > > > > > > > > > I assume that this is a straight forward change and I
> will
> > > > open a
> > > > > > > > voting
> > > > > > > > > > thread in the next couple of days if nobody has anything
> to
> > > > > > suggest.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > > > > stanislav@confluent.io>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Greetings everybody,
> > > > > > > > > > >
> > > > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> > > > section
> > > > > > and
> > > > > > > > also
> > > > > > > > > > > renamed it.
> > > > > > > > > > > KIP:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > > > > > > > > > >
> > > > > > > > > > > I'm looking forward to discussions around it.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stanislav
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > > > > stanislav@confluent.io> wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Hey there everybody,
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
> > > effort
> > > > > you
> > > > > > > are
> > > > > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > > > > >>
> > > > > > > > > > >> @Matt
> > > > > > > > > > >> I also believe the default value is high. In my
> opinion,
> > > we
> > > > > > should
> > > > > > > > aim
> > > > > > > > > > to
> > > > > > > > > > >> a default cap around 250. This is because in the
> current
> > > > model
> > > > > > any
> > > > > > > > > > consumer
> > > > > > > > > > >> rebalance is disrupting to every consumer. The bigger
> > the
> > > > > group,
> > > > > > > the
> > > > > > > > > > longer
> > > > > > > > > > >> this period of disruption.
> > > > > > > > > > >>
> > > > > > > > > > >> If you have such a large consumer group, chances are
> > that
> > > > your
> > > > > > > > > > >> client-side logic could be structured better and that
> > you
> > > > are
> > > > > > not
> > > > > > > > > using
> > > > > > > > > > the
> > > > > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > > > > >> 250 can still be considered of a high upper bound, I
> > > believe
> > > > > in
> > > > > > > > > practice
> > > > > > > > > > >> users should aim to not go over 100 consumers per
> > consumer
> > > > > > group.
> > > > > > > > > > >>
> > > > > > > > > > >> In regards to the cap being global/per-broker, I think
> > > that
> > > > we
> > > > > > > > should
> > > > > > > > > > >> consider whether we want it to be global or
> *per-topic*.
> > > For
> > > > > the
> > > > > > > > time
> > > > > > > > > > >> being, I believe that having it per-topic with a
> global
> > > > > default
> > > > > > > > might
> > > > > > > > > be
> > > > > > > > > > >> the best situation. Having it global only seems a bit
> > > > > > restricting
> > > > > > > to
> > > > > > > > > me
> > > > > > > > > > and
> > > > > > > > > > >> it never hurts to support more fine-grained
> > > configurability
> > > > > > (given
> > > > > > > > > it's
> > > > > > > > > > the
> > > > > > > > > > >> same config, not a new one being introduced).
> > > > > > > > > > >>
> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > > > > bchen11@outlook.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > > > > suggestion
> > > > > > > to
> > > > > > > > > > >>> change the default value. Meanwhile I just want to
> > point
> > > > out
> > > > > > that
> > > > > > > > > this
> > > > > > > > > > >>> value is a just last line of defense, not a real
> > scenario
> > > > we
> > > > > > > would
> > > > > > > > > > expect.
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he
> > would
> > > > be
> > > > > > > > driving
> > > > > > > > > > the
> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea
> in
> > > the
> > > > > > first
> > > > > > > > > place
> > > > > > > > > > and
> > > > > > > > > > >>> had already come up a draft design, while I will keep
> > > > > focusing
> > > > > > on
> > > > > > > > > > KIP-345
> > > > > > > > > > >>> effort to ensure solving the edge case described in
> the
> > > > JIRA<
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > > > > > > > > > >.
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> Boyang
> > > > > > > > > > >>>
> > > > > > > > > > >>> ________________________________
> > > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > > > > >>> To: dev@kafka.apache.org
> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> group.max.size
> > to
> > > > cap
> > > > > > > > member
> > > > > > > > > > >>> metadata growth
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thanks for the KIP.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Will this cap be a global cap across the entire
> cluster
> > > or
> > > > > per
> > > > > > > > > broker?
> > > > > > > > > > >>>
> > > > > > > > > > >>> Either way the default value seems a bit high to me,
> > but
> > > > that
> > > > > > > could
> > > > > > > > > > just
> > > > > > > > > > >>> be
> > > > > > > > > > >>> from my own usage patterns. I'd have probably started
> > > with
> > > > > 500
> > > > > > or
> > > > > > > > 1k
> > > > > > > > > > but
> > > > > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thanks,
> > > > > > > > > > >>> Matt
> > > > > > > > > > >>>
> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > > > > bchen11@outlook.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >>>
> > > > > > > > > > >>> > Hey folks,
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>> > This is a pretty simple change to cap the consumer
> > > group
> > > > > size
> > > > > > > for
> > > > > > > > > > >>> broker
> > > > > > > > > > >>> > stability. Give me your valuable feedback when you
> > got
> > > > > time.
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>> > Thank you!
> > > > > > > > > > >>> >
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Best,
> > > > > > > > > > >> Stanislav
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best,
> > > > > > > > > > > Stanislav
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Hey Jason,

I think this is the correct understanding. One more question is whether you feel
we should enforce group size cap statically or on runtime?

Boyang
________________________________
From: Jason Gustafson <ja...@confluent.io>
Sent: Tuesday, December 11, 2018 3:24 AM
To: dev
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Hey Stanislav,

Just to clarify, I think what you're suggesting is something like this in
order to gracefully shrink the group:

1. Transition the group to PREPARING_REBALANCE. No members are kicked out.
2. Continue to allow offset commits and heartbeats for all current members.
3. Allow the first n members that send JoinGroup to stay in the group, but
wait for the JoinGroup (or session timeout) from all active members before
finishing the rebalance.

So basically we try to give the current members an opportunity to finish
work, but we prevent some of them from rejoining after the rebalance
completes. It sounds reasonable if I've understood correctly.

Thanks,
Jason



On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com> wrote:

> Yep, LGTM on my side. Thanks Stanislav!
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Friday, December 7, 2018 8:51 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hi,
>
> We discussed this offline with Boyang and figured that it's best to not
> wait on the Cooperative Rebalancing proposal. Our thinking is that we can
> just force a rebalance from the broker, allowing consumers to commit
> offsets if their rebalanceListener is configured correctly.
> When rebalancing improvements are implemented, we assume that they would
> improve KIP-389's behavior as well as the normal rebalance scenarios
>
> On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com> wrote:
>
> > Hey Stanislav,
> >
> > thanks for the question! `Trivial rebalance` means "we don't start
> > reassignment right now, but you need to know it's coming soon
> > and you should start preparation".
> >
> > An example KStream use case is that before actually starting to shrink
> the
> > consumer group, we need to
> > 1. partition the consumer group into two subgroups, where one will be
> > offline soon and the other will keep serving;
> > 2. make sure the states associated with near-future offline consumers are
> > successfully replicated on the serving ones.
> >
> > As I have mentioned shrinking the consumer group is pretty much
> equivalent
> > to group scaling down, so we could think of this
> > as an add-on use case for cluster scaling. So my understanding is that
> the
> > KIP-389 could be sequenced within our cooperative rebalancing<
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > >
> > proposal.
> >
> > Let me know if this makes sense.
> >
> > Best,
> > Boyang
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Wednesday, December 5, 2018 5:52 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hey Boyang,
> >
> > I think we still need to take care of group shrinkage because even if
> users
> > change the config value we cannot guarantee that all consumer groups
> would
> > have been manually shrunk.
> >
> > Regarding 2., I agree that forcefully triggering a rebalance might be the
> > most intuitive way to handle the situation.
> > What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
> > term.
> > I was thinking that maybe we could force a rebalance, which would cause
> > consumers to commit their offsets (given their rebalanceListener is
> > configured correctly) and subsequently reject some of the incoming
> > `joinGroup` requests. Does that sound like it would work?
> >
> > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com> wrote:
> >
> > > Hey Stanislav,
> > >
> > > I read the latest KIP and saw that we already changed the default value
> > to
> > > -1. Do
> > > we still need to take care of the consumer group shrinking when doing
> the
> > > upgrade?
> > >
> > > However this is an interesting topic that worth discussing. Although
> > > rolling
> > > upgrade is fine, `consumer.group.max.size` could always have conflict
> > with
> > > the current
> > > consumer group size which means we need to adhere to one source of
> truth.
> > >
> > > 1.Choose the current group size, which means we never interrupt the
> > > consumer group until
> > > it transits to PREPARE_REBALANCE. And we keep track of how many join
> > group
> > > requests
> > > we have seen so far during PREPARE_REBALANCE. After reaching the
> consumer
> > > cap,
> > > we start to inform over provisioned consumers that you should send
> > > LeaveGroupRequest and
> > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could mark
> > > extra members
> > > as hot backup and rebalance without them.
> > >
> > > 2.Choose the `consumer.group.max.size`. I feel incremental rebalancing
> > > (you proposed) could be of help here.
> > > When a new cap is enforced, leader should be notified. If the current
> > > group size is already over limit, leader
> > > shall trigger a trivial rebalance to shuffle some topic partitions and
> > let
> > > a subset of consumers prepare the ownership
> > > transition. Until they are ready, we trigger a real rebalance to remove
> > > over-provisioned consumers. It is pretty much
> > > equivalent to `how do we scale down the consumer group without
> > > interrupting the current processing`.
> > >
> > > I personally feel inclined to 2 because we could kill two birds with
> one
> > > stone in a generic way. What do you think?
> > >
> > > Boyang
> > > ________________________________
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > Sent: Monday, December 3, 2018 8:35 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hi Jason,
> > >
> > > > 2. Do you think we should make this a dynamic config?
> > > I'm not sure. Looking at the config from the perspective of a
> > prescriptive
> > > config, we may get away with not updating it dynamically.
> > > But in my opinion, it always makes sense to have a config be
> dynamically
> > > configurable. As long as we limit it to being a cluster-wide config, we
> > > should be fine.
> > >
> > > > 1. I think it would be helpful to clarify the details on how the
> > > coordinator will shrink the group. It will need to choose which members
> > to
> > > remove. Are we going to give current members an opportunity to commit
> > > offsets before kicking them from the group?
> > >
> > > This turns out to be somewhat tricky. I think that we may not be able
> to
> > > guarantee that consumers don't process a message twice.
> > > My initial approach was to do as much as we could to let consumers
> commit
> > > offsets.
> > >
> > > I was thinking that we mark a group to be shrunk, we could keep a map
> of
> > > consumer_id->boolean indicating whether they have committed offsets. I
> > then
> > > thought we could delay the rebalance until every consumer commits (or
> > some
> > > time passes).
> > > In the meantime, we would block all incoming fetch calls (by either
> > > returning empty records or a retriable error) and we would continue to
> > > accept offset commits (even twice for a single consumer)
> > >
> > > I see two problems with this approach:
> > > * We have async offset commits, which implies that we can receive fetch
> > > requests before the offset commit req has been handled. i.e consmer
> sends
> > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> > > broker. Meaning we could have saved the offsets for B but rebalance
> > before
> > > the offsetCommit for the offsets processed in C come in.
> > > * KIP-392 Allow consumers to fetch from closest replica
> > > <
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > > >
> > > would
> > > make it significantly harder to block poll() calls on consumers whose
> > > groups are being shrunk. Even if we implemented a solution, the same
> race
> > > condition noted above seems to apply and probably others
> > >
> > >
> > > Given those constraints, I think that we can simply mark the group as
> > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > group.max.session.timeout.ms`. That's a bit long by default (5
> minutes)
> > > but
> > > I can't seem to come up with a better alternative
> > >
> > > I'm interested in hearing your thoughts.
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hey Stanislav,
> > > >
> > > > What do you think about the use case I mentioned in my previous reply
> > > about
> > > > > a more resilient self-service Kafka? I believe the benefit there is
> > > > bigger.
> > > >
> > > >
> > > > I see this config as analogous to the open file limit. Probably this
> > > limit
> > > > was intended to be prescriptive at some point about what was deemed a
> > > > reasonable number of open files for an application. But mostly people
> > > treat
> > > > it as an annoyance which they have to work around. If it happens to
> be
> > > hit,
> > > > usually you just increase it because it is not tied to an actual
> > resource
> > > > constraint. However, occasionally hitting the limit does indicate an
> > > > application bug such as a leak, so I wouldn't say it is useless.
> > > Similarly,
> > > > the issue in KAFKA-7610 was a consumer leak and having this limit
> would
> > > > have allowed the problem to be detected before it impacted the
> cluster.
> > > To
> > > > me, that's the main benefit. It's possible that it could be used
> > > > prescriptively to prevent poor usage of groups, but like the open
> file
> > > > limit, I suspect administrators will just set it large enough that
> > users
> > > > are unlikely to complain.
> > > >
> > > > Anyway, just a couple additional questions:
> > > >
> > > > 1. I think it would be helpful to clarify the details on how the
> > > > coordinator will shrink the group. It will need to choose which
> members
> > > to
> > > > remove. Are we going to give current members an opportunity to commit
> > > > offsets before kicking them from the group?
> > > >
> > > > 2. Do you think we should make this a dynamic config?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > > stanislav@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > You raise some very valid points.
> > > > >
> > > > > > The benefit of this KIP is probably limited to preventing
> "runaway"
> > > > > consumer groups due to leaks or some other application bug
> > > > > What do you think about the use case I mentioned in my previous
> reply
> > > > about
> > > > > a more resilient self-service Kafka? I believe the benefit there is
> > > > bigger
> > > > >
> > > > > * Default value
> > > > > You're right, we probably do need to be conservative. Big consumer
> > > groups
> > > > > are considered an anti-pattern and my goal was to also hint at this
> > > > through
> > > > > the config's default. Regardless, it is better to not have the
> > > potential
> > > > to
> > > > > break applications with an upgrade.
> > > > > Choosing between the default of something big like 5000 or an
> opt-in
> > > > > option, I think we should go with the *disabled default option*
> > (-1).
> > > > > The only benefit we would get from a big default of 5000 is default
> > > > > protection against buggy/malicious applications that hit the
> > KAFKA-7610
> > > > > issue.
> > > > > While this KIP was spawned from that issue, I believe its value is
> > > > enabling
> > > > > the possibility of protection and helping move towards a more
> > > > self-service
> > > > > Kafka. I also think that a default value of 5000 might be
> misleading
> > to
> > > > > users and lead them to think that big consumer groups (> 250) are a
> > > good
> > > > > thing.
> > > > >
> > > > > The good news is that KAFKA-7610 should be fully resolved and the
> > > > rebalance
> > > > > protocol should, in general, be more solid after the planned
> > > improvements
> > > > > in KIP-345 and KIP-394.
> > > > >
> > > > > * Handling bigger groups during upgrade
> > > > > I now see that we store the state of consumer groups in the log and
> > > why a
> > > > > rebalance isn't expected during a rolling upgrade.
> > > > > Since we're going with the default value of the max.size being
> > > disabled,
> > > > I
> > > > > believe we can afford to be more strict here.
> > > > > During state reloading of a new Coordinator with a defined
> > > max.group.size
> > > > > config, I believe we should *force* rebalances for groups that
> exceed
> > > the
> > > > > configured size. Then, only some consumers will be able to join and
> > the
> > > > max
> > > > > size invariant will be satisfied.
> > > > >
> > > > > I updated the KIP with a migration plan, rejected alternatives and
> > the
> > > > new
> > > > > default value.
> > > > >
> > > > > Thanks,
> > > > > Stanislav
> > > > >
> > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Stanislav,
> > > > > >
> > > > > > Clients will then find that coordinator
> > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > since
> > > > the
> > > > > > > cache of active consumers is not stored outside the
> Coordinator's
> > > > > memory.
> > > > > > > (please do say if that is incorrect)
> > > > > >
> > > > > >
> > > > > > Groups do not typically rebalance after a coordinator change. You
> > > could
> > > > > > potentially force a rebalance if the group is too big and kick
> out
> > > the
> > > > > > slowest members or something. A more graceful solution is
> probably
> > to
> > > > > just
> > > > > > accept the current size and prevent it from getting bigger. We
> > could
> > > > log
> > > > > a
> > > > > > warning potentially.
> > > > > >
> > > > > > My thinking is that we should abstract away from conserving
> > resources
> > > > and
> > > > > > > focus on giving control to the broker. The issue that spawned
> > this
> > > > KIP
> > > > > > was
> > > > > > > a memory problem but I feel this change is useful in a more
> > general
> > > > > way.
> > > > > >
> > > > > >
> > > > > > So you probably already know why I'm asking about this. For
> > consumer
> > > > > groups
> > > > > > anyway, resource usage would typically be proportional to the
> > number
> > > of
> > > > > > partitions that a group is reading from and not the number of
> > > members.
> > > > > For
> > > > > > example, consider the memory use in the offsets cache. The
> benefit
> > of
> > > > > this
> > > > > > KIP is probably limited to preventing "runaway" consumer groups
> due
> > > to
> > > > > > leaks or some other application bug. That still seems useful
> > though.
> > > > > >
> > > > > > I completely agree with this and I *ask everybody to chime in
> with
> > > > > opinions
> > > > > > > on a sensible default value*.
> > > > > >
> > > > > >
> > > > > > I think we would have to be very conservative. The group protocol
> > is
> > > > > > generic in some sense, so there may be use cases we don't know of
> > > where
> > > > > > larger groups are reasonable. Probably we should make this an
> > opt-in
> > > > > > feature so that we do not risk breaking anyone's application
> after
> > an
> > > > > > upgrade. Either that, or use a very high default like 5,000.
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > > > stanislav@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Jason and Boyang, those were important comments
> > > > > > >
> > > > > > > > One suggestion I have is that it would be helpful to put your
> > > > > reasoning
> > > > > > > on deciding the current default value. For example, in certain
> > use
> > > > > cases
> > > > > > at
> > > > > > > Pinterest we are very likely to have more consumers than 250
> when
> > > we
> > > > > > > configure 8 stream instances with 32 threads.
> > > > > > > > For the effectiveness of this KIP, we should encourage people
> > to
> > > > > > discuss
> > > > > > > their opinions on the default setting and ideally reach a
> > > consensus.
> > > > > > >
> > > > > > > I completely agree with this and I *ask everybody to chime in
> > with
> > > > > > opinions
> > > > > > > on a sensible default value*.
> > > > > > > My thought process was that in the current model rebalances in
> > > large
> > > > > > groups
> > > > > > > are more costly. I imagine most use cases in most Kafka users
> do
> > > not
> > > > > > > require more than 250 consumers.
> > > > > > > Boyang, you say that you are "likely to have... when we..." -
> do
> > > you
> > > > > have
> > > > > > > systems running with so many consumers in a group or are you
> > > planning
> > > > > > to? I
> > > > > > > guess what I'm asking is whether this has been tested in
> > production
> > > > > with
> > > > > > > the current rebalance model (ignoring KIP-345)
> > > > > > >
> > > > > > > >  Can you clarify the compatibility impact here? What
> > > > > > > > will happen to groups that are already larger than the max
> > size?
> > > > > > > This is a very important question.
> > > > > > > From my current understanding, when a coordinator broker gets
> > shut
> > > > > > > down during a cluster rolling upgrade, a replica will take
> > > leadership
> > > > > of
> > > > > > > the `__offset_commits` partition. Clients will then find that
> > > > > coordinator
> > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > since
> > > > the
> > > > > > > cache of active consumers is not stored outside the
> Coordinator's
> > > > > memory.
> > > > > > > (please do say if that is incorrect)
> > > > > > > Then, I believe that working as if this is a new group is a
> > > > reasonable
> > > > > > > approach. Namely, fail joinGroups when the max.size is
> exceeded.
> > > > > > > What do you guys think about this? (I'll update the KIP after
> we
> > > > settle
> > > > > > on
> > > > > > > a solution)
> > > > > > >
> > > > > > > >  Also, just to be clear, the resource we are trying to
> conserve
> > > > here
> > > > > is
> > > > > > > what? Memory?
> > > > > > > My thinking is that we should abstract away from conserving
> > > resources
> > > > > and
> > > > > > > focus on giving control to the broker. The issue that spawned
> > this
> > > > KIP
> > > > > > was
> > > > > > > a memory problem but I feel this change is useful in a more
> > general
> > > > > way.
> > > > > > It
> > > > > > > limits the control clients have on the cluster and helps Kafka
> > > > become a
> > > > > > > more self-serving system. Admin/Ops teams can better control
> the
> > > > impact
> > > > > > > application developers can have on a Kafka cluster with this
> > change
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > jason@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Stanislav,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Can you clarify the compatibility impact
> > > here?
> > > > > What
> > > > > > > > will happen to groups that are already larger than the max
> > size?
> > > > > Also,
> > > > > > > just
> > > > > > > > to be clear, the resource we are trying to conserve here is
> > what?
> > > > > > Memory?
> > > > > > > >
> > > > > > > > -Jason
> > > > > > > >
> > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > bchen11@outlook.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Stanislav for the update! One suggestion I have is
> > that
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > > helpful to put your
> > > > > > > > >
> > > > > > > > > reasoning on deciding the current default value. For
> example,
> > > in
> > > > > > > certain
> > > > > > > > > use cases at Pinterest we are very likely
> > > > > > > > >
> > > > > > > > > to have more consumers than 250 when we configure 8 stream
> > > > > instances
> > > > > > > with
> > > > > > > > > 32 threads.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > For the effectiveness of this KIP, we should encourage
> people
> > > to
> > > > > > > discuss
> > > > > > > > > their opinions on the default setting and ideally reach a
> > > > > consensus.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Boyang
> > > > > > > > >
> > > > > > > > > ________________________________
> > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> cap
> > > > > member
> > > > > > > > > metadata growth
> > > > > > > > >
> > > > > > > > > Hey everybody,
> > > > > > > > >
> > > > > > > > > It's been a week since this KIP and not much discussion has
> > > been
> > > > > > made.
> > > > > > > > > I assume that this is a straight forward change and I will
> > > open a
> > > > > > > voting
> > > > > > > > > thread in the next couple of days if nobody has anything to
> > > > > suggest.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > > > stanislav@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Greetings everybody,
> > > > > > > > > >
> > > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> > > section
> > > > > and
> > > > > > > also
> > > > > > > > > > renamed it.
> > > > > > > > > > KIP:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > > > > > > > > >
> > > > > > > > > > I'm looking forward to discussions around it.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > > > stanislav@confluent.io> wrote:
> > > > > > > > > >
> > > > > > > > > >> Hey there everybody,
> > > > > > > > > >>
> > > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
> > effort
> > > > you
> > > > > > are
> > > > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > > > >>
> > > > > > > > > >> @Matt
> > > > > > > > > >> I also believe the default value is high. In my opinion,
> > we
> > > > > should
> > > > > > > aim
> > > > > > > > > to
> > > > > > > > > >> a default cap around 250. This is because in the current
> > > model
> > > > > any
> > > > > > > > > consumer
> > > > > > > > > >> rebalance is disrupting to every consumer. The bigger
> the
> > > > group,
> > > > > > the
> > > > > > > > > longer
> > > > > > > > > >> this period of disruption.
> > > > > > > > > >>
> > > > > > > > > >> If you have such a large consumer group, chances are
> that
> > > your
> > > > > > > > > >> client-side logic could be structured better and that
> you
> > > are
> > > > > not
> > > > > > > > using
> > > > > > > > > the
> > > > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > > > >> 250 can still be considered of a high upper bound, I
> > believe
> > > > in
> > > > > > > > practice
> > > > > > > > > >> users should aim to not go over 100 consumers per
> consumer
> > > > > group.
> > > > > > > > > >>
> > > > > > > > > >> In regards to the cap being global/per-broker, I think
> > that
> > > we
> > > > > > > should
> > > > > > > > > >> consider whether we want it to be global or *per-topic*.
> > For
> > > > the
> > > > > > > time
> > > > > > > > > >> being, I believe that having it per-topic with a global
> > > > default
> > > > > > > might
> > > > > > > > be
> > > > > > > > > >> the best situation. Having it global only seems a bit
> > > > > restricting
> > > > > > to
> > > > > > > > me
> > > > > > > > > and
> > > > > > > > > >> it never hurts to support more fine-grained
> > configurability
> > > > > (given
> > > > > > > > it's
> > > > > > > > > the
> > > > > > > > > >> same config, not a new one being introduced).
> > > > > > > > > >>
> > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > > > bchen11@outlook.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > > > suggestion
> > > > > > to
> > > > > > > > > >>> change the default value. Meanwhile I just want to
> point
> > > out
> > > > > that
> > > > > > > > this
> > > > > > > > > >>> value is a just last line of defense, not a real
> scenario
> > > we
> > > > > > would
> > > > > > > > > expect.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he
> would
> > > be
> > > > > > > driving
> > > > > > > > > the
> > > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea in
> > the
> > > > > first
> > > > > > > > place
> > > > > > > > > and
> > > > > > > > > >>> had already come up a draft design, while I will keep
> > > > focusing
> > > > > on
> > > > > > > > > KIP-345
> > > > > > > > > >>> effort to ensure solving the edge case described in the
> > > JIRA<
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > > > > > > > > >.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> Boyang
> > > > > > > > > >>>
> > > > > > > > > >>> ________________________________
> > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > > > >>> To: dev@kafka.apache.org
> > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size
> to
> > > cap
> > > > > > > member
> > > > > > > > > >>> metadata growth
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks for the KIP.
> > > > > > > > > >>>
> > > > > > > > > >>> Will this cap be a global cap across the entire cluster
> > or
> > > > per
> > > > > > > > broker?
> > > > > > > > > >>>
> > > > > > > > > >>> Either way the default value seems a bit high to me,
> but
> > > that
> > > > > > could
> > > > > > > > > just
> > > > > > > > > >>> be
> > > > > > > > > >>> from my own usage patterns. I'd have probably started
> > with
> > > > 500
> > > > > or
> > > > > > > 1k
> > > > > > > > > but
> > > > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks,
> > > > > > > > > >>> Matt
> > > > > > > > > >>>
> > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > > > bchen11@outlook.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > > >>> > Hey folks,
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > This is a pretty simple change to cap the consumer
> > group
> > > > size
> > > > > > for
> > > > > > > > > >>> broker
> > > > > > > > > >>> > stability. Give me your valuable feedback when you
> got
> > > > time.
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > Thank you!
> > > > > > > > > >>> >
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Best,
> > > > > > > > > >> Stanislav
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav
>

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Stanislav,

Just to clarify, I think what you're suggesting is something like this in
order to gracefully shrink the group:

1. Transition the group to PREPARING_REBALANCE. No members are kicked out.
2. Continue to allow offset commits and heartbeats for all current members.
3. Allow the first n members that send JoinGroup to stay in the group, but
wait for the JoinGroup (or session timeout) from all active members before
finishing the rebalance.

So basically we try to give the current members an opportunity to finish
work, but we prevent some of them from rejoining after the rebalance
completes. It sounds reasonable if I've understood correctly.

Thanks,
Jason



On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bc...@outlook.com> wrote:

> Yep, LGTM on my side. Thanks Stanislav!
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Friday, December 7, 2018 8:51 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hi,
>
> We discussed this offline with Boyang and figured that it's best to not
> wait on the Cooperative Rebalancing proposal. Our thinking is that we can
> just force a rebalance from the broker, allowing consumers to commit
> offsets if their rebalanceListener is configured correctly.
> When rebalancing improvements are implemented, we assume that they would
> improve KIP-389's behavior as well as the normal rebalance scenarios
>
> On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com> wrote:
>
> > Hey Stanislav,
> >
> > thanks for the question! `Trivial rebalance` means "we don't start
> > reassignment right now, but you need to know it's coming soon
> > and you should start preparation".
> >
> > An example KStream use case is that before actually starting to shrink
> the
> > consumer group, we need to
> > 1. partition the consumer group into two subgroups, where one will be
> > offline soon and the other will keep serving;
> > 2. make sure the states associated with near-future offline consumers are
> > successfully replicated on the serving ones.
> >
> > As I have mentioned shrinking the consumer group is pretty much
> equivalent
> > to group scaling down, so we could think of this
> > as an add-on use case for cluster scaling. So my understanding is that
> the
> > KIP-389 could be sequenced within our cooperative rebalancing<
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=N%2BVJsEYYwTx6k0uz8%2BKvL9tt3jLECokyAA%2B2mWyyOyA%3D&amp;reserved=0
> > >
> > proposal.
> >
> > Let me know if this makes sense.
> >
> > Best,
> > Boyang
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Wednesday, December 5, 2018 5:52 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hey Boyang,
> >
> > I think we still need to take care of group shrinkage because even if
> users
> > change the config value we cannot guarantee that all consumer groups
> would
> > have been manually shrunk.
> >
> > Regarding 2., I agree that forcefully triggering a rebalance might be the
> > most intuitive way to handle the situation.
> > What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
> > term.
> > I was thinking that maybe we could force a rebalance, which would cause
> > consumers to commit their offsets (given their rebalanceListener is
> > configured correctly) and subsequently reject some of the incoming
> > `joinGroup` requests. Does that sound like it would work?
> >
> > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com> wrote:
> >
> > > Hey Stanislav,
> > >
> > > I read the latest KIP and saw that we already changed the default value
> > to
> > > -1. Do
> > > we still need to take care of the consumer group shrinking when doing
> the
> > > upgrade?
> > >
> > > However this is an interesting topic that worth discussing. Although
> > > rolling
> > > upgrade is fine, `consumer.group.max.size` could always have conflict
> > with
> > > the current
> > > consumer group size which means we need to adhere to one source of
> truth.
> > >
> > > 1.Choose the current group size, which means we never interrupt the
> > > consumer group until
> > > it transits to PREPARE_REBALANCE. And we keep track of how many join
> > group
> > > requests
> > > we have seen so far during PREPARE_REBALANCE. After reaching the
> consumer
> > > cap,
> > > we start to inform over provisioned consumers that you should send
> > > LeaveGroupRequest and
> > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could mark
> > > extra members
> > > as hot backup and rebalance without them.
> > >
> > > 2.Choose the `consumer.group.max.size`. I feel incremental rebalancing
> > > (you proposed) could be of help here.
> > > When a new cap is enforced, leader should be notified. If the current
> > > group size is already over limit, leader
> > > shall trigger a trivial rebalance to shuffle some topic partitions and
> > let
> > > a subset of consumers prepare the ownership
> > > transition. Until they are ready, we trigger a real rebalance to remove
> > > over-provisioned consumers. It is pretty much
> > > equivalent to `how do we scale down the consumer group without
> > > interrupting the current processing`.
> > >
> > > I personally feel inclined to 2 because we could kill two birds with
> one
> > > stone in a generic way. What do you think?
> > >
> > > Boyang
> > > ________________________________
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > Sent: Monday, December 3, 2018 8:35 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hi Jason,
> > >
> > > > 2. Do you think we should make this a dynamic config?
> > > I'm not sure. Looking at the config from the perspective of a
> > prescriptive
> > > config, we may get away with not updating it dynamically.
> > > But in my opinion, it always makes sense to have a config be
> dynamically
> > > configurable. As long as we limit it to being a cluster-wide config, we
> > > should be fine.
> > >
> > > > 1. I think it would be helpful to clarify the details on how the
> > > coordinator will shrink the group. It will need to choose which members
> > to
> > > remove. Are we going to give current members an opportunity to commit
> > > offsets before kicking them from the group?
> > >
> > > This turns out to be somewhat tricky. I think that we may not be able
> to
> > > guarantee that consumers don't process a message twice.
> > > My initial approach was to do as much as we could to let consumers
> commit
> > > offsets.
> > >
> > > I was thinking that we mark a group to be shrunk, we could keep a map
> of
> > > consumer_id->boolean indicating whether they have committed offsets. I
> > then
> > > thought we could delay the rebalance until every consumer commits (or
> > some
> > > time passes).
> > > In the meantime, we would block all incoming fetch calls (by either
> > > returning empty records or a retriable error) and we would continue to
> > > accept offset commits (even twice for a single consumer)
> > >
> > > I see two problems with this approach:
> > > * We have async offset commits, which implies that we can receive fetch
> > > requests before the offset commit req has been handled. i.e consmer
> sends
> > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> > > broker. Meaning we could have saved the offsets for B but rebalance
> > before
> > > the offsetCommit for the offsets processed in C come in.
> > > * KIP-392 Allow consumers to fetch from closest replica
> > > <
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=a2G0cWk7Saia9OEz4UfxvKBQtzP25Zi5cCb5jWx9mZY%3D&amp;reserved=0
> > > >
> > > would
> > > make it significantly harder to block poll() calls on consumers whose
> > > groups are being shrunk. Even if we implemented a solution, the same
> race
> > > condition noted above seems to apply and probably others
> > >
> > >
> > > Given those constraints, I think that we can simply mark the group as
> > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > group.max.session.timeout.ms`. That's a bit long by default (5
> minutes)
> > > but
> > > I can't seem to come up with a better alternative
> > >
> > > I'm interested in hearing your thoughts.
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hey Stanislav,
> > > >
> > > > What do you think about the use case I mentioned in my previous reply
> > > about
> > > > > a more resilient self-service Kafka? I believe the benefit there is
> > > > bigger.
> > > >
> > > >
> > > > I see this config as analogous to the open file limit. Probably this
> > > limit
> > > > was intended to be prescriptive at some point about what was deemed a
> > > > reasonable number of open files for an application. But mostly people
> > > treat
> > > > it as an annoyance which they have to work around. If it happens to
> be
> > > hit,
> > > > usually you just increase it because it is not tied to an actual
> > resource
> > > > constraint. However, occasionally hitting the limit does indicate an
> > > > application bug such as a leak, so I wouldn't say it is useless.
> > > Similarly,
> > > > the issue in KAFKA-7610 was a consumer leak and having this limit
> would
> > > > have allowed the problem to be detected before it impacted the
> cluster.
> > > To
> > > > me, that's the main benefit. It's possible that it could be used
> > > > prescriptively to prevent poor usage of groups, but like the open
> file
> > > > limit, I suspect administrators will just set it large enough that
> > users
> > > > are unlikely to complain.
> > > >
> > > > Anyway, just a couple additional questions:
> > > >
> > > > 1. I think it would be helpful to clarify the details on how the
> > > > coordinator will shrink the group. It will need to choose which
> members
> > > to
> > > > remove. Are we going to give current members an opportunity to commit
> > > > offsets before kicking them from the group?
> > > >
> > > > 2. Do you think we should make this a dynamic config?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > > stanislav@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > You raise some very valid points.
> > > > >
> > > > > > The benefit of this KIP is probably limited to preventing
> "runaway"
> > > > > consumer groups due to leaks or some other application bug
> > > > > What do you think about the use case I mentioned in my previous
> reply
> > > > about
> > > > > a more resilient self-service Kafka? I believe the benefit there is
> > > > bigger
> > > > >
> > > > > * Default value
> > > > > You're right, we probably do need to be conservative. Big consumer
> > > groups
> > > > > are considered an anti-pattern and my goal was to also hint at this
> > > > through
> > > > > the config's default. Regardless, it is better to not have the
> > > potential
> > > > to
> > > > > break applications with an upgrade.
> > > > > Choosing between the default of something big like 5000 or an
> opt-in
> > > > > option, I think we should go with the *disabled default option*
> > (-1).
> > > > > The only benefit we would get from a big default of 5000 is default
> > > > > protection against buggy/malicious applications that hit the
> > KAFKA-7610
> > > > > issue.
> > > > > While this KIP was spawned from that issue, I believe its value is
> > > > enabling
> > > > > the possibility of protection and helping move towards a more
> > > > self-service
> > > > > Kafka. I also think that a default value of 5000 might be
> misleading
> > to
> > > > > users and lead them to think that big consumer groups (> 250) are a
> > > good
> > > > > thing.
> > > > >
> > > > > The good news is that KAFKA-7610 should be fully resolved and the
> > > > rebalance
> > > > > protocol should, in general, be more solid after the planned
> > > improvements
> > > > > in KIP-345 and KIP-394.
> > > > >
> > > > > * Handling bigger groups during upgrade
> > > > > I now see that we store the state of consumer groups in the log and
> > > why a
> > > > > rebalance isn't expected during a rolling upgrade.
> > > > > Since we're going with the default value of the max.size being
> > > disabled,
> > > > I
> > > > > believe we can afford to be more strict here.
> > > > > During state reloading of a new Coordinator with a defined
> > > max.group.size
> > > > > config, I believe we should *force* rebalances for groups that
> exceed
> > > the
> > > > > configured size. Then, only some consumers will be able to join and
> > the
> > > > max
> > > > > size invariant will be satisfied.
> > > > >
> > > > > I updated the KIP with a migration plan, rejected alternatives and
> > the
> > > > new
> > > > > default value.
> > > > >
> > > > > Thanks,
> > > > > Stanislav
> > > > >
> > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Stanislav,
> > > > > >
> > > > > > Clients will then find that coordinator
> > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > since
> > > > the
> > > > > > > cache of active consumers is not stored outside the
> Coordinator's
> > > > > memory.
> > > > > > > (please do say if that is incorrect)
> > > > > >
> > > > > >
> > > > > > Groups do not typically rebalance after a coordinator change. You
> > > could
> > > > > > potentially force a rebalance if the group is too big and kick
> out
> > > the
> > > > > > slowest members or something. A more graceful solution is
> probably
> > to
> > > > > just
> > > > > > accept the current size and prevent it from getting bigger. We
> > could
> > > > log
> > > > > a
> > > > > > warning potentially.
> > > > > >
> > > > > > My thinking is that we should abstract away from conserving
> > resources
> > > > and
> > > > > > > focus on giving control to the broker. The issue that spawned
> > this
> > > > KIP
> > > > > > was
> > > > > > > a memory problem but I feel this change is useful in a more
> > general
> > > > > way.
> > > > > >
> > > > > >
> > > > > > So you probably already know why I'm asking about this. For
> > consumer
> > > > > groups
> > > > > > anyway, resource usage would typically be proportional to the
> > number
> > > of
> > > > > > partitions that a group is reading from and not the number of
> > > members.
> > > > > For
> > > > > > example, consider the memory use in the offsets cache. The
> benefit
> > of
> > > > > this
> > > > > > KIP is probably limited to preventing "runaway" consumer groups
> due
> > > to
> > > > > > leaks or some other application bug. That still seems useful
> > though.
> > > > > >
> > > > > > I completely agree with this and I *ask everybody to chime in
> with
> > > > > opinions
> > > > > > > on a sensible default value*.
> > > > > >
> > > > > >
> > > > > > I think we would have to be very conservative. The group protocol
> > is
> > > > > > generic in some sense, so there may be use cases we don't know of
> > > where
> > > > > > larger groups are reasonable. Probably we should make this an
> > opt-in
> > > > > > feature so that we do not risk breaking anyone's application
> after
> > an
> > > > > > upgrade. Either that, or use a very high default like 5,000.
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > > > stanislav@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Jason and Boyang, those were important comments
> > > > > > >
> > > > > > > > One suggestion I have is that it would be helpful to put your
> > > > > reasoning
> > > > > > > on deciding the current default value. For example, in certain
> > use
> > > > > cases
> > > > > > at
> > > > > > > Pinterest we are very likely to have more consumers than 250
> when
> > > we
> > > > > > > configure 8 stream instances with 32 threads.
> > > > > > > > For the effectiveness of this KIP, we should encourage people
> > to
> > > > > > discuss
> > > > > > > their opinions on the default setting and ideally reach a
> > > consensus.
> > > > > > >
> > > > > > > I completely agree with this and I *ask everybody to chime in
> > with
> > > > > > opinions
> > > > > > > on a sensible default value*.
> > > > > > > My thought process was that in the current model rebalances in
> > > large
> > > > > > groups
> > > > > > > are more costly. I imagine most use cases in most Kafka users
> do
> > > not
> > > > > > > require more than 250 consumers.
> > > > > > > Boyang, you say that you are "likely to have... when we..." -
> do
> > > you
> > > > > have
> > > > > > > systems running with so many consumers in a group or are you
> > > planning
> > > > > > to? I
> > > > > > > guess what I'm asking is whether this has been tested in
> > production
> > > > > with
> > > > > > > the current rebalance model (ignoring KIP-345)
> > > > > > >
> > > > > > > >  Can you clarify the compatibility impact here? What
> > > > > > > > will happen to groups that are already larger than the max
> > size?
> > > > > > > This is a very important question.
> > > > > > > From my current understanding, when a coordinator broker gets
> > shut
> > > > > > > down during a cluster rolling upgrade, a replica will take
> > > leadership
> > > > > of
> > > > > > > the `__offset_commits` partition. Clients will then find that
> > > > > coordinator
> > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > since
> > > > the
> > > > > > > cache of active consumers is not stored outside the
> Coordinator's
> > > > > memory.
> > > > > > > (please do say if that is incorrect)
> > > > > > > Then, I believe that working as if this is a new group is a
> > > > reasonable
> > > > > > > approach. Namely, fail joinGroups when the max.size is
> exceeded.
> > > > > > > What do you guys think about this? (I'll update the KIP after
> we
> > > > settle
> > > > > > on
> > > > > > > a solution)
> > > > > > >
> > > > > > > >  Also, just to be clear, the resource we are trying to
> conserve
> > > > here
> > > > > is
> > > > > > > what? Memory?
> > > > > > > My thinking is that we should abstract away from conserving
> > > resources
> > > > > and
> > > > > > > focus on giving control to the broker. The issue that spawned
> > this
> > > > KIP
> > > > > > was
> > > > > > > a memory problem but I feel this change is useful in a more
> > general
> > > > > way.
> > > > > > It
> > > > > > > limits the control clients have on the cluster and helps Kafka
> > > > become a
> > > > > > > more self-serving system. Admin/Ops teams can better control
> the
> > > > impact
> > > > > > > application developers can have on a Kafka cluster with this
> > change
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > jason@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Stanislav,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Can you clarify the compatibility impact
> > > here?
> > > > > What
> > > > > > > > will happen to groups that are already larger than the max
> > size?
> > > > > Also,
> > > > > > > just
> > > > > > > > to be clear, the resource we are trying to conserve here is
> > what?
> > > > > > Memory?
> > > > > > > >
> > > > > > > > -Jason
> > > > > > > >
> > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > bchen11@outlook.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Stanislav for the update! One suggestion I have is
> > that
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > > helpful to put your
> > > > > > > > >
> > > > > > > > > reasoning on deciding the current default value. For
> example,
> > > in
> > > > > > > certain
> > > > > > > > > use cases at Pinterest we are very likely
> > > > > > > > >
> > > > > > > > > to have more consumers than 250 when we configure 8 stream
> > > > > instances
> > > > > > > with
> > > > > > > > > 32 threads.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > For the effectiveness of this KIP, we should encourage
> people
> > > to
> > > > > > > discuss
> > > > > > > > > their opinions on the default setting and ideally reach a
> > > > > consensus.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Boyang
> > > > > > > > >
> > > > > > > > > ________________________________
> > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> cap
> > > > > member
> > > > > > > > > metadata growth
> > > > > > > > >
> > > > > > > > > Hey everybody,
> > > > > > > > >
> > > > > > > > > It's been a week since this KIP and not much discussion has
> > > been
> > > > > > made.
> > > > > > > > > I assume that this is a straight forward change and I will
> > > open a
> > > > > > > voting
> > > > > > > > > thread in the next couple of days if nobody has anything to
> > > > > suggest.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > > > stanislav@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Greetings everybody,
> > > > > > > > > >
> > > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> > > section
> > > > > and
> > > > > > > also
> > > > > > > > > > renamed it.
> > > > > > > > > > KIP:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=1HOujECau6m8AoYt8OMUbpawSjwHg1Z3CxJQMSQYk6A%3D&amp;reserved=0
> > > > > > > > > >
> > > > > > > > > > I'm looking forward to discussions around it.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > > > stanislav@confluent.io> wrote:
> > > > > > > > > >
> > > > > > > > > >> Hey there everybody,
> > > > > > > > > >>
> > > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
> > effort
> > > > you
> > > > > > are
> > > > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > > > >>
> > > > > > > > > >> @Matt
> > > > > > > > > >> I also believe the default value is high. In my opinion,
> > we
> > > > > should
> > > > > > > aim
> > > > > > > > > to
> > > > > > > > > >> a default cap around 250. This is because in the current
> > > model
> > > > > any
> > > > > > > > > consumer
> > > > > > > > > >> rebalance is disrupting to every consumer. The bigger
> the
> > > > group,
> > > > > > the
> > > > > > > > > longer
> > > > > > > > > >> this period of disruption.
> > > > > > > > > >>
> > > > > > > > > >> If you have such a large consumer group, chances are
> that
> > > your
> > > > > > > > > >> client-side logic could be structured better and that
> you
> > > are
> > > > > not
> > > > > > > > using
> > > > > > > > > the
> > > > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > > > >> 250 can still be considered of a high upper bound, I
> > believe
> > > > in
> > > > > > > > practice
> > > > > > > > > >> users should aim to not go over 100 consumers per
> consumer
> > > > > group.
> > > > > > > > > >>
> > > > > > > > > >> In regards to the cap being global/per-broker, I think
> > that
> > > we
> > > > > > > should
> > > > > > > > > >> consider whether we want it to be global or *per-topic*.
> > For
> > > > the
> > > > > > > time
> > > > > > > > > >> being, I believe that having it per-topic with a global
> > > > default
> > > > > > > might
> > > > > > > > be
> > > > > > > > > >> the best situation. Having it global only seems a bit
> > > > > restricting
> > > > > > to
> > > > > > > > me
> > > > > > > > > and
> > > > > > > > > >> it never hurts to support more fine-grained
> > configurability
> > > > > (given
> > > > > > > > it's
> > > > > > > > > the
> > > > > > > > > >> same config, not a new one being introduced).
> > > > > > > > > >>
> > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > > > bchen11@outlook.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > > > suggestion
> > > > > > to
> > > > > > > > > >>> change the default value. Meanwhile I just want to
> point
> > > out
> > > > > that
> > > > > > > > this
> > > > > > > > > >>> value is a just last line of defense, not a real
> scenario
> > > we
> > > > > > would
> > > > > > > > > expect.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he
> would
> > > be
> > > > > > > driving
> > > > > > > > > the
> > > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea in
> > the
> > > > > first
> > > > > > > > place
> > > > > > > > > and
> > > > > > > > > >>> had already come up a draft design, while I will keep
> > > > focusing
> > > > > on
> > > > > > > > > KIP-345
> > > > > > > > > >>> effort to ensure solving the edge case described in the
> > > JIRA<
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=QVt3jtDN2dPQ4xZOTAyJGszCXiKYwXHcmbsxmcpph2w%3D&amp;reserved=0
> > > > > > > > > >.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> Boyang
> > > > > > > > > >>>
> > > > > > > > > >>> ________________________________
> > > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > > > >>> To: dev@kafka.apache.org
> > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size
> to
> > > cap
> > > > > > > member
> > > > > > > > > >>> metadata growth
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks for the KIP.
> > > > > > > > > >>>
> > > > > > > > > >>> Will this cap be a global cap across the entire cluster
> > or
> > > > per
> > > > > > > > broker?
> > > > > > > > > >>>
> > > > > > > > > >>> Either way the default value seems a bit high to me,
> but
> > > that
> > > > > > could
> > > > > > > > > just
> > > > > > > > > >>> be
> > > > > > > > > >>> from my own usage patterns. I'd have probably started
> > with
> > > > 500
> > > > > or
> > > > > > > 1k
> > > > > > > > > but
> > > > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks,
> > > > > > > > > >>> Matt
> > > > > > > > > >>>
> > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > > > bchen11@outlook.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > > >>> > Hey folks,
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=FcuCA6ckiid0dsf41upRumrID8r7BGYS7lx1OItHT88%3D&amp;reserved=0
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > This is a pretty simple change to cap the consumer
> > group
> > > > size
> > > > > > for
> > > > > > > > > >>> broker
> > > > > > > > > >>> > stability. Give me your valuable feedback when you
> got
> > > > time.
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > Thank you!
> > > > > > > > > >>> >
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Best,
> > > > > > > > > >> Stanislav
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav
>

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Yep, LGTM on my side. Thanks Stanislav!
________________________________
From: Stanislav Kozlovski <st...@confluent.io>
Sent: Friday, December 7, 2018 8:51 PM
To: dev@kafka.apache.org
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Hi,

We discussed this offline with Boyang and figured that it's best to not
wait on the Cooperative Rebalancing proposal. Our thinking is that we can
just force a rebalance from the broker, allowing consumers to commit
offsets if their rebalanceListener is configured correctly.
When rebalancing improvements are implemented, we assume that they would
improve KIP-389's behavior as well as the normal rebalance scenarios

On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com> wrote:

> Hey Stanislav,
>
> thanks for the question! `Trivial rebalance` means "we don't start
> reassignment right now, but you need to know it's coming soon
> and you should start preparation".
>
> An example KStream use case is that before actually starting to shrink the
> consumer group, we need to
> 1. partition the consumer group into two subgroups, where one will be
> offline soon and the other will keep serving;
> 2. make sure the states associated with near-future offline consumers are
> successfully replicated on the serving ones.
>
> As I have mentioned shrinking the consumer group is pretty much equivalent
> to group scaling down, so we could think of this
> as an add-on use case for cluster scaling. So my understanding is that the
> KIP-389 could be sequenced within our cooperative rebalancing<
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=N%2BVJsEYYwTx6k0uz8%2BKvL9tt3jLECokyAA%2B2mWyyOyA%3D&amp;reserved=0
> >
> proposal.
>
> Let me know if this makes sense.
>
> Best,
> Boyang
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Wednesday, December 5, 2018 5:52 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hey Boyang,
>
> I think we still need to take care of group shrinkage because even if users
> change the config value we cannot guarantee that all consumer groups would
> have been manually shrunk.
>
> Regarding 2., I agree that forcefully triggering a rebalance might be the
> most intuitive way to handle the situation.
> What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
> term.
> I was thinking that maybe we could force a rebalance, which would cause
> consumers to commit their offsets (given their rebalanceListener is
> configured correctly) and subsequently reject some of the incoming
> `joinGroup` requests. Does that sound like it would work?
>
> On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com> wrote:
>
> > Hey Stanislav,
> >
> > I read the latest KIP and saw that we already changed the default value
> to
> > -1. Do
> > we still need to take care of the consumer group shrinking when doing the
> > upgrade?
> >
> > However this is an interesting topic that worth discussing. Although
> > rolling
> > upgrade is fine, `consumer.group.max.size` could always have conflict
> with
> > the current
> > consumer group size which means we need to adhere to one source of truth.
> >
> > 1.Choose the current group size, which means we never interrupt the
> > consumer group until
> > it transits to PREPARE_REBALANCE. And we keep track of how many join
> group
> > requests
> > we have seen so far during PREPARE_REBALANCE. After reaching the consumer
> > cap,
> > we start to inform over provisioned consumers that you should send
> > LeaveGroupRequest and
> > fail yourself. Or with what Mayuresh proposed in KIP-345, we could mark
> > extra members
> > as hot backup and rebalance without them.
> >
> > 2.Choose the `consumer.group.max.size`. I feel incremental rebalancing
> > (you proposed) could be of help here.
> > When a new cap is enforced, leader should be notified. If the current
> > group size is already over limit, leader
> > shall trigger a trivial rebalance to shuffle some topic partitions and
> let
> > a subset of consumers prepare the ownership
> > transition. Until they are ready, we trigger a real rebalance to remove
> > over-provisioned consumers. It is pretty much
> > equivalent to `how do we scale down the consumer group without
> > interrupting the current processing`.
> >
> > I personally feel inclined to 2 because we could kill two birds with one
> > stone in a generic way. What do you think?
> >
> > Boyang
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Monday, December 3, 2018 8:35 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hi Jason,
> >
> > > 2. Do you think we should make this a dynamic config?
> > I'm not sure. Looking at the config from the perspective of a
> prescriptive
> > config, we may get away with not updating it dynamically.
> > But in my opinion, it always makes sense to have a config be dynamically
> > configurable. As long as we limit it to being a cluster-wide config, we
> > should be fine.
> >
> > > 1. I think it would be helpful to clarify the details on how the
> > coordinator will shrink the group. It will need to choose which members
> to
> > remove. Are we going to give current members an opportunity to commit
> > offsets before kicking them from the group?
> >
> > This turns out to be somewhat tricky. I think that we may not be able to
> > guarantee that consumers don't process a message twice.
> > My initial approach was to do as much as we could to let consumers commit
> > offsets.
> >
> > I was thinking that we mark a group to be shrunk, we could keep a map of
> > consumer_id->boolean indicating whether they have committed offsets. I
> then
> > thought we could delay the rebalance until every consumer commits (or
> some
> > time passes).
> > In the meantime, we would block all incoming fetch calls (by either
> > returning empty records or a retriable error) and we would continue to
> > accept offset commits (even twice for a single consumer)
> >
> > I see two problems with this approach:
> > * We have async offset commits, which implies that we can receive fetch
> > requests before the offset commit req has been handled. i.e consmer sends
> > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> > broker. Meaning we could have saved the offsets for B but rebalance
> before
> > the offsetCommit for the offsets processed in C come in.
> > * KIP-392 Allow consumers to fetch from closest replica
> > <
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=a2G0cWk7Saia9OEz4UfxvKBQtzP25Zi5cCb5jWx9mZY%3D&amp;reserved=0
> > >
> > would
> > make it significantly harder to block poll() calls on consumers whose
> > groups are being shrunk. Even if we implemented a solution, the same race
> > condition noted above seems to apply and probably others
> >
> >
> > Given those constraints, I think that we can simply mark the group as
> > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> > but
> > I can't seem to come up with a better alternative
> >
> > I'm interested in hearing your thoughts.
> >
> > Thanks,
> > Stanislav
> >
> > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Stanislav,
> > >
> > > What do you think about the use case I mentioned in my previous reply
> > about
> > > > a more resilient self-service Kafka? I believe the benefit there is
> > > bigger.
> > >
> > >
> > > I see this config as analogous to the open file limit. Probably this
> > limit
> > > was intended to be prescriptive at some point about what was deemed a
> > > reasonable number of open files for an application. But mostly people
> > treat
> > > it as an annoyance which they have to work around. If it happens to be
> > hit,
> > > usually you just increase it because it is not tied to an actual
> resource
> > > constraint. However, occasionally hitting the limit does indicate an
> > > application bug such as a leak, so I wouldn't say it is useless.
> > Similarly,
> > > the issue in KAFKA-7610 was a consumer leak and having this limit would
> > > have allowed the problem to be detected before it impacted the cluster.
> > To
> > > me, that's the main benefit. It's possible that it could be used
> > > prescriptively to prevent poor usage of groups, but like the open file
> > > limit, I suspect administrators will just set it large enough that
> users
> > > are unlikely to complain.
> > >
> > > Anyway, just a couple additional questions:
> > >
> > > 1. I think it would be helpful to clarify the details on how the
> > > coordinator will shrink the group. It will need to choose which members
> > to
> > > remove. Are we going to give current members an opportunity to commit
> > > offsets before kicking them from the group?
> > >
> > > 2. Do you think we should make this a dynamic config?
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > >
> > >
> > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > You raise some very valid points.
> > > >
> > > > > The benefit of this KIP is probably limited to preventing "runaway"
> > > > consumer groups due to leaks or some other application bug
> > > > What do you think about the use case I mentioned in my previous reply
> > > about
> > > > a more resilient self-service Kafka? I believe the benefit there is
> > > bigger
> > > >
> > > > * Default value
> > > > You're right, we probably do need to be conservative. Big consumer
> > groups
> > > > are considered an anti-pattern and my goal was to also hint at this
> > > through
> > > > the config's default. Regardless, it is better to not have the
> > potential
> > > to
> > > > break applications with an upgrade.
> > > > Choosing between the default of something big like 5000 or an opt-in
> > > > option, I think we should go with the *disabled default option*
> (-1).
> > > > The only benefit we would get from a big default of 5000 is default
> > > > protection against buggy/malicious applications that hit the
> KAFKA-7610
> > > > issue.
> > > > While this KIP was spawned from that issue, I believe its value is
> > > enabling
> > > > the possibility of protection and helping move towards a more
> > > self-service
> > > > Kafka. I also think that a default value of 5000 might be misleading
> to
> > > > users and lead them to think that big consumer groups (> 250) are a
> > good
> > > > thing.
> > > >
> > > > The good news is that KAFKA-7610 should be fully resolved and the
> > > rebalance
> > > > protocol should, in general, be more solid after the planned
> > improvements
> > > > in KIP-345 and KIP-394.
> > > >
> > > > * Handling bigger groups during upgrade
> > > > I now see that we store the state of consumer groups in the log and
> > why a
> > > > rebalance isn't expected during a rolling upgrade.
> > > > Since we're going with the default value of the max.size being
> > disabled,
> > > I
> > > > believe we can afford to be more strict here.
> > > > During state reloading of a new Coordinator with a defined
> > max.group.size
> > > > config, I believe we should *force* rebalances for groups that exceed
> > the
> > > > configured size. Then, only some consumers will be able to join and
> the
> > > max
> > > > size invariant will be satisfied.
> > > >
> > > > I updated the KIP with a migration plan, rejected alternatives and
> the
> > > new
> > > > default value.
> > > >
> > > > Thanks,
> > > > Stanislav
> > > >
> > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Stanislav,
> > > > >
> > > > > Clients will then find that coordinator
> > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> since
> > > the
> > > > > > cache of active consumers is not stored outside the Coordinator's
> > > > memory.
> > > > > > (please do say if that is incorrect)
> > > > >
> > > > >
> > > > > Groups do not typically rebalance after a coordinator change. You
> > could
> > > > > potentially force a rebalance if the group is too big and kick out
> > the
> > > > > slowest members or something. A more graceful solution is probably
> to
> > > > just
> > > > > accept the current size and prevent it from getting bigger. We
> could
> > > log
> > > > a
> > > > > warning potentially.
> > > > >
> > > > > My thinking is that we should abstract away from conserving
> resources
> > > and
> > > > > > focus on giving control to the broker. The issue that spawned
> this
> > > KIP
> > > > > was
> > > > > > a memory problem but I feel this change is useful in a more
> general
> > > > way.
> > > > >
> > > > >
> > > > > So you probably already know why I'm asking about this. For
> consumer
> > > > groups
> > > > > anyway, resource usage would typically be proportional to the
> number
> > of
> > > > > partitions that a group is reading from and not the number of
> > members.
> > > > For
> > > > > example, consider the memory use in the offsets cache. The benefit
> of
> > > > this
> > > > > KIP is probably limited to preventing "runaway" consumer groups due
> > to
> > > > > leaks or some other application bug. That still seems useful
> though.
> > > > >
> > > > > I completely agree with this and I *ask everybody to chime in with
> > > > opinions
> > > > > > on a sensible default value*.
> > > > >
> > > > >
> > > > > I think we would have to be very conservative. The group protocol
> is
> > > > > generic in some sense, so there may be use cases we don't know of
> > where
> > > > > larger groups are reasonable. Probably we should make this an
> opt-in
> > > > > feature so that we do not risk breaking anyone's application after
> an
> > > > > upgrade. Either that, or use a very high default like 5,000.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > > stanislav@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Jason and Boyang, those were important comments
> > > > > >
> > > > > > > One suggestion I have is that it would be helpful to put your
> > > > reasoning
> > > > > > on deciding the current default value. For example, in certain
> use
> > > > cases
> > > > > at
> > > > > > Pinterest we are very likely to have more consumers than 250 when
> > we
> > > > > > configure 8 stream instances with 32 threads.
> > > > > > > For the effectiveness of this KIP, we should encourage people
> to
> > > > > discuss
> > > > > > their opinions on the default setting and ideally reach a
> > consensus.
> > > > > >
> > > > > > I completely agree with this and I *ask everybody to chime in
> with
> > > > > opinions
> > > > > > on a sensible default value*.
> > > > > > My thought process was that in the current model rebalances in
> > large
> > > > > groups
> > > > > > are more costly. I imagine most use cases in most Kafka users do
> > not
> > > > > > require more than 250 consumers.
> > > > > > Boyang, you say that you are "likely to have... when we..." - do
> > you
> > > > have
> > > > > > systems running with so many consumers in a group or are you
> > planning
> > > > > to? I
> > > > > > guess what I'm asking is whether this has been tested in
> production
> > > > with
> > > > > > the current rebalance model (ignoring KIP-345)
> > > > > >
> > > > > > >  Can you clarify the compatibility impact here? What
> > > > > > > will happen to groups that are already larger than the max
> size?
> > > > > > This is a very important question.
> > > > > > From my current understanding, when a coordinator broker gets
> shut
> > > > > > down during a cluster rolling upgrade, a replica will take
> > leadership
> > > > of
> > > > > > the `__offset_commits` partition. Clients will then find that
> > > > coordinator
> > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> since
> > > the
> > > > > > cache of active consumers is not stored outside the Coordinator's
> > > > memory.
> > > > > > (please do say if that is incorrect)
> > > > > > Then, I believe that working as if this is a new group is a
> > > reasonable
> > > > > > approach. Namely, fail joinGroups when the max.size is exceeded.
> > > > > > What do you guys think about this? (I'll update the KIP after we
> > > settle
> > > > > on
> > > > > > a solution)
> > > > > >
> > > > > > >  Also, just to be clear, the resource we are trying to conserve
> > > here
> > > > is
> > > > > > what? Memory?
> > > > > > My thinking is that we should abstract away from conserving
> > resources
> > > > and
> > > > > > focus on giving control to the broker. The issue that spawned
> this
> > > KIP
> > > > > was
> > > > > > a memory problem but I feel this change is useful in a more
> general
> > > > way.
> > > > > It
> > > > > > limits the control clients have on the cluster and helps Kafka
> > > become a
> > > > > > more self-serving system. Admin/Ops teams can better control the
> > > impact
> > > > > > application developers can have on a Kafka cluster with this
> change
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Stanislav,
> > > > > > >
> > > > > > > Thanks for the KIP. Can you clarify the compatibility impact
> > here?
> > > > What
> > > > > > > will happen to groups that are already larger than the max
> size?
> > > > Also,
> > > > > > just
> > > > > > > to be clear, the resource we are trying to conserve here is
> what?
> > > > > Memory?
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> bchen11@outlook.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Stanislav for the update! One suggestion I have is
> that
> > it
> > > > > would
> > > > > > > be
> > > > > > > > helpful to put your
> > > > > > > >
> > > > > > > > reasoning on deciding the current default value. For example,
> > in
> > > > > > certain
> > > > > > > > use cases at Pinterest we are very likely
> > > > > > > >
> > > > > > > > to have more consumers than 250 when we configure 8 stream
> > > > instances
> > > > > > with
> > > > > > > > 32 threads.
> > > > > > > >
> > > > > > > >
> > > > > > > > For the effectiveness of this KIP, we should encourage people
> > to
> > > > > > discuss
> > > > > > > > their opinions on the default setting and ideally reach a
> > > > consensus.
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Boyang
> > > > > > > >
> > > > > > > > ________________________________
> > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > > To: dev@kafka.apache.org
> > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > > member
> > > > > > > > metadata growth
> > > > > > > >
> > > > > > > > Hey everybody,
> > > > > > > >
> > > > > > > > It's been a week since this KIP and not much discussion has
> > been
> > > > > made.
> > > > > > > > I assume that this is a straight forward change and I will
> > open a
> > > > > > voting
> > > > > > > > thread in the next couple of days if nobody has anything to
> > > > suggest.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > > stanislav@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Greetings everybody,
> > > > > > > > >
> > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> > section
> > > > and
> > > > > > also
> > > > > > > > > renamed it.
> > > > > > > > > KIP:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=1HOujECau6m8AoYt8OMUbpawSjwHg1Z3CxJQMSQYk6A%3D&amp;reserved=0
> > > > > > > > >
> > > > > > > > > I'm looking forward to discussions around it.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > > stanislav@confluent.io> wrote:
> > > > > > > > >
> > > > > > > > >> Hey there everybody,
> > > > > > > > >>
> > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
> effort
> > > you
> > > > > are
> > > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > > >>
> > > > > > > > >> @Matt
> > > > > > > > >> I also believe the default value is high. In my opinion,
> we
> > > > should
> > > > > > aim
> > > > > > > > to
> > > > > > > > >> a default cap around 250. This is because in the current
> > model
> > > > any
> > > > > > > > consumer
> > > > > > > > >> rebalance is disrupting to every consumer. The bigger the
> > > group,
> > > > > the
> > > > > > > > longer
> > > > > > > > >> this period of disruption.
> > > > > > > > >>
> > > > > > > > >> If you have such a large consumer group, chances are that
> > your
> > > > > > > > >> client-side logic could be structured better and that you
> > are
> > > > not
> > > > > > > using
> > > > > > > > the
> > > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > > >> 250 can still be considered of a high upper bound, I
> believe
> > > in
> > > > > > > practice
> > > > > > > > >> users should aim to not go over 100 consumers per consumer
> > > > group.
> > > > > > > > >>
> > > > > > > > >> In regards to the cap being global/per-broker, I think
> that
> > we
> > > > > > should
> > > > > > > > >> consider whether we want it to be global or *per-topic*.
> For
> > > the
> > > > > > time
> > > > > > > > >> being, I believe that having it per-topic with a global
> > > default
> > > > > > might
> > > > > > > be
> > > > > > > > >> the best situation. Having it global only seems a bit
> > > > restricting
> > > > > to
> > > > > > > me
> > > > > > > > and
> > > > > > > > >> it never hurts to support more fine-grained
> configurability
> > > > (given
> > > > > > > it's
> > > > > > > > the
> > > > > > > > >> same config, not a new one being introduced).
> > > > > > > > >>
> > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > > bchen11@outlook.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > > suggestion
> > > > > to
> > > > > > > > >>> change the default value. Meanwhile I just want to point
> > out
> > > > that
> > > > > > > this
> > > > > > > > >>> value is a just last line of defense, not a real scenario
> > we
> > > > > would
> > > > > > > > expect.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he would
> > be
> > > > > > driving
> > > > > > > > the
> > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea in
> the
> > > > first
> > > > > > > place
> > > > > > > > and
> > > > > > > > >>> had already come up a draft design, while I will keep
> > > focusing
> > > > on
> > > > > > > > KIP-345
> > > > > > > > >>> effort to ensure solving the edge case described in the
> > JIRA<
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=QVt3jtDN2dPQ4xZOTAyJGszCXiKYwXHcmbsxmcpph2w%3D&amp;reserved=0
> > > > > > > > >.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> Boyang
> > > > > > > > >>>
> > > > > > > > >>> ________________________________
> > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > > >>> To: dev@kafka.apache.org
> > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> > cap
> > > > > > member
> > > > > > > > >>> metadata growth
> > > > > > > > >>>
> > > > > > > > >>> Thanks for the KIP.
> > > > > > > > >>>
> > > > > > > > >>> Will this cap be a global cap across the entire cluster
> or
> > > per
> > > > > > > broker?
> > > > > > > > >>>
> > > > > > > > >>> Either way the default value seems a bit high to me, but
> > that
> > > > > could
> > > > > > > > just
> > > > > > > > >>> be
> > > > > > > > >>> from my own usage patterns. I'd have probably started
> with
> > > 500
> > > > or
> > > > > > 1k
> > > > > > > > but
> > > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > > >>>
> > > > > > > > >>> Thanks,
> > > > > > > > >>> Matt
> > > > > > > > >>>
> > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > > bchen11@outlook.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>> > Hey folks,
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=FcuCA6ckiid0dsf41upRumrID8r7BGYS7lx1OItHT88%3D&amp;reserved=0
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> > This is a pretty simple change to cap the consumer
> group
> > > size
> > > > > for
> > > > > > > > >>> broker
> > > > > > > > >>> > stability. Give me your valuable feedback when you got
> > > time.
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> > Thank you!
> > > > > > > > >>> >
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Best,
> > > > > > > > >> Stanislav
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav
>


--
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hi,

We discussed this offline with Boyang and figured that it's best to not
wait on the Cooperative Rebalancing proposal. Our thinking is that we can
just force a rebalance from the broker, allowing consumers to commit
offsets if their rebalanceListener is configured correctly.
When rebalancing improvements are implemented, we assume that they would
improve KIP-389's behavior as well as the normal rebalance scenarios

On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bc...@outlook.com> wrote:

> Hey Stanislav,
>
> thanks for the question! `Trivial rebalance` means "we don't start
> reassignment right now, but you need to know it's coming soon
> and you should start preparation".
>
> An example KStream use case is that before actually starting to shrink the
> consumer group, we need to
> 1. partition the consumer group into two subgroups, where one will be
> offline soon and the other will keep serving;
> 2. make sure the states associated with near-future offline consumers are
> successfully replicated on the serving ones.
>
> As I have mentioned shrinking the consumer group is pretty much equivalent
> to group scaling down, so we could think of this
> as an add-on use case for cluster scaling. So my understanding is that the
> KIP-389 could be sequenced within our cooperative rebalancing<
> https://cwiki.apache.org/confluence/display/KAFKA/Incremental+Cooperative+Rebalancing%3A+Support+and+Policies
> >
> proposal.
>
> Let me know if this makes sense.
>
> Best,
> Boyang
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Wednesday, December 5, 2018 5:52 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hey Boyang,
>
> I think we still need to take care of group shrinkage because even if users
> change the config value we cannot guarantee that all consumer groups would
> have been manually shrunk.
>
> Regarding 2., I agree that forcefully triggering a rebalance might be the
> most intuitive way to handle the situation.
> What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
> term.
> I was thinking that maybe we could force a rebalance, which would cause
> consumers to commit their offsets (given their rebalanceListener is
> configured correctly) and subsequently reject some of the incoming
> `joinGroup` requests. Does that sound like it would work?
>
> On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com> wrote:
>
> > Hey Stanislav,
> >
> > I read the latest KIP and saw that we already changed the default value
> to
> > -1. Do
> > we still need to take care of the consumer group shrinking when doing the
> > upgrade?
> >
> > However this is an interesting topic that worth discussing. Although
> > rolling
> > upgrade is fine, `consumer.group.max.size` could always have conflict
> with
> > the current
> > consumer group size which means we need to adhere to one source of truth.
> >
> > 1.Choose the current group size, which means we never interrupt the
> > consumer group until
> > it transits to PREPARE_REBALANCE. And we keep track of how many join
> group
> > requests
> > we have seen so far during PREPARE_REBALANCE. After reaching the consumer
> > cap,
> > we start to inform over provisioned consumers that you should send
> > LeaveGroupRequest and
> > fail yourself. Or with what Mayuresh proposed in KIP-345, we could mark
> > extra members
> > as hot backup and rebalance without them.
> >
> > 2.Choose the `consumer.group.max.size`. I feel incremental rebalancing
> > (you proposed) could be of help here.
> > When a new cap is enforced, leader should be notified. If the current
> > group size is already over limit, leader
> > shall trigger a trivial rebalance to shuffle some topic partitions and
> let
> > a subset of consumers prepare the ownership
> > transition. Until they are ready, we trigger a real rebalance to remove
> > over-provisioned consumers. It is pretty much
> > equivalent to `how do we scale down the consumer group without
> > interrupting the current processing`.
> >
> > I personally feel inclined to 2 because we could kill two birds with one
> > stone in a generic way. What do you think?
> >
> > Boyang
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Monday, December 3, 2018 8:35 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hi Jason,
> >
> > > 2. Do you think we should make this a dynamic config?
> > I'm not sure. Looking at the config from the perspective of a
> prescriptive
> > config, we may get away with not updating it dynamically.
> > But in my opinion, it always makes sense to have a config be dynamically
> > configurable. As long as we limit it to being a cluster-wide config, we
> > should be fine.
> >
> > > 1. I think it would be helpful to clarify the details on how the
> > coordinator will shrink the group. It will need to choose which members
> to
> > remove. Are we going to give current members an opportunity to commit
> > offsets before kicking them from the group?
> >
> > This turns out to be somewhat tricky. I think that we may not be able to
> > guarantee that consumers don't process a message twice.
> > My initial approach was to do as much as we could to let consumers commit
> > offsets.
> >
> > I was thinking that we mark a group to be shrunk, we could keep a map of
> > consumer_id->boolean indicating whether they have committed offsets. I
> then
> > thought we could delay the rebalance until every consumer commits (or
> some
> > time passes).
> > In the meantime, we would block all incoming fetch calls (by either
> > returning empty records or a retriable error) and we would continue to
> > accept offset commits (even twice for a single consumer)
> >
> > I see two problems with this approach:
> > * We have async offset commits, which implies that we can receive fetch
> > requests before the offset commit req has been handled. i.e consmer sends
> > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> > broker. Meaning we could have saved the offsets for B but rebalance
> before
> > the offsetCommit for the offsets processed in C come in.
> > * KIP-392 Allow consumers to fetch from closest replica
> > <
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7C129cd5e4e468408aeea908d65a977179%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636796003819759177&amp;sdata=8cGId%2Ba8OerlXGF8n2AsNHNrSVYCUNuFvX7g4Z9NNDw%3D&amp;reserved=0
> > >
> > would
> > make it significantly harder to block poll() calls on consumers whose
> > groups are being shrunk. Even if we implemented a solution, the same race
> > condition noted above seems to apply and probably others
> >
> >
> > Given those constraints, I think that we can simply mark the group as
> > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> > but
> > I can't seem to come up with a better alternative
> >
> > I'm interested in hearing your thoughts.
> >
> > Thanks,
> > Stanislav
> >
> > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Stanislav,
> > >
> > > What do you think about the use case I mentioned in my previous reply
> > about
> > > > a more resilient self-service Kafka? I believe the benefit there is
> > > bigger.
> > >
> > >
> > > I see this config as analogous to the open file limit. Probably this
> > limit
> > > was intended to be prescriptive at some point about what was deemed a
> > > reasonable number of open files for an application. But mostly people
> > treat
> > > it as an annoyance which they have to work around. If it happens to be
> > hit,
> > > usually you just increase it because it is not tied to an actual
> resource
> > > constraint. However, occasionally hitting the limit does indicate an
> > > application bug such as a leak, so I wouldn't say it is useless.
> > Similarly,
> > > the issue in KAFKA-7610 was a consumer leak and having this limit would
> > > have allowed the problem to be detected before it impacted the cluster.
> > To
> > > me, that's the main benefit. It's possible that it could be used
> > > prescriptively to prevent poor usage of groups, but like the open file
> > > limit, I suspect administrators will just set it large enough that
> users
> > > are unlikely to complain.
> > >
> > > Anyway, just a couple additional questions:
> > >
> > > 1. I think it would be helpful to clarify the details on how the
> > > coordinator will shrink the group. It will need to choose which members
> > to
> > > remove. Are we going to give current members an opportunity to commit
> > > offsets before kicking them from the group?
> > >
> > > 2. Do you think we should make this a dynamic config?
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > >
> > >
> > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > You raise some very valid points.
> > > >
> > > > > The benefit of this KIP is probably limited to preventing "runaway"
> > > > consumer groups due to leaks or some other application bug
> > > > What do you think about the use case I mentioned in my previous reply
> > > about
> > > > a more resilient self-service Kafka? I believe the benefit there is
> > > bigger
> > > >
> > > > * Default value
> > > > You're right, we probably do need to be conservative. Big consumer
> > groups
> > > > are considered an anti-pattern and my goal was to also hint at this
> > > through
> > > > the config's default. Regardless, it is better to not have the
> > potential
> > > to
> > > > break applications with an upgrade.
> > > > Choosing between the default of something big like 5000 or an opt-in
> > > > option, I think we should go with the *disabled default option*
> (-1).
> > > > The only benefit we would get from a big default of 5000 is default
> > > > protection against buggy/malicious applications that hit the
> KAFKA-7610
> > > > issue.
> > > > While this KIP was spawned from that issue, I believe its value is
> > > enabling
> > > > the possibility of protection and helping move towards a more
> > > self-service
> > > > Kafka. I also think that a default value of 5000 might be misleading
> to
> > > > users and lead them to think that big consumer groups (> 250) are a
> > good
> > > > thing.
> > > >
> > > > The good news is that KAFKA-7610 should be fully resolved and the
> > > rebalance
> > > > protocol should, in general, be more solid after the planned
> > improvements
> > > > in KIP-345 and KIP-394.
> > > >
> > > > * Handling bigger groups during upgrade
> > > > I now see that we store the state of consumer groups in the log and
> > why a
> > > > rebalance isn't expected during a rolling upgrade.
> > > > Since we're going with the default value of the max.size being
> > disabled,
> > > I
> > > > believe we can afford to be more strict here.
> > > > During state reloading of a new Coordinator with a defined
> > max.group.size
> > > > config, I believe we should *force* rebalances for groups that exceed
> > the
> > > > configured size. Then, only some consumers will be able to join and
> the
> > > max
> > > > size invariant will be satisfied.
> > > >
> > > > I updated the KIP with a migration plan, rejected alternatives and
> the
> > > new
> > > > default value.
> > > >
> > > > Thanks,
> > > > Stanislav
> > > >
> > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Stanislav,
> > > > >
> > > > > Clients will then find that coordinator
> > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> since
> > > the
> > > > > > cache of active consumers is not stored outside the Coordinator's
> > > > memory.
> > > > > > (please do say if that is incorrect)
> > > > >
> > > > >
> > > > > Groups do not typically rebalance after a coordinator change. You
> > could
> > > > > potentially force a rebalance if the group is too big and kick out
> > the
> > > > > slowest members or something. A more graceful solution is probably
> to
> > > > just
> > > > > accept the current size and prevent it from getting bigger. We
> could
> > > log
> > > > a
> > > > > warning potentially.
> > > > >
> > > > > My thinking is that we should abstract away from conserving
> resources
> > > and
> > > > > > focus on giving control to the broker. The issue that spawned
> this
> > > KIP
> > > > > was
> > > > > > a memory problem but I feel this change is useful in a more
> general
> > > > way.
> > > > >
> > > > >
> > > > > So you probably already know why I'm asking about this. For
> consumer
> > > > groups
> > > > > anyway, resource usage would typically be proportional to the
> number
> > of
> > > > > partitions that a group is reading from and not the number of
> > members.
> > > > For
> > > > > example, consider the memory use in the offsets cache. The benefit
> of
> > > > this
> > > > > KIP is probably limited to preventing "runaway" consumer groups due
> > to
> > > > > leaks or some other application bug. That still seems useful
> though.
> > > > >
> > > > > I completely agree with this and I *ask everybody to chime in with
> > > > opinions
> > > > > > on a sensible default value*.
> > > > >
> > > > >
> > > > > I think we would have to be very conservative. The group protocol
> is
> > > > > generic in some sense, so there may be use cases we don't know of
> > where
> > > > > larger groups are reasonable. Probably we should make this an
> opt-in
> > > > > feature so that we do not risk breaking anyone's application after
> an
> > > > > upgrade. Either that, or use a very high default like 5,000.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > > stanislav@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Jason and Boyang, those were important comments
> > > > > >
> > > > > > > One suggestion I have is that it would be helpful to put your
> > > > reasoning
> > > > > > on deciding the current default value. For example, in certain
> use
> > > > cases
> > > > > at
> > > > > > Pinterest we are very likely to have more consumers than 250 when
> > we
> > > > > > configure 8 stream instances with 32 threads.
> > > > > > > For the effectiveness of this KIP, we should encourage people
> to
> > > > > discuss
> > > > > > their opinions on the default setting and ideally reach a
> > consensus.
> > > > > >
> > > > > > I completely agree with this and I *ask everybody to chime in
> with
> > > > > opinions
> > > > > > on a sensible default value*.
> > > > > > My thought process was that in the current model rebalances in
> > large
> > > > > groups
> > > > > > are more costly. I imagine most use cases in most Kafka users do
> > not
> > > > > > require more than 250 consumers.
> > > > > > Boyang, you say that you are "likely to have... when we..." - do
> > you
> > > > have
> > > > > > systems running with so many consumers in a group or are you
> > planning
> > > > > to? I
> > > > > > guess what I'm asking is whether this has been tested in
> production
> > > > with
> > > > > > the current rebalance model (ignoring KIP-345)
> > > > > >
> > > > > > >  Can you clarify the compatibility impact here? What
> > > > > > > will happen to groups that are already larger than the max
> size?
> > > > > > This is a very important question.
> > > > > > From my current understanding, when a coordinator broker gets
> shut
> > > > > > down during a cluster rolling upgrade, a replica will take
> > leadership
> > > > of
> > > > > > the `__offset_commits` partition. Clients will then find that
> > > > coordinator
> > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> since
> > > the
> > > > > > cache of active consumers is not stored outside the Coordinator's
> > > > memory.
> > > > > > (please do say if that is incorrect)
> > > > > > Then, I believe that working as if this is a new group is a
> > > reasonable
> > > > > > approach. Namely, fail joinGroups when the max.size is exceeded.
> > > > > > What do you guys think about this? (I'll update the KIP after we
> > > settle
> > > > > on
> > > > > > a solution)
> > > > > >
> > > > > > >  Also, just to be clear, the resource we are trying to conserve
> > > here
> > > > is
> > > > > > what? Memory?
> > > > > > My thinking is that we should abstract away from conserving
> > resources
> > > > and
> > > > > > focus on giving control to the broker. The issue that spawned
> this
> > > KIP
> > > > > was
> > > > > > a memory problem but I feel this change is useful in a more
> general
> > > > way.
> > > > > It
> > > > > > limits the control clients have on the cluster and helps Kafka
> > > become a
> > > > > > more self-serving system. Admin/Ops teams can better control the
> > > impact
> > > > > > application developers can have on a Kafka cluster with this
> change
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Stanislav,
> > > > > > >
> > > > > > > Thanks for the KIP. Can you clarify the compatibility impact
> > here?
> > > > What
> > > > > > > will happen to groups that are already larger than the max
> size?
> > > > Also,
> > > > > > just
> > > > > > > to be clear, the resource we are trying to conserve here is
> what?
> > > > > Memory?
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> bchen11@outlook.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Stanislav for the update! One suggestion I have is
> that
> > it
> > > > > would
> > > > > > > be
> > > > > > > > helpful to put your
> > > > > > > >
> > > > > > > > reasoning on deciding the current default value. For example,
> > in
> > > > > > certain
> > > > > > > > use cases at Pinterest we are very likely
> > > > > > > >
> > > > > > > > to have more consumers than 250 when we configure 8 stream
> > > > instances
> > > > > > with
> > > > > > > > 32 threads.
> > > > > > > >
> > > > > > > >
> > > > > > > > For the effectiveness of this KIP, we should encourage people
> > to
> > > > > > discuss
> > > > > > > > their opinions on the default setting and ideally reach a
> > > > consensus.
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Boyang
> > > > > > > >
> > > > > > > > ________________________________
> > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > > To: dev@kafka.apache.org
> > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > > member
> > > > > > > > metadata growth
> > > > > > > >
> > > > > > > > Hey everybody,
> > > > > > > >
> > > > > > > > It's been a week since this KIP and not much discussion has
> > been
> > > > > made.
> > > > > > > > I assume that this is a straight forward change and I will
> > open a
> > > > > > voting
> > > > > > > > thread in the next couple of days if nobody has anything to
> > > > suggest.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > > stanislav@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Greetings everybody,
> > > > > > > > >
> > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> > section
> > > > and
> > > > > > also
> > > > > > > > > renamed it.
> > > > > > > > > KIP:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C129cd5e4e468408aeea908d65a977179%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636796003819759177&amp;sdata=kcwQBabXBS0qjm8wfOVkGPxFlLPVw5nULEDyXE7x3kQ%3D&amp;reserved=0
> > > > > > > > >
> > > > > > > > > I'm looking forward to discussions around it.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > > stanislav@confluent.io> wrote:
> > > > > > > > >
> > > > > > > > >> Hey there everybody,
> > > > > > > > >>
> > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
> effort
> > > you
> > > > > are
> > > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > > >>
> > > > > > > > >> @Matt
> > > > > > > > >> I also believe the default value is high. In my opinion,
> we
> > > > should
> > > > > > aim
> > > > > > > > to
> > > > > > > > >> a default cap around 250. This is because in the current
> > model
> > > > any
> > > > > > > > consumer
> > > > > > > > >> rebalance is disrupting to every consumer. The bigger the
> > > group,
> > > > > the
> > > > > > > > longer
> > > > > > > > >> this period of disruption.
> > > > > > > > >>
> > > > > > > > >> If you have such a large consumer group, chances are that
> > your
> > > > > > > > >> client-side logic could be structured better and that you
> > are
> > > > not
> > > > > > > using
> > > > > > > > the
> > > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > > >> 250 can still be considered of a high upper bound, I
> believe
> > > in
> > > > > > > practice
> > > > > > > > >> users should aim to not go over 100 consumers per consumer
> > > > group.
> > > > > > > > >>
> > > > > > > > >> In regards to the cap being global/per-broker, I think
> that
> > we
> > > > > > should
> > > > > > > > >> consider whether we want it to be global or *per-topic*.
> For
> > > the
> > > > > > time
> > > > > > > > >> being, I believe that having it per-topic with a global
> > > default
> > > > > > might
> > > > > > > be
> > > > > > > > >> the best situation. Having it global only seems a bit
> > > > restricting
> > > > > to
> > > > > > > me
> > > > > > > > and
> > > > > > > > >> it never hurts to support more fine-grained
> configurability
> > > > (given
> > > > > > > it's
> > > > > > > > the
> > > > > > > > >> same config, not a new one being introduced).
> > > > > > > > >>
> > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > > bchen11@outlook.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > > suggestion
> > > > > to
> > > > > > > > >>> change the default value. Meanwhile I just want to point
> > out
> > > > that
> > > > > > > this
> > > > > > > > >>> value is a just last line of defense, not a real scenario
> > we
> > > > > would
> > > > > > > > expect.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he would
> > be
> > > > > > driving
> > > > > > > > the
> > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea in
> the
> > > > first
> > > > > > > place
> > > > > > > > and
> > > > > > > > >>> had already come up a draft design, while I will keep
> > > focusing
> > > > on
> > > > > > > > KIP-345
> > > > > > > > >>> effort to ensure solving the edge case described in the
> > JIRA<
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C129cd5e4e468408aeea908d65a977179%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636796003819759177&amp;sdata=Qw85a9r9kqqwgLJ1m2VA%2FrBWSQ6tU65HlayaFYSKC5U%3D&amp;reserved=0
> > > > > > > > >.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> Boyang
> > > > > > > > >>>
> > > > > > > > >>> ________________________________
> > > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > > >>> To: dev@kafka.apache.org
> > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> > cap
> > > > > > member
> > > > > > > > >>> metadata growth
> > > > > > > > >>>
> > > > > > > > >>> Thanks for the KIP.
> > > > > > > > >>>
> > > > > > > > >>> Will this cap be a global cap across the entire cluster
> or
> > > per
> > > > > > > broker?
> > > > > > > > >>>
> > > > > > > > >>> Either way the default value seems a bit high to me, but
> > that
> > > > > could
> > > > > > > > just
> > > > > > > > >>> be
> > > > > > > > >>> from my own usage patterns. I'd have probably started
> with
> > > 500
> > > > or
> > > > > > 1k
> > > > > > > > but
> > > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > > >>>
> > > > > > > > >>> Thanks,
> > > > > > > > >>> Matt
> > > > > > > > >>>
> > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > > bchen11@outlook.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>> > Hey folks,
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C129cd5e4e468408aeea908d65a977179%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636796003819759177&amp;sdata=DeelC0HUlsROQQ0NduyCcOJXNAOUBAAhlIpDI71vO%2Bk%3D&amp;reserved=0
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> > This is a pretty simple change to cap the consumer
> group
> > > size
> > > > > for
> > > > > > > > >>> broker
> > > > > > > > >>> > stability. Give me your valuable feedback when you got
> > > time.
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> > Thank you!
> > > > > > > > >>> >
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Best,
> > > > > > > > >> Stanislav
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Hey Stanislav,

thanks for the question! `Trivial rebalance` means "we don't start reassignment right now, but you need to know it's coming soon
and you should start preparation".

An example KStream use case is that before actually starting to shrink the consumer group, we need to
1. partition the consumer group into two subgroups, where one will be offline soon and the other will keep serving;
2. make sure the states associated with near-future offline consumers are successfully replicated on the serving ones.

As I have mentioned shrinking the consumer group is pretty much equivalent to group scaling down, so we could think of this
as an add-on use case for cluster scaling. So my understanding is that the KIP-389 could be sequenced within our cooperative rebalancing<https://cwiki.apache.org/confluence/display/KAFKA/Incremental+Cooperative+Rebalancing%3A+Support+and+Policies>
proposal.

Let me know if this makes sense.

Best,
Boyang
________________________________
From: Stanislav Kozlovski <st...@confluent.io>
Sent: Wednesday, December 5, 2018 5:52 PM
To: dev@kafka.apache.org
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Hey Boyang,

I think we still need to take care of group shrinkage because even if users
change the config value we cannot guarantee that all consumer groups would
have been manually shrunk.

Regarding 2., I agree that forcefully triggering a rebalance might be the
most intuitive way to handle the situation.
What does a "trivial rebalance" mean? Sorry, I'm not familiar with the term.
I was thinking that maybe we could force a rebalance, which would cause
consumers to commit their offsets (given their rebalanceListener is
configured correctly) and subsequently reject some of the incoming
`joinGroup` requests. Does that sound like it would work?

On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com> wrote:

> Hey Stanislav,
>
> I read the latest KIP and saw that we already changed the default value to
> -1. Do
> we still need to take care of the consumer group shrinking when doing the
> upgrade?
>
> However this is an interesting topic that worth discussing. Although
> rolling
> upgrade is fine, `consumer.group.max.size` could always have conflict with
> the current
> consumer group size which means we need to adhere to one source of truth.
>
> 1.Choose the current group size, which means we never interrupt the
> consumer group until
> it transits to PREPARE_REBALANCE. And we keep track of how many join group
> requests
> we have seen so far during PREPARE_REBALANCE. After reaching the consumer
> cap,
> we start to inform over provisioned consumers that you should send
> LeaveGroupRequest and
> fail yourself. Or with what Mayuresh proposed in KIP-345, we could mark
> extra members
> as hot backup and rebalance without them.
>
> 2.Choose the `consumer.group.max.size`. I feel incremental rebalancing
> (you proposed) could be of help here.
> When a new cap is enforced, leader should be notified. If the current
> group size is already over limit, leader
> shall trigger a trivial rebalance to shuffle some topic partitions and let
> a subset of consumers prepare the ownership
> transition. Until they are ready, we trigger a real rebalance to remove
> over-provisioned consumers. It is pretty much
> equivalent to `how do we scale down the consumer group without
> interrupting the current processing`.
>
> I personally feel inclined to 2 because we could kill two birds with one
> stone in a generic way. What do you think?
>
> Boyang
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Monday, December 3, 2018 8:35 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hi Jason,
>
> > 2. Do you think we should make this a dynamic config?
> I'm not sure. Looking at the config from the perspective of a prescriptive
> config, we may get away with not updating it dynamically.
> But in my opinion, it always makes sense to have a config be dynamically
> configurable. As long as we limit it to being a cluster-wide config, we
> should be fine.
>
> > 1. I think it would be helpful to clarify the details on how the
> coordinator will shrink the group. It will need to choose which members to
> remove. Are we going to give current members an opportunity to commit
> offsets before kicking them from the group?
>
> This turns out to be somewhat tricky. I think that we may not be able to
> guarantee that consumers don't process a message twice.
> My initial approach was to do as much as we could to let consumers commit
> offsets.
>
> I was thinking that we mark a group to be shrunk, we could keep a map of
> consumer_id->boolean indicating whether they have committed offsets. I then
> thought we could delay the rebalance until every consumer commits (or some
> time passes).
> In the meantime, we would block all incoming fetch calls (by either
> returning empty records or a retriable error) and we would continue to
> accept offset commits (even twice for a single consumer)
>
> I see two problems with this approach:
> * We have async offset commits, which implies that we can receive fetch
> requests before the offset commit req has been handled. i.e consmer sends
> fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> broker. Meaning we could have saved the offsets for B but rebalance before
> the offsetCommit for the offsets processed in C come in.
> * KIP-392 Allow consumers to fetch from closest replica
> <
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7C129cd5e4e468408aeea908d65a977179%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636796003819759177&amp;sdata=8cGId%2Ba8OerlXGF8n2AsNHNrSVYCUNuFvX7g4Z9NNDw%3D&amp;reserved=0
> >
> would
> make it significantly harder to block poll() calls on consumers whose
> groups are being shrunk. Even if we implemented a solution, the same race
> condition noted above seems to apply and probably others
>
>
> Given those constraints, I think that we can simply mark the group as
> `PreparingRebalance` with a rebalanceTimeout of the server setting `
> group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> but
> I can't seem to come up with a better alternative
>
> I'm interested in hearing your thoughts.
>
> Thanks,
> Stanislav
>
> On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Stanislav,
> >
> > What do you think about the use case I mentioned in my previous reply
> about
> > > a more resilient self-service Kafka? I believe the benefit there is
> > bigger.
> >
> >
> > I see this config as analogous to the open file limit. Probably this
> limit
> > was intended to be prescriptive at some point about what was deemed a
> > reasonable number of open files for an application. But mostly people
> treat
> > it as an annoyance which they have to work around. If it happens to be
> hit,
> > usually you just increase it because it is not tied to an actual resource
> > constraint. However, occasionally hitting the limit does indicate an
> > application bug such as a leak, so I wouldn't say it is useless.
> Similarly,
> > the issue in KAFKA-7610 was a consumer leak and having this limit would
> > have allowed the problem to be detected before it impacted the cluster.
> To
> > me, that's the main benefit. It's possible that it could be used
> > prescriptively to prevent poor usage of groups, but like the open file
> > limit, I suspect administrators will just set it large enough that users
> > are unlikely to complain.
> >
> > Anyway, just a couple additional questions:
> >
> > 1. I think it would be helpful to clarify the details on how the
> > coordinator will shrink the group. It will need to choose which members
> to
> > remove. Are we going to give current members an opportunity to commit
> > offsets before kicking them from the group?
> >
> > 2. Do you think we should make this a dynamic config?
> >
> > Thanks,
> > Jason
> >
> >
> >
> >
> > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > You raise some very valid points.
> > >
> > > > The benefit of this KIP is probably limited to preventing "runaway"
> > > consumer groups due to leaks or some other application bug
> > > What do you think about the use case I mentioned in my previous reply
> > about
> > > a more resilient self-service Kafka? I believe the benefit there is
> > bigger
> > >
> > > * Default value
> > > You're right, we probably do need to be conservative. Big consumer
> groups
> > > are considered an anti-pattern and my goal was to also hint at this
> > through
> > > the config's default. Regardless, it is better to not have the
> potential
> > to
> > > break applications with an upgrade.
> > > Choosing between the default of something big like 5000 or an opt-in
> > > option, I think we should go with the *disabled default option*  (-1).
> > > The only benefit we would get from a big default of 5000 is default
> > > protection against buggy/malicious applications that hit the KAFKA-7610
> > > issue.
> > > While this KIP was spawned from that issue, I believe its value is
> > enabling
> > > the possibility of protection and helping move towards a more
> > self-service
> > > Kafka. I also think that a default value of 5000 might be misleading to
> > > users and lead them to think that big consumer groups (> 250) are a
> good
> > > thing.
> > >
> > > The good news is that KAFKA-7610 should be fully resolved and the
> > rebalance
> > > protocol should, in general, be more solid after the planned
> improvements
> > > in KIP-345 and KIP-394.
> > >
> > > * Handling bigger groups during upgrade
> > > I now see that we store the state of consumer groups in the log and
> why a
> > > rebalance isn't expected during a rolling upgrade.
> > > Since we're going with the default value of the max.size being
> disabled,
> > I
> > > believe we can afford to be more strict here.
> > > During state reloading of a new Coordinator with a defined
> max.group.size
> > > config, I believe we should *force* rebalances for groups that exceed
> the
> > > configured size. Then, only some consumers will be able to join and the
> > max
> > > size invariant will be satisfied.
> > >
> > > I updated the KIP with a migration plan, rejected alternatives and the
> > new
> > > default value.
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hey Stanislav,
> > > >
> > > > Clients will then find that coordinator
> > > > > and send `joinGroup` on it, effectively rebuilding the group, since
> > the
> > > > > cache of active consumers is not stored outside the Coordinator's
> > > memory.
> > > > > (please do say if that is incorrect)
> > > >
> > > >
> > > > Groups do not typically rebalance after a coordinator change. You
> could
> > > > potentially force a rebalance if the group is too big and kick out
> the
> > > > slowest members or something. A more graceful solution is probably to
> > > just
> > > > accept the current size and prevent it from getting bigger. We could
> > log
> > > a
> > > > warning potentially.
> > > >
> > > > My thinking is that we should abstract away from conserving resources
> > and
> > > > > focus on giving control to the broker. The issue that spawned this
> > KIP
> > > > was
> > > > > a memory problem but I feel this change is useful in a more general
> > > way.
> > > >
> > > >
> > > > So you probably already know why I'm asking about this. For consumer
> > > groups
> > > > anyway, resource usage would typically be proportional to the number
> of
> > > > partitions that a group is reading from and not the number of
> members.
> > > For
> > > > example, consider the memory use in the offsets cache. The benefit of
> > > this
> > > > KIP is probably limited to preventing "runaway" consumer groups due
> to
> > > > leaks or some other application bug. That still seems useful though.
> > > >
> > > > I completely agree with this and I *ask everybody to chime in with
> > > opinions
> > > > > on a sensible default value*.
> > > >
> > > >
> > > > I think we would have to be very conservative. The group protocol is
> > > > generic in some sense, so there may be use cases we don't know of
> where
> > > > larger groups are reasonable. Probably we should make this an opt-in
> > > > feature so that we do not risk breaking anyone's application after an
> > > > upgrade. Either that, or use a very high default like 5,000.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > stanislav@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Jason and Boyang, those were important comments
> > > > >
> > > > > > One suggestion I have is that it would be helpful to put your
> > > reasoning
> > > > > on deciding the current default value. For example, in certain use
> > > cases
> > > > at
> > > > > Pinterest we are very likely to have more consumers than 250 when
> we
> > > > > configure 8 stream instances with 32 threads.
> > > > > > For the effectiveness of this KIP, we should encourage people to
> > > > discuss
> > > > > their opinions on the default setting and ideally reach a
> consensus.
> > > > >
> > > > > I completely agree with this and I *ask everybody to chime in with
> > > > opinions
> > > > > on a sensible default value*.
> > > > > My thought process was that in the current model rebalances in
> large
> > > > groups
> > > > > are more costly. I imagine most use cases in most Kafka users do
> not
> > > > > require more than 250 consumers.
> > > > > Boyang, you say that you are "likely to have... when we..." - do
> you
> > > have
> > > > > systems running with so many consumers in a group or are you
> planning
> > > > to? I
> > > > > guess what I'm asking is whether this has been tested in production
> > > with
> > > > > the current rebalance model (ignoring KIP-345)
> > > > >
> > > > > >  Can you clarify the compatibility impact here? What
> > > > > > will happen to groups that are already larger than the max size?
> > > > > This is a very important question.
> > > > > From my current understanding, when a coordinator broker gets shut
> > > > > down during a cluster rolling upgrade, a replica will take
> leadership
> > > of
> > > > > the `__offset_commits` partition. Clients will then find that
> > > coordinator
> > > > > and send `joinGroup` on it, effectively rebuilding the group, since
> > the
> > > > > cache of active consumers is not stored outside the Coordinator's
> > > memory.
> > > > > (please do say if that is incorrect)
> > > > > Then, I believe that working as if this is a new group is a
> > reasonable
> > > > > approach. Namely, fail joinGroups when the max.size is exceeded.
> > > > > What do you guys think about this? (I'll update the KIP after we
> > settle
> > > > on
> > > > > a solution)
> > > > >
> > > > > >  Also, just to be clear, the resource we are trying to conserve
> > here
> > > is
> > > > > what? Memory?
> > > > > My thinking is that we should abstract away from conserving
> resources
> > > and
> > > > > focus on giving control to the broker. The issue that spawned this
> > KIP
> > > > was
> > > > > a memory problem but I feel this change is useful in a more general
> > > way.
> > > > It
> > > > > limits the control clients have on the cluster and helps Kafka
> > become a
> > > > > more self-serving system. Admin/Ops teams can better control the
> > impact
> > > > > application developers can have on a Kafka cluster with this change
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > >
> > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Stanislav,
> > > > > >
> > > > > > Thanks for the KIP. Can you clarify the compatibility impact
> here?
> > > What
> > > > > > will happen to groups that are already larger than the max size?
> > > Also,
> > > > > just
> > > > > > to be clear, the resource we are trying to conserve here is what?
> > > > Memory?
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bchen11@outlook.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Thanks Stanislav for the update! One suggestion I have is that
> it
> > > > would
> > > > > > be
> > > > > > > helpful to put your
> > > > > > >
> > > > > > > reasoning on deciding the current default value. For example,
> in
> > > > > certain
> > > > > > > use cases at Pinterest we are very likely
> > > > > > >
> > > > > > > to have more consumers than 250 when we configure 8 stream
> > > instances
> > > > > with
> > > > > > > 32 threads.
> > > > > > >
> > > > > > >
> > > > > > > For the effectiveness of this KIP, we should encourage people
> to
> > > > > discuss
> > > > > > > their opinions on the default setting and ideally reach a
> > > consensus.
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Boyang
> > > > > > >
> > > > > > > ________________________________
> > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > To: dev@kafka.apache.org
> > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > member
> > > > > > > metadata growth
> > > > > > >
> > > > > > > Hey everybody,
> > > > > > >
> > > > > > > It's been a week since this KIP and not much discussion has
> been
> > > > made.
> > > > > > > I assume that this is a straight forward change and I will
> open a
> > > > > voting
> > > > > > > thread in the next couple of days if nobody has anything to
> > > suggest.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > stanislav@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Greetings everybody,
> > > > > > > >
> > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> section
> > > and
> > > > > also
> > > > > > > > renamed it.
> > > > > > > > KIP:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C129cd5e4e468408aeea908d65a977179%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636796003819759177&amp;sdata=kcwQBabXBS0qjm8wfOVkGPxFlLPVw5nULEDyXE7x3kQ%3D&amp;reserved=0
> > > > > > > >
> > > > > > > > I'm looking forward to discussions around it.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > stanislav@confluent.io> wrote:
> > > > > > > >
> > > > > > > >> Hey there everybody,
> > > > > > > >>
> > > > > > > >> Thanks for the introduction Boyang. I appreciate the effort
> > you
> > > > are
> > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > >>
> > > > > > > >> @Matt
> > > > > > > >> I also believe the default value is high. In my opinion, we
> > > should
> > > > > aim
> > > > > > > to
> > > > > > > >> a default cap around 250. This is because in the current
> model
> > > any
> > > > > > > consumer
> > > > > > > >> rebalance is disrupting to every consumer. The bigger the
> > group,
> > > > the
> > > > > > > longer
> > > > > > > >> this period of disruption.
> > > > > > > >>
> > > > > > > >> If you have such a large consumer group, chances are that
> your
> > > > > > > >> client-side logic could be structured better and that you
> are
> > > not
> > > > > > using
> > > > > > > the
> > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > >> 250 can still be considered of a high upper bound, I believe
> > in
> > > > > > practice
> > > > > > > >> users should aim to not go over 100 consumers per consumer
> > > group.
> > > > > > > >>
> > > > > > > >> In regards to the cap being global/per-broker, I think that
> we
> > > > > should
> > > > > > > >> consider whether we want it to be global or *per-topic*. For
> > the
> > > > > time
> > > > > > > >> being, I believe that having it per-topic with a global
> > default
> > > > > might
> > > > > > be
> > > > > > > >> the best situation. Having it global only seems a bit
> > > restricting
> > > > to
> > > > > > me
> > > > > > > and
> > > > > > > >> it never hurts to support more fine-grained configurability
> > > (given
> > > > > > it's
> > > > > > > the
> > > > > > > >> same config, not a new one being introduced).
> > > > > > > >>
> > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > bchen11@outlook.com
> > > > >
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > suggestion
> > > > to
> > > > > > > >>> change the default value. Meanwhile I just want to point
> out
> > > that
> > > > > > this
> > > > > > > >>> value is a just last line of defense, not a real scenario
> we
> > > > would
> > > > > > > expect.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> In the meanwhile, I discussed with Stanislav and he would
> be
> > > > > driving
> > > > > > > the
> > > > > > > >>> 389 effort from now on. Stanislav proposed the idea in the
> > > first
> > > > > > place
> > > > > > > and
> > > > > > > >>> had already come up a draft design, while I will keep
> > focusing
> > > on
> > > > > > > KIP-345
> > > > > > > >>> effort to ensure solving the edge case described in the
> JIRA<
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C129cd5e4e468408aeea908d65a977179%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636796003819759177&amp;sdata=Qw85a9r9kqqwgLJ1m2VA%2FrBWSQ6tU65HlayaFYSKC5U%3D&amp;reserved=0
> > > > > > > >.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Boyang
> > > > > > > >>>
> > > > > > > >>> ________________________________
> > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > >>> To: dev@kafka.apache.org
> > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> cap
> > > > > member
> > > > > > > >>> metadata growth
> > > > > > > >>>
> > > > > > > >>> Thanks for the KIP.
> > > > > > > >>>
> > > > > > > >>> Will this cap be a global cap across the entire cluster or
> > per
> > > > > > broker?
> > > > > > > >>>
> > > > > > > >>> Either way the default value seems a bit high to me, but
> that
> > > > could
> > > > > > > just
> > > > > > > >>> be
> > > > > > > >>> from my own usage patterns. I'd have probably started with
> > 500
> > > or
> > > > > 1k
> > > > > > > but
> > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > >>>
> > > > > > > >>> Thanks,
> > > > > > > >>> Matt
> > > > > > > >>>
> > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > bchen11@outlook.com
> > > > >
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> > Hey folks,
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C129cd5e4e468408aeea908d65a977179%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636796003819759177&amp;sdata=DeelC0HUlsROQQ0NduyCcOJXNAOUBAAhlIpDI71vO%2Bk%3D&amp;reserved=0
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> > This is a pretty simple change to cap the consumer group
> > size
> > > > for
> > > > > > > >>> broker
> > > > > > > >>> > stability. Give me your valuable feedback when you got
> > time.
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> > Thank you!
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Best,
> > > > > > > >> Stanislav
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>
>
> --
> Best,
> Stanislav
>


--
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey Boyang,

I think we still need to take care of group shrinkage because even if users
change the config value we cannot guarantee that all consumer groups would
have been manually shrunk.

Regarding 2., I agree that forcefully triggering a rebalance might be the
most intuitive way to handle the situation.
What does a "trivial rebalance" mean? Sorry, I'm not familiar with the term.
I was thinking that maybe we could force a rebalance, which would cause
consumers to commit their offsets (given their rebalanceListener is
configured correctly) and subsequently reject some of the incoming
`joinGroup` requests. Does that sound like it would work?

On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bc...@outlook.com> wrote:

> Hey Stanislav,
>
> I read the latest KIP and saw that we already changed the default value to
> -1. Do
> we still need to take care of the consumer group shrinking when doing the
> upgrade?
>
> However this is an interesting topic that worth discussing. Although
> rolling
> upgrade is fine, `consumer.group.max.size` could always have conflict with
> the current
> consumer group size which means we need to adhere to one source of truth.
>
> 1.Choose the current group size, which means we never interrupt the
> consumer group until
> it transits to PREPARE_REBALANCE. And we keep track of how many join group
> requests
> we have seen so far during PREPARE_REBALANCE. After reaching the consumer
> cap,
> we start to inform over provisioned consumers that you should send
> LeaveGroupRequest and
> fail yourself. Or with what Mayuresh proposed in KIP-345, we could mark
> extra members
> as hot backup and rebalance without them.
>
> 2.Choose the `consumer.group.max.size`. I feel incremental rebalancing
> (you proposed) could be of help here.
> When a new cap is enforced, leader should be notified. If the current
> group size is already over limit, leader
> shall trigger a trivial rebalance to shuffle some topic partitions and let
> a subset of consumers prepare the ownership
> transition. Until they are ready, we trigger a real rebalance to remove
> over-provisioned consumers. It is pretty much
> equivalent to `how do we scale down the consumer group without
> interrupting the current processing`.
>
> I personally feel inclined to 2 because we could kill two birds with one
> stone in a generic way. What do you think?
>
> Boyang
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Monday, December 3, 2018 8:35 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hi Jason,
>
> > 2. Do you think we should make this a dynamic config?
> I'm not sure. Looking at the config from the perspective of a prescriptive
> config, we may get away with not updating it dynamically.
> But in my opinion, it always makes sense to have a config be dynamically
> configurable. As long as we limit it to being a cluster-wide config, we
> should be fine.
>
> > 1. I think it would be helpful to clarify the details on how the
> coordinator will shrink the group. It will need to choose which members to
> remove. Are we going to give current members an opportunity to commit
> offsets before kicking them from the group?
>
> This turns out to be somewhat tricky. I think that we may not be able to
> guarantee that consumers don't process a message twice.
> My initial approach was to do as much as we could to let consumers commit
> offsets.
>
> I was thinking that we mark a group to be shrunk, we could keep a map of
> consumer_id->boolean indicating whether they have committed offsets. I then
> thought we could delay the rebalance until every consumer commits (or some
> time passes).
> In the meantime, we would block all incoming fetch calls (by either
> returning empty records or a retriable error) and we would continue to
> accept offset commits (even twice for a single consumer)
>
> I see two problems with this approach:
> * We have async offset commits, which implies that we can receive fetch
> requests before the offset commit req has been handled. i.e consmer sends
> fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> broker. Meaning we could have saved the offsets for B but rebalance before
> the offsetCommit for the offsets processed in C come in.
> * KIP-392 Allow consumers to fetch from closest replica
> <
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7C7fef9c5ec8e3433e36e008d6591bd927%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636794373467511777&amp;sdata=D%2BW86hEewx5YS5%2FvbKekdR8Atl6ZmZtwjuU7940pKU0%3D&amp;reserved=0
> >
> would
> make it significantly harder to block poll() calls on consumers whose
> groups are being shrunk. Even if we implemented a solution, the same race
> condition noted above seems to apply and probably others
>
>
> Given those constraints, I think that we can simply mark the group as
> `PreparingRebalance` with a rebalanceTimeout of the server setting `
> group.max.session.timeout.ms`. That's a bit long by default (5 minutes)
> but
> I can't seem to come up with a better alternative
>
> I'm interested in hearing your thoughts.
>
> Thanks,
> Stanislav
>
> On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Stanislav,
> >
> > What do you think about the use case I mentioned in my previous reply
> about
> > > a more resilient self-service Kafka? I believe the benefit there is
> > bigger.
> >
> >
> > I see this config as analogous to the open file limit. Probably this
> limit
> > was intended to be prescriptive at some point about what was deemed a
> > reasonable number of open files for an application. But mostly people
> treat
> > it as an annoyance which they have to work around. If it happens to be
> hit,
> > usually you just increase it because it is not tied to an actual resource
> > constraint. However, occasionally hitting the limit does indicate an
> > application bug such as a leak, so I wouldn't say it is useless.
> Similarly,
> > the issue in KAFKA-7610 was a consumer leak and having this limit would
> > have allowed the problem to be detected before it impacted the cluster.
> To
> > me, that's the main benefit. It's possible that it could be used
> > prescriptively to prevent poor usage of groups, but like the open file
> > limit, I suspect administrators will just set it large enough that users
> > are unlikely to complain.
> >
> > Anyway, just a couple additional questions:
> >
> > 1. I think it would be helpful to clarify the details on how the
> > coordinator will shrink the group. It will need to choose which members
> to
> > remove. Are we going to give current members an opportunity to commit
> > offsets before kicking them from the group?
> >
> > 2. Do you think we should make this a dynamic config?
> >
> > Thanks,
> > Jason
> >
> >
> >
> >
> > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > You raise some very valid points.
> > >
> > > > The benefit of this KIP is probably limited to preventing "runaway"
> > > consumer groups due to leaks or some other application bug
> > > What do you think about the use case I mentioned in my previous reply
> > about
> > > a more resilient self-service Kafka? I believe the benefit there is
> > bigger
> > >
> > > * Default value
> > > You're right, we probably do need to be conservative. Big consumer
> groups
> > > are considered an anti-pattern and my goal was to also hint at this
> > through
> > > the config's default. Regardless, it is better to not have the
> potential
> > to
> > > break applications with an upgrade.
> > > Choosing between the default of something big like 5000 or an opt-in
> > > option, I think we should go with the *disabled default option*  (-1).
> > > The only benefit we would get from a big default of 5000 is default
> > > protection against buggy/malicious applications that hit the KAFKA-7610
> > > issue.
> > > While this KIP was spawned from that issue, I believe its value is
> > enabling
> > > the possibility of protection and helping move towards a more
> > self-service
> > > Kafka. I also think that a default value of 5000 might be misleading to
> > > users and lead them to think that big consumer groups (> 250) are a
> good
> > > thing.
> > >
> > > The good news is that KAFKA-7610 should be fully resolved and the
> > rebalance
> > > protocol should, in general, be more solid after the planned
> improvements
> > > in KIP-345 and KIP-394.
> > >
> > > * Handling bigger groups during upgrade
> > > I now see that we store the state of consumer groups in the log and
> why a
> > > rebalance isn't expected during a rolling upgrade.
> > > Since we're going with the default value of the max.size being
> disabled,
> > I
> > > believe we can afford to be more strict here.
> > > During state reloading of a new Coordinator with a defined
> max.group.size
> > > config, I believe we should *force* rebalances for groups that exceed
> the
> > > configured size. Then, only some consumers will be able to join and the
> > max
> > > size invariant will be satisfied.
> > >
> > > I updated the KIP with a migration plan, rejected alternatives and the
> > new
> > > default value.
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hey Stanislav,
> > > >
> > > > Clients will then find that coordinator
> > > > > and send `joinGroup` on it, effectively rebuilding the group, since
> > the
> > > > > cache of active consumers is not stored outside the Coordinator's
> > > memory.
> > > > > (please do say if that is incorrect)
> > > >
> > > >
> > > > Groups do not typically rebalance after a coordinator change. You
> could
> > > > potentially force a rebalance if the group is too big and kick out
> the
> > > > slowest members or something. A more graceful solution is probably to
> > > just
> > > > accept the current size and prevent it from getting bigger. We could
> > log
> > > a
> > > > warning potentially.
> > > >
> > > > My thinking is that we should abstract away from conserving resources
> > and
> > > > > focus on giving control to the broker. The issue that spawned this
> > KIP
> > > > was
> > > > > a memory problem but I feel this change is useful in a more general
> > > way.
> > > >
> > > >
> > > > So you probably already know why I'm asking about this. For consumer
> > > groups
> > > > anyway, resource usage would typically be proportional to the number
> of
> > > > partitions that a group is reading from and not the number of
> members.
> > > For
> > > > example, consider the memory use in the offsets cache. The benefit of
> > > this
> > > > KIP is probably limited to preventing "runaway" consumer groups due
> to
> > > > leaks or some other application bug. That still seems useful though.
> > > >
> > > > I completely agree with this and I *ask everybody to chime in with
> > > opinions
> > > > > on a sensible default value*.
> > > >
> > > >
> > > > I think we would have to be very conservative. The group protocol is
> > > > generic in some sense, so there may be use cases we don't know of
> where
> > > > larger groups are reasonable. Probably we should make this an opt-in
> > > > feature so that we do not risk breaking anyone's application after an
> > > > upgrade. Either that, or use a very high default like 5,000.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > stanislav@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Jason and Boyang, those were important comments
> > > > >
> > > > > > One suggestion I have is that it would be helpful to put your
> > > reasoning
> > > > > on deciding the current default value. For example, in certain use
> > > cases
> > > > at
> > > > > Pinterest we are very likely to have more consumers than 250 when
> we
> > > > > configure 8 stream instances with 32 threads.
> > > > > > For the effectiveness of this KIP, we should encourage people to
> > > > discuss
> > > > > their opinions on the default setting and ideally reach a
> consensus.
> > > > >
> > > > > I completely agree with this and I *ask everybody to chime in with
> > > > opinions
> > > > > on a sensible default value*.
> > > > > My thought process was that in the current model rebalances in
> large
> > > > groups
> > > > > are more costly. I imagine most use cases in most Kafka users do
> not
> > > > > require more than 250 consumers.
> > > > > Boyang, you say that you are "likely to have... when we..." - do
> you
> > > have
> > > > > systems running with so many consumers in a group or are you
> planning
> > > > to? I
> > > > > guess what I'm asking is whether this has been tested in production
> > > with
> > > > > the current rebalance model (ignoring KIP-345)
> > > > >
> > > > > >  Can you clarify the compatibility impact here? What
> > > > > > will happen to groups that are already larger than the max size?
> > > > > This is a very important question.
> > > > > From my current understanding, when a coordinator broker gets shut
> > > > > down during a cluster rolling upgrade, a replica will take
> leadership
> > > of
> > > > > the `__offset_commits` partition. Clients will then find that
> > > coordinator
> > > > > and send `joinGroup` on it, effectively rebuilding the group, since
> > the
> > > > > cache of active consumers is not stored outside the Coordinator's
> > > memory.
> > > > > (please do say if that is incorrect)
> > > > > Then, I believe that working as if this is a new group is a
> > reasonable
> > > > > approach. Namely, fail joinGroups when the max.size is exceeded.
> > > > > What do you guys think about this? (I'll update the KIP after we
> > settle
> > > > on
> > > > > a solution)
> > > > >
> > > > > >  Also, just to be clear, the resource we are trying to conserve
> > here
> > > is
> > > > > what? Memory?
> > > > > My thinking is that we should abstract away from conserving
> resources
> > > and
> > > > > focus on giving control to the broker. The issue that spawned this
> > KIP
> > > > was
> > > > > a memory problem but I feel this change is useful in a more general
> > > way.
> > > > It
> > > > > limits the control clients have on the cluster and helps Kafka
> > become a
> > > > > more self-serving system. Admin/Ops teams can better control the
> > impact
> > > > > application developers can have on a Kafka cluster with this change
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > >
> > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Stanislav,
> > > > > >
> > > > > > Thanks for the KIP. Can you clarify the compatibility impact
> here?
> > > What
> > > > > > will happen to groups that are already larger than the max size?
> > > Also,
> > > > > just
> > > > > > to be clear, the resource we are trying to conserve here is what?
> > > > Memory?
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bchen11@outlook.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Thanks Stanislav for the update! One suggestion I have is that
> it
> > > > would
> > > > > > be
> > > > > > > helpful to put your
> > > > > > >
> > > > > > > reasoning on deciding the current default value. For example,
> in
> > > > > certain
> > > > > > > use cases at Pinterest we are very likely
> > > > > > >
> > > > > > > to have more consumers than 250 when we configure 8 stream
> > > instances
> > > > > with
> > > > > > > 32 threads.
> > > > > > >
> > > > > > >
> > > > > > > For the effectiveness of this KIP, we should encourage people
> to
> > > > > discuss
> > > > > > > their opinions on the default setting and ideally reach a
> > > consensus.
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Boyang
> > > > > > >
> > > > > > > ________________________________
> > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > To: dev@kafka.apache.org
> > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > member
> > > > > > > metadata growth
> > > > > > >
> > > > > > > Hey everybody,
> > > > > > >
> > > > > > > It's been a week since this KIP and not much discussion has
> been
> > > > made.
> > > > > > > I assume that this is a straight forward change and I will
> open a
> > > > > voting
> > > > > > > thread in the next couple of days if nobody has anything to
> > > suggest.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > stanislav@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Greetings everybody,
> > > > > > > >
> > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> section
> > > and
> > > > > also
> > > > > > > > renamed it.
> > > > > > > > KIP:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C7fef9c5ec8e3433e36e008d6591bd927%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636794373467511777&amp;sdata=xgkUmIn8ZjjAOa6sLyNCtt3KEB8weL1NRBEtGPJWASU%3D&amp;reserved=0
> > > > > > > >
> > > > > > > > I'm looking forward to discussions around it.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > stanislav@confluent.io> wrote:
> > > > > > > >
> > > > > > > >> Hey there everybody,
> > > > > > > >>
> > > > > > > >> Thanks for the introduction Boyang. I appreciate the effort
> > you
> > > > are
> > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > >>
> > > > > > > >> @Matt
> > > > > > > >> I also believe the default value is high. In my opinion, we
> > > should
> > > > > aim
> > > > > > > to
> > > > > > > >> a default cap around 250. This is because in the current
> model
> > > any
> > > > > > > consumer
> > > > > > > >> rebalance is disrupting to every consumer. The bigger the
> > group,
> > > > the
> > > > > > > longer
> > > > > > > >> this period of disruption.
> > > > > > > >>
> > > > > > > >> If you have such a large consumer group, chances are that
> your
> > > > > > > >> client-side logic could be structured better and that you
> are
> > > not
> > > > > > using
> > > > > > > the
> > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > >> 250 can still be considered of a high upper bound, I believe
> > in
> > > > > > practice
> > > > > > > >> users should aim to not go over 100 consumers per consumer
> > > group.
> > > > > > > >>
> > > > > > > >> In regards to the cap being global/per-broker, I think that
> we
> > > > > should
> > > > > > > >> consider whether we want it to be global or *per-topic*. For
> > the
> > > > > time
> > > > > > > >> being, I believe that having it per-topic with a global
> > default
> > > > > might
> > > > > > be
> > > > > > > >> the best situation. Having it global only seems a bit
> > > restricting
> > > > to
> > > > > > me
> > > > > > > and
> > > > > > > >> it never hurts to support more fine-grained configurability
> > > (given
> > > > > > it's
> > > > > > > the
> > > > > > > >> same config, not a new one being introduced).
> > > > > > > >>
> > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > bchen11@outlook.com
> > > > >
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > suggestion
> > > > to
> > > > > > > >>> change the default value. Meanwhile I just want to point
> out
> > > that
> > > > > > this
> > > > > > > >>> value is a just last line of defense, not a real scenario
> we
> > > > would
> > > > > > > expect.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> In the meanwhile, I discussed with Stanislav and he would
> be
> > > > > driving
> > > > > > > the
> > > > > > > >>> 389 effort from now on. Stanislav proposed the idea in the
> > > first
> > > > > > place
> > > > > > > and
> > > > > > > >>> had already come up a draft design, while I will keep
> > focusing
> > > on
> > > > > > > KIP-345
> > > > > > > >>> effort to ensure solving the edge case described in the
> JIRA<
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C7fef9c5ec8e3433e36e008d6591bd927%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636794373467511777&amp;sdata=mHSk4vg3ubD63qUmuoxHgIUvgW%2FHF%2FJZJI6lELlg%2BCY%3D&amp;reserved=0
> > > > > > > >.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Boyang
> > > > > > > >>>
> > > > > > > >>> ________________________________
> > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > >>> To: dev@kafka.apache.org
> > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> cap
> > > > > member
> > > > > > > >>> metadata growth
> > > > > > > >>>
> > > > > > > >>> Thanks for the KIP.
> > > > > > > >>>
> > > > > > > >>> Will this cap be a global cap across the entire cluster or
> > per
> > > > > > broker?
> > > > > > > >>>
> > > > > > > >>> Either way the default value seems a bit high to me, but
> that
> > > > could
> > > > > > > just
> > > > > > > >>> be
> > > > > > > >>> from my own usage patterns. I'd have probably started with
> > 500
> > > or
> > > > > 1k
> > > > > > > but
> > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > >>>
> > > > > > > >>> Thanks,
> > > > > > > >>> Matt
> > > > > > > >>>
> > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > bchen11@outlook.com
> > > > >
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> > Hey folks,
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C7fef9c5ec8e3433e36e008d6591bd927%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636794373467511777&amp;sdata=kof%2BXova7ZZG%2BeBiRus%2Fgb%2B6%2Fk2EEWOtzRcCmzf4FCc%3D&amp;reserved=0
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> > This is a pretty simple change to cap the consumer group
> > size
> > > > for
> > > > > > > >>> broker
> > > > > > > >>> > stability. Give me your valuable feedback when you got
> > time.
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> > Thank you!
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Best,
> > > > > > > >> Stanislav
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>
>
> --
> Best,
> Stanislav
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Hey Stanislav,

I read the latest KIP and saw that we already changed the default value to -1. Do
we still need to take care of the consumer group shrinking when doing the upgrade?

However this is an interesting topic that worth discussing. Although rolling
upgrade is fine, `consumer.group.max.size` could always have conflict with the current
consumer group size which means we need to adhere to one source of truth.

1.Choose the current group size, which means we never interrupt the consumer group until
it transits to PREPARE_REBALANCE. And we keep track of how many join group requests
we have seen so far during PREPARE_REBALANCE. After reaching the consumer cap,
we start to inform over provisioned consumers that you should send LeaveGroupRequest and
fail yourself. Or with what Mayuresh proposed in KIP-345, we could mark extra members
as hot backup and rebalance without them.

2.Choose the `consumer.group.max.size`. I feel incremental rebalancing (you proposed) could be of help here.
When a new cap is enforced, leader should be notified. If the current group size is already over limit, leader
shall trigger a trivial rebalance to shuffle some topic partitions and let a subset of consumers prepare the ownership
transition. Until they are ready, we trigger a real rebalance to remove over-provisioned consumers. It is pretty much
equivalent to `how do we scale down the consumer group without interrupting the current processing`.

I personally feel inclined to 2 because we could kill two birds with one stone in a generic way. What do you think?

Boyang
________________________________
From: Stanislav Kozlovski <st...@confluent.io>
Sent: Monday, December 3, 2018 8:35 PM
To: dev@kafka.apache.org
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Hi Jason,

> 2. Do you think we should make this a dynamic config?
I'm not sure. Looking at the config from the perspective of a prescriptive
config, we may get away with not updating it dynamically.
But in my opinion, it always makes sense to have a config be dynamically
configurable. As long as we limit it to being a cluster-wide config, we
should be fine.

> 1. I think it would be helpful to clarify the details on how the
coordinator will shrink the group. It will need to choose which members to
remove. Are we going to give current members an opportunity to commit
offsets before kicking them from the group?

This turns out to be somewhat tricky. I think that we may not be able to
guarantee that consumers don't process a message twice.
My initial approach was to do as much as we could to let consumers commit
offsets.

I was thinking that we mark a group to be shrunk, we could keep a map of
consumer_id->boolean indicating whether they have committed offsets. I then
thought we could delay the rebalance until every consumer commits (or some
time passes).
In the meantime, we would block all incoming fetch calls (by either
returning empty records or a retriable error) and we would continue to
accept offset commits (even twice for a single consumer)

I see two problems with this approach:
* We have async offset commits, which implies that we can receive fetch
requests before the offset commit req has been handled. i.e consmer sends
fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
broker. Meaning we could have saved the offsets for B but rebalance before
the offsetCommit for the offsets processed in C come in.
* KIP-392 Allow consumers to fetch from closest replica
<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7C7fef9c5ec8e3433e36e008d6591bd927%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636794373467511777&amp;sdata=D%2BW86hEewx5YS5%2FvbKekdR8Atl6ZmZtwjuU7940pKU0%3D&amp;reserved=0>
would
make it significantly harder to block poll() calls on consumers whose
groups are being shrunk. Even if we implemented a solution, the same race
condition noted above seems to apply and probably others


Given those constraints, I think that we can simply mark the group as
`PreparingRebalance` with a rebalanceTimeout of the server setting `
group.max.session.timeout.ms`. That's a bit long by default (5 minutes) but
I can't seem to come up with a better alternative

I'm interested in hearing your thoughts.

Thanks,
Stanislav

On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hey Stanislav,
>
> What do you think about the use case I mentioned in my previous reply about
> > a more resilient self-service Kafka? I believe the benefit there is
> bigger.
>
>
> I see this config as analogous to the open file limit. Probably this limit
> was intended to be prescriptive at some point about what was deemed a
> reasonable number of open files for an application. But mostly people treat
> it as an annoyance which they have to work around. If it happens to be hit,
> usually you just increase it because it is not tied to an actual resource
> constraint. However, occasionally hitting the limit does indicate an
> application bug such as a leak, so I wouldn't say it is useless. Similarly,
> the issue in KAFKA-7610 was a consumer leak and having this limit would
> have allowed the problem to be detected before it impacted the cluster. To
> me, that's the main benefit. It's possible that it could be used
> prescriptively to prevent poor usage of groups, but like the open file
> limit, I suspect administrators will just set it large enough that users
> are unlikely to complain.
>
> Anyway, just a couple additional questions:
>
> 1. I think it would be helpful to clarify the details on how the
> coordinator will shrink the group. It will need to choose which members to
> remove. Are we going to give current members an opportunity to commit
> offsets before kicking them from the group?
>
> 2. Do you think we should make this a dynamic config?
>
> Thanks,
> Jason
>
>
>
>
> On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> stanislav@confluent.io>
> wrote:
>
> > Hi Jason,
> >
> > You raise some very valid points.
> >
> > > The benefit of this KIP is probably limited to preventing "runaway"
> > consumer groups due to leaks or some other application bug
> > What do you think about the use case I mentioned in my previous reply
> about
> > a more resilient self-service Kafka? I believe the benefit there is
> bigger
> >
> > * Default value
> > You're right, we probably do need to be conservative. Big consumer groups
> > are considered an anti-pattern and my goal was to also hint at this
> through
> > the config's default. Regardless, it is better to not have the potential
> to
> > break applications with an upgrade.
> > Choosing between the default of something big like 5000 or an opt-in
> > option, I think we should go with the *disabled default option*  (-1).
> > The only benefit we would get from a big default of 5000 is default
> > protection against buggy/malicious applications that hit the KAFKA-7610
> > issue.
> > While this KIP was spawned from that issue, I believe its value is
> enabling
> > the possibility of protection and helping move towards a more
> self-service
> > Kafka. I also think that a default value of 5000 might be misleading to
> > users and lead them to think that big consumer groups (> 250) are a good
> > thing.
> >
> > The good news is that KAFKA-7610 should be fully resolved and the
> rebalance
> > protocol should, in general, be more solid after the planned improvements
> > in KIP-345 and KIP-394.
> >
> > * Handling bigger groups during upgrade
> > I now see that we store the state of consumer groups in the log and why a
> > rebalance isn't expected during a rolling upgrade.
> > Since we're going with the default value of the max.size being disabled,
> I
> > believe we can afford to be more strict here.
> > During state reloading of a new Coordinator with a defined max.group.size
> > config, I believe we should *force* rebalances for groups that exceed the
> > configured size. Then, only some consumers will be able to join and the
> max
> > size invariant will be satisfied.
> >
> > I updated the KIP with a migration plan, rejected alternatives and the
> new
> > default value.
> >
> > Thanks,
> > Stanislav
> >
> > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Stanislav,
> > >
> > > Clients will then find that coordinator
> > > > and send `joinGroup` on it, effectively rebuilding the group, since
> the
> > > > cache of active consumers is not stored outside the Coordinator's
> > memory.
> > > > (please do say if that is incorrect)
> > >
> > >
> > > Groups do not typically rebalance after a coordinator change. You could
> > > potentially force a rebalance if the group is too big and kick out the
> > > slowest members or something. A more graceful solution is probably to
> > just
> > > accept the current size and prevent it from getting bigger. We could
> log
> > a
> > > warning potentially.
> > >
> > > My thinking is that we should abstract away from conserving resources
> and
> > > > focus on giving control to the broker. The issue that spawned this
> KIP
> > > was
> > > > a memory problem but I feel this change is useful in a more general
> > way.
> > >
> > >
> > > So you probably already know why I'm asking about this. For consumer
> > groups
> > > anyway, resource usage would typically be proportional to the number of
> > > partitions that a group is reading from and not the number of members.
> > For
> > > example, consider the memory use in the offsets cache. The benefit of
> > this
> > > KIP is probably limited to preventing "runaway" consumer groups due to
> > > leaks or some other application bug. That still seems useful though.
> > >
> > > I completely agree with this and I *ask everybody to chime in with
> > opinions
> > > > on a sensible default value*.
> > >
> > >
> > > I think we would have to be very conservative. The group protocol is
> > > generic in some sense, so there may be use cases we don't know of where
> > > larger groups are reasonable. Probably we should make this an opt-in
> > > feature so that we do not risk breaking anyone's application after an
> > > upgrade. Either that, or use a very high default like 5,000.
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Hey Jason and Boyang, those were important comments
> > > >
> > > > > One suggestion I have is that it would be helpful to put your
> > reasoning
> > > > on deciding the current default value. For example, in certain use
> > cases
> > > at
> > > > Pinterest we are very likely to have more consumers than 250 when we
> > > > configure 8 stream instances with 32 threads.
> > > > > For the effectiveness of this KIP, we should encourage people to
> > > discuss
> > > > their opinions on the default setting and ideally reach a consensus.
> > > >
> > > > I completely agree with this and I *ask everybody to chime in with
> > > opinions
> > > > on a sensible default value*.
> > > > My thought process was that in the current model rebalances in large
> > > groups
> > > > are more costly. I imagine most use cases in most Kafka users do not
> > > > require more than 250 consumers.
> > > > Boyang, you say that you are "likely to have... when we..." - do you
> > have
> > > > systems running with so many consumers in a group or are you planning
> > > to? I
> > > > guess what I'm asking is whether this has been tested in production
> > with
> > > > the current rebalance model (ignoring KIP-345)
> > > >
> > > > >  Can you clarify the compatibility impact here? What
> > > > > will happen to groups that are already larger than the max size?
> > > > This is a very important question.
> > > > From my current understanding, when a coordinator broker gets shut
> > > > down during a cluster rolling upgrade, a replica will take leadership
> > of
> > > > the `__offset_commits` partition. Clients will then find that
> > coordinator
> > > > and send `joinGroup` on it, effectively rebuilding the group, since
> the
> > > > cache of active consumers is not stored outside the Coordinator's
> > memory.
> > > > (please do say if that is incorrect)
> > > > Then, I believe that working as if this is a new group is a
> reasonable
> > > > approach. Namely, fail joinGroups when the max.size is exceeded.
> > > > What do you guys think about this? (I'll update the KIP after we
> settle
> > > on
> > > > a solution)
> > > >
> > > > >  Also, just to be clear, the resource we are trying to conserve
> here
> > is
> > > > what? Memory?
> > > > My thinking is that we should abstract away from conserving resources
> > and
> > > > focus on giving control to the broker. The issue that spawned this
> KIP
> > > was
> > > > a memory problem but I feel this change is useful in a more general
> > way.
> > > It
> > > > limits the control clients have on the cluster and helps Kafka
> become a
> > > > more self-serving system. Admin/Ops teams can better control the
> impact
> > > > application developers can have on a Kafka cluster with this change
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > >
> > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi Stanislav,
> > > > >
> > > > > Thanks for the KIP. Can you clarify the compatibility impact here?
> > What
> > > > > will happen to groups that are already larger than the max size?
> > Also,
> > > > just
> > > > > to be clear, the resource we are trying to conserve here is what?
> > > Memory?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bc...@outlook.com>
> > > wrote:
> > > > >
> > > > > > Thanks Stanislav for the update! One suggestion I have is that it
> > > would
> > > > > be
> > > > > > helpful to put your
> > > > > >
> > > > > > reasoning on deciding the current default value. For example, in
> > > > certain
> > > > > > use cases at Pinterest we are very likely
> > > > > >
> > > > > > to have more consumers than 250 when we configure 8 stream
> > instances
> > > > with
> > > > > > 32 threads.
> > > > > >
> > > > > >
> > > > > > For the effectiveness of this KIP, we should encourage people to
> > > > discuss
> > > > > > their opinions on the default setting and ideally reach a
> > consensus.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > ________________________________
> > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > To: dev@kafka.apache.org
> > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > member
> > > > > > metadata growth
> > > > > >
> > > > > > Hey everybody,
> > > > > >
> > > > > > It's been a week since this KIP and not much discussion has been
> > > made.
> > > > > > I assume that this is a straight forward change and I will open a
> > > > voting
> > > > > > thread in the next couple of days if nobody has anything to
> > suggest.
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > stanislav@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Greetings everybody,
> > > > > > >
> > > > > > > I have enriched the KIP a bit with a bigger Motivation section
> > and
> > > > also
> > > > > > > renamed it.
> > > > > > > KIP:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C7fef9c5ec8e3433e36e008d6591bd927%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636794373467511777&amp;sdata=xgkUmIn8ZjjAOa6sLyNCtt3KEB8weL1NRBEtGPJWASU%3D&amp;reserved=0
> > > > > > >
> > > > > > > I'm looking forward to discussions around it.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > stanislav@confluent.io> wrote:
> > > > > > >
> > > > > > >> Hey there everybody,
> > > > > > >>
> > > > > > >> Thanks for the introduction Boyang. I appreciate the effort
> you
> > > are
> > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > >>
> > > > > > >> @Matt
> > > > > > >> I also believe the default value is high. In my opinion, we
> > should
> > > > aim
> > > > > > to
> > > > > > >> a default cap around 250. This is because in the current model
> > any
> > > > > > consumer
> > > > > > >> rebalance is disrupting to every consumer. The bigger the
> group,
> > > the
> > > > > > longer
> > > > > > >> this period of disruption.
> > > > > > >>
> > > > > > >> If you have such a large consumer group, chances are that your
> > > > > > >> client-side logic could be structured better and that you are
> > not
> > > > > using
> > > > > > the
> > > > > > >> high number of consumers to achieve high throughput.
> > > > > > >> 250 can still be considered of a high upper bound, I believe
> in
> > > > > practice
> > > > > > >> users should aim to not go over 100 consumers per consumer
> > group.
> > > > > > >>
> > > > > > >> In regards to the cap being global/per-broker, I think that we
> > > > should
> > > > > > >> consider whether we want it to be global or *per-topic*. For
> the
> > > > time
> > > > > > >> being, I believe that having it per-topic with a global
> default
> > > > might
> > > > > be
> > > > > > >> the best situation. Having it global only seems a bit
> > restricting
> > > to
> > > > > me
> > > > > > and
> > > > > > >> it never hurts to support more fine-grained configurability
> > (given
> > > > > it's
> > > > > > the
> > > > > > >> same config, not a new one being introduced).
> > > > > > >>
> > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > bchen11@outlook.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > suggestion
> > > to
> > > > > > >>> change the default value. Meanwhile I just want to point out
> > that
> > > > > this
> > > > > > >>> value is a just last line of defense, not a real scenario we
> > > would
> > > > > > expect.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> In the meanwhile, I discussed with Stanislav and he would be
> > > > driving
> > > > > > the
> > > > > > >>> 389 effort from now on. Stanislav proposed the idea in the
> > first
> > > > > place
> > > > > > and
> > > > > > >>> had already come up a draft design, while I will keep
> focusing
> > on
> > > > > > KIP-345
> > > > > > >>> effort to ensure solving the edge case described in the JIRA<
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C7fef9c5ec8e3433e36e008d6591bd927%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636794373467511777&amp;sdata=mHSk4vg3ubD63qUmuoxHgIUvgW%2FHF%2FJZJI6lELlg%2BCY%3D&amp;reserved=0
> > > > > > >.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Boyang
> > > > > > >>>
> > > > > > >>> ________________________________
> > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > >>> To: dev@kafka.apache.org
> > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > > member
> > > > > > >>> metadata growth
> > > > > > >>>
> > > > > > >>> Thanks for the KIP.
> > > > > > >>>
> > > > > > >>> Will this cap be a global cap across the entire cluster or
> per
> > > > > broker?
> > > > > > >>>
> > > > > > >>> Either way the default value seems a bit high to me, but that
> > > could
> > > > > > just
> > > > > > >>> be
> > > > > > >>> from my own usage patterns. I'd have probably started with
> 500
> > or
> > > > 1k
> > > > > > but
> > > > > > >>> could be easily convinced that's wrong.
> > > > > > >>>
> > > > > > >>> Thanks,
> > > > > > >>> Matt
> > > > > > >>>
> > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > bchen11@outlook.com
> > > >
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>> > Hey folks,
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C7fef9c5ec8e3433e36e008d6591bd927%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636794373467511777&amp;sdata=kof%2BXova7ZZG%2BeBiRus%2Fgb%2B6%2Fk2EEWOtzRcCmzf4FCc%3D&amp;reserved=0
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > This is a pretty simple change to cap the consumer group
> size
> > > for
> > > > > > >>> broker
> > > > > > >>> > stability. Give me your valuable feedback when you got
> time.
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > Thank you!
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Best,
> > > > > > >> Stanislav
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


--
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Patrick Williams <pa...@storageos.com>.
Please take me off this Discuss list

Best,
 
Patrick Williams
 
Sales Manager, UK & Ireland, Nordics & Israel
StorageOS
+44 (0)7549 676279
patrick.williams@storageos.com
 
20 Midtown
20 Proctor Street
Holborn
London WC1V 6NX
 
Twitter: @patch37
LinkedIn: linkedin.com/in/patrickwilliams4 <http://linkedin.com/in/patrickwilliams4>
 
https://slack.storageos.com/
 
 

On 03/12/2018, 12:35, "Stanislav Kozlovski" <st...@confluent.io> wrote:

    Hi Jason,
    
    > 2. Do you think we should make this a dynamic config?
    I'm not sure. Looking at the config from the perspective of a prescriptive
    config, we may get away with not updating it dynamically.
    But in my opinion, it always makes sense to have a config be dynamically
    configurable. As long as we limit it to being a cluster-wide config, we
    should be fine.
    
    > 1. I think it would be helpful to clarify the details on how the
    coordinator will shrink the group. It will need to choose which members to
    remove. Are we going to give current members an opportunity to commit
    offsets before kicking them from the group?
    
    This turns out to be somewhat tricky. I think that we may not be able to
    guarantee that consumers don't process a message twice.
    My initial approach was to do as much as we could to let consumers commit
    offsets.
    
    I was thinking that we mark a group to be shrunk, we could keep a map of
    consumer_id->boolean indicating whether they have committed offsets. I then
    thought we could delay the rebalance until every consumer commits (or some
    time passes).
    In the meantime, we would block all incoming fetch calls (by either
    returning empty records or a retriable error) and we would continue to
    accept offset commits (even twice for a single consumer)
    
    I see two problems with this approach:
    * We have async offset commits, which implies that we can receive fetch
    requests before the offset commit req has been handled. i.e consmer sends
    fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
    broker. Meaning we could have saved the offsets for B but rebalance before
    the offsetCommit for the offsets processed in C come in.
    * KIP-392 Allow consumers to fetch from closest replica
    <https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica>
    would
    make it significantly harder to block poll() calls on consumers whose
    groups are being shrunk. Even if we implemented a solution, the same race
    condition noted above seems to apply and probably others
    
    
    Given those constraints, I think that we can simply mark the group as
    `PreparingRebalance` with a rebalanceTimeout of the server setting `
    group.max.session.timeout.ms`. That's a bit long by default (5 minutes) but
    I can't seem to come up with a better alternative
    
    I'm interested in hearing your thoughts.
    
    Thanks,
    Stanislav
    
    On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io> wrote:
    
    > Hey Stanislav,
    >
    > What do you think about the use case I mentioned in my previous reply about
    > > a more resilient self-service Kafka? I believe the benefit there is
    > bigger.
    >
    >
    > I see this config as analogous to the open file limit. Probably this limit
    > was intended to be prescriptive at some point about what was deemed a
    > reasonable number of open files for an application. But mostly people treat
    > it as an annoyance which they have to work around. If it happens to be hit,
    > usually you just increase it because it is not tied to an actual resource
    > constraint. However, occasionally hitting the limit does indicate an
    > application bug such as a leak, so I wouldn't say it is useless. Similarly,
    > the issue in KAFKA-7610 was a consumer leak and having this limit would
    > have allowed the problem to be detected before it impacted the cluster. To
    > me, that's the main benefit. It's possible that it could be used
    > prescriptively to prevent poor usage of groups, but like the open file
    > limit, I suspect administrators will just set it large enough that users
    > are unlikely to complain.
    >
    > Anyway, just a couple additional questions:
    >
    > 1. I think it would be helpful to clarify the details on how the
    > coordinator will shrink the group. It will need to choose which members to
    > remove. Are we going to give current members an opportunity to commit
    > offsets before kicking them from the group?
    >
    > 2. Do you think we should make this a dynamic config?
    >
    > Thanks,
    > Jason
    >
    >
    >
    >
    > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
    > stanislav@confluent.io>
    > wrote:
    >
    > > Hi Jason,
    > >
    > > You raise some very valid points.
    > >
    > > > The benefit of this KIP is probably limited to preventing "runaway"
    > > consumer groups due to leaks or some other application bug
    > > What do you think about the use case I mentioned in my previous reply
    > about
    > > a more resilient self-service Kafka? I believe the benefit there is
    > bigger
    > >
    > > * Default value
    > > You're right, we probably do need to be conservative. Big consumer groups
    > > are considered an anti-pattern and my goal was to also hint at this
    > through
    > > the config's default. Regardless, it is better to not have the potential
    > to
    > > break applications with an upgrade.
    > > Choosing between the default of something big like 5000 or an opt-in
    > > option, I think we should go with the *disabled default option*  (-1).
    > > The only benefit we would get from a big default of 5000 is default
    > > protection against buggy/malicious applications that hit the KAFKA-7610
    > > issue.
    > > While this KIP was spawned from that issue, I believe its value is
    > enabling
    > > the possibility of protection and helping move towards a more
    > self-service
    > > Kafka. I also think that a default value of 5000 might be misleading to
    > > users and lead them to think that big consumer groups (> 250) are a good
    > > thing.
    > >
    > > The good news is that KAFKA-7610 should be fully resolved and the
    > rebalance
    > > protocol should, in general, be more solid after the planned improvements
    > > in KIP-345 and KIP-394.
    > >
    > > * Handling bigger groups during upgrade
    > > I now see that we store the state of consumer groups in the log and why a
    > > rebalance isn't expected during a rolling upgrade.
    > > Since we're going with the default value of the max.size being disabled,
    > I
    > > believe we can afford to be more strict here.
    > > During state reloading of a new Coordinator with a defined max.group.size
    > > config, I believe we should *force* rebalances for groups that exceed the
    > > configured size. Then, only some consumers will be able to join and the
    > max
    > > size invariant will be satisfied.
    > >
    > > I updated the KIP with a migration plan, rejected alternatives and the
    > new
    > > default value.
    > >
    > > Thanks,
    > > Stanislav
    > >
    > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io>
    > > wrote:
    > >
    > > > Hey Stanislav,
    > > >
    > > > Clients will then find that coordinator
    > > > > and send `joinGroup` on it, effectively rebuilding the group, since
    > the
    > > > > cache of active consumers is not stored outside the Coordinator's
    > > memory.
    > > > > (please do say if that is incorrect)
    > > >
    > > >
    > > > Groups do not typically rebalance after a coordinator change. You could
    > > > potentially force a rebalance if the group is too big and kick out the
    > > > slowest members or something. A more graceful solution is probably to
    > > just
    > > > accept the current size and prevent it from getting bigger. We could
    > log
    > > a
    > > > warning potentially.
    > > >
    > > > My thinking is that we should abstract away from conserving resources
    > and
    > > > > focus on giving control to the broker. The issue that spawned this
    > KIP
    > > > was
    > > > > a memory problem but I feel this change is useful in a more general
    > > way.
    > > >
    > > >
    > > > So you probably already know why I'm asking about this. For consumer
    > > groups
    > > > anyway, resource usage would typically be proportional to the number of
    > > > partitions that a group is reading from and not the number of members.
    > > For
    > > > example, consider the memory use in the offsets cache. The benefit of
    > > this
    > > > KIP is probably limited to preventing "runaway" consumer groups due to
    > > > leaks or some other application bug. That still seems useful though.
    > > >
    > > > I completely agree with this and I *ask everybody to chime in with
    > > opinions
    > > > > on a sensible default value*.
    > > >
    > > >
    > > > I think we would have to be very conservative. The group protocol is
    > > > generic in some sense, so there may be use cases we don't know of where
    > > > larger groups are reasonable. Probably we should make this an opt-in
    > > > feature so that we do not risk breaking anyone's application after an
    > > > upgrade. Either that, or use a very high default like 5,000.
    > > >
    > > > Thanks,
    > > > Jason
    > > >
    > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
    > > > stanislav@confluent.io>
    > > > wrote:
    > > >
    > > > > Hey Jason and Boyang, those were important comments
    > > > >
    > > > > > One suggestion I have is that it would be helpful to put your
    > > reasoning
    > > > > on deciding the current default value. For example, in certain use
    > > cases
    > > > at
    > > > > Pinterest we are very likely to have more consumers than 250 when we
    > > > > configure 8 stream instances with 32 threads.
    > > > > > For the effectiveness of this KIP, we should encourage people to
    > > > discuss
    > > > > their opinions on the default setting and ideally reach a consensus.
    > > > >
    > > > > I completely agree with this and I *ask everybody to chime in with
    > > > opinions
    > > > > on a sensible default value*.
    > > > > My thought process was that in the current model rebalances in large
    > > > groups
    > > > > are more costly. I imagine most use cases in most Kafka users do not
    > > > > require more than 250 consumers.
    > > > > Boyang, you say that you are "likely to have... when we..." - do you
    > > have
    > > > > systems running with so many consumers in a group or are you planning
    > > > to? I
    > > > > guess what I'm asking is whether this has been tested in production
    > > with
    > > > > the current rebalance model (ignoring KIP-345)
    > > > >
    > > > > >  Can you clarify the compatibility impact here? What
    > > > > > will happen to groups that are already larger than the max size?
    > > > > This is a very important question.
    > > > > From my current understanding, when a coordinator broker gets shut
    > > > > down during a cluster rolling upgrade, a replica will take leadership
    > > of
    > > > > the `__offset_commits` partition. Clients will then find that
    > > coordinator
    > > > > and send `joinGroup` on it, effectively rebuilding the group, since
    > the
    > > > > cache of active consumers is not stored outside the Coordinator's
    > > memory.
    > > > > (please do say if that is incorrect)
    > > > > Then, I believe that working as if this is a new group is a
    > reasonable
    > > > > approach. Namely, fail joinGroups when the max.size is exceeded.
    > > > > What do you guys think about this? (I'll update the KIP after we
    > settle
    > > > on
    > > > > a solution)
    > > > >
    > > > > >  Also, just to be clear, the resource we are trying to conserve
    > here
    > > is
    > > > > what? Memory?
    > > > > My thinking is that we should abstract away from conserving resources
    > > and
    > > > > focus on giving control to the broker. The issue that spawned this
    > KIP
    > > > was
    > > > > a memory problem but I feel this change is useful in a more general
    > > way.
    > > > It
    > > > > limits the control clients have on the cluster and helps Kafka
    > become a
    > > > > more self-serving system. Admin/Ops teams can better control the
    > impact
    > > > > application developers can have on a Kafka cluster with this change
    > > > >
    > > > > Best,
    > > > > Stanislav
    > > > >
    > > > >
    > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <ja...@confluent.io>
    > > > > wrote:
    > > > >
    > > > > > Hi Stanislav,
    > > > > >
    > > > > > Thanks for the KIP. Can you clarify the compatibility impact here?
    > > What
    > > > > > will happen to groups that are already larger than the max size?
    > > Also,
    > > > > just
    > > > > > to be clear, the resource we are trying to conserve here is what?
    > > > Memory?
    > > > > >
    > > > > > -Jason
    > > > > >
    > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bc...@outlook.com>
    > > > wrote:
    > > > > >
    > > > > > > Thanks Stanislav for the update! One suggestion I have is that it
    > > > would
    > > > > > be
    > > > > > > helpful to put your
    > > > > > >
    > > > > > > reasoning on deciding the current default value. For example, in
    > > > > certain
    > > > > > > use cases at Pinterest we are very likely
    > > > > > >
    > > > > > > to have more consumers than 250 when we configure 8 stream
    > > instances
    > > > > with
    > > > > > > 32 threads.
    > > > > > >
    > > > > > >
    > > > > > > For the effectiveness of this KIP, we should encourage people to
    > > > > discuss
    > > > > > > their opinions on the default setting and ideally reach a
    > > consensus.
    > > > > > >
    > > > > > >
    > > > > > > Best,
    > > > > > >
    > > > > > > Boyang
    > > > > > >
    > > > > > > ________________________________
    > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
    > > > > > > Sent: Monday, November 26, 2018 6:14 PM
    > > > > > > To: dev@kafka.apache.org
    > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
    > > member
    > > > > > > metadata growth
    > > > > > >
    > > > > > > Hey everybody,
    > > > > > >
    > > > > > > It's been a week since this KIP and not much discussion has been
    > > > made.
    > > > > > > I assume that this is a straight forward change and I will open a
    > > > > voting
    > > > > > > thread in the next couple of days if nobody has anything to
    > > suggest.
    > > > > > >
    > > > > > > Best,
    > > > > > > Stanislav
    > > > > > >
    > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
    > > > > > > stanislav@confluent.io>
    > > > > > > wrote:
    > > > > > >
    > > > > > > > Greetings everybody,
    > > > > > > >
    > > > > > > > I have enriched the KIP a bit with a bigger Motivation section
    > > and
    > > > > also
    > > > > > > > renamed it.
    > > > > > > > KIP:
    > > > > > > >
    > > > > > >
    > > > > >
    > > > >
    > > >
    > >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
    > > > > > > >
    > > > > > > > I'm looking forward to discussions around it.
    > > > > > > >
    > > > > > > > Best,
    > > > > > > > Stanislav
    > > > > > > >
    > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
    > > > > > > > stanislav@confluent.io> wrote:
    > > > > > > >
    > > > > > > >> Hey there everybody,
    > > > > > > >>
    > > > > > > >> Thanks for the introduction Boyang. I appreciate the effort
    > you
    > > > are
    > > > > > > >> putting into improving consumer behavior in Kafka.
    > > > > > > >>
    > > > > > > >> @Matt
    > > > > > > >> I also believe the default value is high. In my opinion, we
    > > should
    > > > > aim
    > > > > > > to
    > > > > > > >> a default cap around 250. This is because in the current model
    > > any
    > > > > > > consumer
    > > > > > > >> rebalance is disrupting to every consumer. The bigger the
    > group,
    > > > the
    > > > > > > longer
    > > > > > > >> this period of disruption.
    > > > > > > >>
    > > > > > > >> If you have such a large consumer group, chances are that your
    > > > > > > >> client-side logic could be structured better and that you are
    > > not
    > > > > > using
    > > > > > > the
    > > > > > > >> high number of consumers to achieve high throughput.
    > > > > > > >> 250 can still be considered of a high upper bound, I believe
    > in
    > > > > > practice
    > > > > > > >> users should aim to not go over 100 consumers per consumer
    > > group.
    > > > > > > >>
    > > > > > > >> In regards to the cap being global/per-broker, I think that we
    > > > > should
    > > > > > > >> consider whether we want it to be global or *per-topic*. For
    > the
    > > > > time
    > > > > > > >> being, I believe that having it per-topic with a global
    > default
    > > > > might
    > > > > > be
    > > > > > > >> the best situation. Having it global only seems a bit
    > > restricting
    > > > to
    > > > > > me
    > > > > > > and
    > > > > > > >> it never hurts to support more fine-grained configurability
    > > (given
    > > > > > it's
    > > > > > > the
    > > > > > > >> same config, not a new one being introduced).
    > > > > > > >>
    > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
    > > bchen11@outlook.com
    > > > >
    > > > > > > wrote:
    > > > > > > >>
    > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
    > > suggestion
    > > > to
    > > > > > > >>> change the default value. Meanwhile I just want to point out
    > > that
    > > > > > this
    > > > > > > >>> value is a just last line of defense, not a real scenario we
    > > > would
    > > > > > > expect.
    > > > > > > >>>
    > > > > > > >>>
    > > > > > > >>> In the meanwhile, I discussed with Stanislav and he would be
    > > > > driving
    > > > > > > the
    > > > > > > >>> 389 effort from now on. Stanislav proposed the idea in the
    > > first
    > > > > > place
    > > > > > > and
    > > > > > > >>> had already come up a draft design, while I will keep
    > focusing
    > > on
    > > > > > > KIP-345
    > > > > > > >>> effort to ensure solving the edge case described in the JIRA<
    > > > > > > >>>
    > > > > > >
    > > > > >
    > > > >
    > > >
    > >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0
    > > > > > > >.
    > > > > > > >>>
    > > > > > > >>>
    > > > > > > >>> Thank you Stanislav for making this happen!
    > > > > > > >>>
    > > > > > > >>>
    > > > > > > >>> Boyang
    > > > > > > >>>
    > > > > > > >>> ________________________________
    > > > > > > >>> From: Matt Farmer <ma...@frmr.me>
    > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
    > > > > > > >>> To: dev@kafka.apache.org
    > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
    > > > > member
    > > > > > > >>> metadata growth
    > > > > > > >>>
    > > > > > > >>> Thanks for the KIP.
    > > > > > > >>>
    > > > > > > >>> Will this cap be a global cap across the entire cluster or
    > per
    > > > > > broker?
    > > > > > > >>>
    > > > > > > >>> Either way the default value seems a bit high to me, but that
    > > > could
    > > > > > > just
    > > > > > > >>> be
    > > > > > > >>> from my own usage patterns. I’d have probably started with
    > 500
    > > or
    > > > > 1k
    > > > > > > but
    > > > > > > >>> could be easily convinced that’s wrong.
    > > > > > > >>>
    > > > > > > >>> Thanks,
    > > > > > > >>> Matt
    > > > > > > >>>
    > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
    > > bchen11@outlook.com
    > > > >
    > > > > > > wrote:
    > > > > > > >>>
    > > > > > > >>> > Hey folks,
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>> > I would like to start a discussion on KIP-389:
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>>
    > > > > > >
    > > > > >
    > > > >
    > > >
    > >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>> > This is a pretty simple change to cap the consumer group
    > size
    > > > for
    > > > > > > >>> broker
    > > > > > > >>> > stability. Give me your valuable feedback when you got
    > time.
    > > > > > > >>> >
    > > > > > > >>> >
    > > > > > > >>> > Thank you!
    > > > > > > >>> >
    > > > > > > >>>
    > > > > > > >>
    > > > > > > >>
    > > > > > > >> --
    > > > > > > >> Best,
    > > > > > > >> Stanislav
    > > > > > > >>
    > > > > > > >
    > > > > > > >
    > > > > > > > --
    > > > > > > > Best,
    > > > > > > > Stanislav
    > > > > > > >
    > > > > > >
    > > > > > >
    > > > > > > --
    > > > > > > Best,
    > > > > > > Stanislav
    > > > > > >
    > > > > >
    > > > >
    > > > >
    > > > > --
    > > > > Best,
    > > > > Stanislav
    > > > >
    > > >
    > >
    > >
    > > --
    > > Best,
    > > Stanislav
    > >
    >
    
    
    -- 
    Best,
    Stanislav
    


Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hi Jason,

> 2. Do you think we should make this a dynamic config?
I'm not sure. Looking at the config from the perspective of a prescriptive
config, we may get away with not updating it dynamically.
But in my opinion, it always makes sense to have a config be dynamically
configurable. As long as we limit it to being a cluster-wide config, we
should be fine.

> 1. I think it would be helpful to clarify the details on how the
coordinator will shrink the group. It will need to choose which members to
remove. Are we going to give current members an opportunity to commit
offsets before kicking them from the group?

This turns out to be somewhat tricky. I think that we may not be able to
guarantee that consumers don't process a message twice.
My initial approach was to do as much as we could to let consumers commit
offsets.

I was thinking that we mark a group to be shrunk, we could keep a map of
consumer_id->boolean indicating whether they have committed offsets. I then
thought we could delay the rebalance until every consumer commits (or some
time passes).
In the meantime, we would block all incoming fetch calls (by either
returning empty records or a retriable error) and we would continue to
accept offset commits (even twice for a single consumer)

I see two problems with this approach:
* We have async offset commits, which implies that we can receive fetch
requests before the offset commit req has been handled. i.e consmer sends
fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
broker. Meaning we could have saved the offsets for B but rebalance before
the offsetCommit for the offsets processed in C come in.
* KIP-392 Allow consumers to fetch from closest replica
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica>
would
make it significantly harder to block poll() calls on consumers whose
groups are being shrunk. Even if we implemented a solution, the same race
condition noted above seems to apply and probably others


Given those constraints, I think that we can simply mark the group as
`PreparingRebalance` with a rebalanceTimeout of the server setting `
group.max.session.timeout.ms`. That's a bit long by default (5 minutes) but
I can't seem to come up with a better alternative

I'm interested in hearing your thoughts.

Thanks,
Stanislav

On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hey Stanislav,
>
> What do you think about the use case I mentioned in my previous reply about
> > a more resilient self-service Kafka? I believe the benefit there is
> bigger.
>
>
> I see this config as analogous to the open file limit. Probably this limit
> was intended to be prescriptive at some point about what was deemed a
> reasonable number of open files for an application. But mostly people treat
> it as an annoyance which they have to work around. If it happens to be hit,
> usually you just increase it because it is not tied to an actual resource
> constraint. However, occasionally hitting the limit does indicate an
> application bug such as a leak, so I wouldn't say it is useless. Similarly,
> the issue in KAFKA-7610 was a consumer leak and having this limit would
> have allowed the problem to be detected before it impacted the cluster. To
> me, that's the main benefit. It's possible that it could be used
> prescriptively to prevent poor usage of groups, but like the open file
> limit, I suspect administrators will just set it large enough that users
> are unlikely to complain.
>
> Anyway, just a couple additional questions:
>
> 1. I think it would be helpful to clarify the details on how the
> coordinator will shrink the group. It will need to choose which members to
> remove. Are we going to give current members an opportunity to commit
> offsets before kicking them from the group?
>
> 2. Do you think we should make this a dynamic config?
>
> Thanks,
> Jason
>
>
>
>
> On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> stanislav@confluent.io>
> wrote:
>
> > Hi Jason,
> >
> > You raise some very valid points.
> >
> > > The benefit of this KIP is probably limited to preventing "runaway"
> > consumer groups due to leaks or some other application bug
> > What do you think about the use case I mentioned in my previous reply
> about
> > a more resilient self-service Kafka? I believe the benefit there is
> bigger
> >
> > * Default value
> > You're right, we probably do need to be conservative. Big consumer groups
> > are considered an anti-pattern and my goal was to also hint at this
> through
> > the config's default. Regardless, it is better to not have the potential
> to
> > break applications with an upgrade.
> > Choosing between the default of something big like 5000 or an opt-in
> > option, I think we should go with the *disabled default option*  (-1).
> > The only benefit we would get from a big default of 5000 is default
> > protection against buggy/malicious applications that hit the KAFKA-7610
> > issue.
> > While this KIP was spawned from that issue, I believe its value is
> enabling
> > the possibility of protection and helping move towards a more
> self-service
> > Kafka. I also think that a default value of 5000 might be misleading to
> > users and lead them to think that big consumer groups (> 250) are a good
> > thing.
> >
> > The good news is that KAFKA-7610 should be fully resolved and the
> rebalance
> > protocol should, in general, be more solid after the planned improvements
> > in KIP-345 and KIP-394.
> >
> > * Handling bigger groups during upgrade
> > I now see that we store the state of consumer groups in the log and why a
> > rebalance isn't expected during a rolling upgrade.
> > Since we're going with the default value of the max.size being disabled,
> I
> > believe we can afford to be more strict here.
> > During state reloading of a new Coordinator with a defined max.group.size
> > config, I believe we should *force* rebalances for groups that exceed the
> > configured size. Then, only some consumers will be able to join and the
> max
> > size invariant will be satisfied.
> >
> > I updated the KIP with a migration plan, rejected alternatives and the
> new
> > default value.
> >
> > Thanks,
> > Stanislav
> >
> > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Stanislav,
> > >
> > > Clients will then find that coordinator
> > > > and send `joinGroup` on it, effectively rebuilding the group, since
> the
> > > > cache of active consumers is not stored outside the Coordinator's
> > memory.
> > > > (please do say if that is incorrect)
> > >
> > >
> > > Groups do not typically rebalance after a coordinator change. You could
> > > potentially force a rebalance if the group is too big and kick out the
> > > slowest members or something. A more graceful solution is probably to
> > just
> > > accept the current size and prevent it from getting bigger. We could
> log
> > a
> > > warning potentially.
> > >
> > > My thinking is that we should abstract away from conserving resources
> and
> > > > focus on giving control to the broker. The issue that spawned this
> KIP
> > > was
> > > > a memory problem but I feel this change is useful in a more general
> > way.
> > >
> > >
> > > So you probably already know why I'm asking about this. For consumer
> > groups
> > > anyway, resource usage would typically be proportional to the number of
> > > partitions that a group is reading from and not the number of members.
> > For
> > > example, consider the memory use in the offsets cache. The benefit of
> > this
> > > KIP is probably limited to preventing "runaway" consumer groups due to
> > > leaks or some other application bug. That still seems useful though.
> > >
> > > I completely agree with this and I *ask everybody to chime in with
> > opinions
> > > > on a sensible default value*.
> > >
> > >
> > > I think we would have to be very conservative. The group protocol is
> > > generic in some sense, so there may be use cases we don't know of where
> > > larger groups are reasonable. Probably we should make this an opt-in
> > > feature so that we do not risk breaking anyone's application after an
> > > upgrade. Either that, or use a very high default like 5,000.
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Hey Jason and Boyang, those were important comments
> > > >
> > > > > One suggestion I have is that it would be helpful to put your
> > reasoning
> > > > on deciding the current default value. For example, in certain use
> > cases
> > > at
> > > > Pinterest we are very likely to have more consumers than 250 when we
> > > > configure 8 stream instances with 32 threads.
> > > > > For the effectiveness of this KIP, we should encourage people to
> > > discuss
> > > > their opinions on the default setting and ideally reach a consensus.
> > > >
> > > > I completely agree with this and I *ask everybody to chime in with
> > > opinions
> > > > on a sensible default value*.
> > > > My thought process was that in the current model rebalances in large
> > > groups
> > > > are more costly. I imagine most use cases in most Kafka users do not
> > > > require more than 250 consumers.
> > > > Boyang, you say that you are "likely to have... when we..." - do you
> > have
> > > > systems running with so many consumers in a group or are you planning
> > > to? I
> > > > guess what I'm asking is whether this has been tested in production
> > with
> > > > the current rebalance model (ignoring KIP-345)
> > > >
> > > > >  Can you clarify the compatibility impact here? What
> > > > > will happen to groups that are already larger than the max size?
> > > > This is a very important question.
> > > > From my current understanding, when a coordinator broker gets shut
> > > > down during a cluster rolling upgrade, a replica will take leadership
> > of
> > > > the `__offset_commits` partition. Clients will then find that
> > coordinator
> > > > and send `joinGroup` on it, effectively rebuilding the group, since
> the
> > > > cache of active consumers is not stored outside the Coordinator's
> > memory.
> > > > (please do say if that is incorrect)
> > > > Then, I believe that working as if this is a new group is a
> reasonable
> > > > approach. Namely, fail joinGroups when the max.size is exceeded.
> > > > What do you guys think about this? (I'll update the KIP after we
> settle
> > > on
> > > > a solution)
> > > >
> > > > >  Also, just to be clear, the resource we are trying to conserve
> here
> > is
> > > > what? Memory?
> > > > My thinking is that we should abstract away from conserving resources
> > and
> > > > focus on giving control to the broker. The issue that spawned this
> KIP
> > > was
> > > > a memory problem but I feel this change is useful in a more general
> > way.
> > > It
> > > > limits the control clients have on the cluster and helps Kafka
> become a
> > > > more self-serving system. Admin/Ops teams can better control the
> impact
> > > > application developers can have on a Kafka cluster with this change
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > >
> > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi Stanislav,
> > > > >
> > > > > Thanks for the KIP. Can you clarify the compatibility impact here?
> > What
> > > > > will happen to groups that are already larger than the max size?
> > Also,
> > > > just
> > > > > to be clear, the resource we are trying to conserve here is what?
> > > Memory?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bc...@outlook.com>
> > > wrote:
> > > > >
> > > > > > Thanks Stanislav for the update! One suggestion I have is that it
> > > would
> > > > > be
> > > > > > helpful to put your
> > > > > >
> > > > > > reasoning on deciding the current default value. For example, in
> > > > certain
> > > > > > use cases at Pinterest we are very likely
> > > > > >
> > > > > > to have more consumers than 250 when we configure 8 stream
> > instances
> > > > with
> > > > > > 32 threads.
> > > > > >
> > > > > >
> > > > > > For the effectiveness of this KIP, we should encourage people to
> > > > discuss
> > > > > > their opinions on the default setting and ideally reach a
> > consensus.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > ________________________________
> > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > To: dev@kafka.apache.org
> > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > member
> > > > > > metadata growth
> > > > > >
> > > > > > Hey everybody,
> > > > > >
> > > > > > It's been a week since this KIP and not much discussion has been
> > > made.
> > > > > > I assume that this is a straight forward change and I will open a
> > > > voting
> > > > > > thread in the next couple of days if nobody has anything to
> > suggest.
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > stanislav@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Greetings everybody,
> > > > > > >
> > > > > > > I have enriched the KIP a bit with a bigger Motivation section
> > and
> > > > also
> > > > > > > renamed it.
> > > > > > > KIP:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
> > > > > > >
> > > > > > > I'm looking forward to discussions around it.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > stanislav@confluent.io> wrote:
> > > > > > >
> > > > > > >> Hey there everybody,
> > > > > > >>
> > > > > > >> Thanks for the introduction Boyang. I appreciate the effort
> you
> > > are
> > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > >>
> > > > > > >> @Matt
> > > > > > >> I also believe the default value is high. In my opinion, we
> > should
> > > > aim
> > > > > > to
> > > > > > >> a default cap around 250. This is because in the current model
> > any
> > > > > > consumer
> > > > > > >> rebalance is disrupting to every consumer. The bigger the
> group,
> > > the
> > > > > > longer
> > > > > > >> this period of disruption.
> > > > > > >>
> > > > > > >> If you have such a large consumer group, chances are that your
> > > > > > >> client-side logic could be structured better and that you are
> > not
> > > > > using
> > > > > > the
> > > > > > >> high number of consumers to achieve high throughput.
> > > > > > >> 250 can still be considered of a high upper bound, I believe
> in
> > > > > practice
> > > > > > >> users should aim to not go over 100 consumers per consumer
> > group.
> > > > > > >>
> > > > > > >> In regards to the cap being global/per-broker, I think that we
> > > > should
> > > > > > >> consider whether we want it to be global or *per-topic*. For
> the
> > > > time
> > > > > > >> being, I believe that having it per-topic with a global
> default
> > > > might
> > > > > be
> > > > > > >> the best situation. Having it global only seems a bit
> > restricting
> > > to
> > > > > me
> > > > > > and
> > > > > > >> it never hurts to support more fine-grained configurability
> > (given
> > > > > it's
> > > > > > the
> > > > > > >> same config, not a new one being introduced).
> > > > > > >>
> > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > bchen11@outlook.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > suggestion
> > > to
> > > > > > >>> change the default value. Meanwhile I just want to point out
> > that
> > > > > this
> > > > > > >>> value is a just last line of defense, not a real scenario we
> > > would
> > > > > > expect.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> In the meanwhile, I discussed with Stanislav and he would be
> > > > driving
> > > > > > the
> > > > > > >>> 389 effort from now on. Stanislav proposed the idea in the
> > first
> > > > > place
> > > > > > and
> > > > > > >>> had already come up a draft design, while I will keep
> focusing
> > on
> > > > > > KIP-345
> > > > > > >>> effort to ensure solving the edge case described in the JIRA<
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0
> > > > > > >.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Boyang
> > > > > > >>>
> > > > > > >>> ________________________________
> > > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > >>> To: dev@kafka.apache.org
> > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > > member
> > > > > > >>> metadata growth
> > > > > > >>>
> > > > > > >>> Thanks for the KIP.
> > > > > > >>>
> > > > > > >>> Will this cap be a global cap across the entire cluster or
> per
> > > > > broker?
> > > > > > >>>
> > > > > > >>> Either way the default value seems a bit high to me, but that
> > > could
> > > > > > just
> > > > > > >>> be
> > > > > > >>> from my own usage patterns. I’d have probably started with
> 500
> > or
> > > > 1k
> > > > > > but
> > > > > > >>> could be easily convinced that’s wrong.
> > > > > > >>>
> > > > > > >>> Thanks,
> > > > > > >>> Matt
> > > > > > >>>
> > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > bchen11@outlook.com
> > > >
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>> > Hey folks,
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > This is a pretty simple change to cap the consumer group
> size
> > > for
> > > > > > >>> broker
> > > > > > >>> > stability. Give me your valuable feedback when you got
> time.
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > Thank you!
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Best,
> > > > > > >> Stanislav
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Stanislav,

What do you think about the use case I mentioned in my previous reply about
> a more resilient self-service Kafka? I believe the benefit there is bigger.


I see this config as analogous to the open file limit. Probably this limit
was intended to be prescriptive at some point about what was deemed a
reasonable number of open files for an application. But mostly people treat
it as an annoyance which they have to work around. If it happens to be hit,
usually you just increase it because it is not tied to an actual resource
constraint. However, occasionally hitting the limit does indicate an
application bug such as a leak, so I wouldn't say it is useless. Similarly,
the issue in KAFKA-7610 was a consumer leak and having this limit would
have allowed the problem to be detected before it impacted the cluster. To
me, that's the main benefit. It's possible that it could be used
prescriptively to prevent poor usage of groups, but like the open file
limit, I suspect administrators will just set it large enough that users
are unlikely to complain.

Anyway, just a couple additional questions:

1. I think it would be helpful to clarify the details on how the
coordinator will shrink the group. It will need to choose which members to
remove. Are we going to give current members an opportunity to commit
offsets before kicking them from the group?

2. Do you think we should make this a dynamic config?

Thanks,
Jason




On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hi Jason,
>
> You raise some very valid points.
>
> > The benefit of this KIP is probably limited to preventing "runaway"
> consumer groups due to leaks or some other application bug
> What do you think about the use case I mentioned in my previous reply about
> a more resilient self-service Kafka? I believe the benefit there is bigger
>
> * Default value
> You're right, we probably do need to be conservative. Big consumer groups
> are considered an anti-pattern and my goal was to also hint at this through
> the config's default. Regardless, it is better to not have the potential to
> break applications with an upgrade.
> Choosing between the default of something big like 5000 or an opt-in
> option, I think we should go with the *disabled default option*  (-1).
> The only benefit we would get from a big default of 5000 is default
> protection against buggy/malicious applications that hit the KAFKA-7610
> issue.
> While this KIP was spawned from that issue, I believe its value is enabling
> the possibility of protection and helping move towards a more self-service
> Kafka. I also think that a default value of 5000 might be misleading to
> users and lead them to think that big consumer groups (> 250) are a good
> thing.
>
> The good news is that KAFKA-7610 should be fully resolved and the rebalance
> protocol should, in general, be more solid after the planned improvements
> in KIP-345 and KIP-394.
>
> * Handling bigger groups during upgrade
> I now see that we store the state of consumer groups in the log and why a
> rebalance isn't expected during a rolling upgrade.
> Since we're going with the default value of the max.size being disabled, I
> believe we can afford to be more strict here.
> During state reloading of a new Coordinator with a defined max.group.size
> config, I believe we should *force* rebalances for groups that exceed the
> configured size. Then, only some consumers will be able to join and the max
> size invariant will be satisfied.
>
> I updated the KIP with a migration plan, rejected alternatives and the new
> default value.
>
> Thanks,
> Stanislav
>
> On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Stanislav,
> >
> > Clients will then find that coordinator
> > > and send `joinGroup` on it, effectively rebuilding the group, since the
> > > cache of active consumers is not stored outside the Coordinator's
> memory.
> > > (please do say if that is incorrect)
> >
> >
> > Groups do not typically rebalance after a coordinator change. You could
> > potentially force a rebalance if the group is too big and kick out the
> > slowest members or something. A more graceful solution is probably to
> just
> > accept the current size and prevent it from getting bigger. We could log
> a
> > warning potentially.
> >
> > My thinking is that we should abstract away from conserving resources and
> > > focus on giving control to the broker. The issue that spawned this KIP
> > was
> > > a memory problem but I feel this change is useful in a more general
> way.
> >
> >
> > So you probably already know why I'm asking about this. For consumer
> groups
> > anyway, resource usage would typically be proportional to the number of
> > partitions that a group is reading from and not the number of members.
> For
> > example, consider the memory use in the offsets cache. The benefit of
> this
> > KIP is probably limited to preventing "runaway" consumer groups due to
> > leaks or some other application bug. That still seems useful though.
> >
> > I completely agree with this and I *ask everybody to chime in with
> opinions
> > > on a sensible default value*.
> >
> >
> > I think we would have to be very conservative. The group protocol is
> > generic in some sense, so there may be use cases we don't know of where
> > larger groups are reasonable. Probably we should make this an opt-in
> > feature so that we do not risk breaking anyone's application after an
> > upgrade. Either that, or use a very high default like 5,000.
> >
> > Thanks,
> > Jason
> >
> > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > wrote:
> >
> > > Hey Jason and Boyang, those were important comments
> > >
> > > > One suggestion I have is that it would be helpful to put your
> reasoning
> > > on deciding the current default value. For example, in certain use
> cases
> > at
> > > Pinterest we are very likely to have more consumers than 250 when we
> > > configure 8 stream instances with 32 threads.
> > > > For the effectiveness of this KIP, we should encourage people to
> > discuss
> > > their opinions on the default setting and ideally reach a consensus.
> > >
> > > I completely agree with this and I *ask everybody to chime in with
> > opinions
> > > on a sensible default value*.
> > > My thought process was that in the current model rebalances in large
> > groups
> > > are more costly. I imagine most use cases in most Kafka users do not
> > > require more than 250 consumers.
> > > Boyang, you say that you are "likely to have... when we..." - do you
> have
> > > systems running with so many consumers in a group or are you planning
> > to? I
> > > guess what I'm asking is whether this has been tested in production
> with
> > > the current rebalance model (ignoring KIP-345)
> > >
> > > >  Can you clarify the compatibility impact here? What
> > > > will happen to groups that are already larger than the max size?
> > > This is a very important question.
> > > From my current understanding, when a coordinator broker gets shut
> > > down during a cluster rolling upgrade, a replica will take leadership
> of
> > > the `__offset_commits` partition. Clients will then find that
> coordinator
> > > and send `joinGroup` on it, effectively rebuilding the group, since the
> > > cache of active consumers is not stored outside the Coordinator's
> memory.
> > > (please do say if that is incorrect)
> > > Then, I believe that working as if this is a new group is a reasonable
> > > approach. Namely, fail joinGroups when the max.size is exceeded.
> > > What do you guys think about this? (I'll update the KIP after we settle
> > on
> > > a solution)
> > >
> > > >  Also, just to be clear, the resource we are trying to conserve here
> is
> > > what? Memory?
> > > My thinking is that we should abstract away from conserving resources
> and
> > > focus on giving control to the broker. The issue that spawned this KIP
> > was
> > > a memory problem but I feel this change is useful in a more general
> way.
> > It
> > > limits the control clients have on the cluster and helps Kafka become a
> > > more self-serving system. Admin/Ops teams can better control the impact
> > > application developers can have on a Kafka cluster with this change
> > >
> > > Best,
> > > Stanislav
> > >
> > >
> > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hi Stanislav,
> > > >
> > > > Thanks for the KIP. Can you clarify the compatibility impact here?
> What
> > > > will happen to groups that are already larger than the max size?
> Also,
> > > just
> > > > to be clear, the resource we are trying to conserve here is what?
> > Memory?
> > > >
> > > > -Jason
> > > >
> > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bc...@outlook.com>
> > wrote:
> > > >
> > > > > Thanks Stanislav for the update! One suggestion I have is that it
> > would
> > > > be
> > > > > helpful to put your
> > > > >
> > > > > reasoning on deciding the current default value. For example, in
> > > certain
> > > > > use cases at Pinterest we are very likely
> > > > >
> > > > > to have more consumers than 250 when we configure 8 stream
> instances
> > > with
> > > > > 32 threads.
> > > > >
> > > > >
> > > > > For the effectiveness of this KIP, we should encourage people to
> > > discuss
> > > > > their opinions on the default setting and ideally reach a
> consensus.
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Boyang
> > > > >
> > > > > ________________________________
> > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > To: dev@kafka.apache.org
> > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> > > > > metadata growth
> > > > >
> > > > > Hey everybody,
> > > > >
> > > > > It's been a week since this KIP and not much discussion has been
> > made.
> > > > > I assume that this is a straight forward change and I will open a
> > > voting
> > > > > thread in the next couple of days if nobody has anything to
> suggest.
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > stanislav@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Greetings everybody,
> > > > > >
> > > > > > I have enriched the KIP a bit with a bigger Motivation section
> and
> > > also
> > > > > > renamed it.
> > > > > > KIP:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
> > > > > >
> > > > > > I'm looking forward to discussions around it.
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > stanislav@confluent.io> wrote:
> > > > > >
> > > > > >> Hey there everybody,
> > > > > >>
> > > > > >> Thanks for the introduction Boyang. I appreciate the effort you
> > are
> > > > > >> putting into improving consumer behavior in Kafka.
> > > > > >>
> > > > > >> @Matt
> > > > > >> I also believe the default value is high. In my opinion, we
> should
> > > aim
> > > > > to
> > > > > >> a default cap around 250. This is because in the current model
> any
> > > > > consumer
> > > > > >> rebalance is disrupting to every consumer. The bigger the group,
> > the
> > > > > longer
> > > > > >> this period of disruption.
> > > > > >>
> > > > > >> If you have such a large consumer group, chances are that your
> > > > > >> client-side logic could be structured better and that you are
> not
> > > > using
> > > > > the
> > > > > >> high number of consumers to achieve high throughput.
> > > > > >> 250 can still be considered of a high upper bound, I believe in
> > > > practice
> > > > > >> users should aim to not go over 100 consumers per consumer
> group.
> > > > > >>
> > > > > >> In regards to the cap being global/per-broker, I think that we
> > > should
> > > > > >> consider whether we want it to be global or *per-topic*. For the
> > > time
> > > > > >> being, I believe that having it per-topic with a global default
> > > might
> > > > be
> > > > > >> the best situation. Having it global only seems a bit
> restricting
> > to
> > > > me
> > > > > and
> > > > > >> it never hurts to support more fine-grained configurability
> (given
> > > > it's
> > > > > the
> > > > > >> same config, not a new one being introduced).
> > > > > >>
> > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> bchen11@outlook.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> suggestion
> > to
> > > > > >>> change the default value. Meanwhile I just want to point out
> that
> > > > this
> > > > > >>> value is a just last line of defense, not a real scenario we
> > would
> > > > > expect.
> > > > > >>>
> > > > > >>>
> > > > > >>> In the meanwhile, I discussed with Stanislav and he would be
> > > driving
> > > > > the
> > > > > >>> 389 effort from now on. Stanislav proposed the idea in the
> first
> > > > place
> > > > > and
> > > > > >>> had already come up a draft design, while I will keep focusing
> on
> > > > > KIP-345
> > > > > >>> effort to ensure solving the edge case described in the JIRA<
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0
> > > > > >.
> > > > > >>>
> > > > > >>>
> > > > > >>> Thank you Stanislav for making this happen!
> > > > > >>>
> > > > > >>>
> > > > > >>> Boyang
> > > > > >>>
> > > > > >>> ________________________________
> > > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > >>> To: dev@kafka.apache.org
> > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > > member
> > > > > >>> metadata growth
> > > > > >>>
> > > > > >>> Thanks for the KIP.
> > > > > >>>
> > > > > >>> Will this cap be a global cap across the entire cluster or per
> > > > broker?
> > > > > >>>
> > > > > >>> Either way the default value seems a bit high to me, but that
> > could
> > > > > just
> > > > > >>> be
> > > > > >>> from my own usage patterns. I’d have probably started with 500
> or
> > > 1k
> > > > > but
> > > > > >>> could be easily convinced that’s wrong.
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Matt
> > > > > >>>
> > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> bchen11@outlook.com
> > >
> > > > > wrote:
> > > > > >>>
> > > > > >>> > Hey folks,
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > >>> >
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > This is a pretty simple change to cap the consumer group size
> > for
> > > > > >>> broker
> > > > > >>> > stability. Give me your valuable feedback when you got time.
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > Thank you!
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Best,
> > > > > >> Stanislav
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>
>
> --
> Best,
> Stanislav
>

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hi Jason,

You raise some very valid points.

> The benefit of this KIP is probably limited to preventing "runaway"
consumer groups due to leaks or some other application bug
What do you think about the use case I mentioned in my previous reply about
a more resilient self-service Kafka? I believe the benefit there is bigger

* Default value
You're right, we probably do need to be conservative. Big consumer groups
are considered an anti-pattern and my goal was to also hint at this through
the config's default. Regardless, it is better to not have the potential to
break applications with an upgrade.
Choosing between the default of something big like 5000 or an opt-in
option, I think we should go with the *disabled default option*  (-1).
The only benefit we would get from a big default of 5000 is default
protection against buggy/malicious applications that hit the KAFKA-7610
issue.
While this KIP was spawned from that issue, I believe its value is enabling
the possibility of protection and helping move towards a more self-service
Kafka. I also think that a default value of 5000 might be misleading to
users and lead them to think that big consumer groups (> 250) are a good
thing.

The good news is that KAFKA-7610 should be fully resolved and the rebalance
protocol should, in general, be more solid after the planned improvements
in KIP-345 and KIP-394.

* Handling bigger groups during upgrade
I now see that we store the state of consumer groups in the log and why a
rebalance isn't expected during a rolling upgrade.
Since we're going with the default value of the max.size being disabled, I
believe we can afford to be more strict here.
During state reloading of a new Coordinator with a defined max.group.size
config, I believe we should *force* rebalances for groups that exceed the
configured size. Then, only some consumers will be able to join and the max
size invariant will be satisfied.

I updated the KIP with a migration plan, rejected alternatives and the new
default value.

Thanks,
Stanislav

On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io> wrote:

> Hey Stanislav,
>
> Clients will then find that coordinator
> > and send `joinGroup` on it, effectively rebuilding the group, since the
> > cache of active consumers is not stored outside the Coordinator's memory.
> > (please do say if that is incorrect)
>
>
> Groups do not typically rebalance after a coordinator change. You could
> potentially force a rebalance if the group is too big and kick out the
> slowest members or something. A more graceful solution is probably to just
> accept the current size and prevent it from getting bigger. We could log a
> warning potentially.
>
> My thinking is that we should abstract away from conserving resources and
> > focus on giving control to the broker. The issue that spawned this KIP
> was
> > a memory problem but I feel this change is useful in a more general way.
>
>
> So you probably already know why I'm asking about this. For consumer groups
> anyway, resource usage would typically be proportional to the number of
> partitions that a group is reading from and not the number of members. For
> example, consider the memory use in the offsets cache. The benefit of this
> KIP is probably limited to preventing "runaway" consumer groups due to
> leaks or some other application bug. That still seems useful though.
>
> I completely agree with this and I *ask everybody to chime in with opinions
> > on a sensible default value*.
>
>
> I think we would have to be very conservative. The group protocol is
> generic in some sense, so there may be use cases we don't know of where
> larger groups are reasonable. Probably we should make this an opt-in
> feature so that we do not risk breaking anyone's application after an
> upgrade. Either that, or use a very high default like 5,000.
>
> Thanks,
> Jason
>
> On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> stanislav@confluent.io>
> wrote:
>
> > Hey Jason and Boyang, those were important comments
> >
> > > One suggestion I have is that it would be helpful to put your reasoning
> > on deciding the current default value. For example, in certain use cases
> at
> > Pinterest we are very likely to have more consumers than 250 when we
> > configure 8 stream instances with 32 threads.
> > > For the effectiveness of this KIP, we should encourage people to
> discuss
> > their opinions on the default setting and ideally reach a consensus.
> >
> > I completely agree with this and I *ask everybody to chime in with
> opinions
> > on a sensible default value*.
> > My thought process was that in the current model rebalances in large
> groups
> > are more costly. I imagine most use cases in most Kafka users do not
> > require more than 250 consumers.
> > Boyang, you say that you are "likely to have... when we..." - do you have
> > systems running with so many consumers in a group or are you planning
> to? I
> > guess what I'm asking is whether this has been tested in production with
> > the current rebalance model (ignoring KIP-345)
> >
> > >  Can you clarify the compatibility impact here? What
> > > will happen to groups that are already larger than the max size?
> > This is a very important question.
> > From my current understanding, when a coordinator broker gets shut
> > down during a cluster rolling upgrade, a replica will take leadership of
> > the `__offset_commits` partition. Clients will then find that coordinator
> > and send `joinGroup` on it, effectively rebuilding the group, since the
> > cache of active consumers is not stored outside the Coordinator's memory.
> > (please do say if that is incorrect)
> > Then, I believe that working as if this is a new group is a reasonable
> > approach. Namely, fail joinGroups when the max.size is exceeded.
> > What do you guys think about this? (I'll update the KIP after we settle
> on
> > a solution)
> >
> > >  Also, just to be clear, the resource we are trying to conserve here is
> > what? Memory?
> > My thinking is that we should abstract away from conserving resources and
> > focus on giving control to the broker. The issue that spawned this KIP
> was
> > a memory problem but I feel this change is useful in a more general way.
> It
> > limits the control clients have on the cluster and helps Kafka become a
> > more self-serving system. Admin/Ops teams can better control the impact
> > application developers can have on a Kafka cluster with this change
> >
> > Best,
> > Stanislav
> >
> >
> > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hi Stanislav,
> > >
> > > Thanks for the KIP. Can you clarify the compatibility impact here? What
> > > will happen to groups that are already larger than the max size? Also,
> > just
> > > to be clear, the resource we are trying to conserve here is what?
> Memory?
> > >
> > > -Jason
> > >
> > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bc...@outlook.com>
> wrote:
> > >
> > > > Thanks Stanislav for the update! One suggestion I have is that it
> would
> > > be
> > > > helpful to put your
> > > >
> > > > reasoning on deciding the current default value. For example, in
> > certain
> > > > use cases at Pinterest we are very likely
> > > >
> > > > to have more consumers than 250 when we configure 8 stream instances
> > with
> > > > 32 threads.
> > > >
> > > >
> > > > For the effectiveness of this KIP, we should encourage people to
> > discuss
> > > > their opinions on the default setting and ideally reach a consensus.
> > > >
> > > >
> > > > Best,
> > > >
> > > > Boyang
> > > >
> > > > ________________________________
> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > To: dev@kafka.apache.org
> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > > metadata growth
> > > >
> > > > Hey everybody,
> > > >
> > > > It's been a week since this KIP and not much discussion has been
> made.
> > > > I assume that this is a straight forward change and I will open a
> > voting
> > > > thread in the next couple of days if nobody has anything to suggest.
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > stanislav@confluent.io>
> > > > wrote:
> > > >
> > > > > Greetings everybody,
> > > > >
> > > > > I have enriched the KIP a bit with a bigger Motivation section and
> > also
> > > > > renamed it.
> > > > > KIP:
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
> > > > >
> > > > > I'm looking forward to discussions around it.
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > stanislav@confluent.io> wrote:
> > > > >
> > > > >> Hey there everybody,
> > > > >>
> > > > >> Thanks for the introduction Boyang. I appreciate the effort you
> are
> > > > >> putting into improving consumer behavior in Kafka.
> > > > >>
> > > > >> @Matt
> > > > >> I also believe the default value is high. In my opinion, we should
> > aim
> > > > to
> > > > >> a default cap around 250. This is because in the current model any
> > > > consumer
> > > > >> rebalance is disrupting to every consumer. The bigger the group,
> the
> > > > longer
> > > > >> this period of disruption.
> > > > >>
> > > > >> If you have such a large consumer group, chances are that your
> > > > >> client-side logic could be structured better and that you are not
> > > using
> > > > the
> > > > >> high number of consumers to achieve high throughput.
> > > > >> 250 can still be considered of a high upper bound, I believe in
> > > practice
> > > > >> users should aim to not go over 100 consumers per consumer group.
> > > > >>
> > > > >> In regards to the cap being global/per-broker, I think that we
> > should
> > > > >> consider whether we want it to be global or *per-topic*. For the
> > time
> > > > >> being, I believe that having it per-topic with a global default
> > might
> > > be
> > > > >> the best situation. Having it global only seems a bit restricting
> to
> > > me
> > > > and
> > > > >> it never hurts to support more fine-grained configurability (given
> > > it's
> > > > the
> > > > >> same config, not a new one being introduced).
> > > > >>
> > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bchen11@outlook.com
> >
> > > > wrote:
> > > > >>
> > > > >>> Thanks Matt for the suggestion! I'm still open to any suggestion
> to
> > > > >>> change the default value. Meanwhile I just want to point out that
> > > this
> > > > >>> value is a just last line of defense, not a real scenario we
> would
> > > > expect.
> > > > >>>
> > > > >>>
> > > > >>> In the meanwhile, I discussed with Stanislav and he would be
> > driving
> > > > the
> > > > >>> 389 effort from now on. Stanislav proposed the idea in the first
> > > place
> > > > and
> > > > >>> had already come up a draft design, while I will keep focusing on
> > > > KIP-345
> > > > >>> effort to ensure solving the edge case described in the JIRA<
> > > > >>>
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0
> > > > >.
> > > > >>>
> > > > >>>
> > > > >>> Thank you Stanislav for making this happen!
> > > > >>>
> > > > >>>
> > > > >>> Boyang
> > > > >>>
> > > > >>> ________________________________
> > > > >>> From: Matt Farmer <ma...@frmr.me>
> > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > >>> To: dev@kafka.apache.org
> > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> > member
> > > > >>> metadata growth
> > > > >>>
> > > > >>> Thanks for the KIP.
> > > > >>>
> > > > >>> Will this cap be a global cap across the entire cluster or per
> > > broker?
> > > > >>>
> > > > >>> Either way the default value seems a bit high to me, but that
> could
> > > > just
> > > > >>> be
> > > > >>> from my own usage patterns. I’d have probably started with 500 or
> > 1k
> > > > but
> > > > >>> could be easily convinced that’s wrong.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Matt
> > > > >>>
> > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bchen11@outlook.com
> >
> > > > wrote:
> > > > >>>
> > > > >>> > Hey folks,
> > > > >>> >
> > > > >>> >
> > > > >>> > I would like to start a discussion on KIP-389:
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
> > > > >>> >
> > > > >>> >
> > > > >>> > This is a pretty simple change to cap the consumer group size
> for
> > > > >>> broker
> > > > >>> > stability. Give me your valuable feedback when you got time.
> > > > >>> >
> > > > >>> >
> > > > >>> > Thank you!
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best,
> > > > >> Stanislav
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Stanislav,

Clients will then find that coordinator
> and send `joinGroup` on it, effectively rebuilding the group, since the
> cache of active consumers is not stored outside the Coordinator's memory.
> (please do say if that is incorrect)


Groups do not typically rebalance after a coordinator change. You could
potentially force a rebalance if the group is too big and kick out the
slowest members or something. A more graceful solution is probably to just
accept the current size and prevent it from getting bigger. We could log a
warning potentially.

My thinking is that we should abstract away from conserving resources and
> focus on giving control to the broker. The issue that spawned this KIP was
> a memory problem but I feel this change is useful in a more general way.


So you probably already know why I'm asking about this. For consumer groups
anyway, resource usage would typically be proportional to the number of
partitions that a group is reading from and not the number of members. For
example, consider the memory use in the offsets cache. The benefit of this
KIP is probably limited to preventing "runaway" consumer groups due to
leaks or some other application bug. That still seems useful though.

I completely agree with this and I *ask everybody to chime in with opinions
> on a sensible default value*.


I think we would have to be very conservative. The group protocol is
generic in some sense, so there may be use cases we don't know of where
larger groups are reasonable. Probably we should make this an opt-in
feature so that we do not risk breaking anyone's application after an
upgrade. Either that, or use a very high default like 5,000.

Thanks,
Jason

On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hey Jason and Boyang, those were important comments
>
> > One suggestion I have is that it would be helpful to put your reasoning
> on deciding the current default value. For example, in certain use cases at
> Pinterest we are very likely to have more consumers than 250 when we
> configure 8 stream instances with 32 threads.
> > For the effectiveness of this KIP, we should encourage people to discuss
> their opinions on the default setting and ideally reach a consensus.
>
> I completely agree with this and I *ask everybody to chime in with opinions
> on a sensible default value*.
> My thought process was that in the current model rebalances in large groups
> are more costly. I imagine most use cases in most Kafka users do not
> require more than 250 consumers.
> Boyang, you say that you are "likely to have... when we..." - do you have
> systems running with so many consumers in a group or are you planning to? I
> guess what I'm asking is whether this has been tested in production with
> the current rebalance model (ignoring KIP-345)
>
> >  Can you clarify the compatibility impact here? What
> > will happen to groups that are already larger than the max size?
> This is a very important question.
> From my current understanding, when a coordinator broker gets shut
> down during a cluster rolling upgrade, a replica will take leadership of
> the `__offset_commits` partition. Clients will then find that coordinator
> and send `joinGroup` on it, effectively rebuilding the group, since the
> cache of active consumers is not stored outside the Coordinator's memory.
> (please do say if that is incorrect)
> Then, I believe that working as if this is a new group is a reasonable
> approach. Namely, fail joinGroups when the max.size is exceeded.
> What do you guys think about this? (I'll update the KIP after we settle on
> a solution)
>
> >  Also, just to be clear, the resource we are trying to conserve here is
> what? Memory?
> My thinking is that we should abstract away from conserving resources and
> focus on giving control to the broker. The issue that spawned this KIP was
> a memory problem but I feel this change is useful in a more general way. It
> limits the control clients have on the cluster and helps Kafka become a
> more self-serving system. Admin/Ops teams can better control the impact
> application developers can have on a Kafka cluster with this change
>
> Best,
> Stanislav
>
>
> On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi Stanislav,
> >
> > Thanks for the KIP. Can you clarify the compatibility impact here? What
> > will happen to groups that are already larger than the max size? Also,
> just
> > to be clear, the resource we are trying to conserve here is what? Memory?
> >
> > -Jason
> >
> > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bc...@outlook.com> wrote:
> >
> > > Thanks Stanislav for the update! One suggestion I have is that it would
> > be
> > > helpful to put your
> > >
> > > reasoning on deciding the current default value. For example, in
> certain
> > > use cases at Pinterest we are very likely
> > >
> > > to have more consumers than 250 when we configure 8 stream instances
> with
> > > 32 threads.
> > >
> > >
> > > For the effectiveness of this KIP, we should encourage people to
> discuss
> > > their opinions on the default setting and ideally reach a consensus.
> > >
> > >
> > > Best,
> > >
> > > Boyang
> > >
> > > ________________________________
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > Sent: Monday, November 26, 2018 6:14 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hey everybody,
> > >
> > > It's been a week since this KIP and not much discussion has been made.
> > > I assume that this is a straight forward change and I will open a
> voting
> > > thread in the next couple of days if nobody has anything to suggest.
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Greetings everybody,
> > > >
> > > > I have enriched the KIP a bit with a bigger Motivation section and
> also
> > > > renamed it.
> > > > KIP:
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
> > > >
> > > > I'm looking forward to discussions around it.
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > stanislav@confluent.io> wrote:
> > > >
> > > >> Hey there everybody,
> > > >>
> > > >> Thanks for the introduction Boyang. I appreciate the effort you are
> > > >> putting into improving consumer behavior in Kafka.
> > > >>
> > > >> @Matt
> > > >> I also believe the default value is high. In my opinion, we should
> aim
> > > to
> > > >> a default cap around 250. This is because in the current model any
> > > consumer
> > > >> rebalance is disrupting to every consumer. The bigger the group, the
> > > longer
> > > >> this period of disruption.
> > > >>
> > > >> If you have such a large consumer group, chances are that your
> > > >> client-side logic could be structured better and that you are not
> > using
> > > the
> > > >> high number of consumers to achieve high throughput.
> > > >> 250 can still be considered of a high upper bound, I believe in
> > practice
> > > >> users should aim to not go over 100 consumers per consumer group.
> > > >>
> > > >> In regards to the cap being global/per-broker, I think that we
> should
> > > >> consider whether we want it to be global or *per-topic*. For the
> time
> > > >> being, I believe that having it per-topic with a global default
> might
> > be
> > > >> the best situation. Having it global only seems a bit restricting to
> > me
> > > and
> > > >> it never hurts to support more fine-grained configurability (given
> > it's
> > > the
> > > >> same config, not a new one being introduced).
> > > >>
> > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bc...@outlook.com>
> > > wrote:
> > > >>
> > > >>> Thanks Matt for the suggestion! I'm still open to any suggestion to
> > > >>> change the default value. Meanwhile I just want to point out that
> > this
> > > >>> value is a just last line of defense, not a real scenario we would
> > > expect.
> > > >>>
> > > >>>
> > > >>> In the meanwhile, I discussed with Stanislav and he would be
> driving
> > > the
> > > >>> 389 effort from now on. Stanislav proposed the idea in the first
> > place
> > > and
> > > >>> had already come up a draft design, while I will keep focusing on
> > > KIP-345
> > > >>> effort to ensure solving the edge case described in the JIRA<
> > > >>>
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0
> > > >.
> > > >>>
> > > >>>
> > > >>> Thank you Stanislav for making this happen!
> > > >>>
> > > >>>
> > > >>> Boyang
> > > >>>
> > > >>> ________________________________
> > > >>> From: Matt Farmer <ma...@frmr.me>
> > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > >>> To: dev@kafka.apache.org
> > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap
> member
> > > >>> metadata growth
> > > >>>
> > > >>> Thanks for the KIP.
> > > >>>
> > > >>> Will this cap be a global cap across the entire cluster or per
> > broker?
> > > >>>
> > > >>> Either way the default value seems a bit high to me, but that could
> > > just
> > > >>> be
> > > >>> from my own usage patterns. I’d have probably started with 500 or
> 1k
> > > but
> > > >>> could be easily convinced that’s wrong.
> > > >>>
> > > >>> Thanks,
> > > >>> Matt
> > > >>>
> > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com>
> > > wrote:
> > > >>>
> > > >>> > Hey folks,
> > > >>> >
> > > >>> >
> > > >>> > I would like to start a discussion on KIP-389:
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
> > > >>> >
> > > >>> >
> > > >>> > This is a pretty simple change to cap the consumer group size for
> > > >>> broker
> > > >>> > stability. Give me your valuable feedback when you got time.
> > > >>> >
> > > >>> >
> > > >>> > Thank you!
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Best,
> > > >> Stanislav
> > > >>
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>
>
> --
> Best,
> Stanislav
>

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey Jason and Boyang, those were important comments

> One suggestion I have is that it would be helpful to put your reasoning
on deciding the current default value. For example, in certain use cases at
Pinterest we are very likely to have more consumers than 250 when we
configure 8 stream instances with 32 threads.
> For the effectiveness of this KIP, we should encourage people to discuss
their opinions on the default setting and ideally reach a consensus.

I completely agree with this and I *ask everybody to chime in with opinions
on a sensible default value*.
My thought process was that in the current model rebalances in large groups
are more costly. I imagine most use cases in most Kafka users do not
require more than 250 consumers.
Boyang, you say that you are "likely to have... when we..." - do you have
systems running with so many consumers in a group or are you planning to? I
guess what I'm asking is whether this has been tested in production with
the current rebalance model (ignoring KIP-345)

>  Can you clarify the compatibility impact here? What
> will happen to groups that are already larger than the max size?
This is a very important question.
From my current understanding, when a coordinator broker gets shut
down during a cluster rolling upgrade, a replica will take leadership of
the `__offset_commits` partition. Clients will then find that coordinator
and send `joinGroup` on it, effectively rebuilding the group, since the
cache of active consumers is not stored outside the Coordinator's memory.
(please do say if that is incorrect)
Then, I believe that working as if this is a new group is a reasonable
approach. Namely, fail joinGroups when the max.size is exceeded.
What do you guys think about this? (I'll update the KIP after we settle on
a solution)

>  Also, just to be clear, the resource we are trying to conserve here is
what? Memory?
My thinking is that we should abstract away from conserving resources and
focus on giving control to the broker. The issue that spawned this KIP was
a memory problem but I feel this change is useful in a more general way. It
limits the control clients have on the cluster and helps Kafka become a
more self-serving system. Admin/Ops teams can better control the impact
application developers can have on a Kafka cluster with this change

Best,
Stanislav


On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <ja...@confluent.io> wrote:

> Hi Stanislav,
>
> Thanks for the KIP. Can you clarify the compatibility impact here? What
> will happen to groups that are already larger than the max size? Also, just
> to be clear, the resource we are trying to conserve here is what? Memory?
>
> -Jason
>
> On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bc...@outlook.com> wrote:
>
> > Thanks Stanislav for the update! One suggestion I have is that it would
> be
> > helpful to put your
> >
> > reasoning on deciding the current default value. For example, in certain
> > use cases at Pinterest we are very likely
> >
> > to have more consumers than 250 when we configure 8 stream instances with
> > 32 threads.
> >
> >
> > For the effectiveness of this KIP, we should encourage people to discuss
> > their opinions on the default setting and ideally reach a consensus.
> >
> >
> > Best,
> >
> > Boyang
> >
> > ________________________________
> > From: Stanislav Kozlovski <st...@confluent.io>
> > Sent: Monday, November 26, 2018 6:14 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hey everybody,
> >
> > It's been a week since this KIP and not much discussion has been made.
> > I assume that this is a straight forward change and I will open a voting
> > thread in the next couple of days if nobody has anything to suggest.
> >
> > Best,
> > Stanislav
> >
> > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > wrote:
> >
> > > Greetings everybody,
> > >
> > > I have enriched the KIP a bit with a bigger Motivation section and also
> > > renamed it.
> > > KIP:
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
> > >
> > > I'm looking forward to discussions around it.
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > stanislav@confluent.io> wrote:
> > >
> > >> Hey there everybody,
> > >>
> > >> Thanks for the introduction Boyang. I appreciate the effort you are
> > >> putting into improving consumer behavior in Kafka.
> > >>
> > >> @Matt
> > >> I also believe the default value is high. In my opinion, we should aim
> > to
> > >> a default cap around 250. This is because in the current model any
> > consumer
> > >> rebalance is disrupting to every consumer. The bigger the group, the
> > longer
> > >> this period of disruption.
> > >>
> > >> If you have such a large consumer group, chances are that your
> > >> client-side logic could be structured better and that you are not
> using
> > the
> > >> high number of consumers to achieve high throughput.
> > >> 250 can still be considered of a high upper bound, I believe in
> practice
> > >> users should aim to not go over 100 consumers per consumer group.
> > >>
> > >> In regards to the cap being global/per-broker, I think that we should
> > >> consider whether we want it to be global or *per-topic*. For the time
> > >> being, I believe that having it per-topic with a global default might
> be
> > >> the best situation. Having it global only seems a bit restricting to
> me
> > and
> > >> it never hurts to support more fine-grained configurability (given
> it's
> > the
> > >> same config, not a new one being introduced).
> > >>
> > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bc...@outlook.com>
> > wrote:
> > >>
> > >>> Thanks Matt for the suggestion! I'm still open to any suggestion to
> > >>> change the default value. Meanwhile I just want to point out that
> this
> > >>> value is a just last line of defense, not a real scenario we would
> > expect.
> > >>>
> > >>>
> > >>> In the meanwhile, I discussed with Stanislav and he would be driving
> > the
> > >>> 389 effort from now on. Stanislav proposed the idea in the first
> place
> > and
> > >>> had already come up a draft design, while I will keep focusing on
> > KIP-345
> > >>> effort to ensure solving the edge case described in the JIRA<
> > >>>
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0
> > >.
> > >>>
> > >>>
> > >>> Thank you Stanislav for making this happen!
> > >>>
> > >>>
> > >>> Boyang
> > >>>
> > >>> ________________________________
> > >>> From: Matt Farmer <ma...@frmr.me>
> > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > >>> To: dev@kafka.apache.org
> > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > >>> metadata growth
> > >>>
> > >>> Thanks for the KIP.
> > >>>
> > >>> Will this cap be a global cap across the entire cluster or per
> broker?
> > >>>
> > >>> Either way the default value seems a bit high to me, but that could
> > just
> > >>> be
> > >>> from my own usage patterns. I’d have probably started with 500 or 1k
> > but
> > >>> could be easily convinced that’s wrong.
> > >>>
> > >>> Thanks,
> > >>> Matt
> > >>>
> > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com>
> > wrote:
> > >>>
> > >>> > Hey folks,
> > >>> >
> > >>> >
> > >>> > I would like to start a discussion on KIP-389:
> > >>> >
> > >>> >
> > >>> >
> > >>>
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
> > >>> >
> > >>> >
> > >>> > This is a pretty simple change to cap the consumer group size for
> > >>> broker
> > >>> > stability. Give me your valuable feedback when you got time.
> > >>> >
> > >>> >
> > >>> > Thank you!
> > >>> >
> > >>>
> > >>
> > >>
> > >> --
> > >> Best,
> > >> Stanislav
> > >>
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Jason Gustafson <ja...@confluent.io>.
Hi Stanislav,

Thanks for the KIP. Can you clarify the compatibility impact here? What
will happen to groups that are already larger than the max size? Also, just
to be clear, the resource we are trying to conserve here is what? Memory?

-Jason

On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bc...@outlook.com> wrote:

> Thanks Stanislav for the update! One suggestion I have is that it would be
> helpful to put your
>
> reasoning on deciding the current default value. For example, in certain
> use cases at Pinterest we are very likely
>
> to have more consumers than 250 when we configure 8 stream instances with
> 32 threads.
>
>
> For the effectiveness of this KIP, we should encourage people to discuss
> their opinions on the default setting and ideally reach a consensus.
>
>
> Best,
>
> Boyang
>
> ________________________________
> From: Stanislav Kozlovski <st...@confluent.io>
> Sent: Monday, November 26, 2018 6:14 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hey everybody,
>
> It's been a week since this KIP and not much discussion has been made.
> I assume that this is a straight forward change and I will open a voting
> thread in the next couple of days if nobody has anything to suggest.
>
> Best,
> Stanislav
>
> On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> stanislav@confluent.io>
> wrote:
>
> > Greetings everybody,
> >
> > I have enriched the KIP a bit with a bigger Motivation section and also
> > renamed it.
> > KIP:
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
> >
> > I'm looking forward to discussions around it.
> >
> > Best,
> > Stanislav
> >
> > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > stanislav@confluent.io> wrote:
> >
> >> Hey there everybody,
> >>
> >> Thanks for the introduction Boyang. I appreciate the effort you are
> >> putting into improving consumer behavior in Kafka.
> >>
> >> @Matt
> >> I also believe the default value is high. In my opinion, we should aim
> to
> >> a default cap around 250. This is because in the current model any
> consumer
> >> rebalance is disrupting to every consumer. The bigger the group, the
> longer
> >> this period of disruption.
> >>
> >> If you have such a large consumer group, chances are that your
> >> client-side logic could be structured better and that you are not using
> the
> >> high number of consumers to achieve high throughput.
> >> 250 can still be considered of a high upper bound, I believe in practice
> >> users should aim to not go over 100 consumers per consumer group.
> >>
> >> In regards to the cap being global/per-broker, I think that we should
> >> consider whether we want it to be global or *per-topic*. For the time
> >> being, I believe that having it per-topic with a global default might be
> >> the best situation. Having it global only seems a bit restricting to me
> and
> >> it never hurts to support more fine-grained configurability (given it's
> the
> >> same config, not a new one being introduced).
> >>
> >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bc...@outlook.com>
> wrote:
> >>
> >>> Thanks Matt for the suggestion! I'm still open to any suggestion to
> >>> change the default value. Meanwhile I just want to point out that this
> >>> value is a just last line of defense, not a real scenario we would
> expect.
> >>>
> >>>
> >>> In the meanwhile, I discussed with Stanislav and he would be driving
> the
> >>> 389 effort from now on. Stanislav proposed the idea in the first place
> and
> >>> had already come up a draft design, while I will keep focusing on
> KIP-345
> >>> effort to ensure solving the edge case described in the JIRA<
> >>>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0
> >.
> >>>
> >>>
> >>> Thank you Stanislav for making this happen!
> >>>
> >>>
> >>> Boyang
> >>>
> >>> ________________________________
> >>> From: Matt Farmer <ma...@frmr.me>
> >>> Sent: Tuesday, November 20, 2018 10:24 AM
> >>> To: dev@kafka.apache.org
> >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> >>> metadata growth
> >>>
> >>> Thanks for the KIP.
> >>>
> >>> Will this cap be a global cap across the entire cluster or per broker?
> >>>
> >>> Either way the default value seems a bit high to me, but that could
> just
> >>> be
> >>> from my own usage patterns. I’d have probably started with 500 or 1k
> but
> >>> could be easily convinced that’s wrong.
> >>>
> >>> Thanks,
> >>> Matt
> >>>
> >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com>
> wrote:
> >>>
> >>> > Hey folks,
> >>> >
> >>> >
> >>> > I would like to start a discussion on KIP-389:
> >>> >
> >>> >
> >>> >
> >>>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
> >>> >
> >>> >
> >>> > This is a pretty simple change to cap the consumer group size for
> >>> broker
> >>> > stability. Give me your valuable feedback when you got time.
> >>> >
> >>> >
> >>> > Thank you!
> >>> >
> >>>
> >>
> >>
> >> --
> >> Best,
> >> Stanislav
> >>
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav
>

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Thanks Stanislav for the update! One suggestion I have is that it would be helpful to put your

reasoning on deciding the current default value. For example, in certain use cases at Pinterest we are very likely

to have more consumers than 250 when we configure 8 stream instances with 32 threads.


For the effectiveness of this KIP, we should encourage people to discuss their opinions on the default setting and ideally reach a consensus.


Best,

Boyang

________________________________
From: Stanislav Kozlovski <st...@confluent.io>
Sent: Monday, November 26, 2018 6:14 PM
To: dev@kafka.apache.org
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Hey everybody,

It's been a week since this KIP and not much discussion has been made.
I assume that this is a straight forward change and I will open a voting
thread in the next couple of days if nobody has anything to suggest.

Best,
Stanislav

On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Greetings everybody,
>
> I have enriched the KIP a bit with a bigger Motivation section and also
> renamed it.
> KIP:
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&amp;reserved=0
>
> I'm looking forward to discussions around it.
>
> Best,
> Stanislav
>
> On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> stanislav@confluent.io> wrote:
>
>> Hey there everybody,
>>
>> Thanks for the introduction Boyang. I appreciate the effort you are
>> putting into improving consumer behavior in Kafka.
>>
>> @Matt
>> I also believe the default value is high. In my opinion, we should aim to
>> a default cap around 250. This is because in the current model any consumer
>> rebalance is disrupting to every consumer. The bigger the group, the longer
>> this period of disruption.
>>
>> If you have such a large consumer group, chances are that your
>> client-side logic could be structured better and that you are not using the
>> high number of consumers to achieve high throughput.
>> 250 can still be considered of a high upper bound, I believe in practice
>> users should aim to not go over 100 consumers per consumer group.
>>
>> In regards to the cap being global/per-broker, I think that we should
>> consider whether we want it to be global or *per-topic*. For the time
>> being, I believe that having it per-topic with a global default might be
>> the best situation. Having it global only seems a bit restricting to me and
>> it never hurts to support more fine-grained configurability (given it's the
>> same config, not a new one being introduced).
>>
>> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bc...@outlook.com> wrote:
>>
>>> Thanks Matt for the suggestion! I'm still open to any suggestion to
>>> change the default value. Meanwhile I just want to point out that this
>>> value is a just last line of defense, not a real scenario we would expect.
>>>
>>>
>>> In the meanwhile, I discussed with Stanislav and he would be driving the
>>> 389 effort from now on. Stanislav proposed the idea in the first place and
>>> had already come up a draft design, while I will keep focusing on KIP-345
>>> effort to ensure solving the edge case described in the JIRA<
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&amp;reserved=0>.
>>>
>>>
>>> Thank you Stanislav for making this happen!
>>>
>>>
>>> Boyang
>>>
>>> ________________________________
>>> From: Matt Farmer <ma...@frmr.me>
>>> Sent: Tuesday, November 20, 2018 10:24 AM
>>> To: dev@kafka.apache.org
>>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>>> metadata growth
>>>
>>> Thanks for the KIP.
>>>
>>> Will this cap be a global cap across the entire cluster or per broker?
>>>
>>> Either way the default value seems a bit high to me, but that could just
>>> be
>>> from my own usage patterns. I’d have probably started with 500 or 1k but
>>> could be easily convinced that’s wrong.
>>>
>>> Thanks,
>>> Matt
>>>
>>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com> wrote:
>>>
>>> > Hey folks,
>>> >
>>> >
>>> > I would like to start a discussion on KIP-389:
>>> >
>>> >
>>> >
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&amp;sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&amp;reserved=0
>>> >
>>> >
>>> > This is a pretty simple change to cap the consumer group size for
>>> broker
>>> > stability. Give me your valuable feedback when you got time.
>>> >
>>> >
>>> > Thank you!
>>> >
>>>
>>
>>
>> --
>> Best,
>> Stanislav
>>
>
>
> --
> Best,
> Stanislav
>


--
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey everybody,

It's been a week since this KIP and not much discussion has been made.
I assume that this is a straight forward change and I will open a voting
thread in the next couple of days if nobody has anything to suggest.

Best,
Stanislav

On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Greetings everybody,
>
> I have enriched the KIP a bit with a bigger Motivation section and also
> renamed it.
> KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-389%3A+Introduce+a+configurable+consumer+group+size+limit
>
> I'm looking forward to discussions around it.
>
> Best,
> Stanislav
>
> On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> stanislav@confluent.io> wrote:
>
>> Hey there everybody,
>>
>> Thanks for the introduction Boyang. I appreciate the effort you are
>> putting into improving consumer behavior in Kafka.
>>
>> @Matt
>> I also believe the default value is high. In my opinion, we should aim to
>> a default cap around 250. This is because in the current model any consumer
>> rebalance is disrupting to every consumer. The bigger the group, the longer
>> this period of disruption.
>>
>> If you have such a large consumer group, chances are that your
>> client-side logic could be structured better and that you are not using the
>> high number of consumers to achieve high throughput.
>> 250 can still be considered of a high upper bound, I believe in practice
>> users should aim to not go over 100 consumers per consumer group.
>>
>> In regards to the cap being global/per-broker, I think that we should
>> consider whether we want it to be global or *per-topic*. For the time
>> being, I believe that having it per-topic with a global default might be
>> the best situation. Having it global only seems a bit restricting to me and
>> it never hurts to support more fine-grained configurability (given it's the
>> same config, not a new one being introduced).
>>
>> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bc...@outlook.com> wrote:
>>
>>> Thanks Matt for the suggestion! I'm still open to any suggestion to
>>> change the default value. Meanwhile I just want to point out that this
>>> value is a just last line of defense, not a real scenario we would expect.
>>>
>>>
>>> In the meanwhile, I discussed with Stanislav and he would be driving the
>>> 389 effort from now on. Stanislav proposed the idea in the first place and
>>> had already come up a draft design, while I will keep focusing on KIP-345
>>> effort to ensure solving the edge case described in the JIRA<
>>> https://issues.apache.org/jira/browse/KAFKA-7610>.
>>>
>>>
>>> Thank you Stanislav for making this happen!
>>>
>>>
>>> Boyang
>>>
>>> ________________________________
>>> From: Matt Farmer <ma...@frmr.me>
>>> Sent: Tuesday, November 20, 2018 10:24 AM
>>> To: dev@kafka.apache.org
>>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>>> metadata growth
>>>
>>> Thanks for the KIP.
>>>
>>> Will this cap be a global cap across the entire cluster or per broker?
>>>
>>> Either way the default value seems a bit high to me, but that could just
>>> be
>>> from my own usage patterns. I’d have probably started with 500 or 1k but
>>> could be easily convinced that’s wrong.
>>>
>>> Thanks,
>>> Matt
>>>
>>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com> wrote:
>>>
>>> > Hey folks,
>>> >
>>> >
>>> > I would like to start a discussion on KIP-389:
>>> >
>>> >
>>> >
>>> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb0ee4fe97ad44cc046eb08d64e8f5d90%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636782774981237462&amp;sdata=Q2T7hIoVq8GiPVhr0HIxVkGNChkiz1Pvk2zyLD5gCu8%3D&amp;reserved=0
>>> >
>>> >
>>> > This is a pretty simple change to cap the consumer group size for
>>> broker
>>> > stability. Give me your valuable feedback when you got time.
>>> >
>>> >
>>> > Thank you!
>>> >
>>>
>>
>>
>> --
>> Best,
>> Stanislav
>>
>
>
> --
> Best,
> Stanislav
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Greetings everybody,

I have enriched the KIP a bit with a bigger Motivation section and also
renamed it.
KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-389%3A+Introduce+a+configurable+consumer+group+size+limit

I'm looking forward to discussions around it.

Best,
Stanislav

On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hey there everybody,
>
> Thanks for the introduction Boyang. I appreciate the effort you are
> putting into improving consumer behavior in Kafka.
>
> @Matt
> I also believe the default value is high. In my opinion, we should aim to
> a default cap around 250. This is because in the current model any consumer
> rebalance is disrupting to every consumer. The bigger the group, the longer
> this period of disruption.
>
> If you have such a large consumer group, chances are that your client-side
> logic could be structured better and that you are not using the high number
> of consumers to achieve high throughput.
> 250 can still be considered of a high upper bound, I believe in practice
> users should aim to not go over 100 consumers per consumer group.
>
> In regards to the cap being global/per-broker, I think that we should
> consider whether we want it to be global or *per-topic*. For the time
> being, I believe that having it per-topic with a global default might be
> the best situation. Having it global only seems a bit restricting to me and
> it never hurts to support more fine-grained configurability (given it's the
> same config, not a new one being introduced).
>
> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bc...@outlook.com> wrote:
>
>> Thanks Matt for the suggestion! I'm still open to any suggestion to
>> change the default value. Meanwhile I just want to point out that this
>> value is a just last line of defense, not a real scenario we would expect.
>>
>>
>> In the meanwhile, I discussed with Stanislav and he would be driving the
>> 389 effort from now on. Stanislav proposed the idea in the first place and
>> had already come up a draft design, while I will keep focusing on KIP-345
>> effort to ensure solving the edge case described in the JIRA<
>> https://issues.apache.org/jira/browse/KAFKA-7610>.
>>
>>
>> Thank you Stanislav for making this happen!
>>
>>
>> Boyang
>>
>> ________________________________
>> From: Matt Farmer <ma...@frmr.me>
>> Sent: Tuesday, November 20, 2018 10:24 AM
>> To: dev@kafka.apache.org
>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
>> metadata growth
>>
>> Thanks for the KIP.
>>
>> Will this cap be a global cap across the entire cluster or per broker?
>>
>> Either way the default value seems a bit high to me, but that could just
>> be
>> from my own usage patterns. I’d have probably started with 500 or 1k but
>> could be easily convinced that’s wrong.
>>
>> Thanks,
>> Matt
>>
>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com> wrote:
>>
>> > Hey folks,
>> >
>> >
>> > I would like to start a discussion on KIP-389:
>> >
>> >
>> >
>> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb0ee4fe97ad44cc046eb08d64e8f5d90%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636782774981237462&amp;sdata=Q2T7hIoVq8GiPVhr0HIxVkGNChkiz1Pvk2zyLD5gCu8%3D&amp;reserved=0
>> >
>> >
>> > This is a pretty simple change to cap the consumer group size for broker
>> > stability. Give me your valuable feedback when you got time.
>> >
>> >
>> > Thank you!
>> >
>>
>
>
> --
> Best,
> Stanislav
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey there everybody,

Thanks for the introduction Boyang. I appreciate the effort you are putting
into improving consumer behavior in Kafka.

@Matt
I also believe the default value is high. In my opinion, we should aim to a
default cap around 250. This is because in the current model any consumer
rebalance is disrupting to every consumer. The bigger the group, the longer
this period of disruption.

If you have such a large consumer group, chances are that your client-side
logic could be structured better and that you are not using the high number
of consumers to achieve high throughput.
250 can still be considered of a high upper bound, I believe in practice
users should aim to not go over 100 consumers per consumer group.

In regards to the cap being global/per-broker, I think that we should
consider whether we want it to be global or *per-topic*. For the time
being, I believe that having it per-topic with a global default might be
the best situation. Having it global only seems a bit restricting to me and
it never hurts to support more fine-grained configurability (given it's the
same config, not a new one being introduced).

On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bc...@outlook.com> wrote:

> Thanks Matt for the suggestion! I'm still open to any suggestion to change
> the default value. Meanwhile I just want to point out that this value is a
> just last line of defense, not a real scenario we would expect.
>
>
> In the meanwhile, I discussed with Stanislav and he would be driving the
> 389 effort from now on. Stanislav proposed the idea in the first place and
> had already come up a draft design, while I will keep focusing on KIP-345
> effort to ensure solving the edge case described in the JIRA<
> https://issues.apache.org/jira/browse/KAFKA-7610>.
>
>
> Thank you Stanislav for making this happen!
>
>
> Boyang
>
> ________________________________
> From: Matt Farmer <ma...@frmr.me>
> Sent: Tuesday, November 20, 2018 10:24 AM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Thanks for the KIP.
>
> Will this cap be a global cap across the entire cluster or per broker?
>
> Either way the default value seems a bit high to me, but that could just be
> from my own usage patterns. I’d have probably started with 500 or 1k but
> could be easily convinced that’s wrong.
>
> Thanks,
> Matt
>
> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com> wrote:
>
> > Hey folks,
> >
> >
> > I would like to start a discussion on KIP-389:
> >
> >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb0ee4fe97ad44cc046eb08d64e8f5d90%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636782774981237462&amp;sdata=Q2T7hIoVq8GiPVhr0HIxVkGNChkiz1Pvk2zyLD5gCu8%3D&amp;reserved=0
> >
> >
> > This is a pretty simple change to cap the consumer group size for broker
> > stability. Give me your valuable feedback when you got time.
> >
> >
> > Thank you!
> >
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Posted by Boyang Chen <bc...@outlook.com>.
Thanks Matt for the suggestion! I'm still open to any suggestion to change the default value. Meanwhile I just want to point out that this value is a just last line of defense, not a real scenario we would expect.


In the meanwhile, I discussed with Stanislav and he would be driving the 389 effort from now on. Stanislav proposed the idea in the first place and had already come up a draft design, while I will keep focusing on KIP-345 effort to ensure solving the edge case described in the JIRA<https://issues.apache.org/jira/browse/KAFKA-7610>.


Thank you Stanislav for making this happen!


Boyang

________________________________
From: Matt Farmer <ma...@frmr.me>
Sent: Tuesday, November 20, 2018 10:24 AM
To: dev@kafka.apache.org
Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Thanks for the KIP.

Will this cap be a global cap across the entire cluster or per broker?

Either way the default value seems a bit high to me, but that could just be
from my own usage patterns. I’d have probably started with 500 or 1k but
could be easily convinced that’s wrong.

Thanks,
Matt

On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bc...@outlook.com> wrote:

> Hey folks,
>
>
> I would like to start a discussion on KIP-389:
>
>
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb0ee4fe97ad44cc046eb08d64e8f5d90%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636782774981237462&amp;sdata=Q2T7hIoVq8GiPVhr0HIxVkGNChkiz1Pvk2zyLD5gCu8%3D&amp;reserved=0
>
>
> This is a pretty simple change to cap the consumer group size for broker
> stability. Give me your valuable feedback when you got time.
>
>
> Thank you!
>