You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jun Rao <ju...@confluent.io> on 2017/06/06 21:59:52 UTC

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Hi, Everyone,

Sorry for being late on this thread. I just came across this thread. I have
a couple of concerns on this. (1) It seems the amount of delay will be
application specific. So, it seems that it's better for the delay to be a
client side config instead of a server side one? (2) When running console
consumer in quickstart, a minimum of 3 sec delay seems to be a bad
experience for our users.

Since we are getting late into the release cycle, it may be a bit too late
to make big changes in the 0.11 release. Perhaps we should at least
consider overriding the delay in config/server.properties to 0 to improve
the quickstart experience?

Thanks,

Jun


On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <da...@gmail.com> wrote:

> Hi Onur,
>
> It was in my previous email. But here it is again.
>
> ============================================================
>
> 1. Better rebalance timing. We will try to rebalance only when all the
> consumers in a group have joined. The challenge would be someone has to
> define what does ALL consumers mean, it could either be a time or number of
> consumers, etc.
>
> 2. Avoid frequent rebalance. For example, if there are 100 consumers in a
> group, today, in the worst case, we may end up with 100 rebalances even if
> all the consumers joined the group in a reasonably small amount of time.
> Frequent rebalance is also a bad thing for brokers.
>
> Having a client side configuration may solve problem 1 better because each
> consumer group can potentially configure their own timing. However, it does
> not really prevent frequent rebalance in general because some of the
> consumers can be misconfigured. (This may have something to do with KIP-124
> as well. But if quota is applied on the JoinGroup/SyncGroup request it may
> cause some unwanted cascading effects.)
>
> Having a broker side configuration may result in less flexibility for each
> consumer group, but it can prevent frequent rebalance better. I think with
> some reasonable design, the rebalance timing issue can be resolved on the
> broker side as well. Matthias had a good point on extending the delay when
> a new consumer joins a group (we actually did something similar to batch
> ISR change propagation). For example, let's say on the broker side, we will
> always delay 2 seconds each time we see a new consumer joining a consumer
> group. This would probably work for most of the consumer groups and will
> also limit the rebalance frequency to protect the brokers.
>
> I am not sure about the streams use case here, but if something like 2
> seconds of delay is acceptable for streams, I would prefer adding the
> configuration to the broker so that we can address both problems.
>
> On Thu, 6 Apr 2017 at 17:11 Onur Karaman <on...@gmail.com>
> wrote:
>
> > Hi Damian.
> >
> > Can you copy the point Becket made earlier that you say isn't addressed?
> >
> > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <da...@gmail.com> wrote:
> >
> > > Thanks all, the Vote is now closed and the KIP has been accepted with 9
> > +1s
> > >
> > > 3 binding::
> > > Guozhang,
> > > Jason,
> > > Ismael
> > >
> > > 6 non-binding:
> > > Bill,
> > > Eno,
> > > Mathieu,
> > > Matthias,
> > > Dong,
> > > Mickael
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <is...@juma.me.uk> wrote:
> > >
> > > > Thanks for the KIP, +1 (binding).
> > > >
> > > > Ismael
> > > >
> > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <jason@confluent.io
> >
> > > > wrote:
> > > >
> > > > > +1 Thanks for the KIP!
> > > > >
> > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> wangguoz@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Sorry about the previous email, Gmail seems be collapsing them
> > into a
> > > > > > single thread on my inbox.
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > wangguoz@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Damian, could you create a new thread for the voting process?
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> bbejeck@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> +1(non-binding)
> > > > > > >>
> > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > eno.thereska@gmail.com
> > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > +1 (non binding)
> > > > > > >> >
> > > > > > >> > Thanks
> > > > > > >> > Eno
> > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > matthias@confluent.io>
> > > > > > >> wrote:
> > > > > > >> > >
> > > > > > >> > > +1
> > > > > > >> > >
> > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > >> > >> Hi All,
> > > > > > >> > >>
> > > > > > >> > >> I'd like to start the voting thread on KIP-134:
> > > > > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > >> > >>
> > > > > > >> > >> Thanks,
> > > > > > >> > >> Damian
> > > > > > >> > >>
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Guozhang Wang <wa...@gmail.com>.
Becket,

Thanks for the replies. Now I see you want to optimize with the heuristic
that "if I see a JoinGroup shortly enough after a rebalance is completed,
then likely there are more JoinGroups coming". I agree that it will help
with a single-instance console consumer for debugging etc, but still in
some unit test cases where we do have more than one instances for the group
it may still be an issue. Plus this logic seem to be more complicated.

On the other hand, without KIP-134 today we are already under the issue
that with large consumer groups consecutive rebalances may be triggered
which takes long latency, so if users mistakenly set the config to 0 it
will not be worse than what we already have today. So to me having this
config on the client side would not introduce any regression. In addition,
we can extend this mechanism to not only for generation 0 (i.e. for the
first time the group has formed), but for any rebalances, which could all
be vulnerable to consecutive rebalances when there is a topic / member
change (e.g. for MM, rebalance can take long to stabilize even after it has
been running for a while).


Guozhang


On Mon, Jul 17, 2017 at 9:37 PM, Becket Qin <be...@gmail.com> wrote:

> Hi Guozhang,
>
> Sorry for the confusion. I actually meant always "complete" the rebalance
> immediately when the first consumer joining the group. i.e. the
> configurable delta only kicks in after the first rebalance.
>
> The concern I have was actually not the frequent rebalance for the users,
> but the pressure on the broker side when frequent rebalance happens. For
> example, if there is a big consumer group with many consumers (e.g. ETL,
> MM, streams, etc) misconfigured the initial rebalance delay to 0, it may
> cause hundreds even thousands of rebalances occur back to back and will
> likely take quite a bit bandwidth. I am a little worried about the
> performance impact in that case. Although request quota might help to
> throttle the rebalance, that seems not the most ideal solution.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Mon, Jul 17, 2017 at 2:02 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > Becket:
> >
> > I think the problem is that when we have a single member joining an
> unknown
> > group for the first time ever, do we want to complete the rebalance
> > immediately or not; it does not matter if we want to "start" the
> rebalance,
> > since even for now if the group coordinator is in the SyncGroup phase
> > waiting for the consumers to send the SyncGroup requests, if it then
> > receives a JoinGroup request it will still cancel the current rebalance
> and
> > falls back to the beginning of the PrepareRebalance.
> >
> > So with the configurable delta, if it will indeed prevent the started
> > rebalance to complete, then the console consumer will still be affected;
> if
> > it will not prevent the started rebalance to complete, then we may still
> > get consecutive rebalances since the first rebalance will usually
> complete
> > very quick.
> >
> > I think the proposal for having the configuration on the client-side
> > instead of on the broker side does not mean that users now need to worry
> > about the config: with an default value of, say 0, as long as they do not
> > observe any consecutive rebalance issues they may never need to be aware
> of
> > such configs at all. And for some higher-level clients like Streams, we
> may
> > decide to change its default configs to be larger than 0 as it may be
> more
> > common to hit the issue.
> >
> >
> > Greg:
> >
> > Regarding notifying the users with too frequent rebalances, I think it
> > would be a better mechanism for users to monitor on a certain metric
> (say,
> > rebalance rate) than watching on the config? Under normal opration this
> > rebalance rate should be 0 with only a rare spike from time to time; if
> > there is continuous non-zero values for this metric then users can be
> > notified. And we can educate them about configuring their apps with the
> > recommended values in web docs correspondingly?
> >
> >
> > Guozhang
> >
> > On Thu, Jul 13, 2017 at 7:37 AM, Becket Qin <be...@gmail.com>
> wrote:
> >
> > > I am a little hesitant to add the configuration to the client. It would
> > be
> > > more flexible but this seems not the thing that users should worry
> about
> > (I
> > > imagine many people would simply set backoff to 0 just for fast
> > rebalance).
> > > I am wondering if the following variant of the current solution will
> > > address the problem.
> > >
> > > 1. broker will start to rebalance immediately when the first member
> joins
> > > the group at T0.
> > >
> > > 2. If another member joins the group at T1 which is between T0 and T0 +
> > > delta (configurable), the broker will wait until T1 + delta then do the
> > > rebalance. Any additional member joining before the rebalance kicks off
> > > would result in the delay of the rebalance with the same extension
> logic
> > as
> > > we have now. We can also try some exponential back off if needed.
> > >
> > > This should help address the console consumer problem. Not sure if
> there
> > > are other cases that needs to be considered, though.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Jul 10, 2017 at 5:28 PM, Greg Fodor <gf...@gmail.com> wrote:
> > >
> > > > Found this thread after posting an alternative idea after we starting
> > > > hitting this issue ourselves for a job that has a lot of state stores
> > and
> > > > topic partitions. My suggestion was to have consumer groups have a
> > > > configurable minimum member count before consumption begins, but that
> > has
> > > > its own trade offs and benefits (maybe a different KIP.)
> > > >
> > > > One suggestion I had is maybe there is some relatively fool-proof
> > > heuristic
> > > > that can cause Kafka Streams to emit an INFO/WARN to the log to
> inform
> > > the
> > > > user of the configuration if it detects a rapid rebalance on startup
> > due
> > > to
> > > > new nodes joining? For example, if streams detects a rebalance,
> before
> > > > processors are initialized, that only add new nodes, if the
> > configuration
> > > > has not been overridden, write to the log?
> > > >
> > > >
> > > >
> > > > On Thu, Jun 8, 2017 at 2:56 PM, Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > > > Just recapping on client-side v.s. broker-side config: we did
> discuss
> > > > about
> > > > > adding this as a client-side config and bump up join-group request
> (I
> > > > think
> > > > > both Ismael and Ewen questioned about it) to include this
> configured
> > > > value
> > > > > to the broker. I cannot remember if there is any strong motivations
> > > > against
> > > > > going to the client-side config, except that we felt a default
> > non-zero
> > > > > value will benefit most users assuming they start with more than
> one
> > > > member
> > > > > in their group but only advanced users would really realize this
> > config
> > > > > existing and tune it themselves.
> > > > >
> > > > > I agree that we could re-consider it for the next release if we
> > observe
> > > > > that it is actually affecting more users than benefiting them.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Jun/Ismael,
> > > > > >
> > > > > > Sounds good to me.
> > > > > >
> > > > > > Thanks,
> > > > > > Damian
> > > > > >
> > > > > > On Tue, 6 Jun 2017 at 23:08 Ismael Juma <is...@juma.me.uk>
> wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > The console consumer issue also came up in a conversation I was
> > > > having
> > > > > > > recently. Seems like the config/server.properties change is a
> > > > > reasonable
> > > > > > > compromise given that we have other defaults that are for
> > > > development.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Everyone,
> > > > > > > >
> > > > > > > > Sorry for being late on this thread. I just came across this
> > > > thread.
> > > > > I
> > > > > > > have
> > > > > > > > a couple of concerns on this. (1) It seems the amount of
> delay
> > > will
> > > > > be
> > > > > > > > application specific. So, it seems that it's better for the
> > delay
> > > > to
> > > > > > be a
> > > > > > > > client side config instead of a server side one? (2) When
> > running
> > > > > > console
> > > > > > > > consumer in quickstart, a minimum of 3 sec delay seems to be
> a
> > > bad
> > > > > > > > experience for our users.
> > > > > > > >
> > > > > > > > Since we are getting late into the release cycle, it may be a
> > bit
> > > > too
> > > > > > > late
> > > > > > > > to make big changes in the 0.11 release. Perhaps we should at
> > > least
> > > > > > > > consider overriding the delay in config/server.properties to
> 0
> > to
> > > > > > improve
> > > > > > > > the quickstart experience?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <
> > > damian.guy@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Onur,
> > > > > > > > >
> > > > > > > > > It was in my previous email. But here it is again.
> > > > > > > > >
> > > > > > > > > ==============================
> ==============================
> > > > > > > > >
> > > > > > > > > 1. Better rebalance timing. We will try to rebalance only
> > when
> > > > all
> > > > > > the
> > > > > > > > > consumers in a group have joined. The challenge would be
> > > someone
> > > > > has
> > > > > > to
> > > > > > > > > define what does ALL consumers mean, it could either be a
> > time
> > > or
> > > > > > > number
> > > > > > > > of
> > > > > > > > > consumers, etc.
> > > > > > > > >
> > > > > > > > > 2. Avoid frequent rebalance. For example, if there are 100
> > > > > consumers
> > > > > > > in a
> > > > > > > > > group, today, in the worst case, we may end up with 100
> > > > rebalances
> > > > > > even
> > > > > > > > if
> > > > > > > > > all the consumers joined the group in a reasonably small
> > amount
> > > > of
> > > > > > > time.
> > > > > > > > > Frequent rebalance is also a bad thing for brokers.
> > > > > > > > >
> > > > > > > > > Having a client side configuration may solve problem 1
> better
> > > > > because
> > > > > > > > each
> > > > > > > > > consumer group can potentially configure their own timing.
> > > > However,
> > > > > > it
> > > > > > > > does
> > > > > > > > > not really prevent frequent rebalance in general because
> some
> > > of
> > > > > the
> > > > > > > > > consumers can be misconfigured. (This may have something to
> > do
> > > > with
> > > > > > > > KIP-124
> > > > > > > > > as well. But if quota is applied on the JoinGroup/SyncGroup
> > > > request
> > > > > > it
> > > > > > > > may
> > > > > > > > > cause some unwanted cascading effects.)
> > > > > > > > >
> > > > > > > > > Having a broker side configuration may result in less
> > > flexibility
> > > > > for
> > > > > > > > each
> > > > > > > > > consumer group, but it can prevent frequent rebalance
> > better. I
> > > > > think
> > > > > > > > with
> > > > > > > > > some reasonable design, the rebalance timing issue can be
> > > > resolved
> > > > > on
> > > > > > > the
> > > > > > > > > broker side as well. Matthias had a good point on extending
> > the
> > > > > delay
> > > > > > > > when
> > > > > > > > > a new consumer joins a group (we actually did something
> > similar
> > > > to
> > > > > > > batch
> > > > > > > > > ISR change propagation). For example, let's say on the
> broker
> > > > side,
> > > > > > we
> > > > > > > > will
> > > > > > > > > always delay 2 seconds each time we see a new consumer
> > joining
> > > a
> > > > > > > consumer
> > > > > > > > > group. This would probably work for most of the consumer
> > groups
> > > > and
> > > > > > > will
> > > > > > > > > also limit the rebalance frequency to protect the brokers.
> > > > > > > > >
> > > > > > > > > I am not sure about the streams use case here, but if
> > something
> > > > > like
> > > > > > 2
> > > > > > > > > seconds of delay is acceptable for streams, I would prefer
> > > adding
> > > > > the
> > > > > > > > > configuration to the broker so that we can address both
> > > problems.
> > > > > > > > >
> > > > > > > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <
> > > > > > onurkaraman.apache@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Damian.
> > > > > > > > > >
> > > > > > > > > > Can you copy the point Becket made earlier that you say
> > isn't
> > > > > > > > addressed?
> > > > > > > > > >
> > > > > > > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <
> > > > damian.guy@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks all, the Vote is now closed and the KIP has been
> > > > > accepted
> > > > > > > > with 9
> > > > > > > > > > +1s
> > > > > > > > > > >
> > > > > > > > > > > 3 binding::
> > > > > > > > > > > Guozhang,
> > > > > > > > > > > Jason,
> > > > > > > > > > > Ismael
> > > > > > > > > > >
> > > > > > > > > > > 6 non-binding:
> > > > > > > > > > > Bill,
> > > > > > > > > > > Eno,
> > > > > > > > > > > Mathieu,
> > > > > > > > > > > Matthias,
> > > > > > > > > > > Dong,
> > > > > > > > > > > Mickael
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Damian
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <
> > ismael@juma.me.uk
> > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the KIP, +1 (binding).
> > > > > > > > > > > >
> > > > > > > > > > > > Ismael
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> > > > > > > > jason@confluent.io
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 Thanks for the KIP!
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > > > > > > > > wangguoz@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry about the previous email, Gmail seems be
> > > > collapsing
> > > > > > > them
> > > > > > > > > > into a
> > > > > > > > > > > > > > single thread on my inbox.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > > > > > > > > wangguoz@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Damian, could you create a new thread for the
> > > voting
> > > > > > > process?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > > > > > > > > bbejeck@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> +1(non-binding)
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska
> <
> > > > > > > > > > > > eno.thereska@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> > +1 (non binding)
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Thanks
> > > > > > > > > > > > > > >> > Eno
> > > > > > > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax
> <
> > > > > > > > > > > > matthias@confluent.io>
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > +1
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > > > > > > > > >> > >> Hi All,
> > > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > > >> > >> I'd like to start the voting thread on
> > > KIP-134:
> > > > > > > > > > > > > > >> > >>
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > > > > >> > 134%3A+Delay+initial+consumer+
> group+rebalance
> > > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > > >> > >> Thanks,
> > > > > > > > > > > > > > >> > >> Damian
> > > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Becket Qin <be...@gmail.com>.
Hi Guozhang,

Sorry for the confusion. I actually meant always "complete" the rebalance
immediately when the first consumer joining the group. i.e. the
configurable delta only kicks in after the first rebalance.

The concern I have was actually not the frequent rebalance for the users,
but the pressure on the broker side when frequent rebalance happens. For
example, if there is a big consumer group with many consumers (e.g. ETL,
MM, streams, etc) misconfigured the initial rebalance delay to 0, it may
cause hundreds even thousands of rebalances occur back to back and will
likely take quite a bit bandwidth. I am a little worried about the
performance impact in that case. Although request quota might help to
throttle the rebalance, that seems not the most ideal solution.

Thanks,

Jiangjie (Becket) Qin


On Mon, Jul 17, 2017 at 2:02 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Becket:
>
> I think the problem is that when we have a single member joining an unknown
> group for the first time ever, do we want to complete the rebalance
> immediately or not; it does not matter if we want to "start" the rebalance,
> since even for now if the group coordinator is in the SyncGroup phase
> waiting for the consumers to send the SyncGroup requests, if it then
> receives a JoinGroup request it will still cancel the current rebalance and
> falls back to the beginning of the PrepareRebalance.
>
> So with the configurable delta, if it will indeed prevent the started
> rebalance to complete, then the console consumer will still be affected; if
> it will not prevent the started rebalance to complete, then we may still
> get consecutive rebalances since the first rebalance will usually complete
> very quick.
>
> I think the proposal for having the configuration on the client-side
> instead of on the broker side does not mean that users now need to worry
> about the config: with an default value of, say 0, as long as they do not
> observe any consecutive rebalance issues they may never need to be aware of
> such configs at all. And for some higher-level clients like Streams, we may
> decide to change its default configs to be larger than 0 as it may be more
> common to hit the issue.
>
>
> Greg:
>
> Regarding notifying the users with too frequent rebalances, I think it
> would be a better mechanism for users to monitor on a certain metric (say,
> rebalance rate) than watching on the config? Under normal opration this
> rebalance rate should be 0 with only a rare spike from time to time; if
> there is continuous non-zero values for this metric then users can be
> notified. And we can educate them about configuring their apps with the
> recommended values in web docs correspondingly?
>
>
> Guozhang
>
> On Thu, Jul 13, 2017 at 7:37 AM, Becket Qin <be...@gmail.com> wrote:
>
> > I am a little hesitant to add the configuration to the client. It would
> be
> > more flexible but this seems not the thing that users should worry about
> (I
> > imagine many people would simply set backoff to 0 just for fast
> rebalance).
> > I am wondering if the following variant of the current solution will
> > address the problem.
> >
> > 1. broker will start to rebalance immediately when the first member joins
> > the group at T0.
> >
> > 2. If another member joins the group at T1 which is between T0 and T0 +
> > delta (configurable), the broker will wait until T1 + delta then do the
> > rebalance. Any additional member joining before the rebalance kicks off
> > would result in the delay of the rebalance with the same extension logic
> as
> > we have now. We can also try some exponential back off if needed.
> >
> > This should help address the console consumer problem. Not sure if there
> > are other cases that needs to be considered, though.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Jul 10, 2017 at 5:28 PM, Greg Fodor <gf...@gmail.com> wrote:
> >
> > > Found this thread after posting an alternative idea after we starting
> > > hitting this issue ourselves for a job that has a lot of state stores
> and
> > > topic partitions. My suggestion was to have consumer groups have a
> > > configurable minimum member count before consumption begins, but that
> has
> > > its own trade offs and benefits (maybe a different KIP.)
> > >
> > > One suggestion I had is maybe there is some relatively fool-proof
> > heuristic
> > > that can cause Kafka Streams to emit an INFO/WARN to the log to inform
> > the
> > > user of the configuration if it detects a rapid rebalance on startup
> due
> > to
> > > new nodes joining? For example, if streams detects a rebalance, before
> > > processors are initialized, that only add new nodes, if the
> configuration
> > > has not been overridden, write to the log?
> > >
> > >
> > >
> > > On Thu, Jun 8, 2017 at 2:56 PM, Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > Just recapping on client-side v.s. broker-side config: we did discuss
> > > about
> > > > adding this as a client-side config and bump up join-group request (I
> > > think
> > > > both Ismael and Ewen questioned about it) to include this configured
> > > value
> > > > to the broker. I cannot remember if there is any strong motivations
> > > against
> > > > going to the client-side config, except that we felt a default
> non-zero
> > > > value will benefit most users assuming they start with more than one
> > > member
> > > > in their group but only advanced users would really realize this
> config
> > > > existing and tune it themselves.
> > > >
> > > > I agree that we could re-consider it for the next release if we
> observe
> > > > that it is actually affecting more users than benefiting them.
> > > >
> > > > Guozhang
> > > >
> > > > On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <da...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Jun/Ismael,
> > > > >
> > > > > Sounds good to me.
> > > > >
> > > > > Thanks,
> > > > > Damian
> > > > >
> > > > > On Tue, 6 Jun 2017 at 23:08 Ismael Juma <is...@juma.me.uk> wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > The console consumer issue also came up in a conversation I was
> > > having
> > > > > > recently. Seems like the config/server.properties change is a
> > > > reasonable
> > > > > > compromise given that we have other defaults that are for
> > > development.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Everyone,
> > > > > > >
> > > > > > > Sorry for being late on this thread. I just came across this
> > > thread.
> > > > I
> > > > > > have
> > > > > > > a couple of concerns on this. (1) It seems the amount of delay
> > will
> > > > be
> > > > > > > application specific. So, it seems that it's better for the
> delay
> > > to
> > > > > be a
> > > > > > > client side config instead of a server side one? (2) When
> running
> > > > > console
> > > > > > > consumer in quickstart, a minimum of 3 sec delay seems to be a
> > bad
> > > > > > > experience for our users.
> > > > > > >
> > > > > > > Since we are getting late into the release cycle, it may be a
> bit
> > > too
> > > > > > late
> > > > > > > to make big changes in the 0.11 release. Perhaps we should at
> > least
> > > > > > > consider overriding the delay in config/server.properties to 0
> to
> > > > > improve
> > > > > > > the quickstart experience?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <
> > damian.guy@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Onur,
> > > > > > > >
> > > > > > > > It was in my previous email. But here it is again.
> > > > > > > >
> > > > > > > > ============================================================
> > > > > > > >
> > > > > > > > 1. Better rebalance timing. We will try to rebalance only
> when
> > > all
> > > > > the
> > > > > > > > consumers in a group have joined. The challenge would be
> > someone
> > > > has
> > > > > to
> > > > > > > > define what does ALL consumers mean, it could either be a
> time
> > or
> > > > > > number
> > > > > > > of
> > > > > > > > consumers, etc.
> > > > > > > >
> > > > > > > > 2. Avoid frequent rebalance. For example, if there are 100
> > > > consumers
> > > > > > in a
> > > > > > > > group, today, in the worst case, we may end up with 100
> > > rebalances
> > > > > even
> > > > > > > if
> > > > > > > > all the consumers joined the group in a reasonably small
> amount
> > > of
> > > > > > time.
> > > > > > > > Frequent rebalance is also a bad thing for brokers.
> > > > > > > >
> > > > > > > > Having a client side configuration may solve problem 1 better
> > > > because
> > > > > > > each
> > > > > > > > consumer group can potentially configure their own timing.
> > > However,
> > > > > it
> > > > > > > does
> > > > > > > > not really prevent frequent rebalance in general because some
> > of
> > > > the
> > > > > > > > consumers can be misconfigured. (This may have something to
> do
> > > with
> > > > > > > KIP-124
> > > > > > > > as well. But if quota is applied on the JoinGroup/SyncGroup
> > > request
> > > > > it
> > > > > > > may
> > > > > > > > cause some unwanted cascading effects.)
> > > > > > > >
> > > > > > > > Having a broker side configuration may result in less
> > flexibility
> > > > for
> > > > > > > each
> > > > > > > > consumer group, but it can prevent frequent rebalance
> better. I
> > > > think
> > > > > > > with
> > > > > > > > some reasonable design, the rebalance timing issue can be
> > > resolved
> > > > on
> > > > > > the
> > > > > > > > broker side as well. Matthias had a good point on extending
> the
> > > > delay
> > > > > > > when
> > > > > > > > a new consumer joins a group (we actually did something
> similar
> > > to
> > > > > > batch
> > > > > > > > ISR change propagation). For example, let's say on the broker
> > > side,
> > > > > we
> > > > > > > will
> > > > > > > > always delay 2 seconds each time we see a new consumer
> joining
> > a
> > > > > > consumer
> > > > > > > > group. This would probably work for most of the consumer
> groups
> > > and
> > > > > > will
> > > > > > > > also limit the rebalance frequency to protect the brokers.
> > > > > > > >
> > > > > > > > I am not sure about the streams use case here, but if
> something
> > > > like
> > > > > 2
> > > > > > > > seconds of delay is acceptable for streams, I would prefer
> > adding
> > > > the
> > > > > > > > configuration to the broker so that we can address both
> > problems.
> > > > > > > >
> > > > > > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <
> > > > > onurkaraman.apache@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Damian.
> > > > > > > > >
> > > > > > > > > Can you copy the point Becket made earlier that you say
> isn't
> > > > > > > addressed?
> > > > > > > > >
> > > > > > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <
> > > damian.guy@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks all, the Vote is now closed and the KIP has been
> > > > accepted
> > > > > > > with 9
> > > > > > > > > +1s
> > > > > > > > > >
> > > > > > > > > > 3 binding::
> > > > > > > > > > Guozhang,
> > > > > > > > > > Jason,
> > > > > > > > > > Ismael
> > > > > > > > > >
> > > > > > > > > > 6 non-binding:
> > > > > > > > > > Bill,
> > > > > > > > > > Eno,
> > > > > > > > > > Mathieu,
> > > > > > > > > > Matthias,
> > > > > > > > > > Dong,
> > > > > > > > > > Mickael
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Damian
> > > > > > > > > >
> > > > > > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <
> ismael@juma.me.uk
> > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP, +1 (binding).
> > > > > > > > > > >
> > > > > > > > > > > Ismael
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> > > > > > > jason@confluent.io
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 Thanks for the KIP!
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > > > > > > > wangguoz@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sorry about the previous email, Gmail seems be
> > > collapsing
> > > > > > them
> > > > > > > > > into a
> > > > > > > > > > > > > single thread on my inbox.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > > > > > > > wangguoz@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Damian, could you create a new thread for the
> > voting
> > > > > > process?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > > > > > > > bbejeck@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> +1(non-binding)
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > > > > > > > > eno.thereska@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > +1 (non binding)
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Thanks
> > > > > > > > > > > > > >> > Eno
> > > > > > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > > > > > > > > matthias@confluent.io>
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > +1
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > > > > > > > >> > >> Hi All,
> > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > >> > >> I'd like to start the voting thread on
> > KIP-134:
> > > > > > > > > > > > > >> > >>
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > >> > >> Thanks,
> > > > > > > > > > > > > >> > >> Damian
> > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Guozhang Wang <wa...@gmail.com>.
Becket:

I think the problem is that when we have a single member joining an unknown
group for the first time ever, do we want to complete the rebalance
immediately or not; it does not matter if we want to "start" the rebalance,
since even for now if the group coordinator is in the SyncGroup phase
waiting for the consumers to send the SyncGroup requests, if it then
receives a JoinGroup request it will still cancel the current rebalance and
falls back to the beginning of the PrepareRebalance.

So with the configurable delta, if it will indeed prevent the started
rebalance to complete, then the console consumer will still be affected; if
it will not prevent the started rebalance to complete, then we may still
get consecutive rebalances since the first rebalance will usually complete
very quick.

I think the proposal for having the configuration on the client-side
instead of on the broker side does not mean that users now need to worry
about the config: with an default value of, say 0, as long as they do not
observe any consecutive rebalance issues they may never need to be aware of
such configs at all. And for some higher-level clients like Streams, we may
decide to change its default configs to be larger than 0 as it may be more
common to hit the issue.


Greg:

Regarding notifying the users with too frequent rebalances, I think it
would be a better mechanism for users to monitor on a certain metric (say,
rebalance rate) than watching on the config? Under normal opration this
rebalance rate should be 0 with only a rare spike from time to time; if
there is continuous non-zero values for this metric then users can be
notified. And we can educate them about configuring their apps with the
recommended values in web docs correspondingly?


Guozhang

On Thu, Jul 13, 2017 at 7:37 AM, Becket Qin <be...@gmail.com> wrote:

> I am a little hesitant to add the configuration to the client. It would be
> more flexible but this seems not the thing that users should worry about (I
> imagine many people would simply set backoff to 0 just for fast rebalance).
> I am wondering if the following variant of the current solution will
> address the problem.
>
> 1. broker will start to rebalance immediately when the first member joins
> the group at T0.
>
> 2. If another member joins the group at T1 which is between T0 and T0 +
> delta (configurable), the broker will wait until T1 + delta then do the
> rebalance. Any additional member joining before the rebalance kicks off
> would result in the delay of the rebalance with the same extension logic as
> we have now. We can also try some exponential back off if needed.
>
> This should help address the console consumer problem. Not sure if there
> are other cases that needs to be considered, though.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Jul 10, 2017 at 5:28 PM, Greg Fodor <gf...@gmail.com> wrote:
>
> > Found this thread after posting an alternative idea after we starting
> > hitting this issue ourselves for a job that has a lot of state stores and
> > topic partitions. My suggestion was to have consumer groups have a
> > configurable minimum member count before consumption begins, but that has
> > its own trade offs and benefits (maybe a different KIP.)
> >
> > One suggestion I had is maybe there is some relatively fool-proof
> heuristic
> > that can cause Kafka Streams to emit an INFO/WARN to the log to inform
> the
> > user of the configuration if it detects a rapid rebalance on startup due
> to
> > new nodes joining? For example, if streams detects a rebalance, before
> > processors are initialized, that only add new nodes, if the configuration
> > has not been overridden, write to the log?
> >
> >
> >
> > On Thu, Jun 8, 2017 at 2:56 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Just recapping on client-side v.s. broker-side config: we did discuss
> > about
> > > adding this as a client-side config and bump up join-group request (I
> > think
> > > both Ismael and Ewen questioned about it) to include this configured
> > value
> > > to the broker. I cannot remember if there is any strong motivations
> > against
> > > going to the client-side config, except that we felt a default non-zero
> > > value will benefit most users assuming they start with more than one
> > member
> > > in their group but only advanced users would really realize this config
> > > existing and tune it themselves.
> > >
> > > I agree that we could re-consider it for the next release if we observe
> > > that it is actually affecting more users than benefiting them.
> > >
> > > Guozhang
> > >
> > > On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <da...@gmail.com>
> wrote:
> > >
> > > > Hi Jun/Ismael,
> > > >
> > > > Sounds good to me.
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Tue, 6 Jun 2017 at 23:08 Ismael Juma <is...@juma.me.uk> wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > The console consumer issue also came up in a conversation I was
> > having
> > > > > recently. Seems like the config/server.properties change is a
> > > reasonable
> > > > > compromise given that we have other defaults that are for
> > development.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Everyone,
> > > > > >
> > > > > > Sorry for being late on this thread. I just came across this
> > thread.
> > > I
> > > > > have
> > > > > > a couple of concerns on this. (1) It seems the amount of delay
> will
> > > be
> > > > > > application specific. So, it seems that it's better for the delay
> > to
> > > > be a
> > > > > > client side config instead of a server side one? (2) When running
> > > > console
> > > > > > consumer in quickstart, a minimum of 3 sec delay seems to be a
> bad
> > > > > > experience for our users.
> > > > > >
> > > > > > Since we are getting late into the release cycle, it may be a bit
> > too
> > > > > late
> > > > > > to make big changes in the 0.11 release. Perhaps we should at
> least
> > > > > > consider overriding the delay in config/server.properties to 0 to
> > > > improve
> > > > > > the quickstart experience?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <
> damian.guy@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Onur,
> > > > > > >
> > > > > > > It was in my previous email. But here it is again.
> > > > > > >
> > > > > > > ============================================================
> > > > > > >
> > > > > > > 1. Better rebalance timing. We will try to rebalance only when
> > all
> > > > the
> > > > > > > consumers in a group have joined. The challenge would be
> someone
> > > has
> > > > to
> > > > > > > define what does ALL consumers mean, it could either be a time
> or
> > > > > number
> > > > > > of
> > > > > > > consumers, etc.
> > > > > > >
> > > > > > > 2. Avoid frequent rebalance. For example, if there are 100
> > > consumers
> > > > > in a
> > > > > > > group, today, in the worst case, we may end up with 100
> > rebalances
> > > > even
> > > > > > if
> > > > > > > all the consumers joined the group in a reasonably small amount
> > of
> > > > > time.
> > > > > > > Frequent rebalance is also a bad thing for brokers.
> > > > > > >
> > > > > > > Having a client side configuration may solve problem 1 better
> > > because
> > > > > > each
> > > > > > > consumer group can potentially configure their own timing.
> > However,
> > > > it
> > > > > > does
> > > > > > > not really prevent frequent rebalance in general because some
> of
> > > the
> > > > > > > consumers can be misconfigured. (This may have something to do
> > with
> > > > > > KIP-124
> > > > > > > as well. But if quota is applied on the JoinGroup/SyncGroup
> > request
> > > > it
> > > > > > may
> > > > > > > cause some unwanted cascading effects.)
> > > > > > >
> > > > > > > Having a broker side configuration may result in less
> flexibility
> > > for
> > > > > > each
> > > > > > > consumer group, but it can prevent frequent rebalance better. I
> > > think
> > > > > > with
> > > > > > > some reasonable design, the rebalance timing issue can be
> > resolved
> > > on
> > > > > the
> > > > > > > broker side as well. Matthias had a good point on extending the
> > > delay
> > > > > > when
> > > > > > > a new consumer joins a group (we actually did something similar
> > to
> > > > > batch
> > > > > > > ISR change propagation). For example, let's say on the broker
> > side,
> > > > we
> > > > > > will
> > > > > > > always delay 2 seconds each time we see a new consumer joining
> a
> > > > > consumer
> > > > > > > group. This would probably work for most of the consumer groups
> > and
> > > > > will
> > > > > > > also limit the rebalance frequency to protect the brokers.
> > > > > > >
> > > > > > > I am not sure about the streams use case here, but if something
> > > like
> > > > 2
> > > > > > > seconds of delay is acceptable for streams, I would prefer
> adding
> > > the
> > > > > > > configuration to the broker so that we can address both
> problems.
> > > > > > >
> > > > > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <
> > > > onurkaraman.apache@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Damian.
> > > > > > > >
> > > > > > > > Can you copy the point Becket made earlier that you say isn't
> > > > > > addressed?
> > > > > > > >
> > > > > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <
> > damian.guy@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks all, the Vote is now closed and the KIP has been
> > > accepted
> > > > > > with 9
> > > > > > > > +1s
> > > > > > > > >
> > > > > > > > > 3 binding::
> > > > > > > > > Guozhang,
> > > > > > > > > Jason,
> > > > > > > > > Ismael
> > > > > > > > >
> > > > > > > > > 6 non-binding:
> > > > > > > > > Bill,
> > > > > > > > > Eno,
> > > > > > > > > Mathieu,
> > > > > > > > > Matthias,
> > > > > > > > > Dong,
> > > > > > > > > Mickael
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Damian
> > > > > > > > >
> > > > > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <ismael@juma.me.uk
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the KIP, +1 (binding).
> > > > > > > > > >
> > > > > > > > > > Ismael
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> > > > > > jason@confluent.io
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 Thanks for the KIP!
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > > > > > > wangguoz@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1
> > > > > > > > > > > >
> > > > > > > > > > > > Sorry about the previous email, Gmail seems be
> > collapsing
> > > > > them
> > > > > > > > into a
> > > > > > > > > > > > single thread on my inbox.
> > > > > > > > > > > >
> > > > > > > > > > > > Guozhang
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > > > > > > wangguoz@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Damian, could you create a new thread for the
> voting
> > > > > process?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > > > > > > bbejeck@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> +1(non-binding)
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > > > > > > > eno.thereska@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > +1 (non binding)
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Thanks
> > > > > > > > > > > > >> > Eno
> > > > > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > > > > > > > matthias@confluent.io>
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > +1
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > > > > > > >> > >> Hi All,
> > > > > > > > > > > > >> > >>
> > > > > > > > > > > > >> > >> I'd like to start the voting thread on
> KIP-134:
> > > > > > > > > > > > >> > >>
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > > > > > > > >> > >>
> > > > > > > > > > > > >> > >> Thanks,
> > > > > > > > > > > > >> > >> Damian
> > > > > > > > > > > > >> > >>
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Becket Qin <be...@gmail.com>.
I am a little hesitant to add the configuration to the client. It would be
more flexible but this seems not the thing that users should worry about (I
imagine many people would simply set backoff to 0 just for fast rebalance).
I am wondering if the following variant of the current solution will
address the problem.

1. broker will start to rebalance immediately when the first member joins
the group at T0.

2. If another member joins the group at T1 which is between T0 and T0 +
delta (configurable), the broker will wait until T1 + delta then do the
rebalance. Any additional member joining before the rebalance kicks off
would result in the delay of the rebalance with the same extension logic as
we have now. We can also try some exponential back off if needed.

This should help address the console consumer problem. Not sure if there
are other cases that needs to be considered, though.

Thanks,

Jiangjie (Becket) Qin

On Mon, Jul 10, 2017 at 5:28 PM, Greg Fodor <gf...@gmail.com> wrote:

> Found this thread after posting an alternative idea after we starting
> hitting this issue ourselves for a job that has a lot of state stores and
> topic partitions. My suggestion was to have consumer groups have a
> configurable minimum member count before consumption begins, but that has
> its own trade offs and benefits (maybe a different KIP.)
>
> One suggestion I had is maybe there is some relatively fool-proof heuristic
> that can cause Kafka Streams to emit an INFO/WARN to the log to inform the
> user of the configuration if it detects a rapid rebalance on startup due to
> new nodes joining? For example, if streams detects a rebalance, before
> processors are initialized, that only add new nodes, if the configuration
> has not been overridden, write to the log?
>
>
>
> On Thu, Jun 8, 2017 at 2:56 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > Just recapping on client-side v.s. broker-side config: we did discuss
> about
> > adding this as a client-side config and bump up join-group request (I
> think
> > both Ismael and Ewen questioned about it) to include this configured
> value
> > to the broker. I cannot remember if there is any strong motivations
> against
> > going to the client-side config, except that we felt a default non-zero
> > value will benefit most users assuming they start with more than one
> member
> > in their group but only advanced users would really realize this config
> > existing and tune it themselves.
> >
> > I agree that we could re-consider it for the next release if we observe
> > that it is actually affecting more users than benefiting them.
> >
> > Guozhang
> >
> > On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <da...@gmail.com> wrote:
> >
> > > Hi Jun/Ismael,
> > >
> > > Sounds good to me.
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Tue, 6 Jun 2017 at 23:08 Ismael Juma <is...@juma.me.uk> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > The console consumer issue also came up in a conversation I was
> having
> > > > recently. Seems like the config/server.properties change is a
> > reasonable
> > > > compromise given that we have other defaults that are for
> development.
> > > >
> > > > Ismael
> > > >
> > > > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Everyone,
> > > > >
> > > > > Sorry for being late on this thread. I just came across this
> thread.
> > I
> > > > have
> > > > > a couple of concerns on this. (1) It seems the amount of delay will
> > be
> > > > > application specific. So, it seems that it's better for the delay
> to
> > > be a
> > > > > client side config instead of a server side one? (2) When running
> > > console
> > > > > consumer in quickstart, a minimum of 3 sec delay seems to be a bad
> > > > > experience for our users.
> > > > >
> > > > > Since we are getting late into the release cycle, it may be a bit
> too
> > > > late
> > > > > to make big changes in the 0.11 release. Perhaps we should at least
> > > > > consider overriding the delay in config/server.properties to 0 to
> > > improve
> > > > > the quickstart experience?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <damian.guy@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Onur,
> > > > > >
> > > > > > It was in my previous email. But here it is again.
> > > > > >
> > > > > > ============================================================
> > > > > >
> > > > > > 1. Better rebalance timing. We will try to rebalance only when
> all
> > > the
> > > > > > consumers in a group have joined. The challenge would be someone
> > has
> > > to
> > > > > > define what does ALL consumers mean, it could either be a time or
> > > > number
> > > > > of
> > > > > > consumers, etc.
> > > > > >
> > > > > > 2. Avoid frequent rebalance. For example, if there are 100
> > consumers
> > > > in a
> > > > > > group, today, in the worst case, we may end up with 100
> rebalances
> > > even
> > > > > if
> > > > > > all the consumers joined the group in a reasonably small amount
> of
> > > > time.
> > > > > > Frequent rebalance is also a bad thing for brokers.
> > > > > >
> > > > > > Having a client side configuration may solve problem 1 better
> > because
> > > > > each
> > > > > > consumer group can potentially configure their own timing.
> However,
> > > it
> > > > > does
> > > > > > not really prevent frequent rebalance in general because some of
> > the
> > > > > > consumers can be misconfigured. (This may have something to do
> with
> > > > > KIP-124
> > > > > > as well. But if quota is applied on the JoinGroup/SyncGroup
> request
> > > it
> > > > > may
> > > > > > cause some unwanted cascading effects.)
> > > > > >
> > > > > > Having a broker side configuration may result in less flexibility
> > for
> > > > > each
> > > > > > consumer group, but it can prevent frequent rebalance better. I
> > think
> > > > > with
> > > > > > some reasonable design, the rebalance timing issue can be
> resolved
> > on
> > > > the
> > > > > > broker side as well. Matthias had a good point on extending the
> > delay
> > > > > when
> > > > > > a new consumer joins a group (we actually did something similar
> to
> > > > batch
> > > > > > ISR change propagation). For example, let's say on the broker
> side,
> > > we
> > > > > will
> > > > > > always delay 2 seconds each time we see a new consumer joining a
> > > > consumer
> > > > > > group. This would probably work for most of the consumer groups
> and
> > > > will
> > > > > > also limit the rebalance frequency to protect the brokers.
> > > > > >
> > > > > > I am not sure about the streams use case here, but if something
> > like
> > > 2
> > > > > > seconds of delay is acceptable for streams, I would prefer adding
> > the
> > > > > > configuration to the broker so that we can address both problems.
> > > > > >
> > > > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <
> > > onurkaraman.apache@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Damian.
> > > > > > >
> > > > > > > Can you copy the point Becket made earlier that you say isn't
> > > > > addressed?
> > > > > > >
> > > > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <
> damian.guy@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks all, the Vote is now closed and the KIP has been
> > accepted
> > > > > with 9
> > > > > > > +1s
> > > > > > > >
> > > > > > > > 3 binding::
> > > > > > > > Guozhang,
> > > > > > > > Jason,
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > 6 non-binding:
> > > > > > > > Bill,
> > > > > > > > Eno,
> > > > > > > > Mathieu,
> > > > > > > > Matthias,
> > > > > > > > Dong,
> > > > > > > > Mickael
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Damian
> > > > > > > >
> > > > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the KIP, +1 (binding).
> > > > > > > > >
> > > > > > > > > Ismael
> > > > > > > > >
> > > > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> > > > > jason@confluent.io
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 Thanks for the KIP!
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > > > > > wangguoz@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1
> > > > > > > > > > >
> > > > > > > > > > > Sorry about the previous email, Gmail seems be
> collapsing
> > > > them
> > > > > > > into a
> > > > > > > > > > > single thread on my inbox.
> > > > > > > > > > >
> > > > > > > > > > > Guozhang
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > > > > > wangguoz@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Damian, could you create a new thread for the voting
> > > > process?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks!
> > > > > > > > > > > >
> > > > > > > > > > > > Guozhang
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > > > > > bbejeck@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> +1(non-binding)
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > > > > > > eno.thereska@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > >> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> > +1 (non binding)
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Thanks
> > > > > > > > > > > >> > Eno
> > > > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > > > > > > matthias@confluent.io>
> > > > > > > > > > > >> wrote:
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > +1
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > > > > > >> > >> Hi All,
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> I'd like to start the voting thread on KIP-134:
> > > > > > > > > > > >> > >>
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Thanks,
> > > > > > > > > > > >> > >> Damian
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > -- Guozhang
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Greg Fodor <gf...@gmail.com>.
Found this thread after posting an alternative idea after we starting
hitting this issue ourselves for a job that has a lot of state stores and
topic partitions. My suggestion was to have consumer groups have a
configurable minimum member count before consumption begins, but that has
its own trade offs and benefits (maybe a different KIP.)

One suggestion I had is maybe there is some relatively fool-proof heuristic
that can cause Kafka Streams to emit an INFO/WARN to the log to inform the
user of the configuration if it detects a rapid rebalance on startup due to
new nodes joining? For example, if streams detects a rebalance, before
processors are initialized, that only add new nodes, if the configuration
has not been overridden, write to the log?



On Thu, Jun 8, 2017 at 2:56 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Just recapping on client-side v.s. broker-side config: we did discuss about
> adding this as a client-side config and bump up join-group request (I think
> both Ismael and Ewen questioned about it) to include this configured value
> to the broker. I cannot remember if there is any strong motivations against
> going to the client-side config, except that we felt a default non-zero
> value will benefit most users assuming they start with more than one member
> in their group but only advanced users would really realize this config
> existing and tune it themselves.
>
> I agree that we could re-consider it for the next release if we observe
> that it is actually affecting more users than benefiting them.
>
> Guozhang
>
> On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <da...@gmail.com> wrote:
>
> > Hi Jun/Ismael,
> >
> > Sounds good to me.
> >
> > Thanks,
> > Damian
> >
> > On Tue, 6 Jun 2017 at 23:08 Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Hi Jun,
> > >
> > > The console consumer issue also came up in a conversation I was having
> > > recently. Seems like the config/server.properties change is a
> reasonable
> > > compromise given that we have other defaults that are for development.
> > >
> > > Ismael
> > >
> > > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Everyone,
> > > >
> > > > Sorry for being late on this thread. I just came across this thread.
> I
> > > have
> > > > a couple of concerns on this. (1) It seems the amount of delay will
> be
> > > > application specific. So, it seems that it's better for the delay to
> > be a
> > > > client side config instead of a server side one? (2) When running
> > console
> > > > consumer in quickstart, a minimum of 3 sec delay seems to be a bad
> > > > experience for our users.
> > > >
> > > > Since we are getting late into the release cycle, it may be a bit too
> > > late
> > > > to make big changes in the 0.11 release. Perhaps we should at least
> > > > consider overriding the delay in config/server.properties to 0 to
> > improve
> > > > the quickstart experience?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <da...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Onur,
> > > > >
> > > > > It was in my previous email. But here it is again.
> > > > >
> > > > > ============================================================
> > > > >
> > > > > 1. Better rebalance timing. We will try to rebalance only when all
> > the
> > > > > consumers in a group have joined. The challenge would be someone
> has
> > to
> > > > > define what does ALL consumers mean, it could either be a time or
> > > number
> > > > of
> > > > > consumers, etc.
> > > > >
> > > > > 2. Avoid frequent rebalance. For example, if there are 100
> consumers
> > > in a
> > > > > group, today, in the worst case, we may end up with 100 rebalances
> > even
> > > > if
> > > > > all the consumers joined the group in a reasonably small amount of
> > > time.
> > > > > Frequent rebalance is also a bad thing for brokers.
> > > > >
> > > > > Having a client side configuration may solve problem 1 better
> because
> > > > each
> > > > > consumer group can potentially configure their own timing. However,
> > it
> > > > does
> > > > > not really prevent frequent rebalance in general because some of
> the
> > > > > consumers can be misconfigured. (This may have something to do with
> > > > KIP-124
> > > > > as well. But if quota is applied on the JoinGroup/SyncGroup request
> > it
> > > > may
> > > > > cause some unwanted cascading effects.)
> > > > >
> > > > > Having a broker side configuration may result in less flexibility
> for
> > > > each
> > > > > consumer group, but it can prevent frequent rebalance better. I
> think
> > > > with
> > > > > some reasonable design, the rebalance timing issue can be resolved
> on
> > > the
> > > > > broker side as well. Matthias had a good point on extending the
> delay
> > > > when
> > > > > a new consumer joins a group (we actually did something similar to
> > > batch
> > > > > ISR change propagation). For example, let's say on the broker side,
> > we
> > > > will
> > > > > always delay 2 seconds each time we see a new consumer joining a
> > > consumer
> > > > > group. This would probably work for most of the consumer groups and
> > > will
> > > > > also limit the rebalance frequency to protect the brokers.
> > > > >
> > > > > I am not sure about the streams use case here, but if something
> like
> > 2
> > > > > seconds of delay is acceptable for streams, I would prefer adding
> the
> > > > > configuration to the broker so that we can address both problems.
> > > > >
> > > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <
> > onurkaraman.apache@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Damian.
> > > > > >
> > > > > > Can you copy the point Becket made earlier that you say isn't
> > > > addressed?
> > > > > >
> > > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <damian.guy@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Thanks all, the Vote is now closed and the KIP has been
> accepted
> > > > with 9
> > > > > > +1s
> > > > > > >
> > > > > > > 3 binding::
> > > > > > > Guozhang,
> > > > > > > Jason,
> > > > > > > Ismael
> > > > > > >
> > > > > > > 6 non-binding:
> > > > > > > Bill,
> > > > > > > Eno,
> > > > > > > Mathieu,
> > > > > > > Matthias,
> > > > > > > Dong,
> > > > > > > Mickael
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Damian
> > > > > > >
> > > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > > > > >
> > > > > > > > Thanks for the KIP, +1 (binding).
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> > > > jason@confluent.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 Thanks for the KIP!
> > > > > > > > >
> > > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > > > > wangguoz@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > > > Sorry about the previous email, Gmail seems be collapsing
> > > them
> > > > > > into a
> > > > > > > > > > single thread on my inbox.
> > > > > > > > > >
> > > > > > > > > > Guozhang
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > > > > wangguoz@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Damian, could you create a new thread for the voting
> > > process?
> > > > > > > > > > >
> > > > > > > > > > > Thanks!
> > > > > > > > > > >
> > > > > > > > > > > Guozhang
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > > > > bbejeck@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> +1(non-binding)
> > > > > > > > > > >>
> > > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > > > > > eno.thereska@gmail.com
> > > > > > > > > >
> > > > > > > > > > >> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > +1 (non binding)
> > > > > > > > > > >> >
> > > > > > > > > > >> > Thanks
> > > > > > > > > > >> > Eno
> > > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > > > > > matthias@confluent.io>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > +1
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > > > > >> > >> Hi All,
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> I'd like to start the voting thread on KIP-134:
> > > > > > > > > > >> > >>
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Thanks,
> > > > > > > > > > >> > >> Damian
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > -- Guozhang
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -- Guozhang
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Guozhang Wang <wa...@gmail.com>.
Just recapping on client-side v.s. broker-side config: we did discuss about
adding this as a client-side config and bump up join-group request (I think
both Ismael and Ewen questioned about it) to include this configured value
to the broker. I cannot remember if there is any strong motivations against
going to the client-side config, except that we felt a default non-zero
value will benefit most users assuming they start with more than one member
in their group but only advanced users would really realize this config
existing and tune it themselves.

I agree that we could re-consider it for the next release if we observe
that it is actually affecting more users than benefiting them.

Guozhang

On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <da...@gmail.com> wrote:

> Hi Jun/Ismael,
>
> Sounds good to me.
>
> Thanks,
> Damian
>
> On Tue, 6 Jun 2017 at 23:08 Ismael Juma <is...@juma.me.uk> wrote:
>
> > Hi Jun,
> >
> > The console consumer issue also came up in a conversation I was having
> > recently. Seems like the config/server.properties change is a reasonable
> > compromise given that we have other defaults that are for development.
> >
> > Ismael
> >
> > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Everyone,
> > >
> > > Sorry for being late on this thread. I just came across this thread. I
> > have
> > > a couple of concerns on this. (1) It seems the amount of delay will be
> > > application specific. So, it seems that it's better for the delay to
> be a
> > > client side config instead of a server side one? (2) When running
> console
> > > consumer in quickstart, a minimum of 3 sec delay seems to be a bad
> > > experience for our users.
> > >
> > > Since we are getting late into the release cycle, it may be a bit too
> > late
> > > to make big changes in the 0.11 release. Perhaps we should at least
> > > consider overriding the delay in config/server.properties to 0 to
> improve
> > > the quickstart experience?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <da...@gmail.com>
> > wrote:
> > >
> > > > Hi Onur,
> > > >
> > > > It was in my previous email. But here it is again.
> > > >
> > > > ============================================================
> > > >
> > > > 1. Better rebalance timing. We will try to rebalance only when all
> the
> > > > consumers in a group have joined. The challenge would be someone has
> to
> > > > define what does ALL consumers mean, it could either be a time or
> > number
> > > of
> > > > consumers, etc.
> > > >
> > > > 2. Avoid frequent rebalance. For example, if there are 100 consumers
> > in a
> > > > group, today, in the worst case, we may end up with 100 rebalances
> even
> > > if
> > > > all the consumers joined the group in a reasonably small amount of
> > time.
> > > > Frequent rebalance is also a bad thing for brokers.
> > > >
> > > > Having a client side configuration may solve problem 1 better because
> > > each
> > > > consumer group can potentially configure their own timing. However,
> it
> > > does
> > > > not really prevent frequent rebalance in general because some of the
> > > > consumers can be misconfigured. (This may have something to do with
> > > KIP-124
> > > > as well. But if quota is applied on the JoinGroup/SyncGroup request
> it
> > > may
> > > > cause some unwanted cascading effects.)
> > > >
> > > > Having a broker side configuration may result in less flexibility for
> > > each
> > > > consumer group, but it can prevent frequent rebalance better. I think
> > > with
> > > > some reasonable design, the rebalance timing issue can be resolved on
> > the
> > > > broker side as well. Matthias had a good point on extending the delay
> > > when
> > > > a new consumer joins a group (we actually did something similar to
> > batch
> > > > ISR change propagation). For example, let's say on the broker side,
> we
> > > will
> > > > always delay 2 seconds each time we see a new consumer joining a
> > consumer
> > > > group. This would probably work for most of the consumer groups and
> > will
> > > > also limit the rebalance frequency to protect the brokers.
> > > >
> > > > I am not sure about the streams use case here, but if something like
> 2
> > > > seconds of delay is acceptable for streams, I would prefer adding the
> > > > configuration to the broker so that we can address both problems.
> > > >
> > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <
> onurkaraman.apache@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Damian.
> > > > >
> > > > > Can you copy the point Becket made earlier that you say isn't
> > > addressed?
> > > > >
> > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks all, the Vote is now closed and the KIP has been accepted
> > > with 9
> > > > > +1s
> > > > > >
> > > > > > 3 binding::
> > > > > > Guozhang,
> > > > > > Jason,
> > > > > > Ismael
> > > > > >
> > > > > > 6 non-binding:
> > > > > > Bill,
> > > > > > Eno,
> > > > > > Mathieu,
> > > > > > Matthias,
> > > > > > Dong,
> > > > > > Mickael
> > > > > >
> > > > > > Thanks,
> > > > > > Damian
> > > > > >
> > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <is...@juma.me.uk>
> wrote:
> > > > > >
> > > > > > > Thanks for the KIP, +1 (binding).
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> > > jason@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 Thanks for the KIP!
> > > > > > > >
> > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > > > wangguoz@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > Sorry about the previous email, Gmail seems be collapsing
> > them
> > > > > into a
> > > > > > > > > single thread on my inbox.
> > > > > > > > >
> > > > > > > > > Guozhang
> > > > > > > > >
> > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > > > wangguoz@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Damian, could you create a new thread for the voting
> > process?
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > > Guozhang
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > > > bbejeck@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> +1(non-binding)
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > > > > eno.thereska@gmail.com
> > > > > > > > >
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > +1 (non binding)
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks
> > > > > > > > > >> > Eno
> > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > > > > matthias@confluent.io>
> > > > > > > > > >> wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > +1
> > > > > > > > > >> > >
> > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > > > >> > >> Hi All,
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> I'd like to start the voting thread on KIP-134:
> > > > > > > > > >> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Thanks,
> > > > > > > > > >> > >> Damian
> > > > > > > > > >> > >>
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -- Guozhang
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -- Guozhang
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Damian Guy <da...@gmail.com>.
Hi Jun/Ismael,

Sounds good to me.

Thanks,
Damian

On Tue, 6 Jun 2017 at 23:08 Ismael Juma <is...@juma.me.uk> wrote:

> Hi Jun,
>
> The console consumer issue also came up in a conversation I was having
> recently. Seems like the config/server.properties change is a reasonable
> compromise given that we have other defaults that are for development.
>
> Ismael
>
> On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Everyone,
> >
> > Sorry for being late on this thread. I just came across this thread. I
> have
> > a couple of concerns on this. (1) It seems the amount of delay will be
> > application specific. So, it seems that it's better for the delay to be a
> > client side config instead of a server side one? (2) When running console
> > consumer in quickstart, a minimum of 3 sec delay seems to be a bad
> > experience for our users.
> >
> > Since we are getting late into the release cycle, it may be a bit too
> late
> > to make big changes in the 0.11 release. Perhaps we should at least
> > consider overriding the delay in config/server.properties to 0 to improve
> > the quickstart experience?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <da...@gmail.com>
> wrote:
> >
> > > Hi Onur,
> > >
> > > It was in my previous email. But here it is again.
> > >
> > > ============================================================
> > >
> > > 1. Better rebalance timing. We will try to rebalance only when all the
> > > consumers in a group have joined. The challenge would be someone has to
> > > define what does ALL consumers mean, it could either be a time or
> number
> > of
> > > consumers, etc.
> > >
> > > 2. Avoid frequent rebalance. For example, if there are 100 consumers
> in a
> > > group, today, in the worst case, we may end up with 100 rebalances even
> > if
> > > all the consumers joined the group in a reasonably small amount of
> time.
> > > Frequent rebalance is also a bad thing for brokers.
> > >
> > > Having a client side configuration may solve problem 1 better because
> > each
> > > consumer group can potentially configure their own timing. However, it
> > does
> > > not really prevent frequent rebalance in general because some of the
> > > consumers can be misconfigured. (This may have something to do with
> > KIP-124
> > > as well. But if quota is applied on the JoinGroup/SyncGroup request it
> > may
> > > cause some unwanted cascading effects.)
> > >
> > > Having a broker side configuration may result in less flexibility for
> > each
> > > consumer group, but it can prevent frequent rebalance better. I think
> > with
> > > some reasonable design, the rebalance timing issue can be resolved on
> the
> > > broker side as well. Matthias had a good point on extending the delay
> > when
> > > a new consumer joins a group (we actually did something similar to
> batch
> > > ISR change propagation). For example, let's say on the broker side, we
> > will
> > > always delay 2 seconds each time we see a new consumer joining a
> consumer
> > > group. This would probably work for most of the consumer groups and
> will
> > > also limit the rebalance frequency to protect the brokers.
> > >
> > > I am not sure about the streams use case here, but if something like 2
> > > seconds of delay is acceptable for streams, I would prefer adding the
> > > configuration to the broker so that we can address both problems.
> > >
> > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <onurkaraman.apache@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Damian.
> > > >
> > > > Can you copy the point Becket made earlier that you say isn't
> > addressed?
> > > >
> > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <da...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks all, the Vote is now closed and the KIP has been accepted
> > with 9
> > > > +1s
> > > > >
> > > > > 3 binding::
> > > > > Guozhang,
> > > > > Jason,
> > > > > Ismael
> > > > >
> > > > > 6 non-binding:
> > > > > Bill,
> > > > > Eno,
> > > > > Mathieu,
> > > > > Matthias,
> > > > > Dong,
> > > > > Mickael
> > > > >
> > > > > Thanks,
> > > > > Damian
> > > > >
> > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <is...@juma.me.uk> wrote:
> > > > >
> > > > > > Thanks for the KIP, +1 (binding).
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> > jason@confluent.io
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1 Thanks for the KIP!
> > > > > > >
> > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > > wangguoz@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Sorry about the previous email, Gmail seems be collapsing
> them
> > > > into a
> > > > > > > > single thread on my inbox.
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > > wangguoz@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Damian, could you create a new thread for the voting
> process?
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > Guozhang
> > > > > > > > >
> > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > > bbejeck@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> +1(non-binding)
> > > > > > > > >>
> > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > > > eno.thereska@gmail.com
> > > > > > > >
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > +1 (non binding)
> > > > > > > > >> >
> > > > > > > > >> > Thanks
> > > > > > > > >> > Eno
> > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > > > matthias@confluent.io>
> > > > > > > > >> wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > +1
> > > > > > > > >> > >
> > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > > >> > >> Hi All,
> > > > > > > > >> > >>
> > > > > > > > >> > >> I'd like to start the voting thread on KIP-134:
> > > > > > > > >> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > > > >> > >>
> > > > > > > > >> > >> Thanks,
> > > > > > > > >> > >> Damian
> > > > > > > > >> > >>
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -- Guozhang
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Ismael Juma <is...@juma.me.uk>.
Hi Jun,

The console consumer issue also came up in a conversation I was having
recently. Seems like the config/server.properties change is a reasonable
compromise given that we have other defaults that are for development.

Ismael

On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io> wrote:

> Hi, Everyone,
>
> Sorry for being late on this thread. I just came across this thread. I have
> a couple of concerns on this. (1) It seems the amount of delay will be
> application specific. So, it seems that it's better for the delay to be a
> client side config instead of a server side one? (2) When running console
> consumer in quickstart, a minimum of 3 sec delay seems to be a bad
> experience for our users.
>
> Since we are getting late into the release cycle, it may be a bit too late
> to make big changes in the 0.11 release. Perhaps we should at least
> consider overriding the delay in config/server.properties to 0 to improve
> the quickstart experience?
>
> Thanks,
>
> Jun
>
>
> On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <da...@gmail.com> wrote:
>
> > Hi Onur,
> >
> > It was in my previous email. But here it is again.
> >
> > ============================================================
> >
> > 1. Better rebalance timing. We will try to rebalance only when all the
> > consumers in a group have joined. The challenge would be someone has to
> > define what does ALL consumers mean, it could either be a time or number
> of
> > consumers, etc.
> >
> > 2. Avoid frequent rebalance. For example, if there are 100 consumers in a
> > group, today, in the worst case, we may end up with 100 rebalances even
> if
> > all the consumers joined the group in a reasonably small amount of time.
> > Frequent rebalance is also a bad thing for brokers.
> >
> > Having a client side configuration may solve problem 1 better because
> each
> > consumer group can potentially configure their own timing. However, it
> does
> > not really prevent frequent rebalance in general because some of the
> > consumers can be misconfigured. (This may have something to do with
> KIP-124
> > as well. But if quota is applied on the JoinGroup/SyncGroup request it
> may
> > cause some unwanted cascading effects.)
> >
> > Having a broker side configuration may result in less flexibility for
> each
> > consumer group, but it can prevent frequent rebalance better. I think
> with
> > some reasonable design, the rebalance timing issue can be resolved on the
> > broker side as well. Matthias had a good point on extending the delay
> when
> > a new consumer joins a group (we actually did something similar to batch
> > ISR change propagation). For example, let's say on the broker side, we
> will
> > always delay 2 seconds each time we see a new consumer joining a consumer
> > group. This would probably work for most of the consumer groups and will
> > also limit the rebalance frequency to protect the brokers.
> >
> > I am not sure about the streams use case here, but if something like 2
> > seconds of delay is acceptable for streams, I would prefer adding the
> > configuration to the broker so that we can address both problems.
> >
> > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <on...@gmail.com>
> > wrote:
> >
> > > Hi Damian.
> > >
> > > Can you copy the point Becket made earlier that you say isn't
> addressed?
> > >
> > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <da...@gmail.com>
> wrote:
> > >
> > > > Thanks all, the Vote is now closed and the KIP has been accepted
> with 9
> > > +1s
> > > >
> > > > 3 binding::
> > > > Guozhang,
> > > > Jason,
> > > > Ismael
> > > >
> > > > 6 non-binding:
> > > > Bill,
> > > > Eno,
> > > > Mathieu,
> > > > Matthias,
> > > > Dong,
> > > > Mickael
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <is...@juma.me.uk> wrote:
> > > >
> > > > > Thanks for the KIP, +1 (binding).
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> jason@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > +1 Thanks for the KIP!
> > > > > >
> > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > wangguoz@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Sorry about the previous email, Gmail seems be collapsing them
> > > into a
> > > > > > > single thread on my inbox.
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > wangguoz@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Damian, could you create a new thread for the voting process?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > bbejeck@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> +1(non-binding)
> > > > > > > >>
> > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > > eno.thereska@gmail.com
> > > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > +1 (non binding)
> > > > > > > >> >
> > > > > > > >> > Thanks
> > > > > > > >> > Eno
> > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > > matthias@confluent.io>
> > > > > > > >> wrote:
> > > > > > > >> > >
> > > > > > > >> > > +1
> > > > > > > >> > >
> > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > >> > >> Hi All,
> > > > > > > >> > >>
> > > > > > > >> > >> I'd like to start the voting thread on KIP-134:
> > > > > > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > > >> > >>
> > > > > > > >> > >> Thanks,
> > > > > > > >> > >> Damian
> > > > > > > >> > >>
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Ismael Juma <is...@juma.me.uk>.
I forgot to address the broker versus client-side point that Jun raised. We
discussed passing the delay via the request and we were unable to make a
strong case for it. It seems like tools that use a single consumer and
group management (like the Console Consumer) are such a case though.
Changing the default in server.properties helps with quick start, but
doesn't help if the broker is configured elsewhere (say a Cloud-like
environment).

It's probably too late for this release, but we should consider it for the
next release.

Ismael

On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <ju...@confluent.io> wrote:

> Hi, Everyone,
>
> Sorry for being late on this thread. I just came across this thread. I have
> a couple of concerns on this. (1) It seems the amount of delay will be
> application specific. So, it seems that it's better for the delay to be a
> client side config instead of a server side one? (2) When running console
> consumer in quickstart, a minimum of 3 sec delay seems to be a bad
> experience for our users.
>
> Since we are getting late into the release cycle, it may be a bit too late
> to make big changes in the 0.11 release. Perhaps we should at least
> consider overriding the delay in config/server.properties to 0 to improve
> the quickstart experience?
>
> Thanks,
>
> Jun
>
>
> On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <da...@gmail.com> wrote:
>
> > Hi Onur,
> >
> > It was in my previous email. But here it is again.
> >
> > ============================================================
> >
> > 1. Better rebalance timing. We will try to rebalance only when all the
> > consumers in a group have joined. The challenge would be someone has to
> > define what does ALL consumers mean, it could either be a time or number
> of
> > consumers, etc.
> >
> > 2. Avoid frequent rebalance. For example, if there are 100 consumers in a
> > group, today, in the worst case, we may end up with 100 rebalances even
> if
> > all the consumers joined the group in a reasonably small amount of time.
> > Frequent rebalance is also a bad thing for brokers.
> >
> > Having a client side configuration may solve problem 1 better because
> each
> > consumer group can potentially configure their own timing. However, it
> does
> > not really prevent frequent rebalance in general because some of the
> > consumers can be misconfigured. (This may have something to do with
> KIP-124
> > as well. But if quota is applied on the JoinGroup/SyncGroup request it
> may
> > cause some unwanted cascading effects.)
> >
> > Having a broker side configuration may result in less flexibility for
> each
> > consumer group, but it can prevent frequent rebalance better. I think
> with
> > some reasonable design, the rebalance timing issue can be resolved on the
> > broker side as well. Matthias had a good point on extending the delay
> when
> > a new consumer joins a group (we actually did something similar to batch
> > ISR change propagation). For example, let's say on the broker side, we
> will
> > always delay 2 seconds each time we see a new consumer joining a consumer
> > group. This would probably work for most of the consumer groups and will
> > also limit the rebalance frequency to protect the brokers.
> >
> > I am not sure about the streams use case here, but if something like 2
> > seconds of delay is acceptable for streams, I would prefer adding the
> > configuration to the broker so that we can address both problems.
> >
> > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <on...@gmail.com>
> > wrote:
> >
> > > Hi Damian.
> > >
> > > Can you copy the point Becket made earlier that you say isn't
> addressed?
> > >
> > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <da...@gmail.com>
> wrote:
> > >
> > > > Thanks all, the Vote is now closed and the KIP has been accepted
> with 9
> > > +1s
> > > >
> > > > 3 binding::
> > > > Guozhang,
> > > > Jason,
> > > > Ismael
> > > >
> > > > 6 non-binding:
> > > > Bill,
> > > > Eno,
> > > > Mathieu,
> > > > Matthias,
> > > > Dong,
> > > > Mickael
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <is...@juma.me.uk> wrote:
> > > >
> > > > > Thanks for the KIP, +1 (binding).
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> jason@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > +1 Thanks for the KIP!
> > > > > >
> > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > wangguoz@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Sorry about the previous email, Gmail seems be collapsing them
> > > into a
> > > > > > > single thread on my inbox.
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > wangguoz@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Damian, could you create a new thread for the voting process?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > bbejeck@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> +1(non-binding)
> > > > > > > >>
> > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska <
> > > > > eno.thereska@gmail.com
> > > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > +1 (non binding)
> > > > > > > >> >
> > > > > > > >> > Thanks
> > > > > > > >> > Eno
> > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax <
> > > > > matthias@confluent.io>
> > > > > > > >> wrote:
> > > > > > > >> > >
> > > > > > > >> > > +1
> > > > > > > >> > >
> > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > >> > >> Hi All,
> > > > > > > >> > >>
> > > > > > > >> > >> I'd like to start the voting thread on KIP-134:
> > > > > > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance
> > > > > > > >> > >>
> > > > > > > >> > >> Thanks,
> > > > > > > >> > >> Damian
> > > > > > > >> > >>
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>