You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Mathieu Fenniak <ma...@replicon.com> on 2017/04/03 13:53:00 UTC

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

+1 (non-binding)

This will be very helpful for me, looking forward to it! :-)

On Thu, Mar 30, 2017 at 4:46 AM, Damian Guy <da...@gmail.com> wrote:

> Hi All,
>
> I'd like to start the voting thread on KIP-134:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 134%3A+Delay+initial+consumer+group+rebalance
>
> Thanks,
> Damian
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Damian Guy <da...@gmail.com>.
Hi Onur,

Thanks for the update. I misunderstood what you said before. I believe what
you are suggesting sounds ok, though i don't think it addresses the point
Becket made earlier in the discussion thread. See below.

Thanks,
Damian

============================================================

1. Better rebalance timing. We will try to rebalance only when all the
consumers in a group have joined. The challenge would be someone has to
define what does ALL consumers mean, it could either be a time or number of
consumers, etc.

2. Avoid frequent rebalance. For example, if there are 100 consumers in a
group, today, in the worst case, we may end up with 100 rebalances even if
all the consumers joined the group in a reasonably small amount of time.
Frequent rebalance is also a bad thing for brokers.

Having a client side configuration may solve problem 1 better because each
consumer group can potentially configure their own timing. However, it does
not really prevent frequent rebalance in general because some of the
consumers can be misconfigured. (This may have something to do with KIP-124
as well. But if quota is applied on the JoinGroup/SyncGroup request it may
cause some unwanted cascading effects.)

Having a broker side configuration may result in less flexibility for each
consumer group, but it can prevent frequent rebalance better. I think with
some reasonable design, the rebalance timing issue can be resolved on the
broker side as well. Matthias had a good point on extending the delay when
a new consumer joins a group (we actually did something similar to batch
ISR change propagation). For example, let's say on the broker side, we will
always delay 2 seconds each time we see a new consumer joining a consumer
group. This would probably work for most of the consumer groups and will
also limit the rebalance frequency to protect the brokers.

I am not sure about the streams use case here, but if something like 2
seconds of delay is acceptable for streams, I would prefer adding the
configuration to the broker so that we can address both problems.

On Mon, 3 Apr 2017 at 21:41 Onur Karaman <on...@gmail.com>
wrote:

Delaying the SyncGroupRequest is not what I had in mind.

What I was thinking was essentially a client-side stabilization window
where the client does nothing other than participate in the group
membership protocol and wait a bit for the group to stabilize.

During this window, several rounds of rebalance can take place, clients
would participate in these rebalances (they'd get notified of the rebalance
from the heartbeats they've been sending during this stabilization window),
but they would effectively not run any
ConsumerRebalanceListener.onPartitionsAssigned or process messages until
the window has closed or rebalance finishes if the window ends during a
rebalance.

So something like:
T0: client A is processing messages
T1: new client B joins
T2: client A gets notified and rejoins the group.
T3: rebalance finishes with the group consisting of A and B. This is where
the stabilization window begins for both A and B. Stabilization window
duration is W.
T4: new client C joins.
T5: clients A and B get notified and they rejoin the group.
T6: rebalance finishes with the group consisting of A, B, and C.
T3+W: clients A, B, and C finally run their
ConsumerRebalanceListener.onPartitionsAssigned and begin processing
messages.

If T3+W is during the middle of a rebalance, then we wait until that
rebalance round finishes. Otherwise, we just run the
ConsumerRebalanceListener.onPartitionsAssigned and begin processing
messages.

On Mon, Apr 3, 2017 at 11:40 AM, Becket Qin <be...@gmail.com> wrote:

> Hey Onur,
>
> Are you suggesting letting the consumers to hold back on sending
> SyncGroupRequest on the first rebalance? I am not sure how exactly that
> works. But it looks that having the group coordinator to control the
> rebalance progress would be clearer and probably safer than letting the
> group members to guess the state of a group. Can you elaborate a little
bit
> on your idea?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Apr 3, 2017 at 8:16 AM, Onur Karaman <onurkaraman.apache@gmail.com
> >
> wrote:
>
> > Hi Damian.
> >
> > After reading the discussion thread again, it still doesn't seem like
the
> > thread discussed the option I mentioned earlier.
> >
> > From what I had understood from the broker-side vs. client-side config
> > debate was that the client-side config from the discussion would cause a
> > wire format change, while the client-side config change that I had
> > suggested would not.
> >
> > I just want to make sure we don't accidentally skip past it due to a
> > potential misunderstanding.
> >
> > On Mon, Apr 3, 2017 at 8:10 AM, Bill Bejeck <bb...@gmail.com> wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Mon, Apr 3, 2017 at 9:53 AM, Mathieu Fenniak <
> > > mathieu.fenniak@replicon.com> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > This will be very helpful for me, looking forward to it! :-)
> > > >
> > > > On Thu, Mar 30, 2017 at 4:46 AM, Damian Guy <da...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to start the voting thread on KIP-134:
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 134%3A+Delay+initial+consumer+group+rebalance
> > > > >
> > > > > Thanks,
> > > > > Damian
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Onur Karaman <on...@gmail.com>.
Delaying the SyncGroupRequest is not what I had in mind.

What I was thinking was essentially a client-side stabilization window
where the client does nothing other than participate in the group
membership protocol and wait a bit for the group to stabilize.

During this window, several rounds of rebalance can take place, clients
would participate in these rebalances (they'd get notified of the rebalance
from the heartbeats they've been sending during this stabilization window),
but they would effectively not run any
ConsumerRebalanceListener.onPartitionsAssigned or process messages until
the window has closed or rebalance finishes if the window ends during a
rebalance.

So something like:
T0: client A is processing messages
T1: new client B joins
T2: client A gets notified and rejoins the group.
T3: rebalance finishes with the group consisting of A and B. This is where
the stabilization window begins for both A and B. Stabilization window
duration is W.
T4: new client C joins.
T5: clients A and B get notified and they rejoin the group.
T6: rebalance finishes with the group consisting of A, B, and C.
T3+W: clients A, B, and C finally run their
ConsumerRebalanceListener.onPartitionsAssigned and begin processing
messages.

If T3+W is during the middle of a rebalance, then we wait until that
rebalance round finishes. Otherwise, we just run the
ConsumerRebalanceListener.onPartitionsAssigned and begin processing
messages.

On Mon, Apr 3, 2017 at 11:40 AM, Becket Qin <be...@gmail.com> wrote:

> Hey Onur,
>
> Are you suggesting letting the consumers to hold back on sending
> SyncGroupRequest on the first rebalance? I am not sure how exactly that
> works. But it looks that having the group coordinator to control the
> rebalance progress would be clearer and probably safer than letting the
> group members to guess the state of a group. Can you elaborate a little bit
> on your idea?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Apr 3, 2017 at 8:16 AM, Onur Karaman <onurkaraman.apache@gmail.com
> >
> wrote:
>
> > Hi Damian.
> >
> > After reading the discussion thread again, it still doesn't seem like the
> > thread discussed the option I mentioned earlier.
> >
> > From what I had understood from the broker-side vs. client-side config
> > debate was that the client-side config from the discussion would cause a
> > wire format change, while the client-side config change that I had
> > suggested would not.
> >
> > I just want to make sure we don't accidentally skip past it due to a
> > potential misunderstanding.
> >
> > On Mon, Apr 3, 2017 at 8:10 AM, Bill Bejeck <bb...@gmail.com> wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Mon, Apr 3, 2017 at 9:53 AM, Mathieu Fenniak <
> > > mathieu.fenniak@replicon.com> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > This will be very helpful for me, looking forward to it! :-)
> > > >
> > > > On Thu, Mar 30, 2017 at 4:46 AM, Damian Guy <da...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to start the voting thread on KIP-134:
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 134%3A+Delay+initial+consumer+group+rebalance
> > > > >
> > > > > Thanks,
> > > > > Damian
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Becket Qin <be...@gmail.com>.
Hey Onur,

Are you suggesting letting the consumers to hold back on sending
SyncGroupRequest on the first rebalance? I am not sure how exactly that
works. But it looks that having the group coordinator to control the
rebalance progress would be clearer and probably safer than letting the
group members to guess the state of a group. Can you elaborate a little bit
on your idea?

Thanks,

Jiangjie (Becket) Qin

On Mon, Apr 3, 2017 at 8:16 AM, Onur Karaman <on...@gmail.com>
wrote:

> Hi Damian.
>
> After reading the discussion thread again, it still doesn't seem like the
> thread discussed the option I mentioned earlier.
>
> From what I had understood from the broker-side vs. client-side config
> debate was that the client-side config from the discussion would cause a
> wire format change, while the client-side config change that I had
> suggested would not.
>
> I just want to make sure we don't accidentally skip past it due to a
> potential misunderstanding.
>
> On Mon, Apr 3, 2017 at 8:10 AM, Bill Bejeck <bb...@gmail.com> wrote:
>
> > +1 (non-binding)
> >
> > On Mon, Apr 3, 2017 at 9:53 AM, Mathieu Fenniak <
> > mathieu.fenniak@replicon.com> wrote:
> >
> > > +1 (non-binding)
> > >
> > > This will be very helpful for me, looking forward to it! :-)
> > >
> > > On Thu, Mar 30, 2017 at 4:46 AM, Damian Guy <da...@gmail.com>
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to start the voting thread on KIP-134:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 134%3A+Delay+initial+consumer+group+rebalance
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > >
> >
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Onur Karaman <on...@gmail.com>.
Hi Damian.

After reading the discussion thread again, it still doesn't seem like the
thread discussed the option I mentioned earlier.

From what I had understood from the broker-side vs. client-side config
debate was that the client-side config from the discussion would cause a
wire format change, while the client-side config change that I had
suggested would not.

I just want to make sure we don't accidentally skip past it due to a
potential misunderstanding.

On Mon, Apr 3, 2017 at 8:10 AM, Bill Bejeck <bb...@gmail.com> wrote:

> +1 (non-binding)
>
> On Mon, Apr 3, 2017 at 9:53 AM, Mathieu Fenniak <
> mathieu.fenniak@replicon.com> wrote:
>
> > +1 (non-binding)
> >
> > This will be very helpful for me, looking forward to it! :-)
> >
> > On Thu, Mar 30, 2017 at 4:46 AM, Damian Guy <da...@gmail.com>
> wrote:
> >
> > > Hi All,
> > >
> > > I'd like to start the voting thread on KIP-134:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 134%3A+Delay+initial+consumer+group+rebalance
> > >
> > > Thanks,
> > > Damian
> > >
> >
>

Re: [VOTE] KIP-134: Delay initial consumer group rebalance

Posted by Bill Bejeck <bb...@gmail.com>.
+1 (non-binding)

On Mon, Apr 3, 2017 at 9:53 AM, Mathieu Fenniak <
mathieu.fenniak@replicon.com> wrote:

> +1 (non-binding)
>
> This will be very helpful for me, looking forward to it! :-)
>
> On Thu, Mar 30, 2017 at 4:46 AM, Damian Guy <da...@gmail.com> wrote:
>
> > Hi All,
> >
> > I'd like to start the voting thread on KIP-134:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 134%3A+Delay+initial+consumer+group+rebalance
> >
> > Thanks,
> > Damian
> >
>