You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Marcos Juarez <mj...@gmail.com> on 2019/01/24 17:51:10 UTC

Kafka consumer configuration to minimize rebalance time

One of our internal customers is working on a service that spans around 120
kubernetes pods.  Due to design constraints, every one of these pods has a
single kafka consumer, and they're all using the same consumer group id.
Since it's kubernetes, and the service is sized according to volume
throughout the day, pods are added/removed constantly, at least a few times
per hour.

What we are seeing with initial testing is that, whenever a single pod
joins or leaves the consumer group, it triggers a rebalance that sometimes
takes up to 60+ seconds to resolve.  Consumption resumes after the
rebalance event, but of course now there's 60+ second lag in consumption
for that topic.  Whenever there's a code deploy to these pods, and we need
to re-create all 120 pods, the problem seems to be exacerbated, and we run
into rebalances taking 200+ seconds.  This particular service is somewhat
sensitive to lag, so we'd like to keep the rebalance time to a minimum.

With that context, what kafka configs should we focus on on the consumer
side (and maybe the broker side?) that would enable us to minimize the time
spent on the rebalance?

Thanks,

Marcos Juarez

Re: Kafka consumer configuration to minimize rebalance time

Posted by Harsha Chintalapani <ka...@harsha.io>.
Hi Marcos,
           I think what you need is static membership which reduces the no.of rebalances required. There is active discussion and work going for this KIP https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances

-Harsha

On Jan 24, 2019, 9:51 AM -0800, Marcos Juarez <mj...@gmail.com>, wrote:
> One of our internal customers is working on a service that spans around 120
> kubernetes pods. Due to design constraints, every one of these pods has a
> single kafka consumer, and they're all using the same consumer group id.
> Since it's kubernetes, and the service is sized according to volume
> throughout the day, pods are added/removed constantly, at least a few times
> per hour.
>
> What we are seeing with initial testing is that, whenever a single pod
> joins or leaves the consumer group, it triggers a rebalance that sometimes
> takes up to 60+ seconds to resolve. Consumption resumes after the
> rebalance event, but of course now there's 60+ second lag in consumption
> for that topic. Whenever there's a code deploy to these pods, and we need
> to re-create all 120 pods, the problem seems to be exacerbated, and we run
> into rebalances taking 200+ seconds. This particular service is somewhat
> sensitive to lag, so we'd like to keep the rebalance time to a minimum.
>
> With that context, what kafka configs should we focus on on the consumer
> side (and maybe the broker side?) that would enable us to minimize the time
> spent on the rebalance?
>
> Thanks,
>
> Marcos Juarez