You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Robert Conrad <ro...@crunchbase.com> on 2016/12/11 19:39:44 UTC

Struggling with Kafka Streams rebalances under load / in production

Hi All,

I have a relatively complex streaming application that seems to struggle
terribly with rebalance issues while under load. Does anyone have any tips
for investigating what is triggering these frequent rebalances or
particular settings I could experiment with to try to eliminate them?

Originally I thought it had to do with exceeding the heartbeat timeout with
heavy work threads, but the 0.10.1 release solved that by adding the background
heartbeat thread
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread>.
Now rebalance just seems to strike randomly and provide no insight into
what triggered it (all nodes are happy, everything seems to be running
smoothly).

Any help or insight is greatly appreciated!

Rob

Re: Struggling with Kafka Streams rebalances under load / in production

Posted by Guozhang Wang <wa...@gmail.com>.
Robert,

To validate if a rebalance happens, you can check the server-side logs
starting with "Preparing to restabilize group %s with old generation..",
and if that is triggered by a consumer failure detected, it will have some
entries like "Member XX in group YY has failed" before the "preparing"
line. Note that you need to turn on TRACE level logging in order to do such
fine-grained debugging.

Guozhang


On Mon, Dec 12, 2016 at 9:50 AM, Jay Kreps <ja...@confluent.io> wrote:

> I think the most common cause of rebalancing is still GC that exceeds the
> consumer liveness timeout you've configured. Might be worth enabling GC
> logging in java and then checking the pause times. If they exceed the
> timeout you have for liveness then you will detect that as a process
> failure and rebalance.
>
> -Jay
>
> On Sun, Dec 11, 2016 at 11:39 AM, Robert Conrad <ro...@crunchbase.com>
> wrote:
>
> > Hi All,
> >
> > I have a relatively complex streaming application that seems to struggle
> > terribly with rebalance issues while under load. Does anyone have any
> tips
> > for investigating what is triggering these frequent rebalances or
> > particular settings I could experiment with to try to eliminate them?
> >
> > Originally I thought it had to do with exceeding the heartbeat timeout
> with
> > heavy work threads, but the 0.10.1 release solved that by adding the
> > background
> > heartbeat thread
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread>.
> > Now rebalance just seems to strike randomly and provide no insight into
> > what triggered it (all nodes are happy, everything seems to be running
> > smoothly).
> >
> > Any help or insight is greatly appreciated!
> >
> > Rob
> >
>



-- 
-- Guozhang

Re: Struggling with Kafka Streams rebalances under load / in production

Posted by Jay Kreps <ja...@confluent.io>.
I think the most common cause of rebalancing is still GC that exceeds the
consumer liveness timeout you've configured. Might be worth enabling GC
logging in java and then checking the pause times. If they exceed the
timeout you have for liveness then you will detect that as a process
failure and rebalance.

-Jay

On Sun, Dec 11, 2016 at 11:39 AM, Robert Conrad <ro...@crunchbase.com>
wrote:

> Hi All,
>
> I have a relatively complex streaming application that seems to struggle
> terribly with rebalance issues while under load. Does anyone have any tips
> for investigating what is triggering these frequent rebalances or
> particular settings I could experiment with to try to eliminate them?
>
> Originally I thought it had to do with exceeding the heartbeat timeout with
> heavy work threads, but the 0.10.1 release solved that by adding the
> background
> heartbeat thread
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread>.
> Now rebalance just seems to strike randomly and provide no insight into
> what triggered it (all nodes are happy, everything seems to be running
> smoothly).
>
> Any help or insight is greatly appreciated!
>
> Rob
>

Re: Struggling with Kafka Streams rebalances under load / in production

Posted by Damian Guy <da...@gmail.com>.
Hi Rob,

Do you have any further information you can provide? Logs etc?
Have you configured max.poll.interval.ms?

Thanks,
Damian

On Sun, 11 Dec 2016 at 20:30 Robert Conrad <ro...@crunchbase.com> wrote:

> Hi All,
>
> I have a relatively complex streaming application that seems to struggle
> terribly with rebalance issues while under load. Does anyone have any tips
> for investigating what is triggering these frequent rebalances or
> particular settings I could experiment with to try to eliminate them?
>
> Originally I thought it had to do with exceeding the heartbeat timeout with
> heavy work threads, but the 0.10.1 release solved that by adding the
> background
> heartbeat thread
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> >.
> Now rebalance just seems to strike randomly and provide no insight into
> what triggered it (all nodes are happy, everything seems to be running
> smoothly).
>
> Any help or insight is greatly appreciated!
>
> Rob
>