You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by "Matthias J. Sax" <ma...@confluent.io> on 2019/10/24 05:26:37 UTC

Re: Kafka Streams TimeoutException and keeps rebalancing

>> 1) The app has a huge consuming lag on both source topic and internal
>> repartition topics , ~5 M messages and keeps growing. Will the lag
>> lead to this timeout exception? My understanding is the app polls too
>> many messages before it could send out even the lag indicates it still
>> polls too slow?

polling and writing messages are independent. Also, an increasing
consumer lag itself, would not lead to a TimeoutException.

It's hard to say in general, but it might be a network issue? Did you
try to increase `retries` config as suggested by the error message?
Did you also check the health of the brokers? If the network is stable
but brokers are unhealthy, it may also lead to timeout exceptions.

For the increasing consumer lag, scaling out the application should
actually help.


-Matthias


On 9/27/19 2:03 PM, Xiyuan Hu wrote:
> Hi,
> 
> I'm running Kafka Streams v2.1.0 with windowing function and 3 threads
> per node. The input traffic is about 120K messages/sec. Once deploy,
> after couple minutes, some thread will get TimeoutException and goes
> to DEAD state.
> 
> 2019-09-27 13:04:34,449 ERROR [client-StreamThread-2]
> o.a.k.s.p.i.AssignedStreamsTasks stream-thread [client-StreamThread-2]
> Failed to commit stream task 1_35 due to the following error:
> org.apache.kafka.streams.errors.StreamsException: task [1_35] Abort
> sending since an error caught with a previous record (key
> 174044298638db0 value [B@33423c0 timestamp 1569613689747) to topic
> XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog due to
> org.apache.kafka.common.errors.TimeoutException: Expiring 45 record(s)
> for XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-35:300102 ms
> has passed since batch creation
> You can increase producer parameter `retries` and `retry.backoff.ms`
> to avoid this error.
> at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:133)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring
> 45 record(s) for
> XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-35:300102 ms has
> passed since batch creation
> 
> I changed linger.ms to 100 and request.timeout.ms to 300000.
> 
> A couple questions:
> 1) The app has a huge consuming lag on both source topic and internal
> repartition topics , ~5 M messages and keeps growing. Will the lag
> lead to this timeout exception? My understanding is the app polls too
> many messages before it could send out even the lag indicates it still
> polls too slow?
> 
> 2) I checked the thread status from streams.localThreadsMetadata() and
> found that, a lot threads are switching between RUNNING and
> PARTITIONS_REVOKED. Eventually, stuck at PARTITIONS_REVOKED. I didn't
> see any error message from the log related to this. What might be a
> good approach to trace the cause?
> 
> Thanks a lot! Any help is appreciated!
>

Re: Kafka Streams TimeoutException and keeps rebalancing

Posted by Xiyuan Hu <xi...@gmail.com>.

Thanks Matthias,

I think it's related to
https://issues.apache.org/jira/browse/KAFKA-8802. Once I disabled the
cache, bytes out  goes up a lot. Before the bytes out are kept at
14MB/S no matter bytes in is high or low.
I'm trying to download the 2.3.1-rc2 build and try it again.

On Thu, Oct 24, 2019 at 1:27 AM Matthias J. Sax <ma...@confluent.io> wrote:
>
> >> 1) The app has a huge consuming lag on both source topic and internal
> >> repartition topics , ~5 M messages and keeps growing. Will the lag
> >> lead to this timeout exception? My understanding is the app polls too
> >> many messages before it could send out even the lag indicates it still
> >> polls too slow?
>
> polling and writing messages are independent. Also, an increasing
> consumer lag itself, would not lead to a TimeoutException.
>
> It's hard to say in general, but it might be a network issue? Did you
> try to increase `retries` config as suggested by the error message?
> Did you also check the health of the brokers? If the network is stable
> but brokers are unhealthy, it may also lead to timeout exceptions.
>
> For the increasing consumer lag, scaling out the application should
> actually help.
>
>
> -Matthias
>
>
> On 9/27/19 2:03 PM, Xiyuan Hu wrote:
> > Hi,
> >
> > I'm running Kafka Streams v2.1.0 with windowing function and 3 threads
> > per node. The input traffic is about 120K messages/sec. Once deploy,
> > after couple minutes, some thread will get TimeoutException and goes
> > to DEAD state.
> >
> > 2019-09-27 13:04:34,449 ERROR [client-StreamThread-2]
> > o.a.k.s.p.i.AssignedStreamsTasks stream-thread [client-StreamThread-2]
> > Failed to commit stream task 1_35 due to the following error:
> > org.apache.kafka.streams.errors.StreamsException: task [1_35] Abort
> > sending since an error caught with a previous record (key
> > 174044298638db0 value [B@33423c0 timestamp 1569613689747) to topic
> > XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog due to
> > org.apache.kafka.common.errors.TimeoutException: Expiring 45 record(s)
> > for XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-35:300102 ms
> > has passed since batch creation
> > You can increase producer parameter `retries` and `retry.backoff.ms`
> > to avoid this error.
> > at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:133)
> > Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring
> > 45 record(s) for
> > XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-35:300102 ms has
> > passed since batch creation
> >
> > I changed linger.ms to 100 and request.timeout.ms to 300000.
> >
> > A couple questions:
> > 1) The app has a huge consuming lag on both source topic and internal
> > repartition topics , ~5 M messages and keeps growing. Will the lag
> > lead to this timeout exception? My understanding is the app polls too
> > many messages before it could send out even the lag indicates it still
> > polls too slow?
> >
> > 2) I checked the thread status from streams.localThreadsMetadata() and
> > found that, a lot threads are switching between RUNNING and
> > PARTITIONS_REVOKED. Eventually, stuck at PARTITIONS_REVOKED. I didn't
> > see any error message from the log related to this. What might be a
> > good approach to trace the cause?
> >
> > Thanks a lot! Any help is appreciated!
> >
>