You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Abuli Palagashvili (Jira)" <ji...@apache.org> on 2020/09/14 15:06:00 UTC

[jira] [Created] (KAFKA-10480) Kafka streams application is being stuck

Abuli Palagashvili created KAFKA-10480:
------------------------------------------

             Summary: Kafka streams application is being stuck
                 Key: KAFKA-10480
                 URL: https://issues.apache.org/jira/browse/KAFKA-10480
             Project: Kafka
          Issue Type: Bug
          Components: consumer, streams
    Affects Versions: 2.2.0, 0.10.2.1
            Reporter: Abuli Palagashvili


*prerequisites:*
 * Kafka cluster running on version 0.10.2.1
 * Topic with 24 partitions, load up to 20k RPS, stored bare Strings with null keys
 * Kafka-streams application that reads records from source topic and writes to another, target partition defined from key extracted from record.Uses library version 2.2.0

*Problem:*

After application start all goes ok, but sometimes I get this message:

2020-09-10 20:09:41 WARN AbstractCoordinator:1119 - [Consumer clientId=sharder-application-1-8545e058-3494-4951-93d3-94bb4833be44-StreamThread-5-consumer, groupId=sharder-application-1] This member will leave the group because consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

I tried playing with these config properties, but still face that problem.
My application processes records pretty fast and I think that it just loses connections to kafka cluster.Another problem is that I can't handle group member leaving, because application doesn't throw any exception and doesn't change it's state so I can't catch that situation as one does it here.
https://dzone.com/articles/whats-the-proper-kubernetes-health-check-for-a-kaf

Maybe somebody also faced this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)