You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Raman Gupta (Jira)" <ji...@apache.org> on 2020/07/14 18:41:00 UTC

[jira] [Resolved] (KAFKA-10229) Kafka stream dies for no apparent reason, no errors logged on client or server

     [ https://issues.apache.org/jira/browse/KAFKA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raman Gupta resolved KAFKA-10229.
---------------------------------
    Resolution: Invalid

Not an issue with Kafka -- the code run by the stream was blocked.

> Kafka stream dies for no apparent reason, no errors logged on client or server
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-10229
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10229
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.4.1
>            Reporter: Raman Gupta
>            Priority: Major
>
> My broker and clients are 2.4.1. I'm currently running a single broker. I have a Kafka stream with exactly once processing turned on. I also have an uncaught exception handler defined on the client. I have a stream which I noticed was lagging. Upon investigation, I see that the consumer group was empty.
> On restarting the consumers, the consumer group re-established itself, but after about 8 minutes, the group became empty again. There is nothing logged on the client side about any stream errors, despite the existence of an uncaught exception handler.
> In the broker logs, I see that about 8 minutes after the clients restart / the stream goes to RUNNING state:
> ```
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Member cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 in group produs-cisFileIndexer-stream has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Preparing to rebalance group produs-cisFileIndexer-stream in state PreparingRebalance with old generation 228 (__consumer_offsets-3) (reason: removing member cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
> ```
> so according to this the consumer heartbeat has expired. I don't know why this would be, logging shows that the stream was running and processing messages normally and then just stopped processing anything about 4 minutes before it dies, with no apparent errors or issues or anything logged via the uncaught exception handler.
> It doesn't appear to be related to any specific poison pill type messages: restarting the stream causes it to reprocess a bunch more messages from the backlog, and then die again approximately 8 minutes later. At the time of the last message consumed by the stream, there are no `INFO`-level or above logs either in the client or the broker, or any errors whatsoever. The stream consumption simply stops.
> There are two consumers -- even if I limit consumption to only a single consumer, the same thing happens.
> The runtime environment is Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)