You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Ismael Juma (JIRA)" <ji...@apache.org> on 2016/11/30 15:39:58 UTC

[jira] [Updated] (KAFKA-4460) Consumer stops getting messages when partition leader dies

     [ https://issues.apache.org/jira/browse/KAFKA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ismael Juma updated KAFKA-4460:
-------------------------------
    Labels: reliability  (was: )

> Consumer stops getting messages when partition leader dies
> ----------------------------------------------------------
>
>                 Key: KAFKA-4460
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4460
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.0.1
>            Reporter: Bernhard Bonigl
>              Labels: reliability
>
> I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a spring boot application with producers and a spring boot application with consumers.
> The topic has 5 partitions and a replication factor of 2, both brokers are in sync, partitions have alternating leader (although it doesn't matter).
> The spring boot kafka configuration is setup as follows:
> {code}
> kafka.address: localhost:9092,localhost:9093
> kafka.numberOfConsumers: 20
> {code}
> Where Broker 0 uses port 9092 and Broker 1 uses port 9093.
> ----
> When sending events they are consumed just fine. When Broker 0 is killed all topics get Broker 1 as their leader, however the consumers stop consuming events until Broker 0 is back. This happens nearly every time, but usually it takes at most 3 attempts of alternatively killing the leading broker to create the error state.
> The console log is getting spammed by the coordinators, it looks like the coordinator representing broker 0 is marked as dead, but instantly rediscovered and used again many many times, and only at the end the other broker is discovered. When the switch works the log is only minimally spammed and the other broker is discovered very quickly.
> This gist contains the log of the application when the problem occurs. The first line is a log of ours indicating a successfully consumed message. After that the Broker 0 (localhost:9092) is killed - you can see the log spam I was talking about. At the end localhost:9093 is discovered, however no further messages are consumed. After that I killed the application.
> ----
> I also discovered this unresolved stackoverflow question, which seems to be the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)