You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Christoffer Hammarström (JIRA)" <ji...@apache.org> on 2019/02/11 14:34:00 UTC

[jira] [Commented] (KAFKA-7913) Kafka broker halts and messes up the whole cluster

    [ https://issues.apache.org/jira/browse/KAFKA-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765007#comment-16765007 ] 

Christoffer Hammarström commented on KAFKA-7913:
------------------------------------------------

This is bug KAFKA-7697

> Kafka broker halts and messes up the whole cluster
> --------------------------------------------------
>
>                 Key: KAFKA-7913
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7913
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.1.0
>         Environment: kafka_2.12-2.1.0, 
> openjdk version "11.0.1" 2018-10-16 LTS
> OpenJDK Runtime Environment 18.9 (build 11.0.1+13-LTS),
> CentOS Linux release 7.3.1611 (Core),
> linux 3.10.0-514.26.2.el7.x86_64
>            Reporter: Andrej Urvantsev
>            Priority: Major
>
> We upgraded cluster recently and running kafka 2.1.0 on java 11.
> For a time being everything went ok, but then random brokers started to halt from time to time.
> When it happens the broker still looks alive to other brokers, but it stops to receive network traffic. Other brokers then throw IOException:
> {noformat}
> java.io.IOException: Connection to 36155 was disconnected before the response was read
>         at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
>         at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97)
>         at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
>         at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:241)
>         at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
>         at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
>         at scala.Option.foreach(Option.scala:257)
>         at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
>         at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> {noformat}
> On the problematic broker all logging stops. No errors, no exceptions - nothing.
> This also "breaks" all cluster - since clients and other brokers "think" that broker is still alive,
> they are trying to connect to it and it seems that leader election leaves problematic brokers as a leader.
>  
> I would be glad to provide any further details if somebody could give an advice what to investigate when it happens next time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)