You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Shilin Lu (JIRA)" <ji...@apache.org> on 2018/12/25 08:45:00 UTC

[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers

    [ https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728624#comment-16728624 ] 

Shilin Lu commented on KAFKA-5778:
----------------------------------

hello, we meet the same problem in our prod environment when the controller is reelect.The phenomenon is close_wait tcp status increace fastly and isr shrink.I think this problem can  not resolve by modify linux sys ctl config ,maybe this is a program code bug.

how to resolve it?Do you have some new discovery.thank you !

> Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-5778
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5778
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.0.1
>            Reporter: saichand
>            Priority: Critical
>
> In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from then other two brokers has connections in close_wait for java client producer/consumer and also even some broker to broker connections are in close wait among those two brokers.
> Kafka Version : 0.10.0.1
> In logs I found replica fetcher thread connection refused exceptions:
> In broker 0 : replica fetcher 0-1, replica fetcher 0-2
> In broker 2 : replica fetcher 0-0, replica fetcher 0-1
> In broker 1 : It was hung no logs were available at that time.
> We tried restarting broker- 2 kafka and then it was not successful as it terminated saying zookeeper timeout 
> then we tried restarting broker- 0 kafka and we got the same error
> Broker -1 was hang so , we could not login even into it
> so we restarted broker -1 machine
> then we restarted all zookepers and then kafka brokers now everything is fine 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)