You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ankur C (JIRA)" <ji...@apache.org> on 2016/12/28 18:22:58 UTC

[jira] [Created] (KAFKA-4573) Producer sporadic timeout

Ankur C created KAFKA-4573:
------------------------------

             Summary: Producer sporadic timeout
                 Key: KAFKA-4573
                 URL: https://issues.apache.org/jira/browse/KAFKA-4573
             Project: Kafka
          Issue Type: Bug
            Reporter: Ankur C


We had production outage due to sporadic kafka producer timeout. About 1 to 2% of the message would timeout continuously. 

Kafka version - 0.9.0.1
#Kafka brokers - 5
#Replication for each topic - 3
#Number of topics  - ~30
#Number of partition - ~300


We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any issues. However, on Dec 23rd we saw sporadic kafka producer timeout. 

Issue begin around 6:51am and continued until we bounced kafka broker. 

6:51am Underreplication started on small number of topics
6:53am All underreplication recovered 
11:00am We restarted all kafka producer writer app but this didn't solve the sporadic kafka producer timeout issue
12:01pm We restarted all kafka broker after this the issue was resolved.

Kafka metrics and kafka logs doesn't show any major issue. There were no offline partitions during the outage and #controller was exactly 1. 

We only saw following exception in kafka broker in controller.log. This log was present for all broker 0 to 4.

java.io.IOException: Connection to 2 was disconnected before the response was read at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) at scala.Option.foreach(Option.scala:236) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)