You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ankur C (JIRA)" <ji...@apache.org> on 2016/12/28 18:36:58 UTC

[jira] [Updated] (KAFKA-4573) Producer sporadic timeout

     [ https://issues.apache.org/jira/browse/KAFKA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur C updated KAFKA-4573:
---------------------------
    Affects Version/s: 0.9.0.1

> Producer sporadic timeout
> -------------------------
>
>                 Key: KAFKA-4573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.1
>            Reporter: Ankur C
>
> We had production outage due to sporadic kafka producer timeout. About 1 to 2% of the message would timeout continuously. 
> Kafka version - 0.9.0.1
> #Kafka brokers - 5
> #Replication for each topic - 3
> #Number of topics  - ~30
> #Number of partition - ~300
> We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any issues. However, on Dec 23rd we saw sporadic kafka producer timeout. 
> Issue begin around 6:51am and continued until we bounced kafka broker. 
> 6:51am Underreplication started on small number of topics
> 6:53am All underreplication recovered 
> 11:00am We restarted all kafka producer writer app but this didn't solve the sporadic kafka producer timeout issue
> 12:01pm We restarted all kafka broker after this the issue was resolved.
> Kafka metrics and kafka logs doesn't show any major issue. There were no offline partitions during the outage and #controller was exactly 1. 
> We only saw following exception in kafka broker in controller.log. This log was present for all broker 0 to 4.
> java.io.IOException: Connection to 2 was disconnected before the response was read at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) at scala.Option.foreach(Option.scala:236) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)