You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "huxi (JIRA)" <ji...@apache.org> on 2016/12/29 06:52:58 UTC

[jira] [Commented] (KAFKA-4573) Producer sporadic timeout

    [ https://issues.apache.org/jira/browse/KAFKA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784692#comment-15784692 ] 

huxi commented on KAFKA-4573:
-----------------------------

Is it possible it's caused by a transient network error? 

> Producer sporadic timeout
> -------------------------
>
>                 Key: KAFKA-4573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.1
>            Reporter: Ankur C
>
> We had production outage due to sporadic kafka producer timeout. About 1 to 2% of the message would timeout continuously. 
> Kafka version - 0.9.0.1
> #Kafka brokers - 5
> #Replication for each topic - 3
> #Number of topics  - ~30
> #Number of partition - ~300
> We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any issues. However, on Dec 23rd we saw sporadic kafka producer timeout. 
> Issue begin around 6:51am and continued until we bounced kafka broker. 
> 6:51am Underreplication started on small number of topics
> 6:53am All underreplication recovered 
> 11:00am We restarted all kafka producer writer app but this didn't solve the sporadic kafka producer timeout issue
> 12:01pm We restarted all kafka broker after this the issue was resolved.
> Kafka metrics and kafka logs doesn't show any major issue. There were no offline partitions during the outage and #controller was exactly 1. 
> We only saw following exception in kafka broker in controller.log. This log was present for all broker 0 to 4.
> java.io.IOException: Connection to 2 was disconnected before the response was read at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) at scala.Option.foreach(Option.scala:236) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>  [2016-12-23 06:51:37,384] WARN [Controller-2-to-broker-2-send-thread], Controller 2 epoch 18 fails to send request {controller_id=2,controller_epoch=18,partition_states=[{topic=compliance_pipeline_fast_green,partition=4,controller_epoch=18,leader=4,leader_epoch=53,isr=[2,4],zk_version=111,replicas=[4,1,2]}],live_brokers=[{id=3,end_points=[{port=31161,host=10.126.144.73,security_protocol_type=0}]},{id=4,end_points=[{port=31355,host=10.126.144.233,security_protocol_type=0}]},{id=2,end_points=[{port=31293,host=10.126.144.137,security_protocol_type=0}]},{id=1,end_points=[{port=31824,host=10.126.144.169,security_protocol_type=0}]},{id=0,end_points=[{port=31139,host=10.126.144.201,security_protocol_type=0}]}]} to broker Node(2, 10.126.144.137, 31293). Reconnecting to broker. (kafka.controller.RequestSendThread)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)