You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ankur C (JIRA)" <ji...@apache.org> on 2017/01/05 17:24:58 UTC

[jira] [Comment Edited] (KAFKA-4573) Producer sporadic timeout

    [ https://issues.apache.org/jira/browse/KAFKA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798700#comment-15798700 ] 

Ankur C edited comment on KAFKA-4573 at 1/5/17 5:24 PM:
--------------------------------------------------------

Outage lasted 5 hours and after we restarted broker the problem was fixed. It may be network transient error for small time but definitely not for hours. Even if it was transient network error Kafka should have recovered after it.


was (Author: maverick2202):
Outgage lasted 5 hours and after we restarted broker the problem was fixed.

> Producer sporadic timeout
> -------------------------
>
>                 Key: KAFKA-4573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.1
>            Reporter: Ankur C
>
> We had production outage due to sporadic kafka producer timeout. About 1 to 2% of the message would timeout continuously. 
> Kafka version - 0.9.0.1
> #Kafka brokers - 5
> #Replication for each topic - 3
> #Number of topics  - ~30
> #Number of partition - ~300
> We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any issues. However, on Dec 23rd we saw sporadic kafka producer timeout. 
> Issue begin around 6:51am and continued until we bounced kafka broker. 
> 6:51am Underreplication started on small number of topics
> 6:53am All underreplication recovered 
> 11:00am We restarted all kafka producer writer app but this didn't solve the sporadic kafka producer timeout issue
> 12:01pm We restarted all kafka broker after this the issue was resolved.
> Kafka metrics and kafka logs doesn't show any major issue. There were no offline partitions during the outage and #controller was exactly 1. 
> We only saw following exception in kafka broker in controller.log. This log was present for all broker 0 to 4.
> java.io.IOException: Connection to 2 was disconnected before the response was read at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) at scala.Option.foreach(Option.scala:236) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>  [2016-12-23 06:51:37,384] WARN [Controller-2-to-broker-2-send-thread], Controller 2 epoch 18 fails to send request {controller_id=2,controller_epoch=18,partition_states=[{topic=compliance_pipeline_fast_green,partition=4,controller_epoch=18,leader=4,leader_epoch=53,isr=[2,4],zk_version=111,replicas=[4,1,2]}],live_brokers=[{id=3,end_points=[{port=31161,host=10.126.144.73,security_protocol_type=0}]},{id=4,end_points=[{port=31355,host=10.126.144.233,security_protocol_type=0}]},{id=2,end_points=[{port=31293,host=10.126.144.137,security_protocol_type=0}]},{id=1,end_points=[{port=31824,host=10.126.144.169,security_protocol_type=0}]},{id=0,end_points=[{port=31139,host=10.126.144.201,security_protocol_type=0}]}]} to broker Node(2, 10.126.144.137, 31293). Reconnecting to broker. (kafka.controller.RequestSendThread)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)