You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Chaitanya GSK (JIRA)" <ji...@apache.org> on 2017/10/16 19:34:00 UTC
[jira] [Created] (KAFKA-6064) Cluster hung when the controller tried to delete a bunch of topics

Chaitanya GSK created KAFKA-6064:
------------------------------------

             Summary: Cluster hung when the controller tried to delete a bunch of topics 
                 Key: KAFKA-6064
                 URL: https://issues.apache.org/jira/browse/KAFKA-6064
             Project: Kafka
          Issue Type: Bug
          Components: controller
    Affects Versions: 0.8.2.1
         Environment: rhel 6, 12 core, 48GB 
            Reporter: Chaitanya GSK


Hi, 

We have been using 0.8.2.1 in our kafka cluster and we had a full cluster outage when we programmatically tried to delete 220 topics and the controller got hung and went out of memory. This has somehow led to the whole cluster outage and the clients were not able to push the data at the right rate. AFAIK, controller shouldn't impact the write rate to the fellow brokers and in this case, it did. Below is the client error.

[WARN] Failed to send kafka.producer.async request with correlation id 1613935688 to broker 44 with data for partitions [topic_2,65],[topic_2,167],[topic_3,2],[topic_4,0],[topic_5,30],[topic_2,48],[topic_2,150]
java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:1.8.0_60]
	at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:1.8.0_60]
	at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[?:1.8.0_60]
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) ~[?:1.8.0_60]
	at java.nio.channels.SocketChannel.write(SocketChannel.java:502) ~[?:1.8.0_60]
	at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) ~[stormjar.jar:?]
	at kafka.network.Send$class.writeCompletely(Transmission.scala:75) ~[stormjar.jar:?]
	at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) ~[stormjar.jar:?]
	at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:103) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103) ~[stormjar.jar:?]
	at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:102) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) ~[stormjar.jar:?]
	at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
	at kafka.producer.SyncProducer.send(SyncProducer.scala:101) ~[stormjar.jar:?]
	at kafka.producer.async.YamasKafkaEventHandler.kafka$producer$async$YamasKafkaEventHandler$$send(YamasKafkaEventHandler.scala:481) [stormjar.jar:?]
	at kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:144) [stormjar.jar:?]
	at kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:138) [stormjar.jar:?]
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) [stormjar.jar:?]
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) [stormjar.jar:?]
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) [stormjar.jar:?]
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) [stormjar.jar:?]
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) [stormjar.jar:?]
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) [stormjar.jar:?]
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) [stormjar.jar:?]
	at kafka.producer.async.YamasKafkaEventHandler.dispatchSerializedData(YamasKafkaEventHandler.scala:138) [stormjar.jar:?]
	at kafka.producer.async.YamasKafkaEventHandler.handle(YamasKafkaEventHandler.scala:79) [stormjar.jar:?]
	at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105) [stormjar.jar:?]
	at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88) [stormjar.jar:?]
	at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68) [stormjar.jar:?]
	at scala.collection.immutable.Stream.foreach(Stream.scala:547) [stormjar.jar:?]
	at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67) [stormjar.jar:?]
	at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45) [stormjar.jar:?]

We tried shifting the controller to a different broker and that didn't help. We had to ultimately clean up the kafka cluster to stabilize it. 

Wondering if this is a known issue and if not we would appreciate it if anyone in the community could provide insights into why the hung controller would bring down the cluster and why deleting the topics would cause the controllers hang.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)