You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Po Zhou (JIRA)" <ji...@apache.org> on 2015/03/14 04:22:38 UTC

[jira] [Comment Edited] (KAFKA-1460) NoReplicaOnlineException: No replica for partition

    [ https://issues.apache.org/jira/browse/KAFKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361561#comment-14361561 ] 

Po Zhou edited comment on KAFKA-1460 at 3/14/15 3:22 AM:
---------------------------------------------------------

My colleague found errors in controller.log further (attached below [Error1]), where we could see:
1. "Broker change listener fired for path /brokers/ids with children 1"
2. "deleted brokers: 3,2,4, all live brokers: 1"and broker-2,3,4-send-thread were all shut down by controller.

These facts cause "DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [xxxx.xxxxx.xxxxxx.xxxxxxxxx,1]".

He also found error log in ZooKeeper: "...negotiated timeout 6000 for client..." & "caught end of stream exception
EndOfStreamException: Unable to read additional data from client...". So we modify "zookeeper.session.timeout.ms" and "zookeeper.connection.timeout.ms" from 6000 to 60000 in server.properties. After that our Kafka has not encountered the above error by now.

[Error 1]
[2015-02-13 10:47:49,574] INFO [BrokerChangeListener on Controller 4]: Broker change listener fired for path /brok
ers/ids with children 1 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-02-13 10:47:49,577] INFO [BrokerChangeListener on Controller 4]: Newly added brokers: , deleted brokers: 3,2
,4, all live brokers: 1 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-3-send-thread], Shutting down (kafka.controller.RequestSend
Thread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-3-send-thread], Stopped  (kafka.controller.RequestSendThrea
d)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-3-send-thread], Shutdown completed (kafka.controller.Reques
tSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-2-send-thread], Shutting down (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-2-send-thread], Stopped  (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-2-send-thread], Shutdown completed (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-4-send-thread], Shutting down (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-4-send-thread], Stopped  (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-4-send-thread], Shutdown completed (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller 4]: Broker failure callback for 3,2,4 (kafka.controller.KafkaController)


was (Author: zhoupo):
My colleague found errors in controller.log further (attached below [Error1]), where we could see:
1. "Broker change listener fired for path /brokers/ids with children 1"
2. "deleted brokers: 3,2,4, all live brokers: 1"and broker-{2,3,4}-send-thread were all shut down by controller.

These facts cause "DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [xxxx.xxxxx.xxxxxx.xxxxxxxxx,1]".

He also found error log in ZooKeeper: "...negotiated timeout 6000 for client..." & "caught end of stream exception
EndOfStreamException: Unable to read additional data from client...". So we modify "zookeeper.session.timeout.ms" and "zookeeper.connection.timeout.ms" from 6000 to 60000 in server.properties. After that our Kafka has not encountered the above error by now.

[Error 1]
[2015-02-13 10:47:49,574] INFO [BrokerChangeListener on Controller 4]: Broker change listener fired for path /brok
ers/ids with children 1 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-02-13 10:47:49,577] INFO [BrokerChangeListener on Controller 4]: Newly added brokers: , deleted brokers: 3,2
,4, all live brokers: 1 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-3-send-thread], Shutting down (kafka.controller.RequestSend
Thread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-3-send-thread], Stopped  (kafka.controller.RequestSendThrea
d)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-3-send-thread], Shutdown completed (kafka.controller.Reques
tSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-2-send-thread], Shutting down (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-2-send-thread], Stopped  (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-2-send-thread], Shutdown completed (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-4-send-thread], Shutting down (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-4-send-thread], Stopped  (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller-4-to-broker-4-send-thread], Shutdown completed (kafka.controller.RequestSendThread)
[2015-02-13 10:47:49,577] INFO [Controller 4]: Broker failure callback for 3,2,4 (kafka.controller.KafkaController)

> NoReplicaOnlineException: No replica for partition
> --------------------------------------------------
>
>                 Key: KAFKA-1460
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1460
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Artur Denysenko
>            Priority: Critical
>         Attachments: state-change.log
>
>
> We have a standalone kafka server.
> After several days of running we get:
> {noformat}
> kafka.common.NoReplicaOnlineException: No replica for partition [gk.q.module,1] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
> 	at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
> 	at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336)
> 	at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185)
> 	at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
> 	at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
> 	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:772)
> 	at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
> 	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
> 	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
> 	at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
> 	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
> 	at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
> 	at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68)
> 	at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312)
> 	at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162)
> 	at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63)
> 	at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1068)
> 	at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1066)
> 	at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1066)
> 	at kafka.utils.Utils$.inLock(Utils.scala:538)
> 	at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1066)
> 	at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)
> 	at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}
> Please see attached [state-change.log]
> You can find all server logs (450mb) here: http://46.4.114.35:9999/deploy/kafka-logs.2014-05-14-16.tgz
> On client we get:
> {noformat}
> 16:28:36,843 [ool-12-thread-2] WARN  ZookeeperConsumerConnector - [dev_dev-1400257716132-e7b8240c], no brokers found when trying to rebalance.
> {noformat}
> If we try to send message using 'kafka-console-producer.sh':
> {noformat}
> [root@dev kafka]# /srv/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> message
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> [2014-05-16 19:45:30,950] WARN Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [id:0,host:localhost,port:9092] failed (kafka.client.ClientUtils$)
> java.net.SocketTimeoutException
>         at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         at kafka.utils.Utils$.read(Utils.scala:375)
>         at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
>         at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:74)
>         at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
>         at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
>         at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
>         at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
>         at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:67)
>         at kafka.utils.Utils$.swallow(Utils.scala:167)
>         at kafka.utils.Logging$class.swallowError(Logging.scala:106)
>         at kafka.utils.Utils$.swallowError(Utils.scala:46)
>         at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:67)
>         at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
>         at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
>         at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67)
>         at scala.collection.immutable.Stream.foreach(Stream.scala:526)
>         at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66)
>         at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44)
> {noformat}
> If we try to receive message using 'kafka-console-consumer.sh':
> {noformat}
> [root@dev kafka]# /srv/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> [2014-05-16 19:46:23,029] WARN [console-consumer-69449_dev-1400262382648-1c9bfcd3], no brokers found when trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)
> {noformat}
> Port 9092 is open:
> {noformat}
> [root@dev kafka]# telnet localhost 9092
> Trying 127.0.0.1...
> Connected to localhost.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)