You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Po Zhou (JIRA)" <ji...@apache.org> on 2015/02/27 12:23:05 UTC

[jira] [Comment Edited] (KAFKA-1460) NoReplicaOnlineException: No replica for partition

    [ https://issues.apache.org/jira/browse/KAFKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338303#comment-14338303 ] 

Po Zhou edited comment on KAFKA-1460 at 2/27/15 11:22 AM:
----------------------------------------------------------

I occasionally encountered the issue with the following error log:

ERROR Controller 4 epoch 61 initiated state change for partition [xxxx.xxxxx.xxxxxx.xxxxxxxxx,1] from OfflinePartition to OnlinePartition failed (state.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition [xxxx.xxxxx.xxxxxx.xxxxxxxxx,1] is alive. Live brokers are: [Set(1)], Assigned replicas are: [List(2, 3)]
        at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
        at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
        at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
        at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
        at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)
        at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
        at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
        at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:35
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
        at kafka.utils.Utils$.inLock(Utils.scala:535)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

My Kafka is deployed on 4 servers, and booting with parameters: "--partitions 4 --replication-factor 2". The issue happens in "OfflinePartitionLeaderSelector". According the following source code, "If no broker in the assigned replica list is alive, it throws NoReplicaOnlineException", and more detailed cause is "liveAssignedReplicasToThisPartition.isEmpty". But how to avoid / resolve the exception remains unknown and the issue remains "Unresolved".

https://apache.googlesource.com/kafka/+/855340a2e65ffbb79520c49d0b9a231b94acd538/core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala


was (Author: zhoupo):
I occasionally encountered the issue with the following error log:
kafka.common.NoReplicaOnlineException: No replica for partition [xxxx.xxxxx.xxxxxx.xxxxxxxxx,1] is alive. Live brokers are: [Set(1)], Assigned replicas are: [List(2, 3)]
        at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
        at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
        at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
        at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
        at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)
        at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
        at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
        at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:35
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
        at kafka.utils.Utils$.inLock(Utils.scala:535)
        at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

My Kafka is deployed on 4 servers, and booting with parameters: "--partitions 4 --replication-factor 2". The issue happens in "OfflinePartitionLeaderSelector". According the following source code, "If no broker in the assigned replica list is alive, it throws NoReplicaOnlineException", and more detailed cause is "liveAssignedReplicasToThisPartition.isEmpty". But how to avoid / resolve the exception remains unknown and the issue remains "Unresolved".

https://apache.googlesource.com/kafka/+/855340a2e65ffbb79520c49d0b9a231b94acd538/core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala

> NoReplicaOnlineException: No replica for partition
> --------------------------------------------------
>
>                 Key: KAFKA-1460
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1460
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Artur Denysenko
>            Priority: Critical
>         Attachments: state-change.log
>
>
> We have a standalone kafka server.
> After several days of running we get:
> {noformat}
> kafka.common.NoReplicaOnlineException: No replica for partition [gk.q.module,1] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
> 	at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
> 	at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336)
> 	at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185)
> 	at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
> 	at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
> 	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:772)
> 	at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
> 	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
> 	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
> 	at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
> 	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
> 	at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
> 	at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68)
> 	at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312)
> 	at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162)
> 	at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63)
> 	at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1068)
> 	at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1066)
> 	at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1066)
> 	at kafka.utils.Utils$.inLock(Utils.scala:538)
> 	at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1066)
> 	at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)
> 	at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}
> Please see attached [state-change.log]
> You can find all server logs (450mb) here: http://46.4.114.35:9999/deploy/kafka-logs.2014-05-14-16.tgz
> On client we get:
> {noformat}
> 16:28:36,843 [ool-12-thread-2] WARN  ZookeeperConsumerConnector - [dev_dev-1400257716132-e7b8240c], no brokers found when trying to rebalance.
> {noformat}
> If we try to send message using 'kafka-console-producer.sh':
> {noformat}
> [root@dev kafka]# /srv/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> message
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> [2014-05-16 19:45:30,950] WARN Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [id:0,host:localhost,port:9092] failed (kafka.client.ClientUtils$)
> java.net.SocketTimeoutException
>         at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         at kafka.utils.Utils$.read(Utils.scala:375)
>         at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
>         at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:74)
>         at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
>         at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
>         at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
>         at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
>         at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:67)
>         at kafka.utils.Utils$.swallow(Utils.scala:167)
>         at kafka.utils.Logging$class.swallowError(Logging.scala:106)
>         at kafka.utils.Utils$.swallowError(Utils.scala:46)
>         at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:67)
>         at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
>         at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
>         at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67)
>         at scala.collection.immutable.Stream.foreach(Stream.scala:526)
>         at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66)
>         at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44)
> {noformat}
> If we try to receive message using 'kafka-console-consumer.sh':
> {noformat}
> [root@dev kafka]# /srv/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> [2014-05-16 19:46:23,029] WARN [console-consumer-69449_dev-1400262382648-1c9bfcd3], no brokers found when trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)
> {noformat}
> Port 9092 is open:
> {noformat}
> [root@dev kafka]# telnet localhost 9092
> Trying 127.0.0.1...
> Connected to localhost.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)