You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Steven Wu <st...@gmail.com> on 2014/09/09 06:26:23 UTC

zookeeper shows partition in-sync but local disk doesn't have the data dirs

did a push in cloud. after new instance for broker 0 comes up, I see a lot
of WARNs in log file.

2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
[request-expiration-task] [warn] [KafkaApi-0] Fetch request with
correlation id 51893 from client 1409779957450-6014fc32-0-0 on partition
[foo,1] failed due to Partition [foo,1] doesn't exist on 0
2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
[kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
correlation id 51894 from client 1409779957450-6014fc32-0-0 on partition
[foo,2] failed due to Partition [foo,2] doesn't exist on 0

zookeeper shows it is the leader after this new instance come back.

{"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
{"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}

but we I check the data dirs. those two partitions aren't there.

any idea?

Thanks,
Steven

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Steven Wu <st...@gmail.com>.
this topic foo is created by auto creation at Sep 3. I looked at the old
log files on controller broker 5. it has some errors regarding partition 1
and 2 when the topic is auto created. who is creating/updating IsrLeader
record?

errors regarding partition 1
===================================================
[2014-09-03 21:18:41,164] ERROR Controller 5 epoch 5 encountered error
while changing partition [foo,1]'s state from New to Online since
LeaderAndIsr path alrea
dy exists with value {"leader":0,"leader_epoch":0,"isr":[0,5]} and
controller epoch 5 (state.change.logger)
[2014-09-03 21:18:41,164] ERROR Controller 5 epoch 5 initiated state change
for partition [foo,1] from NewPartition to OnlinePartition failed
(state.change.logg
er)
kafka.common.StateChangeFailedException: encountered error while changing
partition [foo,1]'s state from New to Online since LeaderAndIsr path
already exists with value {"leader":0,"leader_epoch":0,"isr":[0,5]} and
controller epoch 5
        at
kafka.controller.PartitionStateMachine.initializeLeaderAndIsrForPartition(PartitionStateMachine.scala:302)
        at
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:183)
        at
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:125)
        at
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:124)
        at
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:130)
        at
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at
kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:124)
        at
kafka.controller.KafkaController.onNewPartitionCreation(KafkaController.scala:473)
        at
kafka.controller.KafkaController.onNewTopicCreation(KafkaController.scala:460)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(PartitionStateMachine.scala:422)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:404)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:404)
        at kafka.utils.Utils$.inLock(Utils.scala:538)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:403)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

errors regarding partition 2
===================================================
[2014-09-03 21:18:41,096] ERROR Controller 5 epoch 5 initiated state change
of replica 1 for partition [foo,2] from NonExistentReplica to NewReplica
failed (state.change.logger)
kafka.common.StateChangeFailedException: Replica 1 for partition [foo,2]
cannot be moved to NewReplicastate as it is being requested to become leader
        at
kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:160)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:96)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:96)
        at
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:130)
        at
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at
kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:96)
        at
kafka.controller.KafkaController.onNewPartitionCreation(KafkaController.scala:472)
        at
kafka.controller.KafkaController.onNewTopicCreation(KafkaController.scala:460)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(PartitionStateMachine.scala:422)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:404)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:404)
        at kafka.utils.Utils$.inLock(Utils.scala:538)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:403)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)


[2014-09-03 21:18:41,112] ERROR Controller 5 epoch 5 encountered error
while changing partition [foo,2]'s state from New to Online since
LeaderAndIsr path already exists with value
{"leader":1,"leader_epoch":0,"isr":[1,0]} and controller epoch 5
(state.change.logger)
[2014-09-03 21:18:41,112] ERROR Controller 5 epoch 5 initiated state change
for partition [foo,2] from NewPartition to OnlinePartition failed
(state.change.logger)
kafka.common.StateChangeFailedException: encountered error while changing
partition [foo,2]'s state from New to Online since LeaderAndIsr path
already exists with value {"leader":1,"leader_epoch":0,"isr":[1,0]} and
controller epoch 5
        at
kafka.controller.PartitionStateMachine.initializeLeaderAndIsrForPartition(PartitionStateMachine.scala:302)
        at
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:183)
        at
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:125)
        at
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:124)
        at
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:130)
        at
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at
kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:124)
        at
kafka.controller.KafkaController.onNewPartitionCreation(KafkaController.scala:473)
        at
kafka.controller.KafkaController.onNewTopicCreation(KafkaController.scala:460)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(PartitionStateMachine.scala:422)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:404)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:404)
        at kafka.utils.Utils$.inLock(Utils.scala:538)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:403)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

[2014-09-03 21:18:41,168] ERROR Controller 5 epoch 5 initiated state change
of replica 1 for partition [foo,2] from NonExistentReplica to OnlineReplica
failed (state.change.logger)
java.lang.AssertionError: assertion failed: Replica
[Topic=foo,Partition=2,Replica=1] should be in the
NewReplica,OnlineReplica,OfflineReplica,ReplicaDeletionIneligible states
before moving to OnlineReplica state. Instead it is in NonExistentReplica
state
        at scala.Predef$.assert(Predef.scala:160)
        at
kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:284)
        at
kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:198)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:96)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:96)
        at
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:130)
        at
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at
kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:96)
        at
kafka.controller.KafkaController.onNewPartitionCreation(KafkaController.scala:474)
        at
kafka.controller.KafkaController.onNewTopicCreation(KafkaController.scala:460)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(PartitionStateMachine.scala:422)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:404)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:404)
        at kafka.utils.Utils$.inLock(Utils.scala:538)
        at
kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:403)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)




On Tue, Sep 9, 2014 at 7:18 PM, Steven Wu <st...@gmail.com> wrote:

> previous email is from state-change.log. also found this WARN in
> controller.log
>
> [2014-09-09 01:10:53,669] WARN [Controller 5]: Cannot remove replica 0
> from ISR of partition [cdn_selection_runtime_data,8] since it is not in the
> ISR. Leader = 1 ; ISR = List(1) (kafka.controller.KafkaController)
>
>
>
> On Tue, Sep 9, 2014 at 4:14 PM, Steven Wu <st...@gmail.com> wrote:
>
>> ah. maybe you mean the controller log on leader/controller broker 5. yes.
>> I do noticed some errors regarding these two partitions.
>>
>>
>> [2014-09-09 01:10:53,651] ERROR Controller 5 epoch 5 encountered error
>> while changing partition [foo,2]'s state from New to Online since
>> LeaderAndIsr path alrea
>> dy exists with value {"leader":0,"leader_epoch":1,"isr":[0,1]} and
>> controller epoch 5 (state.change.logger)
>> [2014-09-09 01:10:53,652] ERROR Controller 5 epoch 5 initiated state
>> change for partition [foo,2] from NewPartition to OnlinePartition failed
>> (state.change.logg
>> er)
>> kafka.common.StateChangeFailedException: encountered error while changing
>> partition [foo,2]'s state from New to Online since LeaderAndIsr path
>> already exists wi
>> th value {"leader":0,"leader_epoch":1,"isr":[0,1]} and controller epoch 5
>>         at
>> kafka.controller.PartitionStateMachine.initializeLeaderAndIsrForPartition(PartitionStateMachine.scala:302)
>>         at
>> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:183)
>>         at
>> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
>>         at
>> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
>>         at
>> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
>>         at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>>         at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>>         at scala.collection.Iterator$class.foreach(Iterator.scala:772)
>>         at
>> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
>>         at
>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
>>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
>>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
>>         at
>> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
>>         at
>> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
>>         at
>> kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:433)
>>         at
>> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:344)
>>         at
>> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:330)
>>         at
>> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:330)
>>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>>         at
>> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:329)
>>         at
>> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:328)
>>         at
>> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:328)
>>         at kafka.utils.Utils$.inLock(Utils.scala:538)
>>         at
>> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:327)
>>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
>>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>>
>>
>> On Tue, Sep 9, 2014 at 4:08 PM, Steven Wu <st...@gmail.com> wrote:
>>
>>> assuming you are talking about controller log on broker 0, there is
>>> nothing there.
>>>
>>> $ less kafka-controller
>>> 2014-09-09 01:15:02,600 INFO  kafka.utils.Logging$class:68 [main] [info]
>>> [ControllerEpochListener on 0]: Initialized controller epoch to 5 and zk
>>> version 4
>>> 2014-09-09 01:15:02,655 INFO  kafka.utils.Logging$class:68 [main] [info]
>>> [Controller 0]: Controller starting up
>>> 2014-09-09 01:15:02,692 INFO  kafka.utils.Logging$class:68 [main] [info]
>>> [Controller 0]: Controller startup complete
>>>
>>>
>>> On Tue, Sep 9, 2014 at 4:06 PM, Jun Rao <ju...@gmail.com> wrote:
>>>
>>>> In the controller log, do you see broker 0 being detected as the new
>>>> broker
>>>> when broker 0 comes up?
>>>>
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>> On Tue, Sep 9, 2014 at 3:51 PM, Steven Wu <st...@gmail.com> wrote:
>>>>
>>>> > nope. sate-change log files only had some warnings regarding other
>>>> > partitions. nothing related to these two partitions.
>>>> >
>>>> > 2014-09-09 02:54:30,579 WARN  kafka.utils.Logging$class:83
>>>> > [kafka-request-handler-1] [warn] Broker 0 received invalid
>>>> LeaderAndIsr
>>>> > request with correlation id 497 from controller 5 epoch 5 with an
>>>> older
>>>> > leader epoch 8 for partition [bar,1], current leader epoch is 8
>>>> >
>>>> >
>>>> > On Tue, Sep 9, 2014 at 3:37 PM, Jun Rao <ju...@gmail.com> wrote:
>>>> >
>>>> > > Hmm, that seems to indicate that the leader info is not propagated
>>>> > properly
>>>> > > from the controller to broker 0. In the state-change log of broker
>>>> 0, do
>>>> > > you see anything related to these 2 partitions when broker 0 comes
>>>> up?
>>>> > >
>>>> > > Thanks,
>>>> > >
>>>> > > Jun
>>>> > >
>>>> > > On Tue, Sep 9, 2014 at 9:41 AM, Steven Wu <st...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > > noticed one important thing. topic foo's partition 1 and 2 have
>>>> empty
>>>> > > .log
>>>> > > > file on replicas. I suspect replication doesn't create the
>>>> partition
>>>> > dir
>>>> > > on
>>>> > > > broker 0 in this case, which then cause the WARN logs.
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu <st...@gmail.com>
>>>> > wrote:
>>>> > > >
>>>> > > > > sorry. forgot to mention that I am running 0.8.1.1
>>>> > > > >
>>>> > > > >
>>>> > > > > On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <stevenz3wu@gmail.com
>>>> >
>>>> > > wrote:
>>>> > > > >
>>>> > > > >> did a push in cloud. after new instance for broker 0 comes up,
>>>> I
>>>> > see a
>>>> > > > >> lot of WARNs in log file.
>>>> > > > >>
>>>> > > > >> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
>>>> > > > >> [request-expiration-task] [warn] [KafkaApi-0] Fetch request
>>>> with
>>>> > > > >> correlation id 51893 from client 1409779957450-6014fc32-0-0 on
>>>> > > partition
>>>> > > > >> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
>>>> > > > >> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
>>>> > > > >> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request
>>>> with
>>>> > > > >> correlation id 51894 from client 1409779957450-6014fc32-0-0 on
>>>> > > partition
>>>> > > > >> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
>>>> > > > >>
>>>> > > > >> zookeeper shows it is the leader after this new instance come
>>>> back.
>>>> > > > >>
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
>>>> > > > >>
>>>> > > > >> but we I check the data dirs. those two partitions aren't
>>>> there.
>>>> > > > >>
>>>> > > > >> any idea?
>>>> > > > >>
>>>> > > > >> Thanks,
>>>> > > > >> Steven
>>>> > > > >>
>>>> > > > >>
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Steven Wu <st...@gmail.com>.
previous email is from state-change.log. also found this WARN in
controller.log

[2014-09-09 01:10:53,669] WARN [Controller 5]: Cannot remove replica 0 from
ISR of partition [cdn_selection_runtime_data,8] since it is not in the ISR.
Leader = 1 ; ISR = List(1) (kafka.controller.KafkaController)


On Tue, Sep 9, 2014 at 4:14 PM, Steven Wu <st...@gmail.com> wrote:

> ah. maybe you mean the controller log on leader/controller broker 5. yes.
> I do noticed some errors regarding these two partitions.
>
>
> [2014-09-09 01:10:53,651] ERROR Controller 5 epoch 5 encountered error
> while changing partition [foo,2]'s state from New to Online since
> LeaderAndIsr path alrea
> dy exists with value {"leader":0,"leader_epoch":1,"isr":[0,1]} and
> controller epoch 5 (state.change.logger)
> [2014-09-09 01:10:53,652] ERROR Controller 5 epoch 5 initiated state
> change for partition [foo,2] from NewPartition to OnlinePartition failed
> (state.change.logg
> er)
> kafka.common.StateChangeFailedException: encountered error while changing
> partition [foo,2]'s state from New to Online since LeaderAndIsr path
> already exists wi
> th value {"leader":0,"leader_epoch":1,"isr":[0,1]} and controller epoch 5
>         at
> kafka.controller.PartitionStateMachine.initializeLeaderAndIsrForPartition(PartitionStateMachine.scala:302)
>         at
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:183)
>         at
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
>         at
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
>         at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
>         at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>         at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:772)
>         at
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
>         at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
>         at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
>         at
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
>         at
> kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:433)
>         at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:344)
>         at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:330)
>         at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:330)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:329)
>         at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:328)
>         at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:328)
>         at kafka.utils.Utils$.inLock(Utils.scala:538)
>         at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:327)
>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>
>
> On Tue, Sep 9, 2014 at 4:08 PM, Steven Wu <st...@gmail.com> wrote:
>
>> assuming you are talking about controller log on broker 0, there is
>> nothing there.
>>
>> $ less kafka-controller
>> 2014-09-09 01:15:02,600 INFO  kafka.utils.Logging$class:68 [main] [info]
>> [ControllerEpochListener on 0]: Initialized controller epoch to 5 and zk
>> version 4
>> 2014-09-09 01:15:02,655 INFO  kafka.utils.Logging$class:68 [main] [info]
>> [Controller 0]: Controller starting up
>> 2014-09-09 01:15:02,692 INFO  kafka.utils.Logging$class:68 [main] [info]
>> [Controller 0]: Controller startup complete
>>
>>
>> On Tue, Sep 9, 2014 at 4:06 PM, Jun Rao <ju...@gmail.com> wrote:
>>
>>> In the controller log, do you see broker 0 being detected as the new
>>> broker
>>> when broker 0 comes up?
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Tue, Sep 9, 2014 at 3:51 PM, Steven Wu <st...@gmail.com> wrote:
>>>
>>> > nope. sate-change log files only had some warnings regarding other
>>> > partitions. nothing related to these two partitions.
>>> >
>>> > 2014-09-09 02:54:30,579 WARN  kafka.utils.Logging$class:83
>>> > [kafka-request-handler-1] [warn] Broker 0 received invalid LeaderAndIsr
>>> > request with correlation id 497 from controller 5 epoch 5 with an older
>>> > leader epoch 8 for partition [bar,1], current leader epoch is 8
>>> >
>>> >
>>> > On Tue, Sep 9, 2014 at 3:37 PM, Jun Rao <ju...@gmail.com> wrote:
>>> >
>>> > > Hmm, that seems to indicate that the leader info is not propagated
>>> > properly
>>> > > from the controller to broker 0. In the state-change log of broker
>>> 0, do
>>> > > you see anything related to these 2 partitions when broker 0 comes
>>> up?
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Jun
>>> > >
>>> > > On Tue, Sep 9, 2014 at 9:41 AM, Steven Wu <st...@gmail.com>
>>> wrote:
>>> > >
>>> > > > noticed one important thing. topic foo's partition 1 and 2 have
>>> empty
>>> > > .log
>>> > > > file on replicas. I suspect replication doesn't create the
>>> partition
>>> > dir
>>> > > on
>>> > > > broker 0 in this case, which then cause the WARN logs.
>>> > > >
>>> > > >
>>> > > > On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu <st...@gmail.com>
>>> > wrote:
>>> > > >
>>> > > > > sorry. forgot to mention that I am running 0.8.1.1
>>> > > > >
>>> > > > >
>>> > > > > On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <st...@gmail.com>
>>> > > wrote:
>>> > > > >
>>> > > > >> did a push in cloud. after new instance for broker 0 comes up, I
>>> > see a
>>> > > > >> lot of WARNs in log file.
>>> > > > >>
>>> > > > >> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
>>> > > > >> [request-expiration-task] [warn] [KafkaApi-0] Fetch request with
>>> > > > >> correlation id 51893 from client 1409779957450-6014fc32-0-0 on
>>> > > partition
>>> > > > >> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
>>> > > > >> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
>>> > > > >> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
>>> > > > >> correlation id 51894 from client 1409779957450-6014fc32-0-0 on
>>> > > partition
>>> > > > >> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
>>> > > > >>
>>> > > > >> zookeeper shows it is the leader after this new instance come
>>> back.
>>> > > > >>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
>>> > > > >>
>>> > > > >> but we I check the data dirs. those two partitions aren't there.
>>> > > > >>
>>> > > > >> any idea?
>>> > > > >>
>>> > > > >> Thanks,
>>> > > > >> Steven
>>> > > > >>
>>> > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Steven Wu <st...@gmail.com>.
ah. maybe you mean the controller log on leader/controller broker 5. yes. I
do noticed some errors regarding these two partitions.


[2014-09-09 01:10:53,651] ERROR Controller 5 epoch 5 encountered error
while changing partition [foo,2]'s state from New to Online since
LeaderAndIsr path alrea
dy exists with value {"leader":0,"leader_epoch":1,"isr":[0,1]} and
controller epoch 5 (state.change.logger)
[2014-09-09 01:10:53,652] ERROR Controller 5 epoch 5 initiated state change
for partition [foo,2] from NewPartition to OnlinePartition failed
(state.change.logg
er)
kafka.common.StateChangeFailedException: encountered error while changing
partition [foo,2]'s state from New to Online since LeaderAndIsr path
already exists wi
th value {"leader":0,"leader_epoch":1,"isr":[0,1]} and controller epoch 5
        at
kafka.controller.PartitionStateMachine.initializeLeaderAndIsrForPartition(PartitionStateMachine.scala:302)
        at
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:183)
        at
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
        at
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
        at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
        at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)
        at
scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
        at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
        at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
        at
kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
        at
kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:433)
        at
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:344)
        at
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:330)
        at
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:330)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:329)
        at
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:328)
        at
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:328)
        at kafka.utils.Utils$.inLock(Utils.scala:538)
        at
kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:327)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)


On Tue, Sep 9, 2014 at 4:08 PM, Steven Wu <st...@gmail.com> wrote:

> assuming you are talking about controller log on broker 0, there is
> nothing there.
>
> $ less kafka-controller
> 2014-09-09 01:15:02,600 INFO  kafka.utils.Logging$class:68 [main] [info]
> [ControllerEpochListener on 0]: Initialized controller epoch to 5 and zk
> version 4
> 2014-09-09 01:15:02,655 INFO  kafka.utils.Logging$class:68 [main] [info]
> [Controller 0]: Controller starting up
> 2014-09-09 01:15:02,692 INFO  kafka.utils.Logging$class:68 [main] [info]
> [Controller 0]: Controller startup complete
>
>
> On Tue, Sep 9, 2014 at 4:06 PM, Jun Rao <ju...@gmail.com> wrote:
>
>> In the controller log, do you see broker 0 being detected as the new
>> broker
>> when broker 0 comes up?
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Sep 9, 2014 at 3:51 PM, Steven Wu <st...@gmail.com> wrote:
>>
>> > nope. sate-change log files only had some warnings regarding other
>> > partitions. nothing related to these two partitions.
>> >
>> > 2014-09-09 02:54:30,579 WARN  kafka.utils.Logging$class:83
>> > [kafka-request-handler-1] [warn] Broker 0 received invalid LeaderAndIsr
>> > request with correlation id 497 from controller 5 epoch 5 with an older
>> > leader epoch 8 for partition [bar,1], current leader epoch is 8
>> >
>> >
>> > On Tue, Sep 9, 2014 at 3:37 PM, Jun Rao <ju...@gmail.com> wrote:
>> >
>> > > Hmm, that seems to indicate that the leader info is not propagated
>> > properly
>> > > from the controller to broker 0. In the state-change log of broker 0,
>> do
>> > > you see anything related to these 2 partitions when broker 0 comes up?
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > > On Tue, Sep 9, 2014 at 9:41 AM, Steven Wu <st...@gmail.com>
>> wrote:
>> > >
>> > > > noticed one important thing. topic foo's partition 1 and 2 have
>> empty
>> > > .log
>> > > > file on replicas. I suspect replication doesn't create the partition
>> > dir
>> > > on
>> > > > broker 0 in this case, which then cause the WARN logs.
>> > > >
>> > > >
>> > > > On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu <st...@gmail.com>
>> > wrote:
>> > > >
>> > > > > sorry. forgot to mention that I am running 0.8.1.1
>> > > > >
>> > > > >
>> > > > > On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <st...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > >> did a push in cloud. after new instance for broker 0 comes up, I
>> > see a
>> > > > >> lot of WARNs in log file.
>> > > > >>
>> > > > >> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
>> > > > >> [request-expiration-task] [warn] [KafkaApi-0] Fetch request with
>> > > > >> correlation id 51893 from client 1409779957450-6014fc32-0-0 on
>> > > partition
>> > > > >> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
>> > > > >> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
>> > > > >> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
>> > > > >> correlation id 51894 from client 1409779957450-6014fc32-0-0 on
>> > > partition
>> > > > >> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
>> > > > >>
>> > > > >> zookeeper shows it is the leader after this new instance come
>> back.
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
>> > > > >>
>> > > >
>> > >
>> >
>> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
>> > > > >>
>> > > > >> but we I check the data dirs. those two partitions aren't there.
>> > > > >>
>> > > > >> any idea?
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Steven
>> > > > >>
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Steven Wu <st...@gmail.com>.
assuming you are talking about controller log on broker 0, there is nothing
there.

$ less kafka-controller
2014-09-09 01:15:02,600 INFO  kafka.utils.Logging$class:68 [main] [info]
[ControllerEpochListener on 0]: Initialized controller epoch to 5 and zk
version 4
2014-09-09 01:15:02,655 INFO  kafka.utils.Logging$class:68 [main] [info]
[Controller 0]: Controller starting up
2014-09-09 01:15:02,692 INFO  kafka.utils.Logging$class:68 [main] [info]
[Controller 0]: Controller startup complete


On Tue, Sep 9, 2014 at 4:06 PM, Jun Rao <ju...@gmail.com> wrote:

> In the controller log, do you see broker 0 being detected as the new broker
> when broker 0 comes up?
>
> Thanks,
>
> Jun
>
> On Tue, Sep 9, 2014 at 3:51 PM, Steven Wu <st...@gmail.com> wrote:
>
> > nope. sate-change log files only had some warnings regarding other
> > partitions. nothing related to these two partitions.
> >
> > 2014-09-09 02:54:30,579 WARN  kafka.utils.Logging$class:83
> > [kafka-request-handler-1] [warn] Broker 0 received invalid LeaderAndIsr
> > request with correlation id 497 from controller 5 epoch 5 with an older
> > leader epoch 8 for partition [bar,1], current leader epoch is 8
> >
> >
> > On Tue, Sep 9, 2014 at 3:37 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Hmm, that seems to indicate that the leader info is not propagated
> > properly
> > > from the controller to broker 0. In the state-change log of broker 0,
> do
> > > you see anything related to these 2 partitions when broker 0 comes up?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Sep 9, 2014 at 9:41 AM, Steven Wu <st...@gmail.com>
> wrote:
> > >
> > > > noticed one important thing. topic foo's partition 1 and 2 have empty
> > > .log
> > > > file on replicas. I suspect replication doesn't create the partition
> > dir
> > > on
> > > > broker 0 in this case, which then cause the WARN logs.
> > > >
> > > >
> > > > On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu <st...@gmail.com>
> > wrote:
> > > >
> > > > > sorry. forgot to mention that I am running 0.8.1.1
> > > > >
> > > > >
> > > > > On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <st...@gmail.com>
> > > wrote:
> > > > >
> > > > >> did a push in cloud. after new instance for broker 0 comes up, I
> > see a
> > > > >> lot of WARNs in log file.
> > > > >>
> > > > >> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
> > > > >> [request-expiration-task] [warn] [KafkaApi-0] Fetch request with
> > > > >> correlation id 51893 from client 1409779957450-6014fc32-0-0 on
> > > partition
> > > > >> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
> > > > >> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
> > > > >> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
> > > > >> correlation id 51894 from client 1409779957450-6014fc32-0-0 on
> > > partition
> > > > >> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
> > > > >>
> > > > >> zookeeper shows it is the leader after this new instance come
> back.
> > > > >>
> > > > >>
> > > >
> > >
> >
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
> > > > >>
> > > >
> > >
> >
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
> > > > >>
> > > > >> but we I check the data dirs. those two partitions aren't there.
> > > > >>
> > > > >> any idea?
> > > > >>
> > > > >> Thanks,
> > > > >> Steven
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Jun Rao <ju...@gmail.com>.
In the controller log, do you see broker 0 being detected as the new broker
when broker 0 comes up?

Thanks,

Jun

On Tue, Sep 9, 2014 at 3:51 PM, Steven Wu <st...@gmail.com> wrote:

> nope. sate-change log files only had some warnings regarding other
> partitions. nothing related to these two partitions.
>
> 2014-09-09 02:54:30,579 WARN  kafka.utils.Logging$class:83
> [kafka-request-handler-1] [warn] Broker 0 received invalid LeaderAndIsr
> request with correlation id 497 from controller 5 epoch 5 with an older
> leader epoch 8 for partition [bar,1], current leader epoch is 8
>
>
> On Tue, Sep 9, 2014 at 3:37 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > Hmm, that seems to indicate that the leader info is not propagated
> properly
> > from the controller to broker 0. In the state-change log of broker 0, do
> > you see anything related to these 2 partitions when broker 0 comes up?
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Sep 9, 2014 at 9:41 AM, Steven Wu <st...@gmail.com> wrote:
> >
> > > noticed one important thing. topic foo's partition 1 and 2 have empty
> > .log
> > > file on replicas. I suspect replication doesn't create the partition
> dir
> > on
> > > broker 0 in this case, which then cause the WARN logs.
> > >
> > >
> > > On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu <st...@gmail.com>
> wrote:
> > >
> > > > sorry. forgot to mention that I am running 0.8.1.1
> > > >
> > > >
> > > > On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <st...@gmail.com>
> > wrote:
> > > >
> > > >> did a push in cloud. after new instance for broker 0 comes up, I
> see a
> > > >> lot of WARNs in log file.
> > > >>
> > > >> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
> > > >> [request-expiration-task] [warn] [KafkaApi-0] Fetch request with
> > > >> correlation id 51893 from client 1409779957450-6014fc32-0-0 on
> > partition
> > > >> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
> > > >> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
> > > >> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
> > > >> correlation id 51894 from client 1409779957450-6014fc32-0-0 on
> > partition
> > > >> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
> > > >>
> > > >> zookeeper shows it is the leader after this new instance come back.
> > > >>
> > > >>
> > >
> >
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
> > > >>
> > >
> >
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
> > > >>
> > > >> but we I check the data dirs. those two partitions aren't there.
> > > >>
> > > >> any idea?
> > > >>
> > > >> Thanks,
> > > >> Steven
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Steven Wu <st...@gmail.com>.
nope. sate-change log files only had some warnings regarding other
partitions. nothing related to these two partitions.

2014-09-09 02:54:30,579 WARN  kafka.utils.Logging$class:83
[kafka-request-handler-1] [warn] Broker 0 received invalid LeaderAndIsr
request with correlation id 497 from controller 5 epoch 5 with an older
leader epoch 8 for partition [bar,1], current leader epoch is 8


On Tue, Sep 9, 2014 at 3:37 PM, Jun Rao <ju...@gmail.com> wrote:

> Hmm, that seems to indicate that the leader info is not propagated properly
> from the controller to broker 0. In the state-change log of broker 0, do
> you see anything related to these 2 partitions when broker 0 comes up?
>
> Thanks,
>
> Jun
>
> On Tue, Sep 9, 2014 at 9:41 AM, Steven Wu <st...@gmail.com> wrote:
>
> > noticed one important thing. topic foo's partition 1 and 2 have empty
> .log
> > file on replicas. I suspect replication doesn't create the partition dir
> on
> > broker 0 in this case, which then cause the WARN logs.
> >
> >
> > On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu <st...@gmail.com> wrote:
> >
> > > sorry. forgot to mention that I am running 0.8.1.1
> > >
> > >
> > > On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <st...@gmail.com>
> wrote:
> > >
> > >> did a push in cloud. after new instance for broker 0 comes up, I see a
> > >> lot of WARNs in log file.
> > >>
> > >> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
> > >> [request-expiration-task] [warn] [KafkaApi-0] Fetch request with
> > >> correlation id 51893 from client 1409779957450-6014fc32-0-0 on
> partition
> > >> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
> > >> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
> > >> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
> > >> correlation id 51894 from client 1409779957450-6014fc32-0-0 on
> partition
> > >> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
> > >>
> > >> zookeeper shows it is the leader after this new instance come back.
> > >>
> > >>
> >
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
> > >>
> >
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
> > >>
> > >> but we I check the data dirs. those two partitions aren't there.
> > >>
> > >> any idea?
> > >>
> > >> Thanks,
> > >> Steven
> > >>
> > >>
> > >
> >
>

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Jun Rao <ju...@gmail.com>.
Hmm, that seems to indicate that the leader info is not propagated properly
from the controller to broker 0. In the state-change log of broker 0, do
you see anything related to these 2 partitions when broker 0 comes up?

Thanks,

Jun

On Tue, Sep 9, 2014 at 9:41 AM, Steven Wu <st...@gmail.com> wrote:

> noticed one important thing. topic foo's partition 1 and 2 have empty .log
> file on replicas. I suspect replication doesn't create the partition dir on
> broker 0 in this case, which then cause the WARN logs.
>
>
> On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu <st...@gmail.com> wrote:
>
> > sorry. forgot to mention that I am running 0.8.1.1
> >
> >
> > On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <st...@gmail.com> wrote:
> >
> >> did a push in cloud. after new instance for broker 0 comes up, I see a
> >> lot of WARNs in log file.
> >>
> >> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
> >> [request-expiration-task] [warn] [KafkaApi-0] Fetch request with
> >> correlation id 51893 from client 1409779957450-6014fc32-0-0 on partition
> >> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
> >> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
> >> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
> >> correlation id 51894 from client 1409779957450-6014fc32-0-0 on partition
> >> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
> >>
> >> zookeeper shows it is the leader after this new instance come back.
> >>
> >>
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
> >>
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
> >>
> >> but we I check the data dirs. those two partitions aren't there.
> >>
> >> any idea?
> >>
> >> Thanks,
> >> Steven
> >>
> >>
> >
>

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Steven Wu <st...@gmail.com>.
noticed one important thing. topic foo's partition 1 and 2 have empty .log
file on replicas. I suspect replication doesn't create the partition dir on
broker 0 in this case, which then cause the WARN logs.


On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu <st...@gmail.com> wrote:

> sorry. forgot to mention that I am running 0.8.1.1
>
>
> On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <st...@gmail.com> wrote:
>
>> did a push in cloud. after new instance for broker 0 comes up, I see a
>> lot of WARNs in log file.
>>
>> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
>> [request-expiration-task] [warn] [KafkaApi-0] Fetch request with
>> correlation id 51893 from client 1409779957450-6014fc32-0-0 on partition
>> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
>> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
>> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
>> correlation id 51894 from client 1409779957450-6014fc32-0-0 on partition
>> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
>>
>> zookeeper shows it is the leader after this new instance come back.
>>
>> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
>> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
>>
>> but we I check the data dirs. those two partitions aren't there.
>>
>> any idea?
>>
>> Thanks,
>> Steven
>>
>>
>

Re: zookeeper shows partition in-sync but local disk doesn't have the data dirs

Posted by Steven Wu <st...@gmail.com>.
sorry. forgot to mention that I am running 0.8.1.1


On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu <st...@gmail.com> wrote:

> did a push in cloud. after new instance for broker 0 comes up, I see a lot
> of WARNs in log file.
>
> 2014-09-09 04:21:09,271 WARN  kafka.utils.Logging$class:83
> [request-expiration-task] [warn] [KafkaApi-0] Fetch request with
> correlation id 51893 from client 1409779957450-6014fc32-0-0 on partition
> [foo,1] failed due to Partition [foo,1] doesn't exist on 0
> 2014-09-09 04:21:09,383 WARN  kafka.utils.Logging$class:83
> [kafka-request-handler-5] [warn] [KafkaApi-0] Fetch request with
> correlation id 51894 from client 1409779957450-6014fc32-0-0 on partition
> [foo,2] failed due to Partition [foo,2] doesn't exist on 0
>
> zookeeper shows it is the leader after this new instance come back.
>
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":0,"isr":[0,5]}
> {"controller_epoch":5,"leader":0,"version":1,"leader_epoch":1,"isr":[0,1]}
>
> but we I check the data dirs. those two partitions aren't there.
>
> any idea?
>
> Thanks,
> Steven
>
>