You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Zakee <kz...@netzero.net> on 2015/03/07 00:14:51 UTC

Re: Broker Exceptions

Yes, Jiangjie, I do see lots of these errors "Starting preferred replica leader election for partitions” in logs. I also see lot of Produce request failure warnings in with the NotLeader Exception. 

I tried switching off the auto.leader.relabalance to false. I am still noticing the rebalance happening. My understanding was the rebalance will not happen when this is set to false.  

Thanks
Zakee



> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <jq...@linkedin.com.INVALID> wrote:
> 
> I don’t think num.replica.fetchers will help in this case. Increasing
> number of fetcher threads will only help in cases where you have a large
> amount of data coming into a broker and more replica fetcher threads will
> help keep up. We usually only use 1-2 for each broker. But in your case,
> it looks that leader migration cause issue.
> Do you see anything else in the log? Like preferred leader election?
> 
> Jiangjie (Becket) Qin
> 
> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
> 
>> Thanks, Jiangjie.
>> 
>> Yes, I do see under partitions usually shooting every hour. Anythings that
>> I could try to reduce it?
>> 
>> How does "num.replica.fetchers" affect the replica sync? Currently have
>> configured 7 each of 5 brokers.
>> 
>> -Zakee
>> 
>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
>> wrote:
>> 
>>> These messages are usually caused by leader migration. I think as long
>>> as
>>> you don¹t see this lasting for ever and got a bunch of under replicated
>>> partitions, it should be fine.
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>> 
>>>> Need to know if I should I be worried about this or ignore them.
>>>> 
>>>> I see tons of these exceptions/warnings in the broker logs, not sure
>>> what
>>>> causes them and what could be done to fix them.
>>>> 
>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
>>>> broker
>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>> (kafka.server.ReplicaFetcherThread)
>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
>>>> partition [TestTopic] to broker 5:class
>>>> kafka.common.NotLeaderForPartitionException
>>>> (kafka.server.ReplicaFetcherThread)
>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>>>> request
>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>>>> partition [TestTopic,2] failed due to Leader not local for partition
>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>> 
>>>> 
>>>> Any ideas?
>>>> 
>>>> -Zakee
>>>> ____________________________________________________________
>>>> Next Apple Sensation
>>>> 1 little-known path to big profits
>>> 
>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc
>>> 
>>> ____________________________________________________________
>>> Extended Stay America
>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI
>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc
>>> 
> 
> 
> ____________________________________________________________
> Extended Stay America
> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc>

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

Thanks, Jiangjie, I will try with a clean cluster again.

Thanks
Zakee



> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jq...@linkedin.com.INVALID> wrote:
> 
> Yes, the rebalance should not happen in that case. That is a little bit
> strange. Could you try to launch a clean Kafka cluster with
> auto.leader.election disabled and try push data?
> When leader migration occurs, NotLeaderForPartition exception is expected.
> 
> Jiangjie (Becket) Qin
> 
> 
> On 3/6/15, 3:14 PM, "Zakee" <kz...@netzero.net> wrote:
> 
>> Yes, Jiangjie, I do see lots of these errors "Starting preferred replica
>> leader election for partitions” in logs. I also see lot of Produce
>> request failure warnings in with the NotLeader Exception.
>> 
>> I tried switching off the auto.leader.relabalance to false. I am still
>> noticing the rebalance happening. My understanding was the rebalance will
>> not happen when this is set to false.
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>> wrote:
>>> 
>>> I don’t think num.replica.fetchers will help in this case. Increasing
>>> number of fetcher threads will only help in cases where you have a large
>>> amount of data coming into a broker and more replica fetcher threads
>>> will
>>> help keep up. We usually only use 1-2 for each broker. But in your case,
>>> it looks that leader migration cause issue.
>>> Do you see anything else in the log? Like preferred leader election?
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>>> <ma...@netzero.net>> wrote:
>>> 
>>>> Thanks, Jiangjie.
>>>> 
>>>> Yes, I do see under partitions usually shooting every hour. Anythings
>>>> that
>>>> I could try to reduce it?
>>>> 
>>>> How does "num.replica.fetchers" affect the replica sync? Currently have
>>>> configured 7 each of 5 brokers.
>>>> 
>>>> -Zakee
>>>> 
>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>> <jq...@linkedin.com.invalid>
>>>> wrote:
>>>> 
>>>>> These messages are usually caused by leader migration. I think as long
>>>>> as
>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>> replicated
>>>>> partitions, it should be fine.
>>>>> 
>>>>> Jiangjie (Becket) Qin
>>>>> 
>>>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>> 
>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>> 
>>>>>> I see tons of these exceptions/warnings in the broker logs, not sure
>>>>> what
>>>>>> causes them and what could be done to fix them.
>>>>>> 
>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
>>>>>> broker
>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
>>>>>> partition [TestTopic] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>>>>>> request
>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>>>>>> partition [TestTopic,2] failed due to Leader not local for partition
>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>> 
>>>>>> 
>>>>>> Any ideas?
>>>>>> 
>>>>>> -Zakee
>>>>>> ____________________________________________________________
>>>>>> Next Apple Sensation
>>>>>> 1 little-known path to big profits
>>>>> 
>>>>>> 
>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v
>>>>>> uc
>>>>> 
>>>>> ____________________________________________________________
>>>>> Extended Stay America
>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
>>>>> WIFI
>>>>> 
>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du
>>>>> c
>>>>> 
>>> 
>>> 
>>> ____________________________________________________________
>>> Extended Stay America
>>> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc>
> 
> 
> ____________________________________________________________
> The WORST exercise for aging
> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years YOUNGER
> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc

kafka Issue#2011 https://issues.apache.org/jira/browse/KAFKA-2011

Posted by Zakee <kz...@netzero.net>.

Opened a kafka issue for rebalance happening with auto.rebalance set to false.
https://issues.apache.org/jira/browse/KAFKA-2011

>> Logs for rebalance:
>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election:  (kafka.controller.KafkaController)
>> …
>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> ...
>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> ...
>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions  (kafka.controller.KafkaController)
>> ...
>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election:  (kafka.controller.KafkaController)
>> 
>> Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors?
>> 
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)

Thanks
Zakee

____________________________________________________________
Old School Yearbook Pics
View Class Yearbooks Online Free. Search by School & Year. Look Now!
http://thirdpartyoffers.netzero.net/TGL3231/54fde54f50dc9654e386ast02vuc

Re: Broker Exceptions

Posted by Kazim Zakee <ka...@apple.com>.

No broker restarts.

Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 <https://issues.apache.org/jira/browse/KAFKA-2011>

>> Logs for rebalance:
>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election:  (kafka.controller.KafkaController)
>> …
>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> ...
>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> ...
>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions  (kafka.controller.KafkaController)
>> ...
>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election:  (kafka.controller.KafkaController)
>> 
>> Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors?
>> 
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)


>  Could you paste the
> related logs in controller.log?
What specifically should I search for in the logs?

Thanks,
Kazim Zakee



> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <jq...@linkedin.com.INVALID> wrote:
> 
> Is there anything wrong with brokers around that time? E.g. Broker restart?
> The log you pasted are actually from replica fetchers. Could you paste the
> related logs in controller.log?
> 
> Thanks.
> 
> Jiangjie (Becket) Qin
> 
> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
> 
>> Correction: Actually  the rebalance happened quite until 24 hours after
>> the start, and thats where below errors were found. Ideally rebalance
>> should not have happened at all.
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Mar 9, 2015, at 10:28 AM, Zakee <kz...@netzero.net> wrote:
>>> 
>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>> here?
>>> Thanks for you suggestions.
>>> It looks like the rebalance actually happened only once soon after I
>>> started with clean cluster and data was pushed, it didn’t happen again
>>> so far, and I see the partitions leader counts on brokers did not change
>>> since then. One of the brokers was constantly showing 0 for partition
>>> leader count. Is that normal?
>>> 
>>> Also, I still see lots of below errors (~69k) going on in the logs
>>> since the restart. Is there any other reason than rebalance for these
>>> errors?
>>> 
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>> partition [Topic-11,7] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>> partition [Topic-2,25] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>> partition [Topic-2,21] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>> partition [Topic-22,9] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> 
>>>> Some other things to check are:
>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>> confirm.
>>> Yes 
>>> 
>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>> does not exist?
>>> ls /admin
>>> [delete_topics]
>>> ls /admin/preferred_replica_election
>>> Node does not exist: /admin/preferred_replica_election
>>> 
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>>> wrote:
>>>> 
>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>> here?
>>>> Some other things to check are:
>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>> confirm.
>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>> does not exist?
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> On 3/7/15, 10:24 PM, "Zakee" <kz...@netzero.net> wrote:
>>>> 
>>>>> I started with  clean cluster and started to push data. It still does
>>>>> the
>>>>> rebalance at random durations even though the auto.leader.relabalance
>>>>> is
>>>>> set to false.
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>>>>> wrote:
>>>>>> 
>>>>>> Yes, the rebalance should not happen in that case. That is a little
>>>>>> bit
>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>> auto.leader.election disabled and try push data?
>>>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>>>> expected.
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>> 
>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>>> 
>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>>>> replica
>>>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>> 
>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
>>>>>>> still
>>>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>>>> will
>>>>>>> not happen when this is set to false.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>> <jq...@linkedin.com.INVALID>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>> Increasing
>>>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>>>> large
>>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>> threads
>>>>>>>> will
>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>>>> case,
>>>>>>>> it looks that leader migration cause issue.
>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>> election?
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>>>>>>>> <ma...@netzero.net>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>> 
>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>> Anythings
>>>>>>>>> that
>>>>>>>>> I could try to reduce it?
>>>>>>>>> 
>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>>>> have
>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>> 
>>>>>>>>> -Zakee
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>> <jq...@linkedin.com.invalid>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>>>> long
>>>>>>>>>> as
>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>>>> replicated
>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>> 
>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>> 
>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>>>> 
>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>>>> sure
>>>>>>>>>> what
>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>> 
>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>> [TestTopic]
>>>>>>>>>>> to
>>>>>>>>>>> broker
>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
>>>>>>>>>>> Error
>>>>>>>>>>> for
>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]:
>>>>>>>>>>> Fetch
>>>>>>>>>>> request
>>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2
>>>>>>>>>>> on
>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>>> partition
>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Any ideas?
>>>>>>>>>>> 
>>>>>>>>>>> -Zakee
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061
>>>>>>>>>>> st0
>>>>>>>>>>> 3v
>>>>>>>>>>> uc
>>>>>>>>>> 
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> Extended Stay America
>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
>>>>>>>>>> Free
>>>>>>>>>> WIFI
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m
>>>>>>>>>> p02
>>>>>>>>>> du
>>>>>>>>>> c
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> Extended Stay America
>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>> guaranteed.
>>>>>>>> 
>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
>>>>>>>> uc
>>>>>>>> 
>>>>>>>> 
>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>> duc
>>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> The WORST exercise for aging
>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>> YOUNGER
>>>>>> 
>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
>>>>>> uc
>>>>> 
>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> Seabourn Luxury Cruises
>>>> Receive special offers from the World&#39;s Finest Small-Ship Cruise
>>>> Line!
>>>> 
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc
>>> 
>> 
> 
> 
> ____________________________________________________________
> Discover Seabourn
> A journey as beautiful as the destination, request a brochure today!
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

It was last seen to happen at 3/15/15  11:01:04.218 AM. Not sure what steps will reproduce though.

Logs attached  for the event @ 3/15/15  11:01:04.218 AM  if that helps. 



Thanks
Zakee



> On Mar 17, 2015, at 4:06 PM, Mayuresh Gharat <gh...@gmail.com> wrote:
> 
> We are trying to see what might have caused it.
> 
> We had some questions :
> 1) Is this reproducible? That way we can dig deep.
> 
> 
> This looks interesting problem to solve and you might have caught a bug,
> but we need to verify the root cause before filing a ticket.
> 
> Thanks,
> 
> Mayuresh
> 
> On Tue, Mar 17, 2015 at 2:10 PM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
> 
>>> What version are you running ?
>> 
>> Version 0.8.2.0
>> 
>>> Your case is 2). But the only thing weird is your replica (broker 3) is
>>> requesting for offset which is greater than the leaders log end offset.
>> 
>> 
>> So what could be the cause?
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Mar 17, 2015, at 11:45 AM, Mayuresh Gharat <
>> gharatmayuresh15@gmail.com> wrote:
>>> 
>>> What version are you running ?
>>> 
>>> The code for latest version says that :
>>> 
>>> 1) if the log end offset of the replica is greater than the leaders log
>> end
>>> offset, the replicas offset will be reset to logEndOffset of the leader.
>>> 
>>> 2) Else if the log end offset of the replica is smaller than the leaders
>>> log end offset and its out of range, the replicas offset will be reset to
>>> logStartOffset of the leader.
>>> 
>>> Your case is 2). But the only thing weird is your replica (broker 3) is
>>> requesting for offset which is greater than the leaders log end offset.
>>> 
>>> Thanks,
>>> 
>>> Mayuresh
>>> 
>>> 
>>> On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat <
>>> gharatmayuresh15@gmail.com <ma...@gmail.com> <mailto:gharatmayuresh15@gmail.com <ma...@gmail.com>>> wrote:
>>> 
>>>> cool.
>>>> 
>>>> On Tue, Mar 17, 2015 at 10:15 AM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>> 
>>>>> Hi Mayuresh,
>>>>> 
>>>>> The logs are already attached and are in reverse order starting
>> backwards
>>>>> from [2015-03-14 07:46:52,517] to the time when brokers were started.
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat <
>>>>> gharatmayuresh15@gmail.com <ma...@gmail.com>> wrote:
>>>>>> 
>>>>>> Hi Zakee,
>>>>>> 
>>>>>> Thanks for the logs. Can you paste earlier logs from broker-3 up to :
>>>>>> 
>>>>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
>>>>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
>>>>>> offset to 1400864851 (kafka.server.ReplicaFetcherThread)
>>>>>> 
>>>>>> That would help us figure out what was happening on this broker before
>>>>> it
>>>>>> issued a replicaFetch request to broker-4.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Mayuresh
>>>>>> 
>>>>>> On Mon, Mar 16, 2015 at 11:32 PM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>> 
>>>>>>> Hi Mayuresh,
>>>>>>> 
>>>>>>> Here are the logs.
>>>>>>> 
>>>>>>> ____________________________________________________________
>>>>>>> Old School Yearbook Pics
>>>>>>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc <http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc>
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Kazim Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat <
>>>>>>> gharatmayuresh15@gmail.com <ma...@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> Can you provide more logs (complete) on Broker 3 till time :
>>>>>>>> 
>>>>>>>> *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4],
>> Replica 3
>>>>>>> for
>>>>>>>> partition [Topic22kv,5] reset its fetch offset from 1400864851 to
>>>>> current
>>>>>>>> leader 4's start offset 1400864851
>> (kafka.server.ReplicaFetcherThread)
>>>>>>>> 
>>>>>>>> I would like to see logs from time much before it sent the fetch
>>>>> request
>>>>>>> to
>>>>>>>> Broker 4 to the time above. I want to check if in any case Broker 3
>>>>> was a
>>>>>>>> leader before broker 4 took over.
>>>>>>>> 
>>>>>>>> Additional logs will help.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Mayuresh
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>>> 
>>>>>>>>> log.cleanup.policy is delete not compact.
>>>>>>>>> log.cleaner.enable=true
>>>>>>>>> log.cleaner.threads=5
>>>>>>>>> log.cleanup.policy=delete
>>>>>>>>> log.flush.scheduler.interval.ms=3000
>>>>>>>>> log.retention.minutes=1440
>>>>>>>>> log.segment.bytes=1073741824  (1gb)
>>>>>>>>> 
>>>>>>>>> Messages are keyed but not compressed, producer async and uses
>> kafka
>>>>>>>>> default partitioner.
>>>>>>>>> String message = msg.getString();
>>>>>>>>> String uniqKey = ""+rnd.nextInt();// random key
>>>>>>>>> String partKey = getPartitionKey();// partition key
>>>>>>>>> KeyedMessage<String, String> data = new KeyedMessage<String,
>>>>>>>>> String>(this.topicName, uniqKey, partKey, message);
>>>>>>>>> producer.send(data);
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Zakee
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com <ma...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Is your topic log compacted? Also if it is are the messages keyed?
>>>>> Or
>>>>>>>>> are the messages compressed?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> Mayuresh
>>>>>>>>>> 
>>>>>>>>>> Sent from my iPhone
>>>>>>>>>> 
>>>>>>>>>>> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <ma...@netzero.net> <mailto:
>>>>>>>>> kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks, Jiangjie for helping resolve the kafka controller
>> migration
>>>>>>>>> driven partition leader rebalance issue. The logs are much cleaner
>>>>> now.
>>>>>>>>>>> 
>>>>>>>>>>> There are a few incidences of Out of range offset even though
>>>>> there
>>>>>>> is
>>>>>>>>> no consumers running, only producers and replica fetchers. I was
>>>>> trying
>>>>>>> to
>>>>>>>>> relate to a cause, looks like compaction (log segment deletion)
>>>>> causing
>>>>>>>>> this. Not sure whether this is expected behavior.
>>>>>>>>>>> 
>>>>>>>>>>> Broker-4:
>>>>>>>>>>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]:
>>>>> Error
>>>>>>>>> when processing fetch request for partition [Topic22kv,5] offset
>>>>>>> 1754769769
>>>>>>>>> from follower with correlation id 1645671. Possible cause: Request
>>>>> for
>>>>>>>>> offset 1754769769 but we only have log segments in the range
>>>>> 1400864851
>>>>>>> to
>>>>>>>>> 1754769732. (kafka.server.ReplicaManager)
>>>>>>>>>>> 
>>>>>>>>>>> Broker-3:
>>>>>>>>>>> [2015-03-14 07:46:52,356] INFO The cleaning for partition
>>>>>>> [Topic22kv,5]
>>>>>>>>> is aborted and paused (kafka.log.LogCleaner)
>>>>>>>>>>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851
>>>>> for
>>>>>>>>> log Topic22kv-5 for deletion. (kafka.log.Log)
>>>>>>>>>>> …
>>>>>>>>>>> [2015-03-14 07:46:52,421] INFO Compaction for partition
>>>>> [Topic22kv,5]
>>>>>>>>> is resumed (kafka.log.LogCleaner)
>>>>>>>>>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4],
>> Current
>>>>>>>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
>>>>>>> offset to
>>>>>>>>> 1400864851 (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4],
>> Replica
>>>>> 3
>>>>>>>>> for partition [Topic22kv,5] reset its fetch offset from 1400864851
>> to
>>>>>>>>> current leader 4's start offset 1400864851
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>> 
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> Old School Yearbook Pics
>>>>>>>>>>> View Class Yearbooks Online Free. Search by School & Year. Look
>>>>> Now!
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc <http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc <http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>>
>>>>>>>>>>> <topic22kv_746a_314_logs.txt>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> Zakee
>>>>>>>>>>> 
>>>>>>>>>>>> On Mar 9, 2015, at 12:18 PM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> No broker restarts.
>>>>>>>>>>>> 
>>>>>>>>>>>> Created a kafka issue:
>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-2011 <https://issues.apache.org/jira/browse/KAFKA-2011> <
>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-2011 <https://issues.apache.org/jira/browse/KAFKA-2011>>
>>>>>>>>>>>> 
>>>>>>>>>>>>>> Logs for rebalance:
>>>>>>>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming
>>>>> preferred
>>>>>>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
>>>>>>>>> completed preferred replica election:
>>>>> (kafka.controller.KafkaController)
>>>>>>>>>>>>>> …
>>>>>>>>>>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming
>>>>> preferred
>>>>>>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming
>>>>> preferred
>>>>>>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting
>>>>> preferred
>>>>>>>>> replica leader election for partitions
>>>>>>> (kafka.controller.KafkaController)
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions
>>>>>>> undergoing
>>>>>>>>> preferred replica election:  (kafka.controller.KafkaController)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
>>>>> logs
>>>>>>>>> since the restart. Is there any other reason than rebalance for
>> these
>>>>>>>>> errors?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>>>>> Error
>>>>>>>>> for partition [Topic-11,7] to broker 5:class
>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>>>>> Error
>>>>>>>>> for partition [Topic-2,25] to broker 5:class
>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>>>>> Error
>>>>>>>>> for partition [Topic-2,21] to broker 5:class
>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>>>>> Error
>>>>>>>>> for partition [Topic-22,9] to broker 5:class
>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> Could you paste the related logs in controller.log?
>>>>>>>>>>>> What specifically should I search for in the logs?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Zakee
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin
>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>
>>>>>>>>> <mailto:jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is there anything wrong with brokers around that time? E.g.
>>>>> Broker
>>>>>>>>> restart?
>>>>>>>>>>>>> The log you pasted are actually from replica fetchers. Could
>> you
>>>>>>>>> paste the
>>>>>>>>>>>>> related logs in controller.log?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net> <mailto:
>>>>>>>>> kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Correction: Actually  the rebalance happened quite until 24
>>>>> hours
>>>>>>>>> after
>>>>>>>>>>>>>> the start, and thats where below errors were found. Ideally
>>>>>>> rebalance
>>>>>>>>>>>>>> should not have happened at all.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Zakee
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net <ma...@netzero.net>
>>>>> <mailto:
>>>>>>>>> kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
>>>>>>>>> rebalance
>>>>>>>>>>>>>>>> here?
>>>>>>>>>>>>>>> Thanks for you suggestions.
>>>>>>>>>>>>>>> It looks like the rebalance actually happened only once soon
>>>>>>> after I
>>>>>>>>>>>>>>> started with clean cluster and data was pushed, it didn’t
>>>>> happen
>>>>>>>>> again
>>>>>>>>>>>>>>> so far, and I see the partitions leader counts on brokers did
>>>>> not
>>>>>>>>> change
>>>>>>>>>>>>>>> since then. One of the brokers was constantly showing 0 for
>>>>>>>>> partition
>>>>>>>>>>>>>>> leader count. Is that normal?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
>>>>> logs
>>>>>>>>>>>>>>> since the restart. Is there any other reason than rebalance
>> for
>>>>>>>>> these
>>>>>>>>>>>>>>> errors?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>>>>> Error
>>>>>>>>> for
>>>>>>>>>>>>>>> partition [Topic-11,7] to broker 5:class
>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>>>>> Error
>>>>>>>>> for
>>>>>>>>>>>>>>> partition [Topic-2,25] to broker 5:class
>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>>>>> Error
>>>>>>>>> for
>>>>>>>>>>>>>>> partition [Topic-2,21] to broker 5:class
>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>>>>> Error
>>>>>>>>> for
>>>>>>>>>>>>>>> partition [Topic-22,9] to broker 5:class
>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Some other things to check are:
>>>>>>>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
>>>>> not
>>>>>>>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
>>>>> double
>>>>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>>> Yes
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2. In zookeeper path, can you verify
>>>>>>>>> /admin/preferred_replica_election
>>>>>>>>>>>>>>>> does not exist?
>>>>>>>>>>>>>>> ls /admin
>>>>>>>>>>>>>>> [delete_topics]
>>>>>>>>>>>>>>> ls /admin/preferred_replica_election
>>>>>>>>>>>>>>> Node does not exist: /admin/preferred_replica_election
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Zakee
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
>>>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID> <mailto:jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
>>>>>>>>> rebalance
>>>>>>>>>>>>>>>> here?
>>>>>>>>>>>>>>>> Some other things to check are:
>>>>>>>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
>>>>> not
>>>>>>>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
>>>>> double
>>>>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>>>> 2. In zookeeper path, can you verify
>>>>>>>>> /admin/preferred_replica_election
>>>>>>>>>>>>>>>> does not exist?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net> <mailto:
>>>>>>>>> kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I started with  clean cluster and started to push data. It
>>>>> still
>>>>>>>>> does
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> rebalance at random durations even though the
>>>>>>>>> auto.leader.relabalance
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> set to false.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Zakee
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
>>>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID> <mailto:jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Yes, the rebalance should not happen in that case. That
>> is a
>>>>>>>>> little
>>>>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>>>> strange. Could you try to launch a clean Kafka cluster
>> with
>>>>>>>>>>>>>>>>>> auto.leader.election disabled and try push data?
>>>>>>>>>>>>>>>>>> When leader migration occurs, NotLeaderForPartition
>>>>> exception
>>>>>>> is
>>>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>
>> <mailto:
>>>>>>>>> kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting
>>>>>>> preferred
>>>>>>>>>>>>>>>>>>> replica
>>>>>>>>>>>>>>>>>>> leader election for partitions” in logs. I also see lot
>> of
>>>>>>>>> Produce
>>>>>>>>>>>>>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I tried switching off the auto.leader.relabalance to
>>>>> false. I
>>>>>>> am
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>> noticing the rebalance happening. My understanding was
>> the
>>>>>>>>> rebalance
>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> not happen when this is set to false.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>> Zakee
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>>>>>>>>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID> <mailto:
>>>>> jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>
>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I don’t think num.replica.fetchers will help in this
>> case.
>>>>>>>>>>>>>>>>>>>> Increasing
>>>>>>>>>>>>>>>>>>>> number of fetcher threads will only help in cases where
>>>>> you
>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>> large
>>>>>>>>>>>>>>>>>>>> amount of data coming into a broker and more replica
>>>>> fetcher
>>>>>>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker.
>>>>> But in
>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>> case,
>>>>>>>>>>>>>>>>>>>> it looks that leader migration cause issue.
>>>>>>>>>>>>>>>>>>>> Do you see anything else in the log? Like preferred
>> leader
>>>>>>>>>>>>>>>>>>>> election?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>
>>>>> <mailto:
>>>>>>>>> kzakee1@netzero.net <ma...@netzero.net>>
>>>>>>>>>>>>>>>>>>>> <mailto:kzakee1@netzero.net <ma...@netzero.net> <mailto:kzakee1@netzero.net <ma...@netzero.net>
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Yes, I do see under partitions usually shooting every
>>>>> hour.
>>>>>>>>>>>>>>>>>>>>> Anythings
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> I could try to reduce it?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica
>> sync?
>>>>>>>>> Currently
>>>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>>>>>>>>>>>>>> <jqin@linkedin.com.invalid <ma...@linkedin.com.invalid> <mailto:
>>>>>>> jqin@linkedin.com.invalid <ma...@linkedin.com.invalid>
>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> These messages are usually caused by leader
>> migration. I
>>>>>>>>> think as
>>>>>>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of
>>>>>>> under
>>>>>>>>>>>>>>>>>>>>>> replicated
>>>>>>>>>>>>>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>
>>>>>>> <mailto:
>>>>>>>>> kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Need to know if I should I be worried about this or
>>>>> ignore
>>>>>>>>> them.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker
>>>>>>> logs,
>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>>>>>>>>>>>>>> [TestTopic]
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR
>>>>>>> [ReplicaFetcherThread-3-5],
>>>>>>>>>>>>>>>>>>>>>>> Error
>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on
>>>>> Broker
>>>>>>>>> 2]:
>>>>>>>>>>>>>>>>>>>>>>> Fetch
>>>>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>>>> with correlation id 950084 from client
>>>>>>>>> ReplicaFetcherThread-1-2
>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not
>> local
>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> partition
>>>>>>>>>>>>>>>>>>>>>>> [TestTopic,2] on broker 2
>> (kafka.server.ReplicaManager)
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>>>>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>> <
>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>>
>>>>>>>>>>>>>>>>>>>>>>> st0
>>>>>>>>>>>>>>>>>>>>>>> 3v
>>>>>>>>>>>>>>>>>>>>>>> uc
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample
>>>>>>> Workspace,
>>>>>>>>>>>>>>>>>>>>>> Free
>>>>>>>>>>>>>>>>>>>>>> WIFI
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>> <
>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>> 
>>>>>>>>>>>>>>>>>>>>>> p02
>>>>>>>>>>>>>>>>>>>>>> du
>>>>>>>>>>>>>>>>>>>>>> c
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>>>>>>>>>>>>>> guaranteed.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>> <
>>>>>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>>> 
>>>>>>>>>>>>>>>>>>>> uc
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13 <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13>
>>>>>>>>>>>>>>>>>>>> duc
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>> ____________________________________________________________
>>>>>>>>>>>>>>>>>> The WORST exercise for aging
>>>>>>>>>>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10
>>>>> years
>>>>>>>>>>>>>>>>>> YOUNGER
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>>> <
>>>>>>>>> 
>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>> 
>>>>>>>>>>>>>>>>>> uc
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>> Seabourn Luxury Cruises
>>>>>>>>>>>>>>>> Receive special offers from the World&#39;s Finest
>> Small-Ship
>>>>>>>>> Cruise
>>>>>>>>>>>>>>>> Line!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>> Discover Seabourn
>>>>>>>>>>>>> A journey as beautiful as the destination, request a brochure
>>>>> today!
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>>
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Zakee
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>> Want to place your ad here?
>>>>>>>>>>>> Advertise on United Online
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc <http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc>
>>>>>>>>>>> 
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> What's your flood risk?
>>>>>>>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc <http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
>>>>>>>>> <
>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc <http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>>
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> -Regards,
>>>>>>>> Mayuresh R. Gharat
>>>>>>>> (862) 250-7125
>>>>>>>> ____________________________________________________________
>>>>>>>> What's your flood risk?
>>>>>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>>>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/55072125266de21244da8mp12duc <http://thirdpartyoffers.netzero.net/TGL3255/55072125266de21244da8mp12duc>
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> -Regards,
>>>>>> Mayuresh R. Gharat
>>>>>> (862) 250-7125
>>>>>> ____________________________________________________________
>>>>>> High School Yearbooks
>>>>>> View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
>>>>>> 
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/5507e24f3050f624f0e4amp01duc <http://thirdpartyoffers.netzero.net/TGL3255/5507e24f3050f624f0e4amp01duc>
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Regards,
>>>> Mayuresh R. Gharat
>>>> (862) 250-7125
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> -Regards,
>>> Mayuresh R. Gharat
>>> (862) 250-7125
>>> ____________________________________________________________
>>> What's your flood risk?
>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>>> http://thirdpartyoffers.netzero.net/TGL3255/5508867f356467f4946mp08duc <http://thirdpartyoffers.netzero.net/TGL3255/5508867f356467f4946mp08duc> <
>> http://thirdpartyoffers.netzero.net/TGL3255/5508867f356467f4946mp08duc <http://thirdpartyoffers.netzero.net/TGL3255/5508867f356467f4946mp08duc>>
>> 
> 
> 
> 
> -- 
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
> ____________________________________________________________
> Have you been injured?
> Get a free evaluation today to see what your injury case is worth.
> http://thirdpartyoffers.netzero.net/TGL3255/5508b724e367f37243167mp10duc <http://thirdpartyoffers.netzero.net/TGL3255/5508b724e367f37243167mp10duc>

Re: Broker Exceptions

Posted by Mayuresh Gharat <gh...@gmail.com>.

We are trying to see what might have caused it.

We had some questions :
1) Is this reproducible? That way we can dig deep.


This looks interesting problem to solve and you might have caught a bug,
but we need to verify the root cause before filing a ticket.

Thanks,

Mayuresh

On Tue, Mar 17, 2015 at 2:10 PM, Zakee <kz...@netzero.net> wrote:

> > What version are you running ?
>
> Version 0.8.2.0
>
> > Your case is 2). But the only thing weird is your replica (broker 3) is
> > requesting for offset which is greater than the leaders log end offset.
>
>
> So what could be the cause?
>
> Thanks
> Zakee
>
>
>
> > On Mar 17, 2015, at 11:45 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com> wrote:
> >
> > What version are you running ?
> >
> > The code for latest version says that :
> >
> > 1) if the log end offset of the replica is greater than the leaders log
> end
> > offset, the replicas offset will be reset to logEndOffset of the leader.
> >
> > 2) Else if the log end offset of the replica is smaller than the leaders
> > log end offset and its out of range, the replicas offset will be reset to
> > logStartOffset of the leader.
> >
> > Your case is 2). But the only thing weird is your replica (broker 3) is
> > requesting for offset which is greater than the leaders log end offset.
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat <
> > gharatmayuresh15@gmail.com <ma...@gmail.com>> wrote:
> >
> >> cool.
> >>
> >> On Tue, Mar 17, 2015 at 10:15 AM, Zakee <kz...@netzero.net> wrote:
> >>
> >>> Hi Mayuresh,
> >>>
> >>> The logs are already attached and are in reverse order starting
> backwards
> >>> from [2015-03-14 07:46:52,517] to the time when brokers were started.
> >>>
> >>> Thanks
> >>> Zakee
> >>>
> >>>
> >>>
> >>>> On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat <
> >>> gharatmayuresh15@gmail.com> wrote:
> >>>>
> >>>> Hi Zakee,
> >>>>
> >>>> Thanks for the logs. Can you paste earlier logs from broker-3 up to :
> >>>>
> >>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
> >>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
> >>>> offset to 1400864851 (kafka.server.ReplicaFetcherThread)
> >>>>
> >>>> That would help us figure out what was happening on this broker before
> >>> it
> >>>> issued a replicaFetch request to broker-4.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Mayuresh
> >>>>
> >>>> On Mon, Mar 16, 2015 at 11:32 PM, Zakee <kz...@netzero.net> wrote:
> >>>>
> >>>>> Hi Mayuresh,
> >>>>>
> >>>>> Here are the logs.
> >>>>>
> >>>>> ____________________________________________________________
> >>>>> Old School Yearbook Pics
> >>>>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Kazim Zakee
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat <
> >>>>> gharatmayuresh15@gmail.com> wrote:
> >>>>>>
> >>>>>> Can you provide more logs (complete) on Broker 3 till time :
> >>>>>>
> >>>>>> *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4],
> Replica 3
> >>>>> for
> >>>>>> partition [Topic22kv,5] reset its fetch offset from 1400864851 to
> >>> current
> >>>>>> leader 4's start offset 1400864851
> (kafka.server.ReplicaFetcherThread)
> >>>>>>
> >>>>>> I would like to see logs from time much before it sent the fetch
> >>> request
> >>>>> to
> >>>>>> Broker 4 to the time above. I want to check if in any case Broker 3
> >>> was a
> >>>>>> leader before broker 4 took over.
> >>>>>>
> >>>>>> Additional logs will help.
> >>>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Mayuresh
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kz...@netzero.net> wrote:
> >>>>>>
> >>>>>>> log.cleanup.policy is delete not compact.
> >>>>>>> log.cleaner.enable=true
> >>>>>>> log.cleaner.threads=5
> >>>>>>> log.cleanup.policy=delete
> >>>>>>> log.flush.scheduler.interval.ms=3000
> >>>>>>> log.retention.minutes=1440
> >>>>>>> log.segment.bytes=1073741824  (1gb)
> >>>>>>>
> >>>>>>> Messages are keyed but not compressed, producer async and uses
> kafka
> >>>>>>> default partitioner.
> >>>>>>> String message = msg.getString();
> >>>>>>> String uniqKey = ""+rnd.nextInt();// random key
> >>>>>>> String partKey = getPartitionKey();// partition key
> >>>>>>> KeyedMessage<String, String> data = new KeyedMessage<String,
> >>>>>>> String>(this.topicName, uniqKey, partKey, message);
> >>>>>>> producer.send(data);
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Zakee
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com wrote:
> >>>>>>>>
> >>>>>>>> Is your topic log compacted? Also if it is are the messages keyed?
> >>> Or
> >>>>>>> are the messages compressed?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Mayuresh
> >>>>>>>>
> >>>>>>>> Sent from my iPhone
> >>>>>>>>
> >>>>>>>>> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <mailto:
> >>>>>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks, Jiangjie for helping resolve the kafka controller
> migration
> >>>>>>> driven partition leader rebalance issue. The logs are much cleaner
> >>> now.
> >>>>>>>>>
> >>>>>>>>> There are a few incidences of Out of range offset even though
> >>> there
> >>>>> is
> >>>>>>> no consumers running, only producers and replica fetchers. I was
> >>> trying
> >>>>> to
> >>>>>>> relate to a cause, looks like compaction (log segment deletion)
> >>> causing
> >>>>>>> this. Not sure whether this is expected behavior.
> >>>>>>>>>
> >>>>>>>>> Broker-4:
> >>>>>>>>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]:
> >>> Error
> >>>>>>> when processing fetch request for partition [Topic22kv,5] offset
> >>>>> 1754769769
> >>>>>>> from follower with correlation id 1645671. Possible cause: Request
> >>> for
> >>>>>>> offset 1754769769 but we only have log segments in the range
> >>> 1400864851
> >>>>> to
> >>>>>>> 1754769732. (kafka.server.ReplicaManager)
> >>>>>>>>>
> >>>>>>>>> Broker-3:
> >>>>>>>>> [2015-03-14 07:46:52,356] INFO The cleaning for partition
> >>>>> [Topic22kv,5]
> >>>>>>> is aborted and paused (kafka.log.LogCleaner)
> >>>>>>>>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851
> >>> for
> >>>>>>> log Topic22kv-5 for deletion. (kafka.log.Log)
> >>>>>>>>> …
> >>>>>>>>> [2015-03-14 07:46:52,421] INFO Compaction for partition
> >>> [Topic22kv,5]
> >>>>>>> is resumed (kafka.log.LogCleaner)
> >>>>>>>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4],
> Current
> >>>>>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
> >>>>> offset to
> >>>>>>> 1400864851 (kafka.server.ReplicaFetcherThread)
> >>>>>>>>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4],
> Replica
> >>> 3
> >>>>>>> for partition [Topic22kv,5] reset its fetch offset from 1400864851
> to
> >>>>>>> current leader 4's start offset 1400864851
> >>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>
> >>>>>>>>> ____________________________________________________________
> >>>>>>>>> Old School Yearbook Pics
> >>>>>>>>> View Class Yearbooks Online Free. Search by School & Year. Look
> >>> Now!
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc
> >>> <
> >>>>>>>
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
> >>>>>>>>> <topic22kv_746a_314_logs.txt>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Zakee
> >>>>>>>>>
> >>>>>>>>>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
> >>>>>>>>>>
> >>>>>>>>>> No broker restarts.
> >>>>>>>>>>
> >>>>>>>>>> Created a kafka issue:
> >>>>>>> https://issues.apache.org/jira/browse/KAFKA-2011 <
> >>>>>>> https://issues.apache.org/jira/browse/KAFKA-2011>
> >>>>>>>>>>
> >>>>>>>>>>>> Logs for rebalance:
> >>>>>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming
> >>> preferred
> >>>>>>> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
> >>>>>>> completed preferred replica election:
> >>> (kafka.controller.KafkaController)
> >>>>>>>>>>>> …
> >>>>>>>>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming
> >>> preferred
> >>>>>>> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>>>>>>> ...
> >>>>>>>>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming
> >>> preferred
> >>>>>>> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>>>>>>> ...
> >>>>>>>>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting
> >>> preferred
> >>>>>>> replica leader election for partitions
> >>>>> (kafka.controller.KafkaController)
> >>>>>>>>>>>> ...
> >>>>>>>>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions
> >>>>> undergoing
> >>>>>>> preferred replica election:  (kafka.controller.KafkaController)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
> >>> logs
> >>>>>>> since the restart. Is there any other reason than rebalance for
> these
> >>>>>>> errors?
> >>>>>>>>>>>>
> >>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
> >>> Error
> >>>>>>> for partition [Topic-11,7] to broker 5:class
> >>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
> >>> Error
> >>>>>>> for partition [Topic-2,25] to broker 5:class
> >>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
> >>> Error
> >>>>>>> for partition [Topic-2,21] to broker 5:class
> >>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
> >>> Error
> >>>>>>> for partition [Topic-22,9] to broker 5:class
> >>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Could you paste the related logs in controller.log?
> >>>>>>>>>> What specifically should I search for in the logs?
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Zakee
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin
> >>> <jqin@linkedin.com.INVALID
> >>>>>>> <ma...@linkedin.com.INVALID>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Is there anything wrong with brokers around that time? E.g.
> >>> Broker
> >>>>>>> restart?
> >>>>>>>>>>> The log you pasted are actually from replica fetchers. Could
> you
> >>>>>>> paste the
> >>>>>>>>>>> related logs in controller.log?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks.
> >>>>>>>>>>>
> >>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>
> >>>>>>>>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <mailto:
> >>>>>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Correction: Actually  the rebalance happened quite until 24
> >>> hours
> >>>>>>> after
> >>>>>>>>>>>> the start, and thats where below errors were found. Ideally
> >>>>> rebalance
> >>>>>>>>>>>> should not have happened at all.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Zakee
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net
> >>> <mailto:
> >>>>>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> >>>>>>> rebalance
> >>>>>>>>>>>>>> here?
> >>>>>>>>>>>>> Thanks for you suggestions.
> >>>>>>>>>>>>> It looks like the rebalance actually happened only once soon
> >>>>> after I
> >>>>>>>>>>>>> started with clean cluster and data was pushed, it didn’t
> >>> happen
> >>>>>>> again
> >>>>>>>>>>>>> so far, and I see the partitions leader counts on brokers did
> >>> not
> >>>>>>> change
> >>>>>>>>>>>>> since then. One of the brokers was constantly showing 0 for
> >>>>>>> partition
> >>>>>>>>>>>>> leader count. Is that normal?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
> >>> logs
> >>>>>>>>>>>>> since the restart. Is there any other reason than rebalance
> for
> >>>>>>> these
> >>>>>>>>>>>>> errors?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
> >>> Error
> >>>>>>> for
> >>>>>>>>>>>>> partition [Topic-11,7] to broker 5:class
> >>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
> >>> Error
> >>>>>>> for
> >>>>>>>>>>>>> partition [Topic-2,25] to broker 5:class
> >>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
> >>> Error
> >>>>>>> for
> >>>>>>>>>>>>> partition [Topic-2,21] to broker 5:class
> >>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
> >>> Error
> >>>>>>> for
> >>>>>>>>>>>>> partition [Topic-22,9] to broker 5:class
> >>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Some other things to check are:
> >>>>>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
> >>> not
> >>>>>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
> >>> double
> >>>>>>>>>>>>>> confirm.
> >>>>>>>>>>>>> Yes
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2. In zookeeper path, can you verify
> >>>>>>> /admin/preferred_replica_election
> >>>>>>>>>>>>>> does not exist?
> >>>>>>>>>>>>> ls /admin
> >>>>>>>>>>>>> [delete_topics]
> >>>>>>>>>>>>> ls /admin/preferred_replica_election
> >>>>>>>>>>>>> Node does not exist: /admin/preferred_replica_election
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>> Zakee
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
> >>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> >>>>>>> rebalance
> >>>>>>>>>>>>>> here?
> >>>>>>>>>>>>>> Some other things to check are:
> >>>>>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
> >>> not
> >>>>>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
> >>> double
> >>>>>>>>>>>>>> confirm.
> >>>>>>>>>>>>>> 2. In zookeeper path, can you verify
> >>>>>>> /admin/preferred_replica_election
> >>>>>>>>>>>>>> does not exist?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <mailto:
> >>>>>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I started with  clean cluster and started to push data. It
> >>> still
> >>>>>>> does
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> rebalance at random durations even though the
> >>>>>>> auto.leader.relabalance
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>> set to false.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>> Zakee
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
> >>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Yes, the rebalance should not happen in that case. That
> is a
> >>>>>>> little
> >>>>>>>>>>>>>>>> bit
> >>>>>>>>>>>>>>>> strange. Could you try to launch a clean Kafka cluster
> with
> >>>>>>>>>>>>>>>> auto.leader.election disabled and try push data?
> >>>>>>>>>>>>>>>> When leader migration occurs, NotLeaderForPartition
> >>> exception
> >>>>> is
> >>>>>>>>>>>>>>>> expected.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net
> <mailto:
> >>>>>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting
> >>>>> preferred
> >>>>>>>>>>>>>>>>> replica
> >>>>>>>>>>>>>>>>> leader election for partitions” in logs. I also see lot
> of
> >>>>>>> Produce
> >>>>>>>>>>>>>>>>> request failure warnings in with the NotLeader Exception.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I tried switching off the auto.leader.relabalance to
> >>> false. I
> >>>>> am
> >>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>> noticing the rebalance happening. My understanding was
> the
> >>>>>>> rebalance
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> not happen when this is set to false.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>>> Zakee
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
> >>>>>>>>>>>>>>>>>> <jqin@linkedin.com.INVALID <mailto:
> >>> jqin@linkedin.com.INVALID
> >>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I don’t think num.replica.fetchers will help in this
> case.
> >>>>>>>>>>>>>>>>>> Increasing
> >>>>>>>>>>>>>>>>>> number of fetcher threads will only help in cases where
> >>> you
> >>>>>>> have a
> >>>>>>>>>>>>>>>>>> large
> >>>>>>>>>>>>>>>>>> amount of data coming into a broker and more replica
> >>> fetcher
> >>>>>>>>>>>>>>>>>> threads
> >>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker.
> >>> But in
> >>>>>>> your
> >>>>>>>>>>>>>>>>>> case,
> >>>>>>>>>>>>>>>>>> it looks that leader migration cause issue.
> >>>>>>>>>>>>>>>>>> Do you see anything else in the log? Like preferred
> leader
> >>>>>>>>>>>>>>>>>> election?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
> >>> <mailto:
> >>>>>>> kzakee1@netzero.net>
> >>>>>>>>>>>>>>>>>> <mailto:kzakee1@netzero.net <mailto:kzakee1@netzero.net
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks, Jiangjie.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Yes, I do see under partitions usually shooting every
> >>> hour.
> >>>>>>>>>>>>>>>>>>> Anythings
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> I could try to reduce it?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica
> sync?
> >>>>>>> Currently
> >>>>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>> configured 7 each of 5 brokers.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
> >>>>>>>>>>>>>>>>>>> <jqin@linkedin.com.invalid <mailto:
> >>>>> jqin@linkedin.com.invalid
> >>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> These messages are usually caused by leader
> migration. I
> >>>>>>> think as
> >>>>>>>>>>>>>>>>>>>> long
> >>>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of
> >>>>> under
> >>>>>>>>>>>>>>>>>>>> replicated
> >>>>>>>>>>>>>>>>>>>> partitions, it should be fine.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net
> >>>>> <mailto:
> >>>>>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Need to know if I should I be worried about this or
> >>> ignore
> >>>>>>> them.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker
> >>>>> logs,
> >>>>>>> not
> >>>>>>>>>>>>>>>>>>>>> sure
> >>>>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>> causes them and what could be done to fix them.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
> >>>>>>>>>>>>>>>>>>>>> [TestTopic]
> >>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> broker
> >>>>>>>>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR
> >>>>> [ReplicaFetcherThread-3-5],
> >>>>>>>>>>>>>>>>>>>>> Error
> >>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
> >>>>>>>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on
> >>> Broker
> >>>>>>> 2]:
> >>>>>>>>>>>>>>>>>>>>> Fetch
> >>>>>>>>>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>>>>>>> with correlation id 950084 from client
> >>>>>>> ReplicaFetcherThread-1-2
> >>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not
> local
> >>> for
> >>>>>>>>>>>>>>>>>>>>> partition
> >>>>>>>>>>>>>>>>>>>>> [TestTopic,2] on broker 2
> (kafka.server.ReplicaManager)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Any ideas?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>>>>>>>>>>
> >>>>> ____________________________________________________________
> >>>>>>>>>>>>>>>>>>>>> Next Apple Sensation
> >>>>>>>>>>>>>>>>>>>>> 1 little-known path to big profits
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061
> <
> >>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
> >>>>>>>>>>>>>>>>>>>>> st0
> >>>>>>>>>>>>>>>>>>>>> 3v
> >>>>>>>>>>>>>>>>>>>>> uc
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>> ____________________________________________________________
> >>>>>>>>>>>>>>>>>>>> Extended Stay America
> >>>>>>>>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample
> >>>>> Workspace,
> >>>>>>>>>>>>>>>>>>>> Free
> >>>>>>>>>>>>>>>>>>>> WIFI
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m
> <
> >>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m
> >
> >>>>>>>>>>>>>>>>>>>> p02
> >>>>>>>>>>>>>>>>>>>> du
> >>>>>>>>>>>>>>>>>>>> c
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>> ____________________________________________________________
> >>>>>>>>>>>>>>>>>> Extended Stay America
> >>>>>>>>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
> >>>>>>>>>>>>>>>>>> guaranteed.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
> >>> <
> >>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
> >>>>
> >>>>>>>>>>>>>>>>>> uc
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> <
> >>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
> >>>>>>>>>>>>>>>>>> duc
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> ____________________________________________________________
> >>>>>>>>>>>>>>>> The WORST exercise for aging
> >>>>>>>>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10
> >>> years
> >>>>>>>>>>>>>>>> YOUNGER
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>
> >>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
> >>>>> <
> >>>>>>>
> >>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
> >
> >>>>>>>>>>>>>>>> uc
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>>>>> Seabourn Luxury Cruises
> >>>>>>>>>>>>>> Receive special offers from the World&#39;s Finest
> Small-Ship
> >>>>>>> Cruise
> >>>>>>>>>>>>>> Line!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc
> >>> <
> >>>>>>>
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>> Discover Seabourn
> >>>>>>>>>>> A journey as beautiful as the destination, request a brochure
> >>> today!
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc
> >>> <
> >>>>>>>
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Zakee
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> ____________________________________________________________
> >>>>>>>>>> Want to place your ad here?
> >>>>>>>>>> Advertise on United Online
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
> >>>>>>>>>
> >>>>>>>> ____________________________________________________________
> >>>>>>>> What's your flood risk?
> >>>>>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
> >>>>>>>>
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc
> >>>>>>> <
> >>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -Regards,
> >>>>>> Mayuresh R. Gharat
> >>>>>> (862) 250-7125
> >>>>>> ____________________________________________________________
> >>>>>> What's your flood risk?
> >>>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
> >>>>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/55072125266de21244da8mp12duc
> >>>>>
> >>>>> Thanks
> >>>>> Zakee
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> -Regards,
> >>>> Mayuresh R. Gharat
> >>>> (862) 250-7125
> >>>> ____________________________________________________________
> >>>> High School Yearbooks
> >>>> View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
> >>>>
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/5507e24f3050f624f0e4amp01duc
> >>>
> >>>
> >>
> >>
> >> --
> >> -Regards,
> >> Mayuresh R. Gharat
> >> (862) 250-7125
> >>
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> > ____________________________________________________________
> > What's your flood risk?
> > Find flood maps, interactive tools, FAQs, and agents in your area.
> > http://thirdpartyoffers.netzero.net/TGL3255/5508867f356467f4946mp08duc <
> http://thirdpartyoffers.netzero.net/TGL3255/5508867f356467f4946mp08duc>
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

> What version are you running ?

Version 0.8.2.0

> Your case is 2). But the only thing weird is your replica (broker 3) is
> requesting for offset which is greater than the leaders log end offset.


So what could be the cause?

Thanks
Zakee



> On Mar 17, 2015, at 11:45 AM, Mayuresh Gharat <gh...@gmail.com> wrote:
> 
> What version are you running ?
> 
> The code for latest version says that :
> 
> 1) if the log end offset of the replica is greater than the leaders log end
> offset, the replicas offset will be reset to logEndOffset of the leader.
> 
> 2) Else if the log end offset of the replica is smaller than the leaders
> log end offset and its out of range, the replicas offset will be reset to
> logStartOffset of the leader.
> 
> Your case is 2). But the only thing weird is your replica (broker 3) is
> requesting for offset which is greater than the leaders log end offset.
> 
> Thanks,
> 
> Mayuresh
> 
> 
> On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com <ma...@gmail.com>> wrote:
> 
>> cool.
>> 
>> On Tue, Mar 17, 2015 at 10:15 AM, Zakee <kz...@netzero.net> wrote:
>> 
>>> Hi Mayuresh,
>>> 
>>> The logs are already attached and are in reverse order starting backwards
>>> from [2015-03-14 07:46:52,517] to the time when brokers were started.
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat <
>>> gharatmayuresh15@gmail.com> wrote:
>>>> 
>>>> Hi Zakee,
>>>> 
>>>> Thanks for the logs. Can you paste earlier logs from broker-3 up to :
>>>> 
>>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
>>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
>>>> offset to 1400864851 (kafka.server.ReplicaFetcherThread)
>>>> 
>>>> That would help us figure out what was happening on this broker before
>>> it
>>>> issued a replicaFetch request to broker-4.
>>>> 
>>>> Thanks,
>>>> 
>>>> Mayuresh
>>>> 
>>>> On Mon, Mar 16, 2015 at 11:32 PM, Zakee <kz...@netzero.net> wrote:
>>>> 
>>>>> Hi Mayuresh,
>>>>> 
>>>>> Here are the logs.
>>>>> 
>>>>> ____________________________________________________________
>>>>> Old School Yearbook Pics
>>>>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Kazim Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat <
>>>>> gharatmayuresh15@gmail.com> wrote:
>>>>>> 
>>>>>> Can you provide more logs (complete) on Broker 3 till time :
>>>>>> 
>>>>>> *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3
>>>>> for
>>>>>> partition [Topic22kv,5] reset its fetch offset from 1400864851 to
>>> current
>>>>>> leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
>>>>>> 
>>>>>> I would like to see logs from time much before it sent the fetch
>>> request
>>>>> to
>>>>>> Broker 4 to the time above. I want to check if in any case Broker 3
>>> was a
>>>>>> leader before broker 4 took over.
>>>>>> 
>>>>>> Additional logs will help.
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Mayuresh
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kz...@netzero.net> wrote:
>>>>>> 
>>>>>>> log.cleanup.policy is delete not compact.
>>>>>>> log.cleaner.enable=true
>>>>>>> log.cleaner.threads=5
>>>>>>> log.cleanup.policy=delete
>>>>>>> log.flush.scheduler.interval.ms=3000
>>>>>>> log.retention.minutes=1440
>>>>>>> log.segment.bytes=1073741824  (1gb)
>>>>>>> 
>>>>>>> Messages are keyed but not compressed, producer async and uses kafka
>>>>>>> default partitioner.
>>>>>>> String message = msg.getString();
>>>>>>> String uniqKey = ""+rnd.nextInt();// random key
>>>>>>> String partKey = getPartitionKey();// partition key
>>>>>>> KeyedMessage<String, String> data = new KeyedMessage<String,
>>>>>>> String>(this.topicName, uniqKey, partKey, message);
>>>>>>> producer.send(data);
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com wrote:
>>>>>>>> 
>>>>>>>> Is your topic log compacted? Also if it is are the messages keyed?
>>> Or
>>>>>>> are the messages compressed?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Mayuresh
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>>> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <mailto:
>>>>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks, Jiangjie for helping resolve the kafka controller migration
>>>>>>> driven partition leader rebalance issue. The logs are much cleaner
>>> now.
>>>>>>>>> 
>>>>>>>>> There are a few incidences of Out of range offset even though
>>> there
>>>>> is
>>>>>>> no consumers running, only producers and replica fetchers. I was
>>> trying
>>>>> to
>>>>>>> relate to a cause, looks like compaction (log segment deletion)
>>> causing
>>>>>>> this. Not sure whether this is expected behavior.
>>>>>>>>> 
>>>>>>>>> Broker-4:
>>>>>>>>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]:
>>> Error
>>>>>>> when processing fetch request for partition [Topic22kv,5] offset
>>>>> 1754769769
>>>>>>> from follower with correlation id 1645671. Possible cause: Request
>>> for
>>>>>>> offset 1754769769 but we only have log segments in the range
>>> 1400864851
>>>>> to
>>>>>>> 1754769732. (kafka.server.ReplicaManager)
>>>>>>>>> 
>>>>>>>>> Broker-3:
>>>>>>>>> [2015-03-14 07:46:52,356] INFO The cleaning for partition
>>>>> [Topic22kv,5]
>>>>>>> is aborted and paused (kafka.log.LogCleaner)
>>>>>>>>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851
>>> for
>>>>>>> log Topic22kv-5 for deletion. (kafka.log.Log)
>>>>>>>>> …
>>>>>>>>> [2015-03-14 07:46:52,421] INFO Compaction for partition
>>> [Topic22kv,5]
>>>>>>> is resumed (kafka.log.LogCleaner)
>>>>>>>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
>>>>>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
>>>>> offset to
>>>>>>> 1400864851 (kafka.server.ReplicaFetcherThread)
>>>>>>>>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica
>>> 3
>>>>>>> for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
>>>>>>> current leader 4's start offset 1400864851
>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>> 
>>>>>>>>> ____________________________________________________________
>>>>>>>>> Old School Yearbook Pics
>>>>>>>>> View Class Yearbooks Online Free. Search by School & Year. Look
>>> Now!
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc
>>> <
>>>>>>> 
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
>>>>>>>>> <topic22kv_746a_314_logs.txt>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Zakee
>>>>>>>>> 
>>>>>>>>>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> No broker restarts.
>>>>>>>>>> 
>>>>>>>>>> Created a kafka issue:
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-2011 <
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-2011>
>>>>>>>>>> 
>>>>>>>>>>>> Logs for rebalance:
>>>>>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming
>>> preferred
>>>>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
>>>>>>> completed preferred replica election:
>>> (kafka.controller.KafkaController)
>>>>>>>>>>>> …
>>>>>>>>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming
>>> preferred
>>>>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>>>>> ...
>>>>>>>>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming
>>> preferred
>>>>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>>>>> ...
>>>>>>>>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting
>>> preferred
>>>>>>> replica leader election for partitions
>>>>> (kafka.controller.KafkaController)
>>>>>>>>>>>> ...
>>>>>>>>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions
>>>>> undergoing
>>>>>>> preferred replica election:  (kafka.controller.KafkaController)
>>>>>>>>>>>> 
>>>>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
>>> logs
>>>>>>> since the restart. Is there any other reason than rebalance for these
>>>>>>> errors?
>>>>>>>>>>>> 
>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>>> Error
>>>>>>> for partition [Topic-11,7] to broker 5:class
>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>>> Error
>>>>>>> for partition [Topic-2,25] to broker 5:class
>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>>> Error
>>>>>>> for partition [Topic-2,21] to broker 5:class
>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>>> Error
>>>>>>> for partition [Topic-22,9] to broker 5:class
>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Could you paste the related logs in controller.log?
>>>>>>>>>> What specifically should I search for in the logs?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Zakee
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin
>>> <jqin@linkedin.com.INVALID
>>>>>>> <ma...@linkedin.com.INVALID>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Is there anything wrong with brokers around that time? E.g.
>>> Broker
>>>>>>> restart?
>>>>>>>>>>> The log you pasted are actually from replica fetchers. Could you
>>>>>>> paste the
>>>>>>>>>>> related logs in controller.log?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>> 
>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>> 
>>>>>>>>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <mailto:
>>>>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Correction: Actually  the rebalance happened quite until 24
>>> hours
>>>>>>> after
>>>>>>>>>>>> the start, and thats where below errors were found. Ideally
>>>>> rebalance
>>>>>>>>>>>> should not have happened at all.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Zakee
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net
>>> <mailto:
>>>>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
>>>>>>> rebalance
>>>>>>>>>>>>>> here?
>>>>>>>>>>>>> Thanks for you suggestions.
>>>>>>>>>>>>> It looks like the rebalance actually happened only once soon
>>>>> after I
>>>>>>>>>>>>> started with clean cluster and data was pushed, it didn’t
>>> happen
>>>>>>> again
>>>>>>>>>>>>> so far, and I see the partitions leader counts on brokers did
>>> not
>>>>>>> change
>>>>>>>>>>>>> since then. One of the brokers was constantly showing 0 for
>>>>>>> partition
>>>>>>>>>>>>> leader count. Is that normal?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
>>> logs
>>>>>>>>>>>>> since the restart. Is there any other reason than rebalance for
>>>>>>> these
>>>>>>>>>>>>> errors?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>>> Error
>>>>>>> for
>>>>>>>>>>>>> partition [Topic-11,7] to broker 5:class
>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>>> Error
>>>>>>> for
>>>>>>>>>>>>> partition [Topic-2,25] to broker 5:class
>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>>> Error
>>>>>>> for
>>>>>>>>>>>>> partition [Topic-2,21] to broker 5:class
>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>>> Error
>>>>>>> for
>>>>>>>>>>>>> partition [Topic-22,9] to broker 5:class
>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Some other things to check are:
>>>>>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
>>> not
>>>>>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
>>> double
>>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>> Yes
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. In zookeeper path, can you verify
>>>>>>> /admin/preferred_replica_election
>>>>>>>>>>>>>> does not exist?
>>>>>>>>>>>>> ls /admin
>>>>>>>>>>>>> [delete_topics]
>>>>>>>>>>>>> ls /admin/preferred_replica_election
>>>>>>>>>>>>> Node does not exist: /admin/preferred_replica_election
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Zakee
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
>>>>>>> rebalance
>>>>>>>>>>>>>> here?
>>>>>>>>>>>>>> Some other things to check are:
>>>>>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
>>> not
>>>>>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
>>> double
>>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>> 2. In zookeeper path, can you verify
>>>>>>> /admin/preferred_replica_election
>>>>>>>>>>>>>> does not exist?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <mailto:
>>>>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I started with  clean cluster and started to push data. It
>>> still
>>>>>>> does
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> rebalance at random durations even though the
>>>>>>> auto.leader.relabalance
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> set to false.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Zakee
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, the rebalance should not happen in that case. That is a
>>>>>>> little
>>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>>>>>>>>>>>> auto.leader.election disabled and try push data?
>>>>>>>>>>>>>>>> When leader migration occurs, NotLeaderForPartition
>>> exception
>>>>> is
>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <mailto:
>>>>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting
>>>>> preferred
>>>>>>>>>>>>>>>>> replica
>>>>>>>>>>>>>>>>> leader election for partitions” in logs. I also see lot of
>>>>>>> Produce
>>>>>>>>>>>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I tried switching off the auto.leader.relabalance to
>>> false. I
>>>>> am
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>> noticing the rebalance happening. My understanding was the
>>>>>>> rebalance
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> not happen when this is set to false.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Zakee
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>>>>>>>>>>>> <jqin@linkedin.com.INVALID <mailto:
>>> jqin@linkedin.com.INVALID
>>>>>>> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>>>>>>>>>>>> Increasing
>>>>>>>>>>>>>>>>>> number of fetcher threads will only help in cases where
>>> you
>>>>>>> have a
>>>>>>>>>>>>>>>>>> large
>>>>>>>>>>>>>>>>>> amount of data coming into a broker and more replica
>>> fetcher
>>>>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker.
>>> But in
>>>>>>> your
>>>>>>>>>>>>>>>>>> case,
>>>>>>>>>>>>>>>>>> it looks that leader migration cause issue.
>>>>>>>>>>>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>>>>>>>>>>>> election?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>>> <mailto:
>>>>>>> kzakee1@netzero.net>
>>>>>>>>>>>>>>>>>> <mailto:kzakee1@netzero.net <mailto:kzakee1@netzero.net
>>>>>> 
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Yes, I do see under partitions usually shooting every
>>> hour.
>>>>>>>>>>>>>>>>>>> Anythings
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> I could try to reduce it?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync?
>>>>>>> Currently
>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>>>>>>>>>>>> <jqin@linkedin.com.invalid <mailto:
>>>>> jqin@linkedin.com.invalid
>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> These messages are usually caused by leader migration. I
>>>>>>> think as
>>>>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of
>>>>> under
>>>>>>>>>>>>>>>>>>>> replicated
>>>>>>>>>>>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net
>>>>> <mailto:
>>>>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Need to know if I should I be worried about this or
>>> ignore
>>>>>>> them.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker
>>>>> logs,
>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>>>>>>>>>>>> [TestTopic]
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR
>>>>> [ReplicaFetcherThread-3-5],
>>>>>>>>>>>>>>>>>>>>> Error
>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on
>>> Broker
>>>>>>> 2]:
>>>>>>>>>>>>>>>>>>>>> Fetch
>>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>> with correlation id 950084 from client
>>>>>>> ReplicaFetcherThread-1-2
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local
>>> for
>>>>>>>>>>>>>>>>>>>>> partition
>>>>>>>>>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>>>>>>>>>> 
>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>>>>>>>>>>>>>>>>>>>>> st0
>>>>>>>>>>>>>>>>>>>>> 3v
>>>>>>>>>>>>>>>>>>>>> uc
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample
>>>>> Workspace,
>>>>>>>>>>>>>>>>>>>> Free
>>>>>>>>>>>>>>>>>>>> WIFI
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>>>>>>>>>>>>>>>>>>> p02
>>>>>>>>>>>>>>>>>>>> du
>>>>>>>>>>>>>>>>>>>> c
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>> ____________________________________________________________
>>>>>>>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>>>>>>>>>>>> guaranteed.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
>>> <
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
>>>> 
>>>>>>>>>>>>>>>>>> uc
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> <
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>>>>>>>>>>>> duc
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>> The WORST exercise for aging
>>>>>>>>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10
>>> years
>>>>>>>>>>>>>>>> YOUNGER
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
>>>>> <
>>>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>>>>>>>>>>>> uc
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>> Seabourn Luxury Cruises
>>>>>>>>>>>>>> Receive special offers from the World&#39;s Finest Small-Ship
>>>>>>> Cruise
>>>>>>>>>>>>>> Line!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc
>>> <
>>>>>>> 
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> Discover Seabourn
>>>>>>>>>>> A journey as beautiful as the destination, request a brochure
>>> today!
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc
>>> <
>>>>>>> 
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Zakee
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> Want to place your ad here?
>>>>>>>>>> Advertise on United Online
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
>>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> What's your flood risk?
>>>>>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>>>>>>>> 
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc
>>>>>>> <
>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> -Regards,
>>>>>> Mayuresh R. Gharat
>>>>>> (862) 250-7125
>>>>>> ____________________________________________________________
>>>>>> What's your flood risk?
>>>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>>>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/55072125266de21244da8mp12duc
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Regards,
>>>> Mayuresh R. Gharat
>>>> (862) 250-7125
>>>> ____________________________________________________________
>>>> High School Yearbooks
>>>> View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
>>>> 
>>> http://thirdpartyoffers.netzero.net/TGL3255/5507e24f3050f624f0e4amp01duc
>>> 
>>> 
>> 
>> 
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>> 
> 
> 
> 
> -- 
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
> ____________________________________________________________
> What's your flood risk?
> Find flood maps, interactive tools, FAQs, and agents in your area.
> http://thirdpartyoffers.netzero.net/TGL3255/5508867f356467f4946mp08duc <http://thirdpartyoffers.netzero.net/TGL3255/5508867f356467f4946mp08duc>

Re: Broker Exceptions

Posted by Mayuresh Gharat <gh...@gmail.com>.

What version are you running ?

The code for latest version says that :

1) if the log end offset of the replica is greater than the leaders log end
offset, the replicas offset will be reset to logEndOffset of the leader.

2) Else if the log end offset of the replica is smaller than the leaders
log end offset and its out of range, the replicas offset will be reset to
logStartOffset of the leader.

Your case is 2). But the only thing weird is your replica (broker 3) is
requesting for offset which is greater than the leaders log end offset.

Thanks,

Mayuresh


On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat <
gharatmayuresh15@gmail.com> wrote:

> cool.
>
> On Tue, Mar 17, 2015 at 10:15 AM, Zakee <kz...@netzero.net> wrote:
>
>> Hi Mayuresh,
>>
>> The logs are already attached and are in reverse order starting backwards
>> from [2015-03-14 07:46:52,517] to the time when brokers were started.
>>
>> Thanks
>> Zakee
>>
>>
>>
>> > On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat <
>> gharatmayuresh15@gmail.com> wrote:
>> >
>> > Hi Zakee,
>> >
>> > Thanks for the logs. Can you paste earlier logs from broker-3 up to :
>> >
>> > [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
>> > offset 1754769769 for partition [Topic22kv,5] out of range; reset
>> > offset to 1400864851 (kafka.server.ReplicaFetcherThread)
>> >
>> > That would help us figure out what was happening on this broker before
>> it
>> > issued a replicaFetch request to broker-4.
>> >
>> > Thanks,
>> >
>> > Mayuresh
>> >
>> > On Mon, Mar 16, 2015 at 11:32 PM, Zakee <kz...@netzero.net> wrote:
>> >
>> >> Hi Mayuresh,
>> >>
>> >> Here are the logs.
>> >>
>> >> ____________________________________________________________
>> >> Old School Yearbook Pics
>> >> View Class Yearbooks Online Free. Search by School & Year. Look Now!
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
>> >>
>> >>
>> >> Thanks,
>> >> Kazim Zakee
>> >>
>> >>
>> >>
>> >>> On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat <
>> >> gharatmayuresh15@gmail.com> wrote:
>> >>>
>> >>> Can you provide more logs (complete) on Broker 3 till time :
>> >>>
>> >>> *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3
>> >> for
>> >>> partition [Topic22kv,5] reset its fetch offset from 1400864851 to
>> current
>> >>> leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
>> >>>
>> >>> I would like to see logs from time much before it sent the fetch
>> request
>> >> to
>> >>> Broker 4 to the time above. I want to check if in any case Broker 3
>> was a
>> >>> leader before broker 4 took over.
>> >>>
>> >>> Additional logs will help.
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Mayuresh
>> >>>
>> >>>
>> >>>
>> >>> On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kz...@netzero.net> wrote:
>> >>>
>> >>>> log.cleanup.policy is delete not compact.
>> >>>> log.cleaner.enable=true
>> >>>> log.cleaner.threads=5
>> >>>> log.cleanup.policy=delete
>> >>>> log.flush.scheduler.interval.ms=3000
>> >>>> log.retention.minutes=1440
>> >>>> log.segment.bytes=1073741824  (1gb)
>> >>>>
>> >>>> Messages are keyed but not compressed, producer async and uses kafka
>> >>>> default partitioner.
>> >>>> String message = msg.getString();
>> >>>> String uniqKey = ""+rnd.nextInt();// random key
>> >>>> String partKey = getPartitionKey();// partition key
>> >>>> KeyedMessage<String, String> data = new KeyedMessage<String,
>> >>>> String>(this.topicName, uniqKey, partKey, message);
>> >>>> producer.send(data);
>> >>>>
>> >>>> Thanks
>> >>>> Zakee
>> >>>>
>> >>>>
>> >>>>
>> >>>>> On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com wrote:
>> >>>>>
>> >>>>> Is your topic log compacted? Also if it is are the messages keyed?
>> Or
>> >>>> are the messages compressed?
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> Mayuresh
>> >>>>>
>> >>>>> Sent from my iPhone
>> >>>>>
>> >>>>>> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <mailto:
>> >>>> kzakee1@netzero.net>> wrote:
>> >>>>>>
>> >>>>>> Thanks, Jiangjie for helping resolve the kafka controller migration
>> >>>> driven partition leader rebalance issue. The logs are much cleaner
>> now.
>> >>>>>>
>> >>>>>> There are a few incidences of Out of range offset even though
>> there
>> >> is
>> >>>> no consumers running, only producers and replica fetchers. I was
>> trying
>> >> to
>> >>>> relate to a cause, looks like compaction (log segment deletion)
>> causing
>> >>>> this. Not sure whether this is expected behavior.
>> >>>>>>
>> >>>>>> Broker-4:
>> >>>>>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]:
>> Error
>> >>>> when processing fetch request for partition [Topic22kv,5] offset
>> >> 1754769769
>> >>>> from follower with correlation id 1645671. Possible cause: Request
>> for
>> >>>> offset 1754769769 but we only have log segments in the range
>> 1400864851
>> >> to
>> >>>> 1754769732. (kafka.server.ReplicaManager)
>> >>>>>>
>> >>>>>> Broker-3:
>> >>>>>> [2015-03-14 07:46:52,356] INFO The cleaning for partition
>> >> [Topic22kv,5]
>> >>>> is aborted and paused (kafka.log.LogCleaner)
>> >>>>>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851
>> for
>> >>>> log Topic22kv-5 for deletion. (kafka.log.Log)
>> >>>>>> …
>> >>>>>> [2015-03-14 07:46:52,421] INFO Compaction for partition
>> [Topic22kv,5]
>> >>>> is resumed (kafka.log.LogCleaner)
>> >>>>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
>> >>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
>> >> offset to
>> >>>> 1400864851 (kafka.server.ReplicaFetcherThread)
>> >>>>>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica
>> 3
>> >>>> for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
>> >>>> current leader 4's start offset 1400864851
>> >>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>
>> >>>>>> ____________________________________________________________
>> >>>>>> Old School Yearbook Pics
>> >>>>>> View Class Yearbooks Online Free. Search by School & Year. Look
>> Now!
>> >>>>>>
>> >>>>
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc
>> <
>> >>>>
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
>> >>>>>> <topic22kv_746a_314_logs.txt>
>> >>>>>>
>> >>>>>>
>> >>>>>> Thanks
>> >>>>>> Zakee
>> >>>>>>
>> >>>>>>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
>> >>>>>>>
>> >>>>>>> No broker restarts.
>> >>>>>>>
>> >>>>>>> Created a kafka issue:
>> >>>> https://issues.apache.org/jira/browse/KAFKA-2011 <
>> >>>> https://issues.apache.org/jira/browse/KAFKA-2011>
>> >>>>>>>
>> >>>>>>>>> Logs for rebalance:
>> >>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming
>> preferred
>> >>>> replica election for partitions: (kafka.controller.KafkaController)
>> >>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
>> >>>> completed preferred replica election:
>> (kafka.controller.KafkaController)
>> >>>>>>>>> …
>> >>>>>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming
>> preferred
>> >>>> replica election for partitions: (kafka.controller.KafkaController)
>> >>>>>>>>> ...
>> >>>>>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming
>> preferred
>> >>>> replica election for partitions: (kafka.controller.KafkaController)
>> >>>>>>>>> ...
>> >>>>>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting
>> preferred
>> >>>> replica leader election for partitions
>> >> (kafka.controller.KafkaController)
>> >>>>>>>>> ...
>> >>>>>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions
>> >> undergoing
>> >>>> preferred replica election:  (kafka.controller.KafkaController)
>> >>>>>>>>>
>> >>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
>> logs
>> >>>> since the restart. Is there any other reason than rebalance for these
>> >>>> errors?
>> >>>>>>>>>
>> >>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>> Error
>> >>>> for partition [Topic-11,7] to broker 5:class
>> >>>> kafka.common.NotLeaderForPartitionException
>> >>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>> Error
>> >>>> for partition [Topic-2,25] to broker 5:class
>> >>>> kafka.common.NotLeaderForPartitionException
>> >>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>> Error
>> >>>> for partition [Topic-2,21] to broker 5:class
>> >>>> kafka.common.NotLeaderForPartitionException
>> >>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>> Error
>> >>>> for partition [Topic-22,9] to broker 5:class
>> >>>> kafka.common.NotLeaderForPartitionException
>> >>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> Could you paste the related logs in controller.log?
>> >>>>>>> What specifically should I search for in the logs?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Zakee
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin
>> <jqin@linkedin.com.INVALID
>> >>>> <ma...@linkedin.com.INVALID>> wrote:
>> >>>>>>>>
>> >>>>>>>> Is there anything wrong with brokers around that time? E.g.
>> Broker
>> >>>> restart?
>> >>>>>>>> The log you pasted are actually from replica fetchers. Could you
>> >>>> paste the
>> >>>>>>>> related logs in controller.log?
>> >>>>>>>>
>> >>>>>>>> Thanks.
>> >>>>>>>>
>> >>>>>>>> Jiangjie (Becket) Qin
>> >>>>>>>>
>> >>>>>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <mailto:
>> >>>> kzakee1@netzero.net>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Correction: Actually  the rebalance happened quite until 24
>> hours
>> >>>> after
>> >>>>>>>>> the start, and thats where below errors were found. Ideally
>> >> rebalance
>> >>>>>>>>> should not have happened at all.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Thanks
>> >>>>>>>>> Zakee
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net
>> <mailto:
>> >>>> kzakee1@netzero.net>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
>> >>>> rebalance
>> >>>>>>>>>>> here?
>> >>>>>>>>>> Thanks for you suggestions.
>> >>>>>>>>>> It looks like the rebalance actually happened only once soon
>> >> after I
>> >>>>>>>>>> started with clean cluster and data was pushed, it didn’t
>> happen
>> >>>> again
>> >>>>>>>>>> so far, and I see the partitions leader counts on brokers did
>> not
>> >>>> change
>> >>>>>>>>>> since then. One of the brokers was constantly showing 0 for
>> >>>> partition
>> >>>>>>>>>> leader count. Is that normal?
>> >>>>>>>>>>
>> >>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
>> logs
>> >>>>>>>>>> since the restart. Is there any other reason than rebalance for
>> >>>> these
>> >>>>>>>>>> errors?
>> >>>>>>>>>>
>> >>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>> Error
>> >>>> for
>> >>>>>>>>>> partition [Topic-11,7] to broker 5:class
>> >>>>>>>>>> kafka.common.NotLeaderForPartitionException
>> >>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>> Error
>> >>>> for
>> >>>>>>>>>> partition [Topic-2,25] to broker 5:class
>> >>>>>>>>>> kafka.common.NotLeaderForPartitionException
>> >>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
>> Error
>> >>>> for
>> >>>>>>>>>> partition [Topic-2,21] to broker 5:class
>> >>>>>>>>>> kafka.common.NotLeaderForPartitionException
>> >>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
>> Error
>> >>>> for
>> >>>>>>>>>> partition [Topic-22,9] to broker 5:class
>> >>>>>>>>>> kafka.common.NotLeaderForPartitionException
>> >>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>>>
>> >>>>>>>>>>> Some other things to check are:
>> >>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
>> not
>> >>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
>> double
>> >>>>>>>>>>> confirm.
>> >>>>>>>>>> Yes
>> >>>>>>>>>>
>> >>>>>>>>>>> 2. In zookeeper path, can you verify
>> >>>> /admin/preferred_replica_election
>> >>>>>>>>>>> does not exist?
>> >>>>>>>>>> ls /admin
>> >>>>>>>>>> [delete_topics]
>> >>>>>>>>>> ls /admin/preferred_replica_election
>> >>>>>>>>>> Node does not exist: /admin/preferred_replica_election
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks
>> >>>>>>>>>> Zakee
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
>> >>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
>> >>>> rebalance
>> >>>>>>>>>>> here?
>> >>>>>>>>>>> Some other things to check are:
>> >>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
>> not
>> >>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
>> double
>> >>>>>>>>>>> confirm.
>> >>>>>>>>>>> 2. In zookeeper path, can you verify
>> >>>> /admin/preferred_replica_election
>> >>>>>>>>>>> does not exist?
>> >>>>>>>>>>>
>> >>>>>>>>>>> Jiangjie (Becket) Qin
>> >>>>>>>>>>>
>> >>>>>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <mailto:
>> >>>> kzakee1@netzero.net>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I started with  clean cluster and started to push data. It
>> still
>> >>>> does
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>> rebalance at random durations even though the
>> >>>> auto.leader.relabalance
>> >>>>>>>>>>>> is
>> >>>>>>>>>>>> set to false.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Thanks
>> >>>>>>>>>>>> Zakee
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
>> >>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Yes, the rebalance should not happen in that case. That is a
>> >>>> little
>> >>>>>>>>>>>>> bit
>> >>>>>>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
>> >>>>>>>>>>>>> auto.leader.election disabled and try push data?
>> >>>>>>>>>>>>> When leader migration occurs, NotLeaderForPartition
>> exception
>> >> is
>> >>>>>>>>>>>>> expected.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Jiangjie (Becket) Qin
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <mailto:
>> >>>> kzakee1@netzero.net>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting
>> >> preferred
>> >>>>>>>>>>>>>> replica
>> >>>>>>>>>>>>>> leader election for partitions” in logs. I also see lot of
>> >>>> Produce
>> >>>>>>>>>>>>>> request failure warnings in with the NotLeader Exception.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I tried switching off the auto.leader.relabalance to
>> false. I
>> >> am
>> >>>>>>>>>>>>>> still
>> >>>>>>>>>>>>>> noticing the rebalance happening. My understanding was the
>> >>>> rebalance
>> >>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>> not happen when this is set to false.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks
>> >>>>>>>>>>>>>> Zakee
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>> >>>>>>>>>>>>>>> <jqin@linkedin.com.INVALID <mailto:
>> jqin@linkedin.com.INVALID
>> >>>>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>> >>>>>>>>>>>>>>> Increasing
>> >>>>>>>>>>>>>>> number of fetcher threads will only help in cases where
>> you
>> >>>> have a
>> >>>>>>>>>>>>>>> large
>> >>>>>>>>>>>>>>> amount of data coming into a broker and more replica
>> fetcher
>> >>>>>>>>>>>>>>> threads
>> >>>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker.
>> But in
>> >>>> your
>> >>>>>>>>>>>>>>> case,
>> >>>>>>>>>>>>>>> it looks that leader migration cause issue.
>> >>>>>>>>>>>>>>> Do you see anything else in the log? Like preferred leader
>> >>>>>>>>>>>>>>> election?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>> <mailto:
>> >>>> kzakee1@netzero.net>
>> >>>>>>>>>>>>>>> <mailto:kzakee1@netzero.net <mailto:kzakee1@netzero.net
>> >>>
>> >>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Thanks, Jiangjie.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Yes, I do see under partitions usually shooting every
>> hour.
>> >>>>>>>>>>>>>>>> Anythings
>> >>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>> I could try to reduce it?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync?
>> >>>> Currently
>> >>>>>>>>>>>>>>>> have
>> >>>>>>>>>>>>>>>> configured 7 each of 5 brokers.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> -Zakee
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>> >>>>>>>>>>>>>>>> <jqin@linkedin.com.invalid <mailto:
>> >> jqin@linkedin.com.invalid
>> >>>>>>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> These messages are usually caused by leader migration. I
>> >>>> think as
>> >>>>>>>>>>>>>>>>> long
>> >>>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of
>> >> under
>> >>>>>>>>>>>>>>>>> replicated
>> >>>>>>>>>>>>>>>>> partitions, it should be fine.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net
>> >> <mailto:
>> >>>> kzakee1@netzero.net>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Need to know if I should I be worried about this or
>> ignore
>> >>>> them.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker
>> >> logs,
>> >>>> not
>> >>>>>>>>>>>>>>>>>> sure
>> >>>>>>>>>>>>>>>>> what
>> >>>>>>>>>>>>>>>>>> causes them and what could be done to fix them.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>> >>>>>>>>>>>>>>>>>> [TestTopic]
>> >>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>> broker
>> >>>>>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>> >>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR
>> >> [ReplicaFetcherThread-3-5],
>> >>>>>>>>>>>>>>>>>> Error
>> >>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
>> >>>>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>> >>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>> >>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on
>> Broker
>> >>>> 2]:
>> >>>>>>>>>>>>>>>>>> Fetch
>> >>>>>>>>>>>>>>>>>> request
>> >>>>>>>>>>>>>>>>>> with correlation id 950084 from client
>> >>>> ReplicaFetcherThread-1-2
>> >>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local
>> for
>> >>>>>>>>>>>>>>>>>> partition
>> >>>>>>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Any ideas?
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> -Zakee
>> >>>>>>>>>>>>>>>>>>
>> >> ____________________________________________________________
>> >>>>>>>>>>>>>>>>>> Next Apple Sensation
>> >>>>>>>>>>>>>>>>>> 1 little-known path to big profits
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <
>> >>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>> >>>>>>>>>>>>>>>>>> st0
>> >>>>>>>>>>>>>>>>>> 3v
>> >>>>>>>>>>>>>>>>>> uc
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >> ____________________________________________________________
>> >>>>>>>>>>>>>>>>> Extended Stay America
>> >>>>>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample
>> >> Workspace,
>> >>>>>>>>>>>>>>>>> Free
>> >>>>>>>>>>>>>>>>> WIFI
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <
>> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>> >>>>>>>>>>>>>>>>> p02
>> >>>>>>>>>>>>>>>>> du
>> >>>>>>>>>>>>>>>>> c
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> ____________________________________________________________
>> >>>>>>>>>>>>>>> Extended Stay America
>> >>>>>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>> >>>>>>>>>>>>>>> guaranteed.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
>> <
>> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
>> >
>> >>>>>>>>>>>>>>> uc
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> <
>> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>> >>>>>>>>>>>>>>> duc
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> ____________________________________________________________
>> >>>>>>>>>>>>> The WORST exercise for aging
>> >>>>>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10
>> years
>> >>>>>>>>>>>>> YOUNGER
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>
>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
>> >> <
>> >>>>
>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>> >>>>>>>>>>>>> uc
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> ____________________________________________________________
>> >>>>>>>>>>> Seabourn Luxury Cruises
>> >>>>>>>>>>> Receive special offers from the World&#39;s Finest Small-Ship
>> >>>> Cruise
>> >>>>>>>>>>> Line!
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc
>> <
>> >>>>
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> ____________________________________________________________
>> >>>>>>>> Discover Seabourn
>> >>>>>>>> A journey as beautiful as the destination, request a brochure
>> today!
>> >>>>>>>>
>> >>>>
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc
>> <
>> >>>>
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Thanks
>> >>>>>>> Zakee
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> ____________________________________________________________
>> >>>>>>> Want to place your ad here?
>> >>>>>>> Advertise on United Online
>> >>>>>>>
>> >>>>
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
>> >>>>>>
>> >>>>> ____________________________________________________________
>> >>>>> What's your flood risk?
>> >>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>> >>>>>
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc
>> >>>> <
>> >>
>> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> -Regards,
>> >>> Mayuresh R. Gharat
>> >>> (862) 250-7125
>> >>> ____________________________________________________________
>> >>> What's your flood risk?
>> >>> Find flood maps, interactive tools, FAQs, and agents in your area.
>> >>>
>> http://thirdpartyoffers.netzero.net/TGL3255/55072125266de21244da8mp12duc
>> >>
>> >> Thanks
>> >> Zakee
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > -Regards,
>> > Mayuresh R. Gharat
>> > (862) 250-7125
>> > ____________________________________________________________
>> > High School Yearbooks
>> > View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
>> >
>> http://thirdpartyoffers.netzero.net/TGL3255/5507e24f3050f624f0e4amp01duc
>>
>>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Broker Exceptions

Posted by Mayuresh Gharat <gh...@gmail.com>.

cool.

On Tue, Mar 17, 2015 at 10:15 AM, Zakee <kz...@netzero.net> wrote:

> Hi Mayuresh,
>
> The logs are already attached and are in reverse order starting backwards
> from [2015-03-14 07:46:52,517] to the time when brokers were started.
>
> Thanks
> Zakee
>
>
>
> > On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com> wrote:
> >
> > Hi Zakee,
> >
> > Thanks for the logs. Can you paste earlier logs from broker-3 up to :
> >
> > [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
> > offset 1754769769 for partition [Topic22kv,5] out of range; reset
> > offset to 1400864851 (kafka.server.ReplicaFetcherThread)
> >
> > That would help us figure out what was happening on this broker before it
> > issued a replicaFetch request to broker-4.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Mon, Mar 16, 2015 at 11:32 PM, Zakee <kz...@netzero.net> wrote:
> >
> >> Hi Mayuresh,
> >>
> >> Here are the logs.
> >>
> >> ____________________________________________________________
> >> Old School Yearbook Pics
> >> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> >>
> http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
> >>
> >>
> >> Thanks,
> >> Kazim Zakee
> >>
> >>
> >>
> >>> On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat <
> >> gharatmayuresh15@gmail.com> wrote:
> >>>
> >>> Can you provide more logs (complete) on Broker 3 till time :
> >>>
> >>> *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3
> >> for
> >>> partition [Topic22kv,5] reset its fetch offset from 1400864851 to
> current
> >>> leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
> >>>
> >>> I would like to see logs from time much before it sent the fetch
> request
> >> to
> >>> Broker 4 to the time above. I want to check if in any case Broker 3
> was a
> >>> leader before broker 4 took over.
> >>>
> >>> Additional logs will help.
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Mayuresh
> >>>
> >>>
> >>>
> >>> On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kz...@netzero.net> wrote:
> >>>
> >>>> log.cleanup.policy is delete not compact.
> >>>> log.cleaner.enable=true
> >>>> log.cleaner.threads=5
> >>>> log.cleanup.policy=delete
> >>>> log.flush.scheduler.interval.ms=3000
> >>>> log.retention.minutes=1440
> >>>> log.segment.bytes=1073741824  (1gb)
> >>>>
> >>>> Messages are keyed but not compressed, producer async and uses kafka
> >>>> default partitioner.
> >>>> String message = msg.getString();
> >>>> String uniqKey = ""+rnd.nextInt();// random key
> >>>> String partKey = getPartitionKey();// partition key
> >>>> KeyedMessage<String, String> data = new KeyedMessage<String,
> >>>> String>(this.topicName, uniqKey, partKey, message);
> >>>> producer.send(data);
> >>>>
> >>>> Thanks
> >>>> Zakee
> >>>>
> >>>>
> >>>>
> >>>>> On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com wrote:
> >>>>>
> >>>>> Is your topic log compacted? Also if it is are the messages keyed? Or
> >>>> are the messages compressed?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Mayuresh
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <mailto:
> >>>> kzakee1@netzero.net>> wrote:
> >>>>>>
> >>>>>> Thanks, Jiangjie for helping resolve the kafka controller migration
> >>>> driven partition leader rebalance issue. The logs are much cleaner
> now.
> >>>>>>
> >>>>>> There are a few incidences of Out of range offset even though  there
> >> is
> >>>> no consumers running, only producers and replica fetchers. I was
> trying
> >> to
> >>>> relate to a cause, looks like compaction (log segment deletion)
> causing
> >>>> this. Not sure whether this is expected behavior.
> >>>>>>
> >>>>>> Broker-4:
> >>>>>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error
> >>>> when processing fetch request for partition [Topic22kv,5] offset
> >> 1754769769
> >>>> from follower with correlation id 1645671. Possible cause: Request for
> >>>> offset 1754769769 but we only have log segments in the range
> 1400864851
> >> to
> >>>> 1754769732. (kafka.server.ReplicaManager)
> >>>>>>
> >>>>>> Broker-3:
> >>>>>> [2015-03-14 07:46:52,356] INFO The cleaning for partition
> >> [Topic22kv,5]
> >>>> is aborted and paused (kafka.log.LogCleaner)
> >>>>>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for
> >>>> log Topic22kv-5 for deletion. (kafka.log.Log)
> >>>>>> …
> >>>>>> [2015-03-14 07:46:52,421] INFO Compaction for partition
> [Topic22kv,5]
> >>>> is resumed (kafka.log.LogCleaner)
> >>>>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
> >>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
> >> offset to
> >>>> 1400864851 (kafka.server.ReplicaFetcherThread)
> >>>>>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3
> >>>> for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
> >>>> current leader 4's start offset 1400864851
> >>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>
> >>>>>> ____________________________________________________________
> >>>>>> Old School Yearbook Pics
> >>>>>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> >>>>>>
> >>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc <
> >>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
> >>>>>> <topic22kv_746a_314_logs.txt>
> >>>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Zakee
> >>>>>>
> >>>>>>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
> >>>>>>>
> >>>>>>> No broker restarts.
> >>>>>>>
> >>>>>>> Created a kafka issue:
> >>>> https://issues.apache.org/jira/browse/KAFKA-2011 <
> >>>> https://issues.apache.org/jira/browse/KAFKA-2011>
> >>>>>>>
> >>>>>>>>> Logs for rebalance:
> >>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred
> >>>> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
> >>>> completed preferred replica election:
> (kafka.controller.KafkaController)
> >>>>>>>>> …
> >>>>>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred
> >>>> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>>>> ...
> >>>>>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred
> >>>> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>>>> ...
> >>>>>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred
> >>>> replica leader election for partitions
> >> (kafka.controller.KafkaController)
> >>>>>>>>> ...
> >>>>>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions
> >> undergoing
> >>>> preferred replica election:  (kafka.controller.KafkaController)
> >>>>>>>>>
> >>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
> logs
> >>>> since the restart. Is there any other reason than rebalance for these
> >>>> errors?
> >>>>>>>>>
> >>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> >>>> for partition [Topic-11,7] to broker 5:class
> >>>> kafka.common.NotLeaderForPartitionException
> >>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> >>>> for partition [Topic-2,25] to broker 5:class
> >>>> kafka.common.NotLeaderForPartitionException
> >>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> >>>> for partition [Topic-2,21] to broker 5:class
> >>>> kafka.common.NotLeaderForPartitionException
> >>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> >>>> for partition [Topic-22,9] to broker 5:class
> >>>> kafka.common.NotLeaderForPartitionException
> >>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>
> >>>>>>>
> >>>>>>>> Could you paste the related logs in controller.log?
> >>>>>>> What specifically should I search for in the logs?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Zakee
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin
> <jqin@linkedin.com.INVALID
> >>>> <ma...@linkedin.com.INVALID>> wrote:
> >>>>>>>>
> >>>>>>>> Is there anything wrong with brokers around that time? E.g. Broker
> >>>> restart?
> >>>>>>>> The log you pasted are actually from replica fetchers. Could you
> >>>> paste the
> >>>>>>>> related logs in controller.log?
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>
> >>>>>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <mailto:
> >>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>
> >>>>>>>>> Correction: Actually  the rebalance happened quite until 24 hours
> >>>> after
> >>>>>>>>> the start, and thats where below errors were found. Ideally
> >> rebalance
> >>>>>>>>> should not have happened at all.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Zakee
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net
> <mailto:
> >>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> >>>> rebalance
> >>>>>>>>>>> here?
> >>>>>>>>>> Thanks for you suggestions.
> >>>>>>>>>> It looks like the rebalance actually happened only once soon
> >> after I
> >>>>>>>>>> started with clean cluster and data was pushed, it didn’t happen
> >>>> again
> >>>>>>>>>> so far, and I see the partitions leader counts on brokers did
> not
> >>>> change
> >>>>>>>>>> since then. One of the brokers was constantly showing 0 for
> >>>> partition
> >>>>>>>>>> leader count. Is that normal?
> >>>>>>>>>>
> >>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the
> logs
> >>>>>>>>>> since the restart. Is there any other reason than rebalance for
> >>>> these
> >>>>>>>>>> errors?
> >>>>>>>>>>
> >>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
> Error
> >>>> for
> >>>>>>>>>> partition [Topic-11,7] to broker 5:class
> >>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
> Error
> >>>> for
> >>>>>>>>>> partition [Topic-2,25] to broker 5:class
> >>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5],
> Error
> >>>> for
> >>>>>>>>>> partition [Topic-2,21] to broker 5:class
> >>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5],
> Error
> >>>> for
> >>>>>>>>>> partition [Topic-22,9] to broker 5:class
> >>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>
> >>>>>>>>>>> Some other things to check are:
> >>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
> not
> >>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
> double
> >>>>>>>>>>> confirm.
> >>>>>>>>>> Yes
> >>>>>>>>>>
> >>>>>>>>>>> 2. In zookeeper path, can you verify
> >>>> /admin/preferred_replica_election
> >>>>>>>>>>> does not exist?
> >>>>>>>>>> ls /admin
> >>>>>>>>>> [delete_topics]
> >>>>>>>>>> ls /admin/preferred_replica_election
> >>>>>>>>>> Node does not exist: /admin/preferred_replica_election
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Zakee
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
> >>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> >>>> rebalance
> >>>>>>>>>>> here?
> >>>>>>>>>>> Some other things to check are:
> >>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable,
> not
> >>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to
> double
> >>>>>>>>>>> confirm.
> >>>>>>>>>>> 2. In zookeeper path, can you verify
> >>>> /admin/preferred_replica_election
> >>>>>>>>>>> does not exist?
> >>>>>>>>>>>
> >>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>
> >>>>>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <mailto:
> >>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I started with  clean cluster and started to push data. It
> still
> >>>> does
> >>>>>>>>>>>> the
> >>>>>>>>>>>> rebalance at random durations even though the
> >>>> auto.leader.relabalance
> >>>>>>>>>>>> is
> >>>>>>>>>>>> set to false.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Zakee
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
> >>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Yes, the rebalance should not happen in that case. That is a
> >>>> little
> >>>>>>>>>>>>> bit
> >>>>>>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
> >>>>>>>>>>>>> auto.leader.election disabled and try push data?
> >>>>>>>>>>>>> When leader migration occurs, NotLeaderForPartition exception
> >> is
> >>>>>>>>>>>>> expected.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <mailto:
> >>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting
> >> preferred
> >>>>>>>>>>>>>> replica
> >>>>>>>>>>>>>> leader election for partitions” in logs. I also see lot of
> >>>> Produce
> >>>>>>>>>>>>>> request failure warnings in with the NotLeader Exception.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I tried switching off the auto.leader.relabalance to false.
> I
> >> am
> >>>>>>>>>>>>>> still
> >>>>>>>>>>>>>> noticing the rebalance happening. My understanding was the
> >>>> rebalance
> >>>>>>>>>>>>>> will
> >>>>>>>>>>>>>> not happen when this is set to false.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>> Zakee
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
> >>>>>>>>>>>>>>> <jqin@linkedin.com.INVALID <mailto:
> jqin@linkedin.com.INVALID
> >>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
> >>>>>>>>>>>>>>> Increasing
> >>>>>>>>>>>>>>> number of fetcher threads will only help in cases where you
> >>>> have a
> >>>>>>>>>>>>>>> large
> >>>>>>>>>>>>>>> amount of data coming into a broker and more replica
> fetcher
> >>>>>>>>>>>>>>> threads
> >>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But
> in
> >>>> your
> >>>>>>>>>>>>>>> case,
> >>>>>>>>>>>>>>> it looks that leader migration cause issue.
> >>>>>>>>>>>>>>> Do you see anything else in the log? Like preferred leader
> >>>>>>>>>>>>>>> election?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <mailto:
> >>>> kzakee1@netzero.net>
> >>>>>>>>>>>>>>> <mailto:kzakee1@netzero.net <ma...@netzero.net>>>
> >>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks, Jiangjie.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Yes, I do see under partitions usually shooting every
> hour.
> >>>>>>>>>>>>>>>> Anythings
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> I could try to reduce it?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync?
> >>>> Currently
> >>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>> configured 7 each of 5 brokers.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
> >>>>>>>>>>>>>>>> <jqin@linkedin.com.invalid <mailto:
> >> jqin@linkedin.com.invalid
> >>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> These messages are usually caused by leader migration. I
> >>>> think as
> >>>>>>>>>>>>>>>>> long
> >>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of
> >> under
> >>>>>>>>>>>>>>>>> replicated
> >>>>>>>>>>>>>>>>> partitions, it should be fine.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net
> >> <mailto:
> >>>> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Need to know if I should I be worried about this or
> ignore
> >>>> them.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker
> >> logs,
> >>>> not
> >>>>>>>>>>>>>>>>>> sure
> >>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>> causes them and what could be done to fix them.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
> >>>>>>>>>>>>>>>>>> [TestTopic]
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> broker
> >>>>>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR
> >> [ReplicaFetcherThread-3-5],
> >>>>>>>>>>>>>>>>>> Error
> >>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
> >>>>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on
> Broker
> >>>> 2]:
> >>>>>>>>>>>>>>>>>> Fetch
> >>>>>>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>>>> with correlation id 950084 from client
> >>>> ReplicaFetcherThread-1-2
> >>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local
> for
> >>>>>>>>>>>>>>>>>> partition
> >>>>>>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Any ideas?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>>>>>>>
> >> ____________________________________________________________
> >>>>>>>>>>>>>>>>>> Next Apple Sensation
> >>>>>>>>>>>>>>>>>> 1 little-known path to big profits
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <
> >>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
> >>>>>>>>>>>>>>>>>> st0
> >>>>>>>>>>>>>>>>>> 3v
> >>>>>>>>>>>>>>>>>> uc
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >> ____________________________________________________________
> >>>>>>>>>>>>>>>>> Extended Stay America
> >>>>>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample
> >> Workspace,
> >>>>>>>>>>>>>>>>> Free
> >>>>>>>>>>>>>>>>> WIFI
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <
> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
> >>>>>>>>>>>>>>>>> p02
> >>>>>>>>>>>>>>>>> du
> >>>>>>>>>>>>>>>>> c
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> ____________________________________________________________
> >>>>>>>>>>>>>>> Extended Stay America
> >>>>>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
> >>>>>>>>>>>>>>> guaranteed.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
> <
> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
> >>>>>>>>>>>>>>> uc
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> <
> >>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
> >>>>>>>>>>>>>>> duc
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>>>> The WORST exercise for aging
> >>>>>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10
> years
> >>>>>>>>>>>>> YOUNGER
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
> >> <
> >>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
> >>>>>>>>>>>>> uc
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>> Seabourn Luxury Cruises
> >>>>>>>>>>> Receive special offers from the World&#39;s Finest Small-Ship
> >>>> Cruise
> >>>>>>>>>>> Line!
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <
> >>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ____________________________________________________________
> >>>>>>>> Discover Seabourn
> >>>>>>>> A journey as beautiful as the destination, request a brochure
> today!
> >>>>>>>>
> >>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <
> >>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Zakee
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ____________________________________________________________
> >>>>>>> Want to place your ad here?
> >>>>>>> Advertise on United Online
> >>>>>>>
> >>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
> >>>>>>
> >>>>> ____________________________________________________________
> >>>>> What's your flood risk?
> >>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
> >>>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc
> >>>> <
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -Regards,
> >>> Mayuresh R. Gharat
> >>> (862) 250-7125
> >>> ____________________________________________________________
> >>> What's your flood risk?
> >>> Find flood maps, interactive tools, FAQs, and agents in your area.
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/55072125266de21244da8mp12duc
> >>
> >> Thanks
> >> Zakee
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> > ____________________________________________________________
> > High School Yearbooks
> > View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
> > http://thirdpartyoffers.netzero.net/TGL3255/5507e24f3050f624f0e4amp01duc
>
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

Hi Mayuresh,

The logs are already attached and are in reverse order starting backwards from [2015-03-14 07:46:52,517] to the time when brokers were started.

Thanks
Zakee



> On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat <gh...@gmail.com> wrote:
> 
> Hi Zakee,
> 
> Thanks for the logs. Can you paste earlier logs from broker-3 up to :
> 
> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
> offset 1754769769 for partition [Topic22kv,5] out of range; reset
> offset to 1400864851 (kafka.server.ReplicaFetcherThread)
> 
> That would help us figure out what was happening on this broker before it
> issued a replicaFetch request to broker-4.
> 
> Thanks,
> 
> Mayuresh
> 
> On Mon, Mar 16, 2015 at 11:32 PM, Zakee <kz...@netzero.net> wrote:
> 
>> Hi Mayuresh,
>> 
>> Here are the logs.
>> 
>> ____________________________________________________________
>> Old School Yearbook Pics
>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
>> http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
>> 
>> 
>> Thanks,
>> Kazim Zakee
>> 
>> 
>> 
>>> On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat <
>> gharatmayuresh15@gmail.com> wrote:
>>> 
>>> Can you provide more logs (complete) on Broker 3 till time :
>>> 
>>> *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3
>> for
>>> partition [Topic22kv,5] reset its fetch offset from 1400864851 to current
>>> leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
>>> 
>>> I would like to see logs from time much before it sent the fetch request
>> to
>>> Broker 4 to the time above. I want to check if in any case Broker 3 was a
>>> leader before broker 4 took over.
>>> 
>>> Additional logs will help.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Mayuresh
>>> 
>>> 
>>> 
>>> On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kz...@netzero.net> wrote:
>>> 
>>>> log.cleanup.policy is delete not compact.
>>>> log.cleaner.enable=true
>>>> log.cleaner.threads=5
>>>> log.cleanup.policy=delete
>>>> log.flush.scheduler.interval.ms=3000
>>>> log.retention.minutes=1440
>>>> log.segment.bytes=1073741824  (1gb)
>>>> 
>>>> Messages are keyed but not compressed, producer async and uses kafka
>>>> default partitioner.
>>>> String message = msg.getString();
>>>> String uniqKey = ""+rnd.nextInt();// random key
>>>> String partKey = getPartitionKey();// partition key
>>>> KeyedMessage<String, String> data = new KeyedMessage<String,
>>>> String>(this.topicName, uniqKey, partKey, message);
>>>> producer.send(data);
>>>> 
>>>> Thanks
>>>> Zakee
>>>> 
>>>> 
>>>> 
>>>>> On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com wrote:
>>>>> 
>>>>> Is your topic log compacted? Also if it is are the messages keyed? Or
>>>> are the messages compressed?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Mayuresh
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <mailto:
>>>> kzakee1@netzero.net>> wrote:
>>>>>> 
>>>>>> Thanks, Jiangjie for helping resolve the kafka controller migration
>>>> driven partition leader rebalance issue. The logs are much cleaner now.
>>>>>> 
>>>>>> There are a few incidences of Out of range offset even though  there
>> is
>>>> no consumers running, only producers and replica fetchers. I was trying
>> to
>>>> relate to a cause, looks like compaction (log segment deletion) causing
>>>> this. Not sure whether this is expected behavior.
>>>>>> 
>>>>>> Broker-4:
>>>>>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error
>>>> when processing fetch request for partition [Topic22kv,5] offset
>> 1754769769
>>>> from follower with correlation id 1645671. Possible cause: Request for
>>>> offset 1754769769 but we only have log segments in the range 1400864851
>> to
>>>> 1754769732. (kafka.server.ReplicaManager)
>>>>>> 
>>>>>> Broker-3:
>>>>>> [2015-03-14 07:46:52,356] INFO The cleaning for partition
>> [Topic22kv,5]
>>>> is aborted and paused (kafka.log.LogCleaner)
>>>>>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for
>>>> log Topic22kv-5 for deletion. (kafka.log.Log)
>>>>>> …
>>>>>> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5]
>>>> is resumed (kafka.log.LogCleaner)
>>>>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
>>>> offset 1754769769 for partition [Topic22kv,5] out of range; reset
>> offset to
>>>> 1400864851 (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3
>>>> for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
>>>> current leader 4's start offset 1400864851
>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> Old School Yearbook Pics
>>>>>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
>>>>>> 
>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc <
>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
>>>>>> <topic22kv_746a_314_logs.txt>
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> Zakee
>>>>>> 
>>>>>>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
>>>>>>> 
>>>>>>> No broker restarts.
>>>>>>> 
>>>>>>> Created a kafka issue:
>>>> https://issues.apache.org/jira/browse/KAFKA-2011 <
>>>> https://issues.apache.org/jira/browse/KAFKA-2011>
>>>>>>> 
>>>>>>>>> Logs for rebalance:
>>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred
>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
>>>> completed preferred replica election: (kafka.controller.KafkaController)
>>>>>>>>> …
>>>>>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred
>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>> ...
>>>>>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred
>>>> replica election for partitions: (kafka.controller.KafkaController)
>>>>>>>>> ...
>>>>>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred
>>>> replica leader election for partitions
>> (kafka.controller.KafkaController)
>>>>>>>>> ...
>>>>>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions
>> undergoing
>>>> preferred replica election:  (kafka.controller.KafkaController)
>>>>>>>>> 
>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the logs
>>>> since the restart. Is there any other reason than rebalance for these
>>>> errors?
>>>>>>>>> 
>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
>>>> for partition [Topic-11,7] to broker 5:class
>>>> kafka.common.NotLeaderForPartitionException
>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
>>>> for partition [Topic-2,25] to broker 5:class
>>>> kafka.common.NotLeaderForPartitionException
>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
>>>> for partition [Topic-2,21] to broker 5:class
>>>> kafka.common.NotLeaderForPartitionException
>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
>>>> for partition [Topic-22,9] to broker 5:class
>>>> kafka.common.NotLeaderForPartitionException
>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>> 
>>>>>>> 
>>>>>>>> Could you paste the related logs in controller.log?
>>>>>>> What specifically should I search for in the logs?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <jqin@linkedin.com.INVALID
>>>> <ma...@linkedin.com.INVALID>> wrote:
>>>>>>>> 
>>>>>>>> Is there anything wrong with brokers around that time? E.g. Broker
>>>> restart?
>>>>>>>> The log you pasted are actually from replica fetchers. Could you
>>>> paste the
>>>>>>>> related logs in controller.log?
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <mailto:
>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>> 
>>>>>>>>> Correction: Actually  the rebalance happened quite until 24 hours
>>>> after
>>>>>>>>> the start, and thats where below errors were found. Ideally
>> rebalance
>>>>>>>>> should not have happened at all.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Zakee
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net <mailto:
>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
>>>> rebalance
>>>>>>>>>>> here?
>>>>>>>>>> Thanks for you suggestions.
>>>>>>>>>> It looks like the rebalance actually happened only once soon
>> after I
>>>>>>>>>> started with clean cluster and data was pushed, it didn’t happen
>>>> again
>>>>>>>>>> so far, and I see the partitions leader counts on brokers did not
>>>> change
>>>>>>>>>> since then. One of the brokers was constantly showing 0 for
>>>> partition
>>>>>>>>>> leader count. Is that normal?
>>>>>>>>>> 
>>>>>>>>>> Also, I still see lots of below errors (~69k) going on in the logs
>>>>>>>>>> since the restart. Is there any other reason than rebalance for
>>>> these
>>>>>>>>>> errors?
>>>>>>>>>> 
>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
>>>> for
>>>>>>>>>> partition [Topic-11,7] to broker 5:class
>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
>>>> for
>>>>>>>>>> partition [Topic-2,25] to broker 5:class
>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
>>>> for
>>>>>>>>>> partition [Topic-2,21] to broker 5:class
>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
>>>> for
>>>>>>>>>> partition [Topic-22,9] to broker 5:class
>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>> 
>>>>>>>>>>> Some other things to check are:
>>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>>>>>>> confirm.
>>>>>>>>>> Yes
>>>>>>>>>> 
>>>>>>>>>>> 2. In zookeeper path, can you verify
>>>> /admin/preferred_replica_election
>>>>>>>>>>> does not exist?
>>>>>>>>>> ls /admin
>>>>>>>>>> [delete_topics]
>>>>>>>>>> ls /admin/preferred_replica_election
>>>>>>>>>> Node does not exist: /admin/preferred_replica_election
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Zakee
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
>>>> rebalance
>>>>>>>>>>> here?
>>>>>>>>>>> Some other things to check are:
>>>>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>>>>>>> confirm.
>>>>>>>>>>> 2. In zookeeper path, can you verify
>>>> /admin/preferred_replica_election
>>>>>>>>>>> does not exist?
>>>>>>>>>>> 
>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>> 
>>>>>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <mailto:
>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I started with  clean cluster and started to push data. It still
>>>> does
>>>>>>>>>>>> the
>>>>>>>>>>>> rebalance at random durations even though the
>>>> auto.leader.relabalance
>>>>>>>>>>>> is
>>>>>>>>>>>> set to false.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Zakee
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, the rebalance should not happen in that case. That is a
>>>> little
>>>>>>>>>>>>> bit
>>>>>>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>>>>>>>>> auto.leader.election disabled and try push data?
>>>>>>>>>>>>> When leader migration occurs, NotLeaderForPartition exception
>> is
>>>>>>>>>>>>> expected.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <mailto:
>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting
>> preferred
>>>>>>>>>>>>>> replica
>>>>>>>>>>>>>> leader election for partitions” in logs. I also see lot of
>>>> Produce
>>>>>>>>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I
>> am
>>>>>>>>>>>>>> still
>>>>>>>>>>>>>> noticing the rebalance happening. My understanding was the
>>>> rebalance
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>> not happen when this is set to false.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Zakee
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>>>>>>>>> <jqin@linkedin.com.INVALID <mailto:jqin@linkedin.com.INVALID
>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>>>>>>>>> Increasing
>>>>>>>>>>>>>>> number of fetcher threads will only help in cases where you
>>>> have a
>>>>>>>>>>>>>>> large
>>>>>>>>>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in
>>>> your
>>>>>>>>>>>>>>> case,
>>>>>>>>>>>>>>> it looks that leader migration cause issue.
>>>>>>>>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>>>>>>>>> election?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <mailto:
>>>> kzakee1@netzero.net>
>>>>>>>>>>>>>>> <mailto:kzakee1@netzero.net <ma...@netzero.net>>>
>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>>>>>>>>> Anythings
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> I could try to reduce it?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync?
>>>> Currently
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>>>>>>>>> <jqin@linkedin.com.invalid <mailto:
>> jqin@linkedin.com.invalid
>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> These messages are usually caused by leader migration. I
>>>> think as
>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of
>> under
>>>>>>>>>>>>>>>>> replicated
>>>>>>>>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net
>> <mailto:
>>>> kzakee1@netzero.net>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore
>>>> them.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker
>> logs,
>>>> not
>>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>>>>>>>>> [TestTopic]
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR
>> [ReplicaFetcherThread-3-5],
>>>>>>>>>>>>>>>>>> Error
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker
>>>> 2]:
>>>>>>>>>>>>>>>>>> Fetch
>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>> with correlation id 950084 from client
>>>> ReplicaFetcherThread-1-2
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>>>>>>>>>> partition
>>>>>>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>>>>>>> 
>> ____________________________________________________________
>>>>>>>>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <
>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>>>>>>>>>>>>>>>>>> st0
>>>>>>>>>>>>>>>>>> 3v
>>>>>>>>>>>>>>>>>> uc
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>> ____________________________________________________________
>>>>>>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample
>> Workspace,
>>>>>>>>>>>>>>>>> Free
>>>>>>>>>>>>>>>>> WIFI
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>>>>>>>>>>>>>>>> p02
>>>>>>>>>>>>>>>>> du
>>>>>>>>>>>>>>>>> c
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>>>>>>>>> guaranteed.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>>>>>>>>>>>> uc
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> <
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>>>>>>>>> duc
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>> The WORST exercise for aging
>>>>>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>>>>>>>>> YOUNGER
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
>> <
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>>>>>>>>> uc
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> Seabourn Luxury Cruises
>>>>>>>>>>> Receive special offers from the World&#39;s Finest Small-Ship
>>>> Cruise
>>>>>>>>>>> Line!
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <
>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> Discover Seabourn
>>>>>>>> A journey as beautiful as the destination, request a brochure today!
>>>>>>>> 
>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <
>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ____________________________________________________________
>>>>>>> Want to place your ad here?
>>>>>>> Advertise on United Online
>>>>>>> 
>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
>>>>>> 
>>>>> ____________________________________________________________
>>>>> What's your flood risk?
>>>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>>>>> 
>> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc
>>>> <
>> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> -Regards,
>>> Mayuresh R. Gharat
>>> (862) 250-7125
>>> ____________________________________________________________
>>> What's your flood risk?
>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>>> http://thirdpartyoffers.netzero.net/TGL3255/55072125266de21244da8mp12duc
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
> ____________________________________________________________
> High School Yearbooks
> View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
> http://thirdpartyoffers.netzero.net/TGL3255/5507e24f3050f624f0e4amp01duc

Re: Broker Exceptions

Posted by Mayuresh Gharat <gh...@gmail.com>.

Hi Zakee,

Thanks for the logs. Can you paste earlier logs from broker-3 up to :

[2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
offset 1754769769 for partition [Topic22kv,5] out of range; reset
offset to 1400864851 (kafka.server.ReplicaFetcherThread)

That would help us figure out what was happening on this broker before it
issued a replicaFetch request to broker-4.

Thanks,

Mayuresh

On Mon, Mar 16, 2015 at 11:32 PM, Zakee <kz...@netzero.net> wrote:

> Hi Mayuresh,
>
> Here are the logs.
>
> ____________________________________________________________
> Old School Yearbook Pics
> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
>
>
> Thanks,
> Kazim Zakee
>
>
>
> > On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com> wrote:
> >
> > Can you provide more logs (complete) on Broker 3 till time :
> >
> > *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3
> for
> > partition [Topic22kv,5] reset its fetch offset from 1400864851 to current
> > leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
> >
> > I would like to see logs from time much before it sent the fetch request
> to
> > Broker 4 to the time above. I want to check if in any case Broker 3 was a
> > leader before broker 4 took over.
> >
> > Additional logs will help.
> >
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> >
> > On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kz...@netzero.net> wrote:
> >
> >> log.cleanup.policy is delete not compact.
> >> log.cleaner.enable=true
> >> log.cleaner.threads=5
> >> log.cleanup.policy=delete
> >> log.flush.scheduler.interval.ms=3000
> >> log.retention.minutes=1440
> >> log.segment.bytes=1073741824  (1gb)
> >>
> >> Messages are keyed but not compressed, producer async and uses kafka
> >> default partitioner.
> >> String message = msg.getString();
> >> String uniqKey = ""+rnd.nextInt();// random key
> >> String partKey = getPartitionKey();// partition key
> >> KeyedMessage<String, String> data = new KeyedMessage<String,
> >> String>(this.topicName, uniqKey, partKey, message);
> >> producer.send(data);
> >>
> >> Thanks
> >> Zakee
> >>
> >>
> >>
> >>> On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com wrote:
> >>>
> >>> Is your topic log compacted? Also if it is are the messages keyed? Or
> >> are the messages compressed?
> >>>
> >>> Thanks,
> >>>
> >>> Mayuresh
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <mailto:
> >> kzakee1@netzero.net>> wrote:
> >>>>
> >>>> Thanks, Jiangjie for helping resolve the kafka controller migration
> >> driven partition leader rebalance issue. The logs are much cleaner now.
> >>>>
> >>>> There are a few incidences of Out of range offset even though  there
> is
> >> no consumers running, only producers and replica fetchers. I was trying
> to
> >> relate to a cause, looks like compaction (log segment deletion) causing
> >> this. Not sure whether this is expected behavior.
> >>>>
> >>>> Broker-4:
> >>>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error
> >> when processing fetch request for partition [Topic22kv,5] offset
> 1754769769
> >> from follower with correlation id 1645671. Possible cause: Request for
> >> offset 1754769769 but we only have log segments in the range 1400864851
> to
> >> 1754769732. (kafka.server.ReplicaManager)
> >>>>
> >>>> Broker-3:
> >>>> [2015-03-14 07:46:52,356] INFO The cleaning for partition
> [Topic22kv,5]
> >> is aborted and paused (kafka.log.LogCleaner)
> >>>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for
> >> log Topic22kv-5 for deletion. (kafka.log.Log)
> >>>> …
> >>>> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5]
> >> is resumed (kafka.log.LogCleaner)
> >>>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
> >> offset 1754769769 for partition [Topic22kv,5] out of range; reset
> offset to
> >> 1400864851 (kafka.server.ReplicaFetcherThread)
> >>>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3
> >> for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
> >> current leader 4's start offset 1400864851
> >> (kafka.server.ReplicaFetcherThread)
> >>>>
> >>>> ____________________________________________________________
> >>>> Old School Yearbook Pics
> >>>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> >>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc <
> >>
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
> >>>> <topic22kv_746a_314_logs.txt>
> >>>>
> >>>>
> >>>> Thanks
> >>>> Zakee
> >>>>
> >>>>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
> >>>>>
> >>>>> No broker restarts.
> >>>>>
> >>>>> Created a kafka issue:
> >> https://issues.apache.org/jira/browse/KAFKA-2011 <
> >> https://issues.apache.org/jira/browse/KAFKA-2011>
> >>>>>
> >>>>>>> Logs for rebalance:
> >>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred
> >> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
> >> completed preferred replica election: (kafka.controller.KafkaController)
> >>>>>>> …
> >>>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred
> >> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>> ...
> >>>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred
> >> replica election for partitions: (kafka.controller.KafkaController)
> >>>>>>> ...
> >>>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred
> >> replica leader election for partitions
> (kafka.controller.KafkaController)
> >>>>>>> ...
> >>>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions
> undergoing
> >> preferred replica election:  (kafka.controller.KafkaController)
> >>>>>>>
> >>>>>>> Also, I still see lots of below errors (~69k) going on in the logs
> >> since the restart. Is there any other reason than rebalance for these
> >> errors?
> >>>>>>>
> >>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> >> for partition [Topic-11,7] to broker 5:class
> >> kafka.common.NotLeaderForPartitionException
> >> (kafka.server.ReplicaFetcherThread)
> >>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> >> for partition [Topic-2,25] to broker 5:class
> >> kafka.common.NotLeaderForPartitionException
> >> (kafka.server.ReplicaFetcherThread)
> >>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> >> for partition [Topic-2,21] to broker 5:class
> >> kafka.common.NotLeaderForPartitionException
> >> (kafka.server.ReplicaFetcherThread)
> >>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> >> for partition [Topic-22,9] to broker 5:class
> >> kafka.common.NotLeaderForPartitionException
> >> (kafka.server.ReplicaFetcherThread)
> >>>>>
> >>>>>
> >>>>>> Could you paste the related logs in controller.log?
> >>>>> What specifically should I search for in the logs?
> >>>>>
> >>>>> Thanks,
> >>>>> Zakee
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <jqin@linkedin.com.INVALID
> >> <ma...@linkedin.com.INVALID>> wrote:
> >>>>>>
> >>>>>> Is there anything wrong with brokers around that time? E.g. Broker
> >> restart?
> >>>>>> The log you pasted are actually from replica fetchers. Could you
> >> paste the
> >>>>>> related logs in controller.log?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Jiangjie (Becket) Qin
> >>>>>>
> >>>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <mailto:
> >> kzakee1@netzero.net>> wrote:
> >>>>>>>
> >>>>>>> Correction: Actually  the rebalance happened quite until 24 hours
> >> after
> >>>>>>> the start, and thats where below errors were found. Ideally
> rebalance
> >>>>>>> should not have happened at all.
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Zakee
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net <mailto:
> >> kzakee1@netzero.net>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> >> rebalance
> >>>>>>>>> here?
> >>>>>>>> Thanks for you suggestions.
> >>>>>>>> It looks like the rebalance actually happened only once soon
> after I
> >>>>>>>> started with clean cluster and data was pushed, it didn’t happen
> >> again
> >>>>>>>> so far, and I see the partitions leader counts on brokers did not
> >> change
> >>>>>>>> since then. One of the brokers was constantly showing 0 for
> >> partition
> >>>>>>>> leader count. Is that normal?
> >>>>>>>>
> >>>>>>>> Also, I still see lots of below errors (~69k) going on in the logs
> >>>>>>>> since the restart. Is there any other reason than rebalance for
> >> these
> >>>>>>>> errors?
> >>>>>>>>
> >>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> >> for
> >>>>>>>> partition [Topic-11,7] to broker 5:class
> >>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> >> for
> >>>>>>>> partition [Topic-2,25] to broker 5:class
> >>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> >> for
> >>>>>>>> partition [Topic-2,21] to broker 5:class
> >>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> >> for
> >>>>>>>> partition [Topic-22,9] to broker 5:class
> >>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>
> >>>>>>>>> Some other things to check are:
> >>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
> >>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
> >>>>>>>>> confirm.
> >>>>>>>> Yes
> >>>>>>>>
> >>>>>>>>> 2. In zookeeper path, can you verify
> >> /admin/preferred_replica_election
> >>>>>>>>> does not exist?
> >>>>>>>> ls /admin
> >>>>>>>> [delete_topics]
> >>>>>>>> ls /admin/preferred_replica_election
> >>>>>>>> Node does not exist: /admin/preferred_replica_election
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Zakee
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
> >> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> >> rebalance
> >>>>>>>>> here?
> >>>>>>>>> Some other things to check are:
> >>>>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
> >>>>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
> >>>>>>>>> confirm.
> >>>>>>>>> 2. In zookeeper path, can you verify
> >> /admin/preferred_replica_election
> >>>>>>>>> does not exist?
> >>>>>>>>>
> >>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>
> >>>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <mailto:
> >> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I started with  clean cluster and started to push data. It still
> >> does
> >>>>>>>>>> the
> >>>>>>>>>> rebalance at random durations even though the
> >> auto.leader.relabalance
> >>>>>>>>>> is
> >>>>>>>>>> set to false.
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Zakee
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
> >> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, the rebalance should not happen in that case. That is a
> >> little
> >>>>>>>>>>> bit
> >>>>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
> >>>>>>>>>>> auto.leader.election disabled and try push data?
> >>>>>>>>>>> When leader migration occurs, NotLeaderForPartition exception
> is
> >>>>>>>>>>> expected.
> >>>>>>>>>>>
> >>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <mailto:
> >> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting
> preferred
> >>>>>>>>>>>> replica
> >>>>>>>>>>>> leader election for partitions” in logs. I also see lot of
> >> Produce
> >>>>>>>>>>>> request failure warnings in with the NotLeader Exception.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I
> am
> >>>>>>>>>>>> still
> >>>>>>>>>>>> noticing the rebalance happening. My understanding was the
> >> rebalance
> >>>>>>>>>>>> will
> >>>>>>>>>>>> not happen when this is set to false.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Zakee
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
> >>>>>>>>>>>>> <jqin@linkedin.com.INVALID <mailto:jqin@linkedin.com.INVALID
> >>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
> >>>>>>>>>>>>> Increasing
> >>>>>>>>>>>>> number of fetcher threads will only help in cases where you
> >> have a
> >>>>>>>>>>>>> large
> >>>>>>>>>>>>> amount of data coming into a broker and more replica fetcher
> >>>>>>>>>>>>> threads
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in
> >> your
> >>>>>>>>>>>>> case,
> >>>>>>>>>>>>> it looks that leader migration cause issue.
> >>>>>>>>>>>>> Do you see anything else in the log? Like preferred leader
> >>>>>>>>>>>>> election?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <mailto:
> >> kzakee1@netzero.net>
> >>>>>>>>>>>>> <mailto:kzakee1@netzero.net <ma...@netzero.net>>>
> >> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks, Jiangjie.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
> >>>>>>>>>>>>>> Anythings
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>> I could try to reduce it?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync?
> >> Currently
> >>>>>>>>>>>>>> have
> >>>>>>>>>>>>>> configured 7 each of 5 brokers.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
> >>>>>>>>>>>>>> <jqin@linkedin.com.invalid <mailto:
> jqin@linkedin.com.invalid
> >>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> These messages are usually caused by leader migration. I
> >> think as
> >>>>>>>>>>>>>>> long
> >>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of
> under
> >>>>>>>>>>>>>>> replicated
> >>>>>>>>>>>>>>> partitions, it should be fine.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net
> <mailto:
> >> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore
> >> them.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker
> logs,
> >> not
> >>>>>>>>>>>>>>>> sure
> >>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>> causes them and what could be done to fix them.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
> >>>>>>>>>>>>>>>> [TestTopic]
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> broker
> >>>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR
> [ReplicaFetcherThread-3-5],
> >>>>>>>>>>>>>>>> Error
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
> >>>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker
> >> 2]:
> >>>>>>>>>>>>>>>> Fetch
> >>>>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>> with correlation id 950084 from client
> >> ReplicaFetcherThread-1-2
> >>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
> >>>>>>>>>>>>>>>> partition
> >>>>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Any ideas?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>>>>>
> ____________________________________________________________
> >>>>>>>>>>>>>>>> Next Apple Sensation
> >>>>>>>>>>>>>>>> 1 little-known path to big profits
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <
> >> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
> >>>>>>>>>>>>>>>> st0
> >>>>>>>>>>>>>>>> 3v
> >>>>>>>>>>>>>>>> uc
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> ____________________________________________________________
> >>>>>>>>>>>>>>> Extended Stay America
> >>>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample
> Workspace,
> >>>>>>>>>>>>>>> Free
> >>>>>>>>>>>>>>> WIFI
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <
> >> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
> >>>>>>>>>>>>>>> p02
> >>>>>>>>>>>>>>> du
> >>>>>>>>>>>>>>> c
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>>>> Extended Stay America
> >>>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
> >>>>>>>>>>>>> guaranteed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <
> >> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
> >>>>>>>>>>>>> uc
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> <
> >> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
> >>>>>>>>>>>>> duc
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>> The WORST exercise for aging
> >>>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
> >>>>>>>>>>> YOUNGER
> >>>>>>>>>>>
> >>>>>>>>>>>
> >> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
> <
> >> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
> >>>>>>>>>>> uc
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ____________________________________________________________
> >>>>>>>>> Seabourn Luxury Cruises
> >>>>>>>>> Receive special offers from the World&#39;s Finest Small-Ship
> >> Cruise
> >>>>>>>>> Line!
> >>>>>>>>>
> >>>>>>>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
> >>>>>>
> >>>>>>
> >>>>>> ____________________________________________________________
> >>>>>> Discover Seabourn
> >>>>>> A journey as beautiful as the destination, request a brochure today!
> >>>>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
> >>>>>
> >>>>>
> >>>>> Thanks
> >>>>> Zakee
> >>>>>
> >>>>>
> >>>>>
> >>>>> ____________________________________________________________
> >>>>> Want to place your ad here?
> >>>>> Advertise on United Online
> >>>>>
> >>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
> >>>>
> >>> ____________________________________________________________
> >>> What's your flood risk?
> >>> Find flood maps, interactive tools, FAQs, and agents in your area.
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc
> >> <
> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
> >>
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> > ____________________________________________________________
> > What's your flood risk?
> > Find flood maps, interactive tools, FAQs, and agents in your area.
> > http://thirdpartyoffers.netzero.net/TGL3255/55072125266de21244da8mp12duc
>
> Thanks
> Zakee
>
>
>
>
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

Hi Mayuresh,

Here are the logs.

____________________________________________________________
Old School Yearbook Pics
View Class Yearbooks Online Free. Search by School & Year. Look Now!
http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc

Re: Broker Exceptions

Posted by Kazim Zakee <ka...@apple.com>.

Hi Mayuresh,

Here are the logs.

Re: Broker Exceptions

Posted by Mayuresh Gharat <gh...@gmail.com>.

Can you provide more logs (complete) on Broker 3 till time :

*[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for
partition [Topic22kv,5] reset its fetch offset from 1400864851 to current
leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)

I would like to see logs from time much before it sent the fetch request to
Broker 4 to the time above. I want to check if in any case Broker 3 was a
leader before broker 4 took over.

Additional logs will help.


Thanks,

Mayuresh



On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kz...@netzero.net> wrote:

> log.cleanup.policy is delete not compact.
> log.cleaner.enable=true
> log.cleaner.threads=5
> log.cleanup.policy=delete
> log.flush.scheduler.interval.ms=3000
> log.retention.minutes=1440
> log.segment.bytes=1073741824  (1gb)
>
> Messages are keyed but not compressed, producer async and uses kafka
> default partitioner.
> String message = msg.getString();
> String uniqKey = ""+rnd.nextInt();// random key
> String partKey = getPartitionKey();// partition key
> KeyedMessage<String, String> data = new KeyedMessage<String,
> String>(this.topicName, uniqKey, partKey, message);
> producer.send(data);
>
> Thanks
> Zakee
>
>
>
> > On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com wrote:
> >
> > Is your topic log compacted? Also if it is are the messages keyed? Or
> are the messages compressed?
> >
> > Thanks,
> >
> > Mayuresh
> >
> > Sent from my iPhone
> >
> >> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <mailto:
> kzakee1@netzero.net>> wrote:
> >>
> >> Thanks, Jiangjie for helping resolve the kafka controller migration
> driven partition leader rebalance issue. The logs are much cleaner now.
> >>
> >> There are a few incidences of Out of range offset even though  there is
> no consumers running, only producers and replica fetchers. I was trying to
> relate to a cause, looks like compaction (log segment deletion) causing
> this. Not sure whether this is expected behavior.
> >>
> >> Broker-4:
> >> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error
> when processing fetch request for partition [Topic22kv,5] offset 1754769769
> from follower with correlation id 1645671. Possible cause: Request for
> offset 1754769769 but we only have log segments in the range 1400864851 to
> 1754769732. (kafka.server.ReplicaManager)
> >>
> >> Broker-3:
> >> [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5]
> is aborted and paused (kafka.log.LogCleaner)
> >> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for
> log Topic22kv-5 for deletion. (kafka.log.Log)
> >> …
> >> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5]
> is resumed (kafka.log.LogCleaner)
> >> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
> offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to
> 1400864851 (kafka.server.ReplicaFetcherThread)
> >> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3
> for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
> current leader 4's start offset 1400864851
> (kafka.server.ReplicaFetcherThread)
> >>
> >> ____________________________________________________________
> >> Old School Yearbook Pics
> >> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> >>
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc <
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
> >> <topic22kv_746a_314_logs.txt>
> >>
> >>
> >> Thanks
> >> Zakee
> >>
> >>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
> >>>
> >>> No broker restarts.
> >>>
> >>> Created a kafka issue:
> https://issues.apache.org/jira/browse/KAFKA-2011 <
> https://issues.apache.org/jira/browse/KAFKA-2011>
> >>>
> >>>>> Logs for rebalance:
> >>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred
> replica election for partitions: (kafka.controller.KafkaController)
> >>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
> completed preferred replica election: (kafka.controller.KafkaController)
> >>>>> …
> >>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred
> replica election for partitions: (kafka.controller.KafkaController)
> >>>>> ...
> >>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred
> replica election for partitions: (kafka.controller.KafkaController)
> >>>>> ...
> >>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred
> replica leader election for partitions (kafka.controller.KafkaController)
> >>>>> ...
> >>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing
> preferred replica election:  (kafka.controller.KafkaController)
> >>>>>
> >>>>> Also, I still see lots of below errors (~69k) going on in the logs
> since the restart. Is there any other reason than rebalance for these
> errors?
> >>>>>
> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> for partition [Topic-11,7] to broker 5:class
> kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> for partition [Topic-2,25] to broker 5:class
> kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> for partition [Topic-2,21] to broker 5:class
> kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> for partition [Topic-22,9] to broker 5:class
> kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
> >>>
> >>>
> >>>> Could you paste the related logs in controller.log?
> >>> What specifically should I search for in the logs?
> >>>
> >>> Thanks,
> >>> Zakee
> >>>
> >>>
> >>>
> >>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <jqin@linkedin.com.INVALID
> <ma...@linkedin.com.INVALID>> wrote:
> >>>>
> >>>> Is there anything wrong with brokers around that time? E.g. Broker
> restart?
> >>>> The log you pasted are actually from replica fetchers. Could you
> paste the
> >>>> related logs in controller.log?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> Jiangjie (Becket) Qin
> >>>>
> >>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <mailto:
> kzakee1@netzero.net>> wrote:
> >>>>>
> >>>>> Correction: Actually  the rebalance happened quite until 24 hours
> after
> >>>>> the start, and thats where below errors were found. Ideally rebalance
> >>>>> should not have happened at all.
> >>>>>
> >>>>>
> >>>>> Thanks
> >>>>> Zakee
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net <mailto:
> kzakee1@netzero.net>> wrote:
> >>>>>>>
> >>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> rebalance
> >>>>>>> here?
> >>>>>> Thanks for you suggestions.
> >>>>>> It looks like the rebalance actually happened only once soon after I
> >>>>>> started with clean cluster and data was pushed, it didn’t happen
> again
> >>>>>> so far, and I see the partitions leader counts on brokers did not
> change
> >>>>>> since then. One of the brokers was constantly showing 0 for
> partition
> >>>>>> leader count. Is that normal?
> >>>>>>
> >>>>>> Also, I still see lots of below errors (~69k) going on in the logs
> >>>>>> since the restart. Is there any other reason than rebalance for
> these
> >>>>>> errors?
> >>>>>>
> >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> for
> >>>>>> partition [Topic-11,7] to broker 5:class
> >>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> for
> >>>>>> partition [Topic-2,25] to broker 5:class
> >>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> for
> >>>>>> partition [Topic-2,21] to broker 5:class
> >>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> for
> >>>>>> partition [Topic-22,9] to broker 5:class
> >>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>
> >>>>>>> Some other things to check are:
> >>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
> >>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
> >>>>>>> confirm.
> >>>>>> Yes
> >>>>>>
> >>>>>>> 2. In zookeeper path, can you verify
> /admin/preferred_replica_election
> >>>>>>> does not exist?
> >>>>>> ls /admin
> >>>>>> [delete_topics]
> >>>>>> ls /admin/preferred_replica_election
> >>>>>> Node does not exist: /admin/preferred_replica_election
> >>>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Zakee
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> rebalance
> >>>>>>> here?
> >>>>>>> Some other things to check are:
> >>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
> >>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
> >>>>>>> confirm.
> >>>>>>> 2. In zookeeper path, can you verify
> /admin/preferred_replica_election
> >>>>>>> does not exist?
> >>>>>>>
> >>>>>>> Jiangjie (Becket) Qin
> >>>>>>>
> >>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <mailto:
> kzakee1@netzero.net>> wrote:
> >>>>>>>>
> >>>>>>>> I started with  clean cluster and started to push data. It still
> does
> >>>>>>>> the
> >>>>>>>> rebalance at random durations even though the
> auto.leader.relabalance
> >>>>>>>> is
> >>>>>>>> set to false.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Zakee
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Yes, the rebalance should not happen in that case. That is a
> little
> >>>>>>>>> bit
> >>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
> >>>>>>>>> auto.leader.election disabled and try push data?
> >>>>>>>>> When leader migration occurs, NotLeaderForPartition exception is
> >>>>>>>>> expected.
> >>>>>>>>>
> >>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <mailto:
> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
> >>>>>>>>>> replica
> >>>>>>>>>> leader election for partitions” in logs. I also see lot of
> Produce
> >>>>>>>>>> request failure warnings in with the NotLeader Exception.
> >>>>>>>>>>
> >>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
> >>>>>>>>>> still
> >>>>>>>>>> noticing the rebalance happening. My understanding was the
> rebalance
> >>>>>>>>>> will
> >>>>>>>>>> not happen when this is set to false.
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Zakee
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
> >>>>>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
> >>>>>>>>>>> Increasing
> >>>>>>>>>>> number of fetcher threads will only help in cases where you
> have a
> >>>>>>>>>>> large
> >>>>>>>>>>> amount of data coming into a broker and more replica fetcher
> >>>>>>>>>>> threads
> >>>>>>>>>>> will
> >>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in
> your
> >>>>>>>>>>> case,
> >>>>>>>>>>> it looks that leader migration cause issue.
> >>>>>>>>>>> Do you see anything else in the log? Like preferred leader
> >>>>>>>>>>> election?
> >>>>>>>>>>>
> >>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>
> >>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <mailto:
> kzakee1@netzero.net>
> >>>>>>>>>>> <mailto:kzakee1@netzero.net <ma...@netzero.net>>>
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks, Jiangjie.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
> >>>>>>>>>>>> Anythings
> >>>>>>>>>>>> that
> >>>>>>>>>>>> I could try to reduce it?
> >>>>>>>>>>>>
> >>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync?
> Currently
> >>>>>>>>>>>> have
> >>>>>>>>>>>> configured 7 each of 5 brokers.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
> >>>>>>>>>>>> <jqin@linkedin.com.invalid <mailto:jqin@linkedin.com.invalid
> >>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> These messages are usually caused by leader migration. I
> think as
> >>>>>>>>>>>>> long
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
> >>>>>>>>>>>>> replicated
> >>>>>>>>>>>>> partitions, it should be fine.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net <mailto:
> kzakee1@netzero.net>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore
> them.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs,
> not
> >>>>>>>>>>>>>> sure
> >>>>>>>>>>>>> what
> >>>>>>>>>>>>>> causes them and what could be done to fix them.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
> >>>>>>>>>>>>>> [TestTopic]
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>> broker
> >>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
> >>>>>>>>>>>>>> Error
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
> >>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker
> 2]:
> >>>>>>>>>>>>>> Fetch
> >>>>>>>>>>>>>> request
> >>>>>>>>>>>>>> with correlation id 950084 from client
> ReplicaFetcherThread-1-2
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
> >>>>>>>>>>>>>> partition
> >>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Any ideas?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>>>>> Next Apple Sensation
> >>>>>>>>>>>>>> 1 little-known path to big profits
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <
> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
> >>>>>>>>>>>>>> st0
> >>>>>>>>>>>>>> 3v
> >>>>>>>>>>>>>> uc
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>>>> Extended Stay America
> >>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
> >>>>>>>>>>>>> Free
> >>>>>>>>>>>>> WIFI
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <
> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
> >>>>>>>>>>>>> p02
> >>>>>>>>>>>>> du
> >>>>>>>>>>>>> c
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>> Extended Stay America
> >>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
> >>>>>>>>>>> guaranteed.
> >>>>>>>>>>>
> >>>>>>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
> >>>>>>>>>>> uc
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> <
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
> >>>>>>>>>>> duc
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ____________________________________________________________
> >>>>>>>>> The WORST exercise for aging
> >>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
> >>>>>>>>> YOUNGER
> >>>>>>>>>
> >>>>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d <
> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
> >>>>>>>>> uc
> >>>>>>>
> >>>>>>>
> >>>>>>> ____________________________________________________________
> >>>>>>> Seabourn Luxury Cruises
> >>>>>>> Receive special offers from the World&#39;s Finest Small-Ship
> Cruise
> >>>>>>> Line!
> >>>>>>>
> >>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
> >>>>
> >>>>
> >>>> ____________________________________________________________
> >>>> Discover Seabourn
> >>>> A journey as beautiful as the destination, request a brochure today!
> >>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
> >>>
> >>>
> >>> Thanks
> >>> Zakee
> >>>
> >>>
> >>>
> >>> ____________________________________________________________
> >>> Want to place your ad here?
> >>> Advertise on United Online
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
> >>
> > ____________________________________________________________
> > What's your flood risk?
> > Find flood maps, interactive tools, FAQs, and agents in your area.
> > http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc
> <http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

log.cleanup.policy is delete not compact. 
log.cleaner.enable=true
log.cleaner.threads=5
log.cleanup.policy=delete
log.flush.scheduler.interval.ms=3000
log.retention.minutes=1440
log.segment.bytes=1073741824  (1gb)

Messages are keyed but not compressed, producer async and uses kafka default partitioner.
String message = msg.getString();
String uniqKey = ""+rnd.nextInt();// random key
String partKey = getPartitionKey();// partition key
KeyedMessage<String, String> data = new KeyedMessage<String, String>(this.topicName, uniqKey, partKey, message);
producer.send(data);

Thanks
Zakee



> On Mar 14, 2015, at 4:23 PM, gharatmayuresh15@gmail.com wrote:
> 
> Is your topic log compacted? Also if it is are the messages keyed? Or are the messages compressed?
> 
> Thanks,
> 
> Mayuresh
> 
> Sent from my iPhone
> 
>> On Mar 14, 2015, at 2:02 PM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>> 
>> Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. 
>> 
>> There are a few incidences of Out of range offset even though  there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior.
>> 
>> Broker-4:
>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager)
>> 
>> Broker-3:
>> [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner)
>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log)
>> …
>> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is resumed (kafka.log.LogCleaner)
>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread)
>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
>> 
>> ____________________________________________________________
>> Old School Yearbook Pics
>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc <http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
>> <topic22kv_746a_314_logs.txt>
>> 
>> 
>> Thanks
>> Zakee
>> 
>>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
>>> 
>>> No broker restarts.
>>> 
>>> Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 <https://issues.apache.org/jira/browse/KAFKA-2011>
>>> 
>>>>> Logs for rebalance:
>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController)
>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election: (kafka.controller.KafkaController)
>>>>> …
>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController)
>>>>> ...
>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController)
>>>>> ...
>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions (kafka.controller.KafkaController)
>>>>> ...
>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election:  (kafka.controller.KafkaController)
>>>>> 
>>>>> Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors?
>>>>> 
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>>> 
>>> 
>>>> Could you paste the related logs in controller.log?
>>> What specifically should I search for in the logs?
>>> 
>>> Thanks,
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>> wrote:
>>>> 
>>>> Is there anything wrong with brokers around that time? E.g. Broker restart?
>>>> The log you pasted are actually from replica fetchers. Could you paste the
>>>> related logs in controller.log?
>>>> 
>>>> Thanks.
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>> 
>>>>> Correction: Actually  the rebalance happened quite until 24 hours after
>>>>> the start, and thats where below errors were found. Ideally rebalance
>>>>> should not have happened at all.
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>> 
>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>>>>> here?
>>>>>> Thanks for you suggestions.
>>>>>> It looks like the rebalance actually happened only once soon after I
>>>>>> started with clean cluster and data was pushed, it didn’t happen again
>>>>>> so far, and I see the partitions leader counts on brokers did not change
>>>>>> since then. One of the brokers was constantly showing 0 for partition
>>>>>> leader count. Is that normal?
>>>>>> 
>>>>>> Also, I still see lots of below errors (~69k) going on in the logs
>>>>>> since the restart. Is there any other reason than rebalance for these
>>>>>> errors?
>>>>>> 
>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>>>>> partition [Topic-11,7] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>>>>> partition [Topic-2,25] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>>>>> partition [Topic-2,21] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>>>>> partition [Topic-22,9] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> 
>>>>>>> Some other things to check are:
>>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>>> confirm.
>>>>>> Yes 
>>>>>> 
>>>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>>>>> does not exist?
>>>>>> ls /admin
>>>>>> [delete_topics]
>>>>>> ls /admin/preferred_replica_election
>>>>>> Node does not exist: /admin/preferred_replica_election
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> Zakee
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>>>>> here?
>>>>>>> Some other things to check are:
>>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>>> confirm.
>>>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>>>>> does not exist?
>>>>>>> 
>>>>>>> Jiangjie (Becket) Qin
>>>>>>> 
>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>>> 
>>>>>>>> I started with  clean cluster and started to push data. It still does
>>>>>>>> the
>>>>>>>> rebalance at random durations even though the auto.leader.relabalance
>>>>>>>> is
>>>>>>>> set to false.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Zakee
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Yes, the rebalance should not happen in that case. That is a little
>>>>>>>>> bit
>>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>>>>> auto.leader.election disabled and try push data?
>>>>>>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>>>>>>> expected.
>>>>>>>>> 
>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>>>>>>> replica
>>>>>>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>>>>> 
>>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
>>>>>>>>>> still
>>>>>>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>>>>>>> will
>>>>>>>>>> not happen when this is set to false.
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Zakee
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>>>>> Increasing
>>>>>>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>>>>>>> large
>>>>>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>>>>> threads
>>>>>>>>>>> will
>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>>>>>>> case,
>>>>>>>>>>> it looks that leader migration cause issue.
>>>>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>>>>> election?
>>>>>>>>>>> 
>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>> 
>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>
>>>>>>>>>>> <mailto:kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>>>>> Anythings
>>>>>>>>>>>> that
>>>>>>>>>>>> I could try to reduce it?
>>>>>>>>>>>> 
>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>>>>>>> have
>>>>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Zakee
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>>>>> <jqin@linkedin.com.invalid <ma...@linkedin.com.invalid>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>>>>>>> long
>>>>>>>>>>>>> as
>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>>>>>>> replicated
>>>>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>>>>>>> sure
>>>>>>>>>>>>> what
>>>>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>>>>> [TestTopic]
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
>>>>>>>>>>>>>> Error
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]:
>>>>>>>>>>>>>> Fetch
>>>>>>>>>>>>>> request
>>>>>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>>>>>> partition
>>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>>>>>>>>>>>>>> st0
>>>>>>>>>>>>>> 3v
>>>>>>>>>>>>>> uc
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
>>>>>>>>>>>>> Free
>>>>>>>>>>>>> WIFI
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>>>>>>>>>>>> p02
>>>>>>>>>>>>> du
>>>>>>>>>>>>> c
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> Extended Stay America
>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>>>>> guaranteed.
>>>>>>>>>>> 
>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>>>>>>>> uc
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>>>>> duc
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ____________________________________________________________
>>>>>>>>> The WORST exercise for aging
>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>>>>> YOUNGER
>>>>>>>>> 
>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>>>>> uc
>>>>>>> 
>>>>>>> 
>>>>>>> ____________________________________________________________
>>>>>>> Seabourn Luxury Cruises
>>>>>>> Receive special offers from the World&#39;s Finest Small-Ship Cruise
>>>>>>> Line!
>>>>>>> 
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> Discover Seabourn
>>>> A journey as beautiful as the destination, request a brochure today!
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
>>> 
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>> ____________________________________________________________
>>> Want to place your ad here?
>>> Advertise on United Online
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
>> 
> ____________________________________________________________
> What's your flood risk?
> Find flood maps, interactive tools, FAQs, and agents in your area.
> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc <http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>

Re: Broker Exceptions

Posted by gh...@gmail.com.

Is your topic log compacted? Also if it is are the messages keyed? Or are the messages compressed?

Thanks,

Mayuresh

Sent from my iPhone

> On Mar 14, 2015, at 2:02 PM, Zakee <kz...@netzero.net> wrote:
> 
> Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. 
> 
> There are a few incidences of Out of range offset even though  there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior.
> 
> Broker-4:
> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager)
> 
> Broker-3:
> [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner)
> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log)
> …
> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is resumed (kafka.log.LogCleaner)
> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread)
> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
> 
> ____________________________________________________________
> Old School Yearbook Pics
> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc
> <topic22kv_746a_314_logs.txt>
> 
> 
> Thanks
> Zakee
> 
>> On Mar 9, 2015, at 12:18 PM, Zakee <kz...@netzero.net> wrote:
>> 
>> No broker restarts.
>> 
>> Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 <https://issues.apache.org/jira/browse/KAFKA-2011>
>> 
>>>> Logs for rebalance:
>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election:  (kafka.controller.KafkaController)
>>>> …
>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>>>> ...
>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>>>> ...
>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions  (kafka.controller.KafkaController)
>>>> ...
>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election:  (kafka.controller.KafkaController)
>>>> 
>>>> Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors?
>>>> 
>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> 
>> 
>>> Could you paste the related logs in controller.log?
>> What specifically should I search for in the logs?
>> 
>> Thanks,
>> Zakee
>> 
>> 
>> 
>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>> wrote:
>>> 
>>> Is there anything wrong with brokers around that time? E.g. Broker restart?
>>> The log you pasted are actually from replica fetchers. Could you paste the
>>> related logs in controller.log?
>>> 
>>> Thanks.
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>>> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>> 
>>>> Correction: Actually  the rebalance happened quite until 24 hours after
>>>> the start, and thats where below errors were found. Ideally rebalance
>>>> should not have happened at all.
>>>> 
>>>> 
>>>> Thanks
>>>> Zakee
>>>> 
>>>> 
>>>> 
>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>> 
>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>>>> here?
>>>>> Thanks for you suggestions.
>>>>> It looks like the rebalance actually happened only once soon after I
>>>>> started with clean cluster and data was pushed, it didn’t happen again
>>>>> so far, and I see the partitions leader counts on brokers did not change
>>>>> since then. One of the brokers was constantly showing 0 for partition
>>>>> leader count. Is that normal?
>>>>> 
>>>>> Also, I still see lots of below errors (~69k) going on in the logs
>>>>> since the restart. Is there any other reason than rebalance for these
>>>>> errors?
>>>>> 
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>>>> partition [Topic-11,7] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>>>> partition [Topic-2,25] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>>>> partition [Topic-2,21] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>>>> partition [Topic-22,9] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> 
>>>>>> Some other things to check are:
>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>> confirm.
>>>>> Yes 
>>>>> 
>>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>>>> does not exist?
>>>>> ls /admin
>>>>> [delete_topics]
>>>>> ls /admin/preferred_replica_election
>>>>> Node does not exist: /admin/preferred_replica_election
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>> wrote:
>>>>>> 
>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>>>> here?
>>>>>> Some other things to check are:
>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>> confirm.
>>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>>>> does not exist?
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>> 
>>>>>>> I started with  clean cluster and started to push data. It still does
>>>>>>> the
>>>>>>> rebalance at random durations even though the auto.leader.relabalance
>>>>>>> is
>>>>>>> set to false.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Yes, the rebalance should not happen in that case. That is a little
>>>>>>>> bit
>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>>>> auto.leader.election disabled and try push data?
>>>>>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>>>>>> expected.
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>>>> 
>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>>>>>> replica
>>>>>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>>>> 
>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
>>>>>>>>> still
>>>>>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>>>>>> will
>>>>>>>>> not happen when this is set to false.
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Zakee
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>>>> Increasing
>>>>>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>>>>>> large
>>>>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>>>> threads
>>>>>>>>>> will
>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>>>>>> case,
>>>>>>>>>> it looks that leader migration cause issue.
>>>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>>>> election?
>>>>>>>>>> 
>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>> 
>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>
>>>>>>>>>> <mailto:kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>>>> 
>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>>>> Anythings
>>>>>>>>>>> that
>>>>>>>>>>> I could try to reduce it?
>>>>>>>>>>> 
>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>>>>>> have
>>>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>>>> 
>>>>>>>>>>> -Zakee
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>>>> <jqin@linkedin.com.invalid <ma...@linkedin.com.invalid>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>>>>>> long
>>>>>>>>>>>> as
>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>>>>>> replicated
>>>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>>>> 
>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>>>>>> sure
>>>>>>>>>>>> what
>>>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>>>> [TestTopic]
>>>>>>>>>>>>> to
>>>>>>>>>>>>> broker
>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
>>>>>>>>>>>>> Error
>>>>>>>>>>>>> for
>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]:
>>>>>>>>>>>>> Fetch
>>>>>>>>>>>>> request
>>>>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2
>>>>>>>>>>>>> on
>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>>>>> partition
>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>>>>>>>>>>>>> st0
>>>>>>>>>>>>> 3v
>>>>>>>>>>>>> uc
>>>>>>>>>>>> 
>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
>>>>>>>>>>>> Free
>>>>>>>>>>>> WIFI
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>>>>>>>>>>> p02
>>>>>>>>>>>> du
>>>>>>>>>>>> c
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> Extended Stay America
>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>>>> guaranteed.
>>>>>>>>>> 
>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>>>>>>> uc
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>>>> duc
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> The WORST exercise for aging
>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>>>> YOUNGER
>>>>>>>> 
>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>>>> uc
>>>>>> 
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> Seabourn Luxury Cruises
>>>>>> Receive special offers from the World&#39;s Finest Small-Ship Cruise
>>>>>> Line!
>>>>>> 
>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>> 
>>> 
>>> ____________________________________________________________
>>> Discover Seabourn
>>> A journey as beautiful as the destination, request a brochure today!
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>> ____________________________________________________________
>> Want to place your ad here?
>> Advertise on United Online
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
>

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. 

There are a few incidences of Out of range offset even though  there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior.

Broker-4:
[2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager)

Broker-3:
[2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner)
[2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log)
…
[2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is resumed (kafka.log.LogCleaner)
[2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread)
[2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)

____________________________________________________________
Old School Yearbook Pics
View Class Yearbooks Online Free. Search by School & Year. Look Now!
http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

No broker restarts.

Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 <https://issues.apache.org/jira/browse/KAFKA-2011>

>> Logs for rebalance:
>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election:  (kafka.controller.KafkaController)
>> …
>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> ...
>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
>> ...
>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions  (kafka.controller.KafkaController)
>> ...
>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election:  (kafka.controller.KafkaController)
>> 
>> Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors?
>> 
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)


>  Could you paste the related logs in controller.log?
What specifically should I search for in the logs?

Thanks,
Zakee



> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>> wrote:
> 
> Is there anything wrong with brokers around that time? E.g. Broker restart?
> The log you pasted are actually from replica fetchers. Could you paste the
> related logs in controller.log?
> 
> Thanks.
> 
> Jiangjie (Becket) Qin
> 
> On 3/9/15, 10:32 AM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
> 
>> Correction: Actually  the rebalance happened quite until 24 hours after
>> the start, and thats where below errors were found. Ideally rebalance
>> should not have happened at all.
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>> 
>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>> here?
>>> Thanks for you suggestions.
>>> It looks like the rebalance actually happened only once soon after I
>>> started with clean cluster and data was pushed, it didn’t happen again
>>> so far, and I see the partitions leader counts on brokers did not change
>>> since then. One of the brokers was constantly showing 0 for partition
>>> leader count. Is that normal?
>>> 
>>> Also, I still see lots of below errors (~69k) going on in the logs
>>> since the restart. Is there any other reason than rebalance for these
>>> errors?
>>> 
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>> partition [Topic-11,7] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>> partition [Topic-2,25] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>> partition [Topic-2,21] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>> partition [Topic-22,9] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> 
>>>> Some other things to check are:
>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>> confirm.
>>> Yes 
>>> 
>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>> does not exist?
>>> ls /admin
>>> [delete_topics]
>>> ls /admin/preferred_replica_election
>>> Node does not exist: /admin/preferred_replica_election
>>> 
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>> wrote:
>>>> 
>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>> here?
>>>> Some other things to check are:
>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>> confirm.
>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>> does not exist?
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> On 3/7/15, 10:24 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>> 
>>>>> I started with  clean cluster and started to push data. It still does
>>>>> the
>>>>> rebalance at random durations even though the auto.leader.relabalance
>>>>> is
>>>>> set to false.
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>> wrote:
>>>>>> 
>>>>>> Yes, the rebalance should not happen in that case. That is a little
>>>>>> bit
>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>> auto.leader.election disabled and try push data?
>>>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>>>> expected.
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>> 
>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>> 
>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>>>> replica
>>>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>> 
>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
>>>>>>> still
>>>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>>>> will
>>>>>>> not happen when this is set to false.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>> <jqin@linkedin.com.INVALID <ma...@linkedin.com.INVALID>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>> Increasing
>>>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>>>> large
>>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>> threads
>>>>>>>> will
>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>>>> case,
>>>>>>>> it looks that leader migration cause issue.
>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>> election?
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>
>>>>>>>> <mailto:kzakee1@netzero.net <ma...@netzero.net>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>> 
>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>> Anythings
>>>>>>>>> that
>>>>>>>>> I could try to reduce it?
>>>>>>>>> 
>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>>>> have
>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>> 
>>>>>>>>> -Zakee
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>> <jqin@linkedin.com.invalid <ma...@linkedin.com.invalid>>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>>>> long
>>>>>>>>>> as
>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>>>> replicated
>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>> 
>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>> 
>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzakee1@netzero.net <ma...@netzero.net>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>>>> 
>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>>>> sure
>>>>>>>>>> what
>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>> 
>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>> [TestTopic]
>>>>>>>>>>> to
>>>>>>>>>>> broker
>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
>>>>>>>>>>> Error
>>>>>>>>>>> for
>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]:
>>>>>>>>>>> Fetch
>>>>>>>>>>> request
>>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2
>>>>>>>>>>> on
>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>>> partition
>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Any ideas?
>>>>>>>>>>> 
>>>>>>>>>>> -Zakee
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>>>>>>>>>>> st0
>>>>>>>>>>> 3v
>>>>>>>>>>> uc
>>>>>>>>>> 
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> Extended Stay America
>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
>>>>>>>>>> Free
>>>>>>>>>> WIFI
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>>>>>>>>> p02
>>>>>>>>>> du
>>>>>>>>>> c
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> Extended Stay America
>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>> guaranteed.
>>>>>>>> 
>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>>>>> uc
>>>>>>>> 
>>>>>>>> 
>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>> duc
>>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> The WORST exercise for aging
>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>> YOUNGER
>>>>>> 
>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>> uc
>>>>> 
>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> Seabourn Luxury Cruises
>>>> Receive special offers from the World&#39;s Finest Small-Ship Cruise
>>>> Line!
>>>> 
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>> 
>> 
> 
> 
> ____________________________________________________________
> Discover Seabourn
> A journey as beautiful as the destination, request a brochure today!
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>


Thanks
Zakee

Re: Broker Exceptions

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Is there anything wrong with brokers around that time? E.g. Broker restart?
The log you pasted are actually from replica fetchers. Could you paste the
related logs in controller.log?

Thanks.

Jiangjie (Becket) Qin

On 3/9/15, 10:32 AM, "Zakee" <kz...@netzero.net> wrote:

>Correction: Actually  the rebalance happened quite until 24 hours after
>the start, and thats where below errors were found. Ideally rebalance
>should not have happened at all.
>
>
>Thanks
>Zakee
>
>
>
>> On Mar 9, 2015, at 10:28 AM, Zakee <kz...@netzero.net> wrote:
>> 
>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>> here?
>> Thanks for you suggestions.
>> It looks like the rebalance actually happened only once soon after I
>>started with clean cluster and data was pushed, it didn’t happen again
>>so far, and I see the partitions leader counts on brokers did not change
>>since then. One of the brokers was constantly showing 0 for partition
>>leader count. Is that normal?
>> 
>> Also, I still see lots of below errors (~69k) going on in the logs
>>since the restart. Is there any other reason than rebalance for these
>>errors?
>> 
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>partition [Topic-11,7] to broker 5:class
>>kafka.common.NotLeaderForPartitionException
>>(kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>partition [Topic-2,25] to broker 5:class
>>kafka.common.NotLeaderForPartitionException
>>(kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>partition [Topic-2,21] to broker 5:class
>>kafka.common.NotLeaderForPartitionException
>>(kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>partition [Topic-22,9] to broker 5:class
>>kafka.common.NotLeaderForPartitionException
>>(kafka.server.ReplicaFetcherThread)
>> 
>>> Some other things to check are:
>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>confirm.
>> Yes 
>> 
>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>> does not exist?
>> ls /admin
>> [delete_topics]
>> ls /admin/preferred_replica_election
>> Node does not exist: /admin/preferred_replica_election
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>>wrote:
>>> 
>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>> here?
>>> Some other things to check are:
>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>confirm.
>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>> does not exist?
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> On 3/7/15, 10:24 PM, "Zakee" <kz...@netzero.net> wrote:
>>> 
>>>> I started with  clean cluster and started to push data. It still does
>>>>the
>>>> rebalance at random durations even though the auto.leader.relabalance
>>>>is
>>>> set to false.
>>>> 
>>>> Thanks
>>>> Zakee
>>>> 
>>>> 
>>>> 
>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>>>> wrote:
>>>>> 
>>>>> Yes, the rebalance should not happen in that case. That is a little
>>>>>bit
>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>> auto.leader.election disabled and try push data?
>>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>>> expected.
>>>>> 
>>>>> Jiangjie (Becket) Qin
>>>>> 
>>>>> 
>>>>> On 3/6/15, 3:14 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>> 
>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>>> replica
>>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>> 
>>>>>> I tried switching off the auto.leader.relabalance to false. I am
>>>>>>still
>>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>>> will
>>>>>> not happen when this is set to false.
>>>>>> 
>>>>>> Thanks
>>>>>> Zakee
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>><jq...@linkedin.com.INVALID>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>Increasing
>>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>>> large
>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>threads
>>>>>>> will
>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>>> case,
>>>>>>> it looks that leader migration cause issue.
>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>election?
>>>>>>> 
>>>>>>> Jiangjie (Becket) Qin
>>>>>>> 
>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>>>>>>> <ma...@netzero.net>> wrote:
>>>>>>> 
>>>>>>>> Thanks, Jiangjie.
>>>>>>>> 
>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>Anythings
>>>>>>>> that
>>>>>>>> I could try to reduce it?
>>>>>>>> 
>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>>> have
>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>> 
>>>>>>>> -Zakee
>>>>>>>> 
>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>> <jq...@linkedin.com.invalid>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>>> long
>>>>>>>>> as
>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>>> replicated
>>>>>>>>> partitions, it should be fine.
>>>>>>>>> 
>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>> 
>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>>>>>> 
>>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>>> 
>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>>> sure
>>>>>>>>> what
>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>> 
>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>[TestTopic]
>>>>>>>>>> to
>>>>>>>>>> broker
>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
>>>>>>>>>>Error
>>>>>>>>>> for
>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]:
>>>>>>>>>>Fetch
>>>>>>>>>> request
>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2
>>>>>>>>>>on
>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>> partition
>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Any ideas?
>>>>>>>>>> 
>>>>>>>>>> -Zakee
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> Next Apple Sensation
>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061
>>>>>>>>>>st0
>>>>>>>>>> 3v
>>>>>>>>>> uc
>>>>>>>>> 
>>>>>>>>> ____________________________________________________________
>>>>>>>>> Extended Stay America
>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
>>>>>>>>>Free
>>>>>>>>> WIFI
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m
>>>>>>>>>p02
>>>>>>>>> du
>>>>>>>>> c
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ____________________________________________________________
>>>>>>> Extended Stay America
>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>guaranteed.
>>>>>>> 
>>>>>>>http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
>>>>>>>uc
>>>>>>> 
>>>>>>> 
>>>>>>><http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>duc
>>>>>>>> 
>>>>> 
>>>>> 
>>>>> ____________________________________________________________
>>>>> The WORST exercise for aging
>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>YOUNGER
>>>>> 
>>>>>http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
>>>>>uc
>>>> 
>>> 
>>> 
>>> ____________________________________________________________
>>> Seabourn Luxury Cruises
>>> Receive special offers from the World&#39;s Finest Small-Ship Cruise
>>>Line!
>>> 
>>>http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc
>> 
>

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

Correction: Actually  the rebalance happened quite until 24 hours after the start, and thats where below errors were found. Ideally rebalance should not have happened at all.


Thanks
Zakee



> On Mar 9, 2015, at 10:28 AM, Zakee <kz...@netzero.net> wrote:
> 
>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>> here?
> Thanks for you suggestions. 
> It looks like the rebalance actually happened only once soon after I started with clean cluster and data was pushed, it didn’t happen again so far, and I see the partitions leader counts on brokers did not change since then. One of the brokers was constantly showing 0 for partition leader count. Is that normal?
> 
> Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors?
> 
> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
> 
>> Some other things to check are:
>> 1. The actual property name is auto.leader.rebalance.enable, not
>> auto.leader.rebalance. You’ve probably known this, just to double confirm.
> Yes 
> 
>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>> does not exist?
> ls /admin
> [delete_topics]
> ls /admin/preferred_replica_election
> Node does not exist: /admin/preferred_replica_election
> 
> 
> Thanks
> Zakee
> 
> 
> 
>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <jq...@linkedin.com.INVALID> wrote:
>> 
>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>> here?
>> Some other things to check are:
>> 1. The actual property name is auto.leader.rebalance.enable, not
>> auto.leader.rebalance. You’ve probably known this, just to double confirm.
>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>> does not exist?
>> 
>> Jiangjie (Becket) Qin
>> 
>> On 3/7/15, 10:24 PM, "Zakee" <kz...@netzero.net> wrote:
>> 
>>> I started with  clean cluster and started to push data. It still does the
>>> rebalance at random durations even though the auto.leader.relabalance is
>>> set to false.
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>>> wrote:
>>>> 
>>>> Yes, the rebalance should not happen in that case. That is a little bit
>>>> strange. Could you try to launch a clean Kafka cluster with
>>>> auto.leader.election disabled and try push data?
>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>> expected.
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> 
>>>> On 3/6/15, 3:14 PM, "Zakee" <kz...@netzero.net> wrote:
>>>> 
>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>> replica
>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>> request failure warnings in with the NotLeader Exception.
>>>>> 
>>>>> I tried switching off the auto.leader.relabalance to false. I am still
>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>> will
>>>>> not happen when this is set to false.
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>>>>> wrote:
>>>>>> 
>>>>>> I don’t think num.replica.fetchers will help in this case. Increasing
>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>> large
>>>>>> amount of data coming into a broker and more replica fetcher threads
>>>>>> will
>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>> case,
>>>>>> it looks that leader migration cause issue.
>>>>>> Do you see anything else in the log? Like preferred leader election?
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>>>>>> <ma...@netzero.net>> wrote:
>>>>>> 
>>>>>>> Thanks, Jiangjie.
>>>>>>> 
>>>>>>> Yes, I do see under partitions usually shooting every hour. Anythings
>>>>>>> that
>>>>>>> I could try to reduce it?
>>>>>>> 
>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>> have
>>>>>>> configured 7 each of 5 brokers.
>>>>>>> 
>>>>>>> -Zakee
>>>>>>> 
>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>> <jq...@linkedin.com.invalid>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>> long
>>>>>>>> as
>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>> replicated
>>>>>>>> partitions, it should be fine.
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>>>>> 
>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>> 
>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>> sure
>>>>>>>> what
>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>> 
>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic]
>>>>>>>>> to
>>>>>>>>> broker
>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error
>>>>>>>>> for
>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>>>>>>>>> request
>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>> partition
>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Any ideas?
>>>>>>>>> 
>>>>>>>>> -Zakee
>>>>>>>>> ____________________________________________________________
>>>>>>>>> Next Apple Sensation
>>>>>>>>> 1 little-known path to big profits
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st0
>>>>>>>>> 3v
>>>>>>>>> uc
>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> Extended Stay America
>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
>>>>>>>> WIFI
>>>>>>>> 
>>>>>>>> 
>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02
>>>>>>>> du
>>>>>>>> c
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> Extended Stay America
>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>>>>> 
>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>>>>>> 
>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> The WORST exercise for aging
>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years YOUNGER
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc
>>> 
>> 
>> 
>> ____________________________________________________________
>> Seabourn Luxury Cruises
>> Receive special offers from the World&#39;s Finest Small-Ship Cruise Line!
>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc
>

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
> here?
Thanks for you suggestions. 
It looks like the rebalance actually happened only once soon after I started with clean cluster and data was pushed, it didn’t happen again so far, and I see the partitions leader counts on brokers did not change since then. One of the brokers was constantly showing 0 for partition leader count. Is that normal?

Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors?

[2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
[2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
[2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
[2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)

> Some other things to check are:
> 1. The actual property name is auto.leader.rebalance.enable, not
> auto.leader.rebalance. You’ve probably known this, just to double confirm.
Yes 

> 2. In zookeeper path, can you verify /admin/preferred_replica_election
> does not exist?
ls /admin
[delete_topics]
ls /admin/preferred_replica_election
Node does not exist: /admin/preferred_replica_election


Thanks
Zakee



> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <jq...@linkedin.com.INVALID> wrote:
> 
> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
> here?
> Some other things to check are:
> 1. The actual property name is auto.leader.rebalance.enable, not
> auto.leader.rebalance. You’ve probably known this, just to double confirm.
> 2. In zookeeper path, can you verify /admin/preferred_replica_election
> does not exist?
> 
> Jiangjie (Becket) Qin
> 
> On 3/7/15, 10:24 PM, "Zakee" <kz...@netzero.net> wrote:
> 
>> I started with  clean cluster and started to push data. It still does the
>> rebalance at random durations even though the auto.leader.relabalance is
>> set to false.
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>> wrote:
>>> 
>>> Yes, the rebalance should not happen in that case. That is a little bit
>>> strange. Could you try to launch a clean Kafka cluster with
>>> auto.leader.election disabled and try push data?
>>> When leader migration occurs, NotLeaderForPartition exception is
>>> expected.
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> 
>>> On 3/6/15, 3:14 PM, "Zakee" <kz...@netzero.net> wrote:
>>> 
>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>> replica
>>>> leader election for partitions” in logs. I also see lot of Produce
>>>> request failure warnings in with the NotLeader Exception.
>>>> 
>>>> I tried switching off the auto.leader.relabalance to false. I am still
>>>> noticing the rebalance happening. My understanding was the rebalance
>>>> will
>>>> not happen when this is set to false.
>>>> 
>>>> Thanks
>>>> Zakee
>>>> 
>>>> 
>>>> 
>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>>>> wrote:
>>>>> 
>>>>> I don’t think num.replica.fetchers will help in this case. Increasing
>>>>> number of fetcher threads will only help in cases where you have a
>>>>> large
>>>>> amount of data coming into a broker and more replica fetcher threads
>>>>> will
>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>> case,
>>>>> it looks that leader migration cause issue.
>>>>> Do you see anything else in the log? Like preferred leader election?
>>>>> 
>>>>> Jiangjie (Becket) Qin
>>>>> 
>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>>>>> <ma...@netzero.net>> wrote:
>>>>> 
>>>>>> Thanks, Jiangjie.
>>>>>> 
>>>>>> Yes, I do see under partitions usually shooting every hour. Anythings
>>>>>> that
>>>>>> I could try to reduce it?
>>>>>> 
>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>> have
>>>>>> configured 7 each of 5 brokers.
>>>>>> 
>>>>>> -Zakee
>>>>>> 
>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>> <jq...@linkedin.com.invalid>
>>>>>> wrote:
>>>>>> 
>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>> long
>>>>>>> as
>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>> replicated
>>>>>>> partitions, it should be fine.
>>>>>>> 
>>>>>>> Jiangjie (Becket) Qin
>>>>>>> 
>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>>>> 
>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>> 
>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>> sure
>>>>>>> what
>>>>>>>> causes them and what could be done to fix them.
>>>>>>>> 
>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic]
>>>>>>>> to
>>>>>>>> broker
>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error
>>>>>>>> for
>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>>>>>>>> request
>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>> partition
>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Any ideas?
>>>>>>>> 
>>>>>>>> -Zakee
>>>>>>>> ____________________________________________________________
>>>>>>>> Next Apple Sensation
>>>>>>>> 1 little-known path to big profits
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st0
>>>>>>>> 3v
>>>>>>>> uc
>>>>>>> 
>>>>>>> ____________________________________________________________
>>>>>>> Extended Stay America
>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
>>>>>>> WIFI
>>>>>>> 
>>>>>>> 
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02
>>>>>>> du
>>>>>>> c
>>>>>>> 
>>>>> 
>>>>> 
>>>>> ____________________________________________________________
>>>>> Extended Stay America
>>>>> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>>>> 
>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>>>>> 
>>> 
>>> 
>>> ____________________________________________________________
>>> The WORST exercise for aging
>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years YOUNGER
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc
>> 
> 
> 
> ____________________________________________________________
> Seabourn Luxury Cruises
> Receive special offers from the World&#39;s Finest Small-Ship Cruise Line!
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc

Re: Broker Exceptions

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hmm, that sounds like a bug. Can you paste the log of leader rebalance
here?
Some other things to check are:
1. The actual property name is auto.leader.rebalance.enable, not
auto.leader.rebalance. You’ve probably known this, just to double confirm.
2. In zookeeper path, can you verify /admin/preferred_replica_election
does not exist?

Jiangjie (Becket) Qin

On 3/7/15, 10:24 PM, "Zakee" <kz...@netzero.net> wrote:

>I started with  clean cluster and started to push data. It still does the
>rebalance at random durations even though the auto.leader.relabalance is
>set to false.
>
>Thanks
>Zakee
>
>
>
>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>wrote:
>> 
>> Yes, the rebalance should not happen in that case. That is a little bit
>> strange. Could you try to launch a clean Kafka cluster with
>> auto.leader.election disabled and try push data?
>> When leader migration occurs, NotLeaderForPartition exception is
>>expected.
>> 
>> Jiangjie (Becket) Qin
>> 
>> 
>> On 3/6/15, 3:14 PM, "Zakee" <kz...@netzero.net> wrote:
>> 
>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>replica
>>> leader election for partitions” in logs. I also see lot of Produce
>>> request failure warnings in with the NotLeader Exception.
>>> 
>>> I tried switching off the auto.leader.relabalance to false. I am still
>>> noticing the rebalance happening. My understanding was the rebalance
>>>will
>>> not happen when this is set to false.
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>>> wrote:
>>>> 
>>>> I don’t think num.replica.fetchers will help in this case. Increasing
>>>> number of fetcher threads will only help in cases where you have a
>>>>large
>>>> amount of data coming into a broker and more replica fetcher threads
>>>> will
>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>case,
>>>> it looks that leader migration cause issue.
>>>> Do you see anything else in the log? Like preferred leader election?
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>>>> <ma...@netzero.net>> wrote:
>>>> 
>>>>> Thanks, Jiangjie.
>>>>> 
>>>>> Yes, I do see under partitions usually shooting every hour. Anythings
>>>>> that
>>>>> I could try to reduce it?
>>>>> 
>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>have
>>>>> configured 7 each of 5 brokers.
>>>>> 
>>>>> -Zakee
>>>>> 
>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>> <jq...@linkedin.com.invalid>
>>>>> wrote:
>>>>> 
>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>long
>>>>>> as
>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>> replicated
>>>>>> partitions, it should be fine.
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>>> 
>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>> 
>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>sure
>>>>>> what
>>>>>>> causes them and what could be done to fix them.
>>>>>>> 
>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic]
>>>>>>>to
>>>>>>> broker
>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error
>>>>>>>for
>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>>>>>>> request
>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>partition
>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>> 
>>>>>>> 
>>>>>>> Any ideas?
>>>>>>> 
>>>>>>> -Zakee
>>>>>>> ____________________________________________________________
>>>>>>> Next Apple Sensation
>>>>>>> 1 little-known path to big profits
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st0
>>>>>>>3v
>>>>>>> uc
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> Extended Stay America
>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
>>>>>> WIFI
>>>>>> 
>>>>>> 
>>>>>>http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02
>>>>>>du
>>>>>> c
>>>>>> 
>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> Extended Stay America
>>>> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>>> 
>>>><http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>>>>
>> 
>> 
>> ____________________________________________________________
>> The WORST exercise for aging
>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years YOUNGER
>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc
>

Re: Broker Exceptions

Posted by Zakee <kz...@netzero.net>.

I started with  clean cluster and started to push data. It still does the rebalance at random durations even though the auto.leader.relabalance is set to false.

Thanks
Zakee



> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <jq...@linkedin.com.INVALID> wrote:
> 
> Yes, the rebalance should not happen in that case. That is a little bit
> strange. Could you try to launch a clean Kafka cluster with
> auto.leader.election disabled and try push data?
> When leader migration occurs, NotLeaderForPartition exception is expected.
> 
> Jiangjie (Becket) Qin
> 
> 
> On 3/6/15, 3:14 PM, "Zakee" <kz...@netzero.net> wrote:
> 
>> Yes, Jiangjie, I do see lots of these errors "Starting preferred replica
>> leader election for partitions” in logs. I also see lot of Produce
>> request failure warnings in with the NotLeader Exception.
>> 
>> I tried switching off the auto.leader.relabalance to false. I am still
>> noticing the rebalance happening. My understanding was the rebalance will
>> not happen when this is set to false.
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>> wrote:
>>> 
>>> I don’t think num.replica.fetchers will help in this case. Increasing
>>> number of fetcher threads will only help in cases where you have a large
>>> amount of data coming into a broker and more replica fetcher threads
>>> will
>>> help keep up. We usually only use 1-2 for each broker. But in your case,
>>> it looks that leader migration cause issue.
>>> Do you see anything else in the log? Like preferred leader election?
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>>> <ma...@netzero.net>> wrote:
>>> 
>>>> Thanks, Jiangjie.
>>>> 
>>>> Yes, I do see under partitions usually shooting every hour. Anythings
>>>> that
>>>> I could try to reduce it?
>>>> 
>>>> How does "num.replica.fetchers" affect the replica sync? Currently have
>>>> configured 7 each of 5 brokers.
>>>> 
>>>> -Zakee
>>>> 
>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>> <jq...@linkedin.com.invalid>
>>>> wrote:
>>>> 
>>>>> These messages are usually caused by leader migration. I think as long
>>>>> as
>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>> replicated
>>>>> partitions, it should be fine.
>>>>> 
>>>>> Jiangjie (Becket) Qin
>>>>> 
>>>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>>>> 
>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>> 
>>>>>> I see tons of these exceptions/warnings in the broker logs, not sure
>>>>> what
>>>>>> causes them and what could be done to fix them.
>>>>>> 
>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
>>>>>> broker
>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
>>>>>> partition [TestTopic] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>>>>>> request
>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>>>>>> partition [TestTopic,2] failed due to Leader not local for partition
>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>> 
>>>>>> 
>>>>>> Any ideas?
>>>>>> 
>>>>>> -Zakee
>>>>>> ____________________________________________________________
>>>>>> Next Apple Sensation
>>>>>> 1 little-known path to big profits
>>>>> 
>>>>>> 
>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v
>>>>>> uc
>>>>> 
>>>>> ____________________________________________________________
>>>>> Extended Stay America
>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
>>>>> WIFI
>>>>> 
>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du
>>>>> c
>>>>> 
>>> 
>>> 
>>> ____________________________________________________________
>>> Extended Stay America
>>> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc>
> 
> 
> ____________________________________________________________
> The WORST exercise for aging
> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years YOUNGER
> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc

Re: Broker Exceptions

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Yes, the rebalance should not happen in that case. That is a little bit
strange. Could you try to launch a clean Kafka cluster with
auto.leader.election disabled and try push data?
When leader migration occurs, NotLeaderForPartition exception is expected.

Jiangjie (Becket) Qin


On 3/6/15, 3:14 PM, "Zakee" <kz...@netzero.net> wrote:

>Yes, Jiangjie, I do see lots of these errors "Starting preferred replica
>leader election for partitions” in logs. I also see lot of Produce
>request failure warnings in with the NotLeader Exception.
>
>I tried switching off the auto.leader.relabalance to false. I am still
>noticing the rebalance happening. My understanding was the rebalance will
>not happen when this is set to false.
>
>Thanks
>Zakee
>
>
>
>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <jq...@linkedin.com.INVALID>
>>wrote:
>> 
>> I don’t think num.replica.fetchers will help in this case. Increasing
>> number of fetcher threads will only help in cases where you have a large
>> amount of data coming into a broker and more replica fetcher threads
>>will
>> help keep up. We usually only use 1-2 for each broker. But in your case,
>> it looks that leader migration cause issue.
>> Do you see anything else in the log? Like preferred leader election?
>> 
>> Jiangjie (Becket) Qin
>> 
>> On 2/25/15, 5:02 PM, "Zakee" <kzakee1@netzero.net
>><ma...@netzero.net>> wrote:
>> 
>>> Thanks, Jiangjie.
>>> 
>>> Yes, I do see under partitions usually shooting every hour. Anythings
>>>that
>>> I could try to reduce it?
>>> 
>>> How does "num.replica.fetchers" affect the replica sync? Currently have
>>> configured 7 each of 5 brokers.
>>> 
>>> -Zakee
>>> 
>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>><jq...@linkedin.com.invalid>
>>> wrote:
>>> 
>>>> These messages are usually caused by leader migration. I think as long
>>>> as
>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>replicated
>>>> partitions, it should be fine.
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> On 2/25/15, 4:07 PM, "Zakee" <kz...@netzero.net> wrote:
>>>> 
>>>>> Need to know if I should I be worried about this or ignore them.
>>>>> 
>>>>> I see tons of these exceptions/warnings in the broker logs, not sure
>>>> what
>>>>> causes them and what could be done to fix them.
>>>>> 
>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
>>>>> broker
>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
>>>>> partition [TestTopic] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>>>>> request
>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>>>>> partition [TestTopic,2] failed due to Leader not local for partition
>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>> 
>>>>> 
>>>>> Any ideas?
>>>>> 
>>>>> -Zakee
>>>>> ____________________________________________________________
>>>>> Next Apple Sensation
>>>>> 1 little-known path to big profits
>>>> 
>>>>> 
>>>>>http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v
>>>>>uc
>>>> 
>>>> ____________________________________________________________
>>>> Extended Stay America
>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
>>>>WIFI
>>>> 
>>>>http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du
>>>>c
>>>> 
>> 
>> 
>> ____________________________________________________________
>> Extended Stay America
>> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>><http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc>