You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Qi Xu <sh...@gmail.com> on 2016/04/21 02:23:58 UTC

electing leader failed and result in 0 latest offset

Hi folks,
Recently we run into an odd issue that some partition's latest offset
becomes 0. Here's the snapshot of the Kafka Manager. As you can see
partition 2 and 3 becomes zero.

*Partition*

*Latest Offset*

*Leader*

*Replicas*

*In Sync Replicas*

*Preferred Leader?*

*Under Replicated?*

0

25822061

3 <http://10.1.49.4:9000/clusters/ppe/brokers/3>

(3,4,5)

(3,5,4)

true

false

1

25822388

4 <http://10.1.49.4:9000/clusters/ppe/brokers/4>

(4,5,1)

(4,1,5)

true

false

2

0

2 <http://10.1.49.4:9000/clusters/ppe/brokers/2>

(5,1,2)

(2)

false

true

3

0

2 <http://10.1.49.4:9000/clusters/ppe/brokers/2>

(1,2,3)

(3,2)

false

true

In the Kafka Controller node, I saw there're some errors like below in
state-change log. The timing seems match, not sure if it's related or not.

[2016-04-14 19:59:21,800] ERROR Controller 3 epoch 74174 initiated state
change for partition [topic,2] from OnlinePartition to OnlinePartition
failed (state.change.logger)
kafka.common.StateChangeFailedException: encountered error while electing
leader for partition [topic,2] due to: Preferred replica 1 for partition
[topic,2] is either not alive or not in the isr. Current leader and ISR:
[{"leader":2,"leader_epoch":169,"isr":[2]}].


And when this happens, basically all these partitions with zero latest
offset fail to get new data. After we restart the controller, everything
goes back normally.

Do you see the similar issue before and any idea about the root cause? What
other information do you suggest to collect to get to the root cause?

Thanks,
Qi

Re: electing leader failed and result in 0 latest offset

Posted by Liquan Pei <li...@gmail.com>.

Hi Qi,

Just to confirm, as you seeing the offset reset to 0 in the new consumer?

I am not sure the root cause of the leader election failure. But as the new
Kafka consumer is storing the offset in Kafka, it is possible to reset the
offset to 0 for some topic partitions when the leader for a offset topic
partition becomes unavailable.

Thanks,
Liquan

On Wed, Apr 20, 2016 at 5:23 PM, Qi Xu <sh...@gmail.com> wrote:

> Hi folks,
> Recently we run into an odd issue that some partition's latest offset
> becomes 0. Here's the snapshot of the Kafka Manager. As you can see
> partition 2 and 3 becomes zero.
>
> *Partition*
>
> *Latest Offset*
>
> *Leader*
>
> *Replicas*
>
> *In Sync Replicas*
>
> *Preferred Leader?*
>
> *Under Replicated?*
>
> 0
>
> 25822061
>
> 3 <http://10.1.49.4:9000/clusters/ppe/brokers/3>
>
> (3,4,5)
>
> (3,5,4)
>
> true
>
> false
>
> 1
>
> 25822388
>
> 4 <http://10.1.49.4:9000/clusters/ppe/brokers/4>
>
> (4,5,1)
>
> (4,1,5)
>
> true
>
> false
>
> 2
>
> 0
>
> 2 <http://10.1.49.4:9000/clusters/ppe/brokers/2>
>
> (5,1,2)
>
> (2)
>
> false
>
> true
>
> 3
>
> 0
>
> 2 <http://10.1.49.4:9000/clusters/ppe/brokers/2>
>
> (1,2,3)
>
> (3,2)
>
> false
>
> true
>
> In the Kafka Controller node, I saw there're some errors like below in
> state-change log. The timing seems match, not sure if it's related or not.
>
> [2016-04-14 19:59:21,800] ERROR Controller 3 epoch 74174 initiated state
> change for partition [topic,2] from OnlinePartition to OnlinePartition
> failed (state.change.logger)
> kafka.common.StateChangeFailedException: encountered error while electing
> leader for partition [topic,2] due to: Preferred replica 1 for partition
> [topic,2] is either not alive or not in the isr. Current leader and ISR:
> [{"leader":2,"leader_epoch":169,"isr":[2]}].
>
>
> And when this happens, basically all these partitions with zero latest
> offset fail to get new data. After we restart the controller, everything
> goes back normally.
>
> Do you see the similar issue before and any idea about the root cause? What
> other information do you suggest to collect to get to the root cause?
>
> Thanks,
> Qi
>



-- 
Liquan Pei
Software Engineer, Confluent Inc