You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Alex Demidko <al...@metamarkets.com> on 2014/05/13 20:58:25 UTC

Killing last replica for partition doesn't change ISR/Leadership if replica is running controller

Hi,

Kafka version is 0.8.1.1. We have three machines: A,B,C. Let’s say there is a topic with replication 2 and one of it’s partitions - partition 1 is placed on brokers A and B. If the broker A is already down than for the partition 1 we have: Leader: B, ISR: [B]. If the current controller is node C, than killing broker B will turn partition 1 into state: Leader:  -1, ISR: []. But if the current controller is node B, than killing it won’t update leadership/isr for partition 1 even when controller will be restarted on node C, so partition 1 will forever think it’s leader is node B which is dead.

It looks that KafkaController.onBrokerFailure handles situation when the broker down is the partition leader - it sets the new leader value to -1. To the contrary, KafkaController.onControllerFailover never removes leader from the partition with all replicas offline - allegedly because partition gets into ReplicaDeletionIneligible state. Is it intended behavior?

This behavior affects DefaultEventHandler.getPartition in the null key case - it can’t determine partition 1 as having no leader, and this results into events send failure.


What we are trying to achieve - is to be able to write data even if some partitions lost all replicas, which is rare yet still possible scenario. Using null key looked suitable with minor DefaultEventHandler modifications (like getting rid from DefaultEventHandler.sendPartitionPerTopicCache to avoid caching and uneven events distribution) as we neither use logs compaction nor rely on partitioning of the data. We had such behavior with kafka 0.7 - if the node is down, simply produce to a different one.


Thanks, 
Alex


Re: Killing last replica for partition doesn't change ISR/Leadership if replica is running controller

Posted by Jun Rao <ju...@gmail.com>.
Yes, that seems like a real issue. Could you file a jira?

Thanks,

Jun


On Tue, May 13, 2014 at 11:58 AM, Alex Demidko <al...@metamarkets.com>wrote:

> Hi,
>
> Kafka version is 0.8.1.1. We have three machines: A,B,C. Let’s say there
> is a topic with replication 2 and one of it’s partitions - partition 1 is
> placed on brokers A and B. If the broker A is already down than for the
> partition 1 we have: Leader: B, ISR: [B]. If the current controller is node
> C, than killing broker B will turn partition 1 into state: Leader:  -1,
> ISR: []. But if the current controller is node B, than killing it won’t
> update leadership/isr for partition 1 even when controller will be
> restarted on node C, so partition 1 will forever think it’s leader is node
> B which is dead.
>
> It looks that KafkaController.onBrokerFailure handles situation when the
> broker down is the partition leader - it sets the new leader value to -1.
> To the contrary, KafkaController.onControllerFailover never removes leader
> from the partition with all replicas offline - allegedly because partition
> gets into ReplicaDeletionIneligible state. Is it intended behavior?
>
> This behavior affects DefaultEventHandler.getPartition in the null key
> case - it can’t determine partition 1 as having no leader, and this results
> into events send failure.
>
>
> What we are trying to achieve - is to be able to write data even if some
> partitions lost all replicas, which is rare yet still possible scenario.
> Using null key looked suitable with minor DefaultEventHandler modifications
> (like getting rid from DefaultEventHandler.sendPartitionPerTopicCache to
> avoid caching and uneven events distribution) as we neither use logs
> compaction nor rely on partitioning of the data. We had such behavior with
> kafka 0.7 - if the node is down, simply produce to a different one.
>
>
> Thanks,
> Alex
>
>