You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Kunal Jadhav <ku...@nciportal.com.INVALID> on 2023/04/14 10:30:39 UTC

Facing new leader election issues in zookeeper-less kafka cluster having version 3.3.2

Hello All,

We have implemented 3 brokers cluster on a single node server in the
kubernetes environment, which is a zookeeper-less cluster having kafka
version 3.32. And facing one issue like when the existing leader broker
gets down then the new leader is not elected. We have faced this issue
several times and always need to restart the cluster. So please help me to
solve this problem. Thanks in advance.

---
Thanks & Regards,
Kunal Jadhav

Re: Facing new leader election issues in zookeeper-less kafka cluster having version 3.3.2

Posted by Kunal Jadhav <ku...@nciportal.com.INVALID>.
Thanks Divij, I will check further.

---
Thanks & Regards,
Kunal Jadhav


On Fri, Apr 14, 2023 at 4:25 PM Divij Vaidya <di...@gmail.com>
wrote:

> Hey Kunal
>
> We would need more information to debug your scenario since there are no
> known bugs (AFAIK) in 3.3.2 associated with leader election.
>
> At a very high level, the ideal sequence of events should be as follows:
> 1. When the existing leader shuts down, it will stop sending requests for
> heartbeat/metadata to the controller.
> 2. Controller will detect that it hasn't received a heartbeat from a broker
> for > broker.heartbeat.interval.ms (defaults to 2s).
> 3. Controller will elect a new leader and send LeadershipAndISR requests to
> other brokers in the ISR, one of them will be elected as a leader.
>
> You should be able to look at the state change logs and verify the sequence
> of events. In case your controller resides on the same machines as the
> leader in step 1, there will be a controller failover first followed by the
> sequence of events described above.
>
> Could you please tell us the sequence of events by looking at your state
> change logs? I would also look at controller logs to ensure that it is
> actually performing a leader failover.
>
> Also, how are you checking that a leader is not elected? Could it be that
> the partition is under-replicated or below ISR and that is why you aren't
> able to produce/consume from it but it still has a leader?
>
> --
> Divij Vaidya
>
>
>
> On Fri, Apr 14, 2023 at 12:32 PM Kunal Jadhav
> <ku...@nciportal.com.invalid> wrote:
>
> > Hello All,
> >
> > We have implemented 3 brokers cluster on a single node server in the
> > kubernetes environment, which is a zookeeper-less cluster having kafka
> > version 3.32. And facing one issue like when the existing leader broker
> > gets down then the new leader is not elected. We have faced this issue
> > several times and always need to restart the cluster. So please help me
> to
> > solve this problem. Thanks in advance.
> >
> > ---
> > Thanks & Regards,
> > Kunal Jadhav
> >
>

Re: Facing new leader election issues in zookeeper-less kafka cluster having version 3.3.2

Posted by Divij Vaidya <di...@gmail.com>.
Hey Kunal

We would need more information to debug your scenario since there are no
known bugs (AFAIK) in 3.3.2 associated with leader election.

At a very high level, the ideal sequence of events should be as follows:
1. When the existing leader shuts down, it will stop sending requests for
heartbeat/metadata to the controller.
2. Controller will detect that it hasn't received a heartbeat from a broker
for > broker.heartbeat.interval.ms (defaults to 2s).
3. Controller will elect a new leader and send LeadershipAndISR requests to
other brokers in the ISR, one of them will be elected as a leader.

You should be able to look at the state change logs and verify the sequence
of events. In case your controller resides on the same machines as the
leader in step 1, there will be a controller failover first followed by the
sequence of events described above.

Could you please tell us the sequence of events by looking at your state
change logs? I would also look at controller logs to ensure that it is
actually performing a leader failover.

Also, how are you checking that a leader is not elected? Could it be that
the partition is under-replicated or below ISR and that is why you aren't
able to produce/consume from it but it still has a leader?

--
Divij Vaidya



On Fri, Apr 14, 2023 at 12:32 PM Kunal Jadhav
<ku...@nciportal.com.invalid> wrote:

> Hello All,
>
> We have implemented 3 brokers cluster on a single node server in the
> kubernetes environment, which is a zookeeper-less cluster having kafka
> version 3.32. And facing one issue like when the existing leader broker
> gets down then the new leader is not elected. We have faced this issue
> several times and always need to restart the cluster. So please help me to
> solve this problem. Thanks in advance.
>
> ---
> Thanks & Regards,
> Kunal Jadhav
>