You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by JD Zheng <jd...@gmail.com> on 2020/11/10 06:32:39 UTC

Fwd: Recover partitions from a failed broker leader which is the only replica in isr.

Hi, everyone,

Not sure if this is the right place to ask this question. I sent it to the
user mailing list and didn't get any response. Where should I ask such
questions?
We configure the producer write Ack=ALL and broker min.insync.replica = 2.
The replication factor is 3. In this case, if the leader is struggling, and
the two other replicas can not catchup with the leader and are both kicked
out of the isr set, it is still safe to fail over to the most updated
follower.  Because the updates which are only available on the failed
leader won't have been acked back to the client as success, no data loss in
terms of client updates in this case. But I didn't find any
information/tool to failover to the newer follower in case the only isr
replica is the failed leader. How can we recover the partition when the
only replica in the isr failed and two other healthy followers can not
catch up? Is there any way to manually bring the two replicas back to isr
without actually catchup with the down leader? Thanks,

-JD

---------- Forwarded message ---------
From: JD Zheng <jd...@gmail.com>
Date: Sun, Nov 8, 2020 at 7:41 PM
Subject: Recover partitions from a failed broker leader which is the only
replica in isr.
To: <us...@kafka.apache.org>


Hi,

We have a case where one of our kafka brokers's data drive failed which
kicks all the followers of the partitions with this broker as the leader
out of the isr before the broker is completed down. In this case, all the
partitions (this broker was the leader) become offline. What's the best way
to recover these partitions to re-elect a new leader among the non-isr
replicas? Is there any way to elect the replica with the highest ELO?

Thanks,

-JD

Re: Recover partitions from a failed broker leader which is the only replica in isr.

Posted by Ismael Juma <is...@juma.me.uk>.
A better option is to use the elect leaders command. You have to be careful
still, but the edge is not as sharp as the config.

Ismael

On Sun, May 16, 2021, 4:32 AM Alexandre Dupriez <al...@gmail.com>
wrote:

> Hi JD,
>
> You can enable unclean leader election (disabled by default) to use
> ex-ISR followers as fallbacks for new leaders - but be mindful of data
> loss.
>
> Thanks,
> Alexandre
>
> Le mar. 10 nov. 2020 à 06:33, JD Zheng <jd...@gmail.com> a écrit :
> >
> > Hi, everyone,
> >
> > Not sure if this is the right place to ask this question. I sent it to
> the
> > user mailing list and didn't get any response. Where should I ask such
> > questions?
> > We configure the producer write Ack=ALL and broker min.insync.replica =
> 2.
> > The replication factor is 3. In this case, if the leader is struggling,
> and
> > the two other replicas can not catchup with the leader and are both
> kicked
> > out of the isr set, it is still safe to fail over to the most updated
> > follower.  Because the updates which are only available on the failed
> > leader won't have been acked back to the client as success, no data loss
> in
> > terms of client updates in this case. But I didn't find any
> > information/tool to failover to the newer follower in case the only isr
> > replica is the failed leader. How can we recover the partition when the
> > only replica in the isr failed and two other healthy followers can not
> > catch up? Is there any way to manually bring the two replicas back to isr
> > without actually catchup with the down leader? Thanks,
> >
> > -JD
> >
> > ---------- Forwarded message ---------
> > From: JD Zheng <jd...@gmail.com>
> > Date: Sun, Nov 8, 2020 at 7:41 PM
> > Subject: Recover partitions from a failed broker leader which is the only
> > replica in isr.
> > To: <us...@kafka.apache.org>
> >
> >
> > Hi,
> >
> > We have a case where one of our kafka brokers's data drive failed which
> > kicks all the followers of the partitions with this broker as the leader
> > out of the isr before the broker is completed down. In this case, all the
> > partitions (this broker was the leader) become offline. What's the best
> way
> > to recover these partitions to re-elect a new leader among the non-isr
> > replicas? Is there any way to elect the replica with the highest ELO?
> >
> > Thanks,
> >
> > -JD
>

Re: Recover partitions from a failed broker leader which is the only replica in isr.

Posted by Alexandre Dupriez <al...@gmail.com>.
Hi JD,

You can enable unclean leader election (disabled by default) to use
ex-ISR followers as fallbacks for new leaders - but be mindful of data
loss.

Thanks,
Alexandre

Le mar. 10 nov. 2020 à 06:33, JD Zheng <jd...@gmail.com> a écrit :
>
> Hi, everyone,
>
> Not sure if this is the right place to ask this question. I sent it to the
> user mailing list and didn't get any response. Where should I ask such
> questions?
> We configure the producer write Ack=ALL and broker min.insync.replica = 2.
> The replication factor is 3. In this case, if the leader is struggling, and
> the two other replicas can not catchup with the leader and are both kicked
> out of the isr set, it is still safe to fail over to the most updated
> follower.  Because the updates which are only available on the failed
> leader won't have been acked back to the client as success, no data loss in
> terms of client updates in this case. But I didn't find any
> information/tool to failover to the newer follower in case the only isr
> replica is the failed leader. How can we recover the partition when the
> only replica in the isr failed and two other healthy followers can not
> catch up? Is there any way to manually bring the two replicas back to isr
> without actually catchup with the down leader? Thanks,
>
> -JD
>
> ---------- Forwarded message ---------
> From: JD Zheng <jd...@gmail.com>
> Date: Sun, Nov 8, 2020 at 7:41 PM
> Subject: Recover partitions from a failed broker leader which is the only
> replica in isr.
> To: <us...@kafka.apache.org>
>
>
> Hi,
>
> We have a case where one of our kafka brokers's data drive failed which
> kicks all the followers of the partitions with this broker as the leader
> out of the isr before the broker is completed down. In this case, all the
> partitions (this broker was the leader) become offline. What's the best way
> to recover these partitions to re-elect a new leader among the non-isr
> replicas? Is there any way to elect the replica with the highest ELO?
>
> Thanks,
>
> -JD