You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Ming Liu <mi...@gmail.com> on 2019/02/14 23:50:10 UTC

OfflinePartitionLeaderElection improvement?

Hi Kafka community,
   I like to propose a small change related to
OfflinePartitionLeaderElectionStrategy.
   In our system, we usually has RF = 3, Min_ISR = 2,
unclean.leader.election = false and client usually set the ACK.all when
publishing. We have observed that occasionally, when disk become bad, we
have partition  offline and stayed on the offline state, which of cause,
causing the availability issue and we have to manually set
unclean.leader.election = true to bring the partition online.
   This partition offlie due to disk failure become a huge operational pain
for us.

   Looking into, the sequence of events are:
   1. First, ISR for that partition drops to 1 (maybe bad disk causing the
broker to respond to fetch more slowly. Note dead disk doesn't cause this
to happen every time, but occasionally)
   2. Then disk completely give up and the failure causing leader replica
offline
   3. Because the ISR is 1, OfflinePartitionLeaderElectionStrategy won't
choose the leader if unclean.leader.election = false.

   The observation here is, in this case, even the last failed replica is
not in ISR, it still should have the HW same as the failed leader replica.
So the OfflinePartitionLeaderElectionStrategy should select the last failed
replica as the leader, espcially if it has the same HW.

   So the proposal is:
   1. Choose replica as the leader if it has the same HW (and even it is
not in ISR)
   2. Further, when unclean.leader.election = true, choose the replica with
highest HW as the leader.

   Let me know if this makes sense or any suggestions. If yes, I will
create a JIRA and work on it.

   Thanks!
   Ming

Re: OfflinePartitionLeaderElection improvement?

Posted by hacker win7 <ha...@gmail.com>.

Seems like this proposal is between unclean elect and clean elect, maybe need add new policy for this?


— hackerwin7
— hackerswin7@gmail.com

> On Feb 15, 2019, at 07:50, Ming Liu <mi...@gmail.com> wrote:
> 
> Hi Kafka community,
>   I like to propose a small change related to
> OfflinePartitionLeaderElectionStrategy.
>   In our system, we usually has RF = 3, Min_ISR = 2,
> unclean.leader.election = false and client usually set the ACK.all when
> publishing. We have observed that occasionally, when disk become bad, we
> have partition  offline and stayed on the offline state, which of cause,
> causing the availability issue and we have to manually set
> unclean.leader.election = true to bring the partition online.
>   This partition offlie due to disk failure become a huge operational pain
> for us.
> 
>   Looking into, the sequence of events are:
>   1. First, ISR for that partition drops to 1 (maybe bad disk causing the
> broker to respond to fetch more slowly. Note dead disk doesn't cause this
> to happen every time, but occasionally)
>   2. Then disk completely give up and the failure causing leader replica
> offline
>   3. Because the ISR is 1, OfflinePartitionLeaderElectionStrategy won't
> choose the leader if unclean.leader.election = false.
> 
>   The observation here is, in this case, even the last failed replica is
> not in ISR, it still should have the HW same as the failed leader replica.
> So the OfflinePartitionLeaderElectionStrategy should select the last failed
> replica as the leader, espcially if it has the same HW.
> 
>   So the proposal is:
>   1. Choose replica as the leader if it has the same HW (and even it is
> not in ISR)
>   2. Further, when unclean.leader.election = true, choose the replica with
> highest HW as the leader.
> 
>   Let me know if this makes sense or any suggestions. If yes, I will
> create a JIRA and work on it.
> 
>   Thanks!
>   Ming