You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Kane Kim <ka...@gmail.com> on 2016/04/28 02:42:46 UTC

leader election bug

Hello,

Looks like we are hitting leader election bug. I've stopped one broker
(104224873) on other brokers I see following:

WARN  kafka.controller.ControllerChannelManager  - [Channel manager on
controller 104224863]: Not sending request Name: StopReplicaRequest;
Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false;
ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] to
broker 104224873, since it is offline.

Also describing topics returns this:
Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas:
104224874,104224873,104224875 Isr: 104224873,104224875

broker 104224873 is shut down, but it's still leader for the partition (at
least for a couple of hours as I monitor it).
Zookeeper cluster is healthy.

ls /brokers/ids
[104224874, 104224875, 104224863, 104224864, 104224871, 104224867,
104224868, 104224865, 104224866, 104224876, 104224877, 104224869,
104224878, 104224879]

That broker is not registered in ZK.

Re: leader election bug

Posted by Kane Kim <ka...@gmail.com>.
Also that broker is not registered in ZK as we can check with zk-shell, but
kafka still thinks it's a leader for some partitions.

On Mon, May 2, 2016 at 11:04 AM, Kane Kim <ka...@gmail.com> wrote:

> We are running Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09
> GMT, does it have any known problems?
>
>
> On Fri, Apr 29, 2016 at 2:35 PM, James Brown <jb...@easypost.com> wrote:
>
>> What version of ZooKeeper are you on? There have been a few bugs over
>> the years where ZK has lost ephemeral nodes (and spontaneously
>> de-registered brokers).
>>
>> On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <ka...@gmail.com> wrote:
>> > Any idea why it's happening? I'm sure rolling restart would fix it. Is
>> it a
>> > bug?
>> >
>> > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <ka...@gmail.com>
>> wrote:
>> >
>> >> Hello,
>> >>
>> >> Looks like we are hitting leader election bug. I've stopped one broker
>> >> (104224873) on other brokers I see following:
>> >>
>> >> WARN  kafka.controller.ControllerChannelManager  - [Channel manager on
>> >> controller 104224863]: Not sending request Name: StopReplicaRequest;
>> >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false;
>> >> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169]
>> to
>> >> broker 104224873, since it is offline.
>> >>
>> >> Also describing topics returns this:
>> >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas:
>> >> 104224874,104224873,104224875 Isr: 104224873,104224875
>> >>
>> >> broker 104224873 is shut down, but it's still leader for the partition
>> (at
>> >> least for a couple of hours as I monitor it).
>> >> Zookeeper cluster is healthy.
>> >>
>> >> ls /brokers/ids
>> >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867,
>> >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869,
>> >> 104224878, 104224879]
>> >>
>> >> That broker is not registered in ZK.
>> >>
>>
>>
>>
>> --
>> James Brown
>> Engineer
>>
>
>

Re: leader election bug

Posted by Kane Kim <ka...@gmail.com>.
So what could happen then? There is no broker registered in zookeeper, but
it's still a leader somehow.

On Mon, May 2, 2016 at 3:27 PM, Gwen Shapira <gw...@confluent.io> wrote:

> Thats a good version :)
>
> On Mon, May 2, 2016 at 11:04 AM, Kane Kim <ka...@gmail.com> wrote:
> > We are running Zookeeper version: 3.4.6-1569965, built on 02/20/2014
> 09:09
> > GMT, does it have any known problems?
> >
> > On Fri, Apr 29, 2016 at 2:35 PM, James Brown <jb...@easypost.com>
> wrote:
> >
> >> What version of ZooKeeper are you on? There have been a few bugs over
> >> the years where ZK has lost ephemeral nodes (and spontaneously
> >> de-registered brokers).
> >>
> >> On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <ka...@gmail.com>
> wrote:
> >> > Any idea why it's happening? I'm sure rolling restart would fix it. Is
> >> it a
> >> > bug?
> >> >
> >> > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <ka...@gmail.com>
> wrote:
> >> >
> >> >> Hello,
> >> >>
> >> >> Looks like we are hitting leader election bug. I've stopped one
> broker
> >> >> (104224873) on other brokers I see following:
> >> >>
> >> >> WARN  kafka.controller.ControllerChannelManager  - [Channel manager
> on
> >> >> controller 104224863]: Not sending request Name: StopReplicaRequest;
> >> >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions:
> false;
> >> >> ControllerId: 104224863; ControllerEpoch: 8; Partitions:
> [mp-auth,169]
> >> to
> >> >> broker 104224873, since it is offline.
> >> >>
> >> >> Also describing topics returns this:
> >> >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas:
> >> >> 104224874,104224873,104224875 Isr: 104224873,104224875
> >> >>
> >> >> broker 104224873 is shut down, but it's still leader for the
> partition
> >> (at
> >> >> least for a couple of hours as I monitor it).
> >> >> Zookeeper cluster is healthy.
> >> >>
> >> >> ls /brokers/ids
> >> >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867,
> >> >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869,
> >> >> 104224878, 104224879]
> >> >>
> >> >> That broker is not registered in ZK.
> >> >>
> >>
> >>
> >>
> >> --
> >> James Brown
> >> Engineer
> >>
>

Re: leader election bug

Posted by Gwen Shapira <gw...@confluent.io>.
Thats a good version :)

On Mon, May 2, 2016 at 11:04 AM, Kane Kim <ka...@gmail.com> wrote:
> We are running Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09
> GMT, does it have any known problems?
>
> On Fri, Apr 29, 2016 at 2:35 PM, James Brown <jb...@easypost.com> wrote:
>
>> What version of ZooKeeper are you on? There have been a few bugs over
>> the years where ZK has lost ephemeral nodes (and spontaneously
>> de-registered brokers).
>>
>> On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <ka...@gmail.com> wrote:
>> > Any idea why it's happening? I'm sure rolling restart would fix it. Is
>> it a
>> > bug?
>> >
>> > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <ka...@gmail.com> wrote:
>> >
>> >> Hello,
>> >>
>> >> Looks like we are hitting leader election bug. I've stopped one broker
>> >> (104224873) on other brokers I see following:
>> >>
>> >> WARN  kafka.controller.ControllerChannelManager  - [Channel manager on
>> >> controller 104224863]: Not sending request Name: StopReplicaRequest;
>> >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false;
>> >> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169]
>> to
>> >> broker 104224873, since it is offline.
>> >>
>> >> Also describing topics returns this:
>> >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas:
>> >> 104224874,104224873,104224875 Isr: 104224873,104224875
>> >>
>> >> broker 104224873 is shut down, but it's still leader for the partition
>> (at
>> >> least for a couple of hours as I monitor it).
>> >> Zookeeper cluster is healthy.
>> >>
>> >> ls /brokers/ids
>> >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867,
>> >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869,
>> >> 104224878, 104224879]
>> >>
>> >> That broker is not registered in ZK.
>> >>
>>
>>
>>
>> --
>> James Brown
>> Engineer
>>

Re: leader election bug

Posted by Kane Kim <ka...@gmail.com>.
We are running Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09
GMT, does it have any known problems?

On Fri, Apr 29, 2016 at 2:35 PM, James Brown <jb...@easypost.com> wrote:

> What version of ZooKeeper are you on? There have been a few bugs over
> the years where ZK has lost ephemeral nodes (and spontaneously
> de-registered brokers).
>
> On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <ka...@gmail.com> wrote:
> > Any idea why it's happening? I'm sure rolling restart would fix it. Is
> it a
> > bug?
> >
> > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <ka...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> Looks like we are hitting leader election bug. I've stopped one broker
> >> (104224873) on other brokers I see following:
> >>
> >> WARN  kafka.controller.ControllerChannelManager  - [Channel manager on
> >> controller 104224863]: Not sending request Name: StopReplicaRequest;
> >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false;
> >> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169]
> to
> >> broker 104224873, since it is offline.
> >>
> >> Also describing topics returns this:
> >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas:
> >> 104224874,104224873,104224875 Isr: 104224873,104224875
> >>
> >> broker 104224873 is shut down, but it's still leader for the partition
> (at
> >> least for a couple of hours as I monitor it).
> >> Zookeeper cluster is healthy.
> >>
> >> ls /brokers/ids
> >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867,
> >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869,
> >> 104224878, 104224879]
> >>
> >> That broker is not registered in ZK.
> >>
>
>
>
> --
> James Brown
> Engineer
>

Re: leader election bug

Posted by James Brown <jb...@easypost.com>.
What version of ZooKeeper are you on? There have been a few bugs over
the years where ZK has lost ephemeral nodes (and spontaneously
de-registered brokers).

On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <ka...@gmail.com> wrote:
> Any idea why it's happening? I'm sure rolling restart would fix it. Is it a
> bug?
>
> On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <ka...@gmail.com> wrote:
>
>> Hello,
>>
>> Looks like we are hitting leader election bug. I've stopped one broker
>> (104224873) on other brokers I see following:
>>
>> WARN  kafka.controller.ControllerChannelManager  - [Channel manager on
>> controller 104224863]: Not sending request Name: StopReplicaRequest;
>> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false;
>> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] to
>> broker 104224873, since it is offline.
>>
>> Also describing topics returns this:
>> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas:
>> 104224874,104224873,104224875 Isr: 104224873,104224875
>>
>> broker 104224873 is shut down, but it's still leader for the partition (at
>> least for a couple of hours as I monitor it).
>> Zookeeper cluster is healthy.
>>
>> ls /brokers/ids
>> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867,
>> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869,
>> 104224878, 104224879]
>>
>> That broker is not registered in ZK.
>>



-- 
James Brown
Engineer

Re: leader election bug

Posted by Kane Kim <ka...@gmail.com>.
Any idea why it's happening? I'm sure rolling restart would fix it. Is it a
bug?

On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <ka...@gmail.com> wrote:

> Hello,
>
> Looks like we are hitting leader election bug. I've stopped one broker
> (104224873) on other brokers I see following:
>
> WARN  kafka.controller.ControllerChannelManager  - [Channel manager on
> controller 104224863]: Not sending request Name: StopReplicaRequest;
> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false;
> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] to
> broker 104224873, since it is offline.
>
> Also describing topics returns this:
> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas:
> 104224874,104224873,104224875 Isr: 104224873,104224875
>
> broker 104224873 is shut down, but it's still leader for the partition (at
> least for a couple of hours as I monitor it).
> Zookeeper cluster is healthy.
>
> ls /brokers/ids
> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867,
> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869,
> 104224878, 104224879]
>
> That broker is not registered in ZK.
>