You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Vadim Keylis <vk...@gmail.com> on 2013/08/19 21:26:38 UTC

Failed to start preferred replica election

I have a cluster of 3 kafka servers. Replication factor is 3. Two out of 3
servers were shutdown and traffic was sent to only one server that was up.
I brought second host up and it says according to logs that server has
started.

I ran ./kafka-list-topic.sh --zookeeper <host> Still was showing leaders
are not distributed. Then ran
kafka-preferred-replica-election.sh which exited with error:

kafka.common.AdminCommandFailedException: Admin command failed
        at
kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:119)
        at
kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
        at
kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)

Would you please give suggestion what have caused the exception and how to
recover from it?

Thanks so much in advance,
Vadim

Re: Failed to start preferred replica election

Posted by Jun Rao <ju...@gmail.com>.
Added to the 0.8 documentation.

Thanks,

Jun


On Tue, Aug 20, 2013 at 9:22 AM, Jay Kreps <ja...@gmail.com> wrote:

> Is there any way to channel these many excellent email threads into
> documentation improvements :-)
>
> -Jay
>
>
> On Mon, Aug 19, 2013 at 8:55 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > We also have a jmx bean that tracks the lag in messages per partition in
> > the follower broker.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Aug 19, 2013 at 1:07 PM, Vadim Keylis <vk...@gmail.com>
> > wrote:
> >
> > > It does print partitions. I just did not include them in the bug.
> > >
> > > How can I monitor replica resync progress as well as know when resync
> > > process completed using script? That should allow me to better predict
> > when
> > > the tool would run successfully.
> > >
> > > Thanks so much.
> > >
> > >
> > > On Mon, Aug 19, 2013 at 12:59 PM, Neha Narkhede <
> neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > I think the error message can be improved to at least print which
> > > > partitions it couldn't move the leader for. What could be happening
> is
> > > that
> > > > the 2 brokers that were down might not have entered the ISR yet. So
> the
> > > > tool will not be able to move any leaders to them. You can run
> > > > kafka-list-topics with the --under-replicated-count option to print
> the
> > > > list of under replicated partitions.
> > > >
> > > > Please can you file a bug to improve the error reporting of this
> tool?
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Mon, Aug 19, 2013 at 12:26 PM, Vadim Keylis <
> vkeylis2009@gmail.com
> > > > >wrote:
> > > >
> > > > > I have a cluster of 3 kafka servers. Replication factor is 3. Two
> out
> > > of
> > > > 3
> > > > > servers were shutdown and traffic was sent to only one server that
> > was
> > > > up.
> > > > > I brought second host up and it says according to logs that server
> > has
> > > > > started.
> > > > >
> > > > > I ran ./kafka-list-topic.sh --zookeeper <host> Still was showing
> > > leaders
> > > > > are not distributed. Then ran
> > > > > kafka-preferred-replica-election.sh which exited with error:
> > > > >
> > > > > kafka.common.AdminCommandFailedException: Admin command failed
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:119)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
> > > > >
> > > > > Would you please give suggestion what have caused the exception and
> > how
> > > > to
> > > > > recover from it?
> > > > >
> > > > > Thanks so much in advance,
> > > > > Vadim
> > > > >
> > > >
> > >
> >
>

Re: Failed to start preferred replica election

Posted by Jay Kreps <ja...@gmail.com>.
Is there any way to channel these many excellent email threads into
documentation improvements :-)

-Jay


On Mon, Aug 19, 2013 at 8:55 PM, Jun Rao <ju...@gmail.com> wrote:

> We also have a jmx bean that tracks the lag in messages per partition in
> the follower broker.
>
> Thanks,
>
> Jun
>
>
> On Mon, Aug 19, 2013 at 1:07 PM, Vadim Keylis <vk...@gmail.com>
> wrote:
>
> > It does print partitions. I just did not include them in the bug.
> >
> > How can I monitor replica resync progress as well as know when resync
> > process completed using script? That should allow me to better predict
> when
> > the tool would run successfully.
> >
> > Thanks so much.
> >
> >
> > On Mon, Aug 19, 2013 at 12:59 PM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > I think the error message can be improved to at least print which
> > > partitions it couldn't move the leader for. What could be happening is
> > that
> > > the 2 brokers that were down might not have entered the ISR yet. So the
> > > tool will not be able to move any leaders to them. You can run
> > > kafka-list-topics with the --under-replicated-count option to print the
> > > list of under replicated partitions.
> > >
> > > Please can you file a bug to improve the error reporting of this tool?
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Mon, Aug 19, 2013 at 12:26 PM, Vadim Keylis <vkeylis2009@gmail.com
> > > >wrote:
> > >
> > > > I have a cluster of 3 kafka servers. Replication factor is 3. Two out
> > of
> > > 3
> > > > servers were shutdown and traffic was sent to only one server that
> was
> > > up.
> > > > I brought second host up and it says according to logs that server
> has
> > > > started.
> > > >
> > > > I ran ./kafka-list-topic.sh --zookeeper <host> Still was showing
> > leaders
> > > > are not distributed. Then ran
> > > > kafka-preferred-replica-election.sh which exited with error:
> > > >
> > > > kafka.common.AdminCommandFailedException: Admin command failed
> > > >         at
> > > >
> > > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:119)
> > > >         at
> > > >
> > > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
> > > >         at
> > > >
> > > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
> > > >
> > > > Would you please give suggestion what have caused the exception and
> how
> > > to
> > > > recover from it?
> > > >
> > > > Thanks so much in advance,
> > > > Vadim
> > > >
> > >
> >
>

Re: Failed to start preferred replica election

Posted by Jun Rao <ju...@gmail.com>.
We also have a jmx bean that tracks the lag in messages per partition in
the follower broker.

Thanks,

Jun


On Mon, Aug 19, 2013 at 1:07 PM, Vadim Keylis <vk...@gmail.com> wrote:

> It does print partitions. I just did not include them in the bug.
>
> How can I monitor replica resync progress as well as know when resync
> process completed using script? That should allow me to better predict when
> the tool would run successfully.
>
> Thanks so much.
>
>
> On Mon, Aug 19, 2013 at 12:59 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > I think the error message can be improved to at least print which
> > partitions it couldn't move the leader for. What could be happening is
> that
> > the 2 brokers that were down might not have entered the ISR yet. So the
> > tool will not be able to move any leaders to them. You can run
> > kafka-list-topics with the --under-replicated-count option to print the
> > list of under replicated partitions.
> >
> > Please can you file a bug to improve the error reporting of this tool?
> >
> > Thanks,
> > Neha
> >
> >
> > On Mon, Aug 19, 2013 at 12:26 PM, Vadim Keylis <vkeylis2009@gmail.com
> > >wrote:
> >
> > > I have a cluster of 3 kafka servers. Replication factor is 3. Two out
> of
> > 3
> > > servers were shutdown and traffic was sent to only one server that was
> > up.
> > > I brought second host up and it says according to logs that server has
> > > started.
> > >
> > > I ran ./kafka-list-topic.sh --zookeeper <host> Still was showing
> leaders
> > > are not distributed. Then ran
> > > kafka-preferred-replica-election.sh which exited with error:
> > >
> > > kafka.common.AdminCommandFailedException: Admin command failed
> > >         at
> > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:119)
> > >         at
> > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
> > >         at
> > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
> > >
> > > Would you please give suggestion what have caused the exception and how
> > to
> > > recover from it?
> > >
> > > Thanks so much in advance,
> > > Vadim
> > >
> >
>

Re: Failed to start preferred replica election

Posted by Neha Narkhede <ne...@gmail.com>.
You can monitor the under replicated partition count through the
"kafka.server.UnderReplicatedPartitions" jmx bean on every leader. Another
way, which is heavy weight is to run kafka-list-topics, but I would
recommend running that only for diagnostic purposes, not for monitoring.

Thanks,
Neha


On Mon, Aug 19, 2013 at 1:07 PM, Vadim Keylis <vk...@gmail.com> wrote:

> It does print partitions. I just did not include them in the bug.
>
> How can I monitor replica resync progress as well as know when resync
> process completed using script? That should allow me to better predict when
> the tool would run successfully.
>
> Thanks so much.
>
>
> On Mon, Aug 19, 2013 at 12:59 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > I think the error message can be improved to at least print which
> > partitions it couldn't move the leader for. What could be happening is
> that
> > the 2 brokers that were down might not have entered the ISR yet. So the
> > tool will not be able to move any leaders to them. You can run
> > kafka-list-topics with the --under-replicated-count option to print the
> > list of under replicated partitions.
> >
> > Please can you file a bug to improve the error reporting of this tool?
> >
> > Thanks,
> > Neha
> >
> >
> > On Mon, Aug 19, 2013 at 12:26 PM, Vadim Keylis <vkeylis2009@gmail.com
> > >wrote:
> >
> > > I have a cluster of 3 kafka servers. Replication factor is 3. Two out
> of
> > 3
> > > servers were shutdown and traffic was sent to only one server that was
> > up.
> > > I brought second host up and it says according to logs that server has
> > > started.
> > >
> > > I ran ./kafka-list-topic.sh --zookeeper <host> Still was showing
> leaders
> > > are not distributed. Then ran
> > > kafka-preferred-replica-election.sh which exited with error:
> > >
> > > kafka.common.AdminCommandFailedException: Admin command failed
> > >         at
> > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:119)
> > >         at
> > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
> > >         at
> > >
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
> > >
> > > Would you please give suggestion what have caused the exception and how
> > to
> > > recover from it?
> > >
> > > Thanks so much in advance,
> > > Vadim
> > >
> >
>

Re: Failed to start preferred replica election

Posted by Vadim Keylis <vk...@gmail.com>.
It does print partitions. I just did not include them in the bug.

How can I monitor replica resync progress as well as know when resync
process completed using script? That should allow me to better predict when
the tool would run successfully.

Thanks so much.


On Mon, Aug 19, 2013 at 12:59 PM, Neha Narkhede <ne...@gmail.com>wrote:

> I think the error message can be improved to at least print which
> partitions it couldn't move the leader for. What could be happening is that
> the 2 brokers that were down might not have entered the ISR yet. So the
> tool will not be able to move any leaders to them. You can run
> kafka-list-topics with the --under-replicated-count option to print the
> list of under replicated partitions.
>
> Please can you file a bug to improve the error reporting of this tool?
>
> Thanks,
> Neha
>
>
> On Mon, Aug 19, 2013 at 12:26 PM, Vadim Keylis <vkeylis2009@gmail.com
> >wrote:
>
> > I have a cluster of 3 kafka servers. Replication factor is 3. Two out of
> 3
> > servers were shutdown and traffic was sent to only one server that was
> up.
> > I brought second host up and it says according to logs that server has
> > started.
> >
> > I ran ./kafka-list-topic.sh --zookeeper <host> Still was showing leaders
> > are not distributed. Then ran
> > kafka-preferred-replica-election.sh which exited with error:
> >
> > kafka.common.AdminCommandFailedException: Admin command failed
> >         at
> >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:119)
> >         at
> >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
> >         at
> >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
> >
> > Would you please give suggestion what have caused the exception and how
> to
> > recover from it?
> >
> > Thanks so much in advance,
> > Vadim
> >
>

Re: Failed to start preferred replica election

Posted by Neha Narkhede <ne...@gmail.com>.
I think the error message can be improved to at least print which
partitions it couldn't move the leader for. What could be happening is that
the 2 brokers that were down might not have entered the ISR yet. So the
tool will not be able to move any leaders to them. You can run
kafka-list-topics with the --under-replicated-count option to print the
list of under replicated partitions.

Please can you file a bug to improve the error reporting of this tool?

Thanks,
Neha


On Mon, Aug 19, 2013 at 12:26 PM, Vadim Keylis <vk...@gmail.com>wrote:

> I have a cluster of 3 kafka servers. Replication factor is 3. Two out of 3
> servers were shutdown and traffic was sent to only one server that was up.
> I brought second host up and it says according to logs that server has
> started.
>
> I ran ./kafka-list-topic.sh --zookeeper <host> Still was showing leaders
> are not distributed. Then ran
> kafka-preferred-replica-election.sh which exited with error:
>
> kafka.common.AdminCommandFailedException: Admin command failed
>         at
>
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:119)
>         at
>
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
>         at
>
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
>
> Would you please give suggestion what have caused the exception and how to
> recover from it?
>
> Thanks so much in advance,
> Vadim
>