You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Tom Dearman <to...@gmail.com> on 2017/06/13 14:20:39 UTC

ZookeeperLeaderElector broker shutdown rebalance time

On our production system I have observed problems with sending messages to kafka when one of the brokers gets taken down (controlled roll of an open-shift pod containing the kafka broker), the messages time out after 5 seconds (we have retries set up, but before the retries are exhausted, the wait of 5 seconds, using Future.get() call of 5000 milliseconds,
 for sending a message has expired). I have managed to reproduce this on my local machine with a cluster of 3 kafka 0.10.2.0 brokers, one zookeeper and our app sending messages to the brokers.  It seems to only happen when the broker taken down is the broker which is the lead with the zookeeper elector. The election happens pretty quickly, but then the rebalance takes 10 or so seconds.  If the broker that is taken down is not the zookeeper elector leader, the rebalance happens quickly.  The long of the election is below for the 2 nodes that remained up, it shows that after election there is a 10 to 12 second time lag before the rebalance partitions happens. I have also included the controller log for the node that did the successful znode creation.

logs2
[2017-06-13 14:14:28,273] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2017-06-13 14:14:28,275] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
[2017-06-13 14:14:28,276] INFO 2 successfully elected as leader (kafka.server.ZookeeperLeaderElector)
[2017-06-13 14:14:28,571] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-06-13 14:14:40,761] INFO Partition [__consumer_offsets,30] on broker 2: Shrinking ISR for partition [__consumer_offsets,30] from 2,0,3 to 2,0 (kafka.cluster.Partition)

logs
[2017-06-13 14:14:28,277] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2017-06-13 14:14:28,279] INFO Result of znode creation is: NODEEXISTS (kafka.utils.ZKCheckedEphemeral)
[2017-06-13 14:14:28,284] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-06-13 14:14:38,518] INFO Partition [__consumer_offsets,17] on broker 0: Shrinking ISR for partition [__consumer_offsets,17] from 2,0,3 to 2,0 (kafka.cluster.Partition)



Are there any parameters that we can change to get this time lower, or are there any possible improvements that could get this time down.


Thanks,

Tom

Re: ZookeeperLeaderElector broker shutdown rebalance time

Posted by Tom Dearman <to...@gmail.com>.
It seems this doesn’t just happen when the broker being restarted is the one that has the leadership ('1002 successfully elected as leader’).  I have now turned off the auto leadership rebalance.  What I don’t understand is that I can run the manual leadership rebalance when there are messages being sent to kafka and I don’t get any producer send time outs during the rebalance, I can even use the kafka-reassign-partitions.sh tool to reassign partitions away from a broker, run the manual rebalance, no messages time out, but when I take down the broker which has no leadership at all for any partitions, I get producer send message time outs (i.e. on the future objects get with a timeout of 5000 milliseconds).  Does anyone have any idea about this.  Surely high availability should mean that we can continue to send messages successfully given that we allow 2 retries?


> On 13 Jun 2017, at 15:20, Tom Dearman <to...@gmail.com> wrote:
> 
> On our production system I have observed problems with sending messages to kafka when one of the brokers gets taken down (controlled roll of an open-shift pod containing the kafka broker), the messages time out after 5 seconds (we have retries set up, but before the retries are exhausted, the wait of 5 seconds, using Future.get() call of 5000 milliseconds,
>  for sending a message has expired). I have managed to reproduce this on my local machine with a cluster of 3 kafka 0.10.2.0 brokers, one zookeeper and our app sending messages to the brokers.  It seems to only happen when the broker taken down is the broker which is the lead with the zookeeper elector. The election happens pretty quickly, but then the rebalance takes 10 or so seconds.  If the broker that is taken down is not the zookeeper elector leader, the rebalance happens quickly.  The long of the election is below for the 2 nodes that remained up, it shows that after election there is a 10 to 12 second time lag before the rebalance partitions happens. I have also included the controller log for the node that did the successful znode creation.
> 
> logs2
> [2017-06-13 14:14:28,273] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
> [2017-06-13 14:14:28,275] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
> [2017-06-13 14:14:28,276] INFO 2 successfully elected as leader (kafka.server.ZookeeperLeaderElector)
> [2017-06-13 14:14:28,571] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2017-06-13 14:14:40,761] INFO Partition [__consumer_offsets,30] on broker 2: Shrinking ISR for partition [__consumer_offsets,30] from 2,0,3 to 2,0 (kafka.cluster.Partition)
> 
> logs
> [2017-06-13 14:14:28,277] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
> [2017-06-13 14:14:28,279] INFO Result of znode creation is: NODEEXISTS (kafka.utils.ZKCheckedEphemeral)
> [2017-06-13 14:14:28,284] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2017-06-13 14:14:38,518] INFO Partition [__consumer_offsets,17] on broker 0: Shrinking ISR for partition [__consumer_offsets,17] from 2,0,3 to 2,0 (kafka.cluster.Partition)
> <controller.log.2017-06-13-14.gz>
> 
> 
> Are there any parameters that we can change to get this time lower, or are there any possible improvements that could get this time down.
> 
> 
> Thanks,
> 
> Tom