You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Hu Xi <hu...@hotmail.com> on 2018/01/25 08:02:39 UTC

答复: kafka controller setting for detecting broker failure and re-electing a new leader for partitions?

Yu Yang,


There does exist a broker-side config named 'controller.socket.timeout.ms'. Decrease it to a reasonably smaller value might be a help but please use it with caution.

________________________________
发件人: Yu Yang <yu...@gmail.com>
发送时间: 2018年1月25日 15:42
收件人: users@kafka.apache.org
主题: kafka controller setting for detecting broker failure and re-electing a new leader for partitions?

Hi everyone,

Recently we had a cluster in which the controller failed to connect to a
broker A for an extended period of time.  I had expected that the
controller would identify the broker as a failed broker, and re-elect
another broker as the leader for partitions that were hosted on broker A.
However, this did not happen in that cluster. What happened was that broker
A was still considered as the leader for some partitions, and those
partitions are marked as under replicated partitions. Is there any
configuration setting in kafka to speed up the broker failure detection?


2018-01-24 14:13:57,132] WARN [Controller-37-to-broker-4-send-thread],
Controller 37's connection to broker testkafka04:9092 (id: 4 rack: null)
was unsuccessful (kafka.controller.RequestSendThread)
java.net.SocketTimeoutException: Failed to connect within 30000 ms
        at
kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:231)
        at
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:182)
        at
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:181)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

Thanks!

Regards,
-Yu

Re: 答复: kafka controller setting for detecting broker failure and re-electing a new leader for partitions?

Posted by Yu Yang <yu...@gmail.com>.
Thanks for the reply, Xi! The default value of 'controller.socket.timeout.ms'
is 30000. That is 30 seconds. What we have observed was that the controller
would not assign another replica as the leader, even if it failed to send
updated topic metadata information too the problematic broker for >30
minutes. Reducing controller.socket.timeout.ms will not help.

Based on the current kaka implementation, when such an exception is raised
up, ControllerChannelManager will catch the exception and keep retrying.

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerChannelManager.scala#L222

<https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerChannelManager.scala#L245>
On Thu, Jan 25, 2018 at 12:02 AM, Hu Xi <hu...@hotmail.com> wrote:

> Yu Yang,
>
>
> There does exist a broker-side config named 'controller.socket.timeout.ms'.
> Decrease it to a reasonably smaller value might be a help but please use it
> with caution.
>
> ________________________________
> 发件人: Yu Yang <yu...@gmail.com>
> 发送时间: 2018年1月25日 15:42
> 收件人: users@kafka.apache.org
> 主题: kafka controller setting for detecting broker failure and re-electing
> a new leader for partitions?
>
> Hi everyone,
>
> Recently we had a cluster in which the controller failed to connect to a
> broker A for an extended period of time.  I had expected that the
> controller would identify the broker as a failed broker, and re-elect
> another broker as the leader for partitions that were hosted on broker A.
> However, this did not happen in that cluster. What happened was that broker
> A was still considered as the leader for some partitions, and those
> partitions are marked as under replicated partitions. Is there any
> configuration setting in kafka to speed up the broker failure detection?
>
>
> 2018-01-24 14:13:57,132] WARN [Controller-37-to-broker-4-send-thread],
> Controller 37's connection to broker testkafka04:9092 (id: 4 rack: null)
> was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at
> kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.
> scala:231)
>         at
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.
> scala:182)
>         at
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.
> scala:181)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
> Thanks!
>
> Regards,
> -Yu
>