You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Peter Bukowinski <pm...@gmail.com> on 2020/10/11 19:34:32 UTC

Intended behavior when a broker loses its log volume

Greeting, all.

What is the expected behavior of a broker when it loses its only configured data log directory?

I’m running kafka 2.2.1 in aws and we had an outage caused by the loss of an attached volume on one of the brokers. The broker did not relinquish leadership of its topic partitions when this occurred, so it caused an outage that was only mitigated after we restarted the broker, forcing leadership changes. I run kafka on bare metal with JBOD data dirs, and losing a disk in those clusters does not cause an outage.

I’m curious what I should expect with only one storage location per broker.

—
Peter Bukowinski

Re: Intended behavior when a broker loses its log volume

Posted by Tom Bentley <tb...@redhat.com>.
Hi Peter,

When an unexpected IOException happens when accessing a file in a log
directory, the broker will take that log directory offline. That means
follower fetchers for partitions on that log dir will be stopped, the
broker will stop serving requests from those logs and a notification is
sent to Zookeeper. When the controller receives notification it queries the
broker (via a LeaderAndIsrRequest) to find out exactly which partitions are
affected and those where the response error code indicates a storage error
will have new leaders elected via the usual mechanism. When a broker has no
more log directories online it will exit.

So in the setup you describe, I would expect that the broker should only
have lost leadership of those partitions which were on the affected log
dir. In a broker with a single log directory the broker would exit. So the
behaviour is not the same, but prioritises availability if the broker is
able to continue functioning with the remaining volumes. As an
administrator you'd have to notice the loss of the volume and restart the
broker manually.

HTH,

Tom


On Sun, Oct 11, 2020 at 8:40 PM Peter Bukowinski <pm...@gmail.com> wrote:

> Greeting, all.
>
> What is the expected behavior of a broker when it loses its only
> configured data log directory?
>
> I’m running kafka 2.2.1 in aws and we had an outage caused by the loss of
> an attached volume on one of the brokers. The broker did not relinquish
> leadership of its topic partitions when this occurred, so it caused an
> outage that was only mitigated after we restarted the broker, forcing
> leadership changes. I run kafka on bare metal with JBOD data dirs, and
> losing a disk in those clusters does not cause an outage.
>
> I’m curious what I should expect with only one storage location per broker.
>
> —
> Peter Bukowinski
>
>