You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Bryan Baugher <bj...@gmail.com> on 2013/08/14 18:52:27 UTC

Broker crashes when no space left for log.dirs

Hi,

This is more of a thought question than a problem that I need support for.
I have trying out Kafka 0.8.0-beta1 with replication. For our user case we
want to try and guarantee that our consumers will see all messages even if
they have fallen greatly behind the broker/producer. For this reason I
wanted to know how the broker would react when the filesystem it writes its
messages to is full. What I found was that the broker crashes and cannot be
started until the filesystem has space again.

Is there or would it make sense to provide configuration allowing the
broker to reject writes in this case rather than crashing, electing a new
leader and attempting the write again? I can clearly understand the use
case that we don't want to 'lose' messages from the producer and I could
also see how lack of filesystem space could be considered a machine
failure, but with replication I would think if you are running out of space
on 1 broker you are likely running out of space on others.

Bryan

Re: Broker crashes when no space left for log.dirs

Posted by Joel Koshy <jj...@gmail.com>.
Interesting - I agree that it is unfortunate we crash in this
scenario. i.e., if the brokers don't crash at least the published
messages remain available for consumption. However, I think this
problem would effectively be subsumed by doing capacity planning ahead
of time and then setting up alerts when thresholds are close to your
limits although that assumes such alerts are reacted to quickly
enough.

Joel

On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bj...@gmail.com> wrote:
> Hi,
>
> This is more of a thought question than a problem that I need support for.
> I have trying out Kafka 0.8.0-beta1 with replication. For our user case we
> want to try and guarantee that our consumers will see all messages even if
> they have fallen greatly behind the broker/producer. For this reason I
> wanted to know how the broker would react when the filesystem it writes its
> messages to is full. What I found was that the broker crashes and cannot be
> started until the filesystem has space again.
>
> Is there or would it make sense to provide configuration allowing the
> broker to reject writes in this case rather than crashing, electing a new
> leader and attempting the write again? I can clearly understand the use
> case that we don't want to 'lose' messages from the producer and I could
> also see how lack of filesystem space could be considered a machine
> failure, but with replication I would think if you are running out of space
> on 1 broker you are likely running out of space on others.
>
> Bryan

Re: Broker crashes when no space left for log.dirs

Posted by Jason Rosenberg <jb...@squareup.com>.
Ok,

I didn't realize the write to disk was immediate (is that new in 0.8, with
requested acks enabled?).

I do think the OS will indeed reserve space in advance for data not yet
flushed to disk.  This seems to be true, at least, for xfs, which I have
more experience lately.

Jason


On Thu, Aug 15, 2013 at 11:30 AM, Jay Kreps <ja...@gmail.com> wrote:

> I am saying we always immediately write to the fs. So the question is is it
> possible with delayed allocation in ext4 to do a successful write that
> later cannot be flushed to disk due to running out of space? I don't know
> the answer to this, though I would hope it is not possible.
>
> Basically if our write to the fs succeeds and replicas acknowledge then we
> send back the ack.
>
> -Jay
>
>
> On Thu, Aug 15, 2013 at 11:12 AM, Jason Rosenberg <jb...@squareup.com>
> wrote:
>
> > Hmmm....I guess I was thinking that a broker could receive a message and
> > keep it in memory, before having disk space reserved for it's eventual
> > storage.  Are you saying that memory is not allocated for a message
> without
> > there already being disk space allocated for it?  In which case, there
> > should be no problem!
> >
> > Jason
> >
> >
> > On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > I don't think the filesystem will overcommit its disk space, but I'm
> > > actually not sure. I think this would only come into play on a fs like
> > ext4
> > > which does lazy block allocation in addition to lazy writing. But I
> think
> > > even ext4 is probably not allowed to hand out more disk space then it
> > has.
> > >
> > >
> > > On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <jb...@squareup.com>
> > > wrote:
> > >
> > > > A related question:  Will producers sending messages with
> > acknowledgment,
> > > > get a failed ack if a broker is out of disk space, or will messages
> get
> > > > buffered in memory successfully (resulting in a good ack, before
> > failing
> > > to
> > > > be written).
> > > >
> > > > It seems like it might be a good feature to have the broker
> auto-detect
> > > if
> > > > it's log dir is nearing full, so that there is some runway to
> > gracefully
> > > > shutdown, while still writing any in memory buffered messages.  It
> > could
> > > be
> > > > an optional threshold, like 98% full, or X Mb free, etc.
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <ja...@gmail.com>
> > wrote:
> > > >
> > > > > The crash is actually just a call to shutdown. We think this is the
> > > right
> > > > > thing to do, though I agree it is unintuitive. Here is why. When
> you
> > > get
> > > > an
> > > > > out of space error it is likely that the operating system did a
> > partial
> > > > > write, leaving you with a corrupt log. Furthermore it is possible
> > that
> > > > > space will free up at which point more writes on the log could
> > succeed
> > > so
> > > > > you wouldn't even know there was a problem but all your consumers
> > would
> > > > hit
> > > > > this data and choke.
> > > > >
> > > > > By "crashing" the node we ensure that recovery is run on the log to
> > > bring
> > > > > it into a consistent state.
> > > > >
> > > > > Theoretically we could leave the node up accepting reads but
> > rejecting
> > > > > writes while attempting to recover the log. But there are a bunch
> of
> > > > > problems with this. But this is very complex. Likely if you are out
> > of
> > > > > space you are just going to keep getting writes, and running out of
> > > space
> > > > > again and then running recovery and so on. This kind of crazy loop
> is
> > > > much
> > > > > worse then just needing to bring the node back up.
> > > > >
> > > > > Alternately we could leave the node up but go into some kind of
> > > > > write-rejecting mode forever. But this would still require that you
> > > > restart
> > > > > the node, and we would have to implement that write-rejecting node.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > -Jay
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bj...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This is more of a thought question than a problem that I need
> > support
> > > > > for.
> > > > > > I have trying out Kafka 0.8.0-beta1 with replication. For our
> user
> > > case
> > > > > we
> > > > > > want to try and guarantee that our consumers will see all
> messages
> > > even
> > > > > if
> > > > > > they have fallen greatly behind the broker/producer. For this
> > reason
> > > I
> > > > > > wanted to know how the broker would react when the filesystem it
> > > writes
> > > > > its
> > > > > > messages to is full. What I found was that the broker crashes and
> > > > cannot
> > > > > be
> > > > > > started until the filesystem has space again.
> > > > > >
> > > > > > Is there or would it make sense to provide configuration allowing
> > the
> > > > > > broker to reject writes in this case rather than crashing,
> > electing a
> > > > new
> > > > > > leader and attempting the write again? I can clearly understand
> the
> > > use
> > > > > > case that we don't want to 'lose' messages from the producer and
> I
> > > > could
> > > > > > also see how lack of filesystem space could be considered a
> machine
> > > > > > failure, but with replication I would think if you are running
> out
> > of
> > > > > space
> > > > > > on 1 broker you are likely running out of space on others.
> > > > > >
> > > > > > Bryan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Broker crashes when no space left for log.dirs

Posted by Jay Kreps <ja...@gmail.com>.
I am saying we always immediately write to the fs. So the question is is it
possible with delayed allocation in ext4 to do a successful write that
later cannot be flushed to disk due to running out of space? I don't know
the answer to this, though I would hope it is not possible.

Basically if our write to the fs succeeds and replicas acknowledge then we
send back the ack.

-Jay


On Thu, Aug 15, 2013 at 11:12 AM, Jason Rosenberg <jb...@squareup.com> wrote:

> Hmmm....I guess I was thinking that a broker could receive a message and
> keep it in memory, before having disk space reserved for it's eventual
> storage.  Are you saying that memory is not allocated for a message without
> there already being disk space allocated for it?  In which case, there
> should be no problem!
>
> Jason
>
>
> On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > I don't think the filesystem will overcommit its disk space, but I'm
> > actually not sure. I think this would only come into play on a fs like
> ext4
> > which does lazy block allocation in addition to lazy writing. But I think
> > even ext4 is probably not allowed to hand out more disk space then it
> has.
> >
> >
> > On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <jb...@squareup.com>
> > wrote:
> >
> > > A related question:  Will producers sending messages with
> acknowledgment,
> > > get a failed ack if a broker is out of disk space, or will messages get
> > > buffered in memory successfully (resulting in a good ack, before
> failing
> > to
> > > be written).
> > >
> > > It seems like it might be a good feature to have the broker auto-detect
> > if
> > > it's log dir is nearing full, so that there is some runway to
> gracefully
> > > shutdown, while still writing any in memory buffered messages.  It
> could
> > be
> > > an optional threshold, like 98% full, or X Mb free, etc.
> > >
> > > Jason
> > >
> > >
> > > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> > >
> > > > The crash is actually just a call to shutdown. We think this is the
> > right
> > > > thing to do, though I agree it is unintuitive. Here is why. When you
> > get
> > > an
> > > > out of space error it is likely that the operating system did a
> partial
> > > > write, leaving you with a corrupt log. Furthermore it is possible
> that
> > > > space will free up at which point more writes on the log could
> succeed
> > so
> > > > you wouldn't even know there was a problem but all your consumers
> would
> > > hit
> > > > this data and choke.
> > > >
> > > > By "crashing" the node we ensure that recovery is run on the log to
> > bring
> > > > it into a consistent state.
> > > >
> > > > Theoretically we could leave the node up accepting reads but
> rejecting
> > > > writes while attempting to recover the log. But there are a bunch of
> > > > problems with this. But this is very complex. Likely if you are out
> of
> > > > space you are just going to keep getting writes, and running out of
> > space
> > > > again and then running recovery and so on. This kind of crazy loop is
> > > much
> > > > worse then just needing to bring the node back up.
> > > >
> > > > Alternately we could leave the node up but go into some kind of
> > > > write-rejecting mode forever. But this would still require that you
> > > restart
> > > > the node, and we would have to implement that write-rejecting node.
> > > >
> > > > Cheers,
> > > >
> > > > -Jay
> > > >
> > > >
> > > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bj...@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > This is more of a thought question than a problem that I need
> support
> > > > for.
> > > > > I have trying out Kafka 0.8.0-beta1 with replication. For our user
> > case
> > > > we
> > > > > want to try and guarantee that our consumers will see all messages
> > even
> > > > if
> > > > > they have fallen greatly behind the broker/producer. For this
> reason
> > I
> > > > > wanted to know how the broker would react when the filesystem it
> > writes
> > > > its
> > > > > messages to is full. What I found was that the broker crashes and
> > > cannot
> > > > be
> > > > > started until the filesystem has space again.
> > > > >
> > > > > Is there or would it make sense to provide configuration allowing
> the
> > > > > broker to reject writes in this case rather than crashing,
> electing a
> > > new
> > > > > leader and attempting the write again? I can clearly understand the
> > use
> > > > > case that we don't want to 'lose' messages from the producer and I
> > > could
> > > > > also see how lack of filesystem space could be considered a machine
> > > > > failure, but with replication I would think if you are running out
> of
> > > > space
> > > > > on 1 broker you are likely running out of space on others.
> > > > >
> > > > > Bryan
> > > > >
> > > >
> > >
> >
>

Re: Broker crashes when no space left for log.dirs

Posted by Jason Rosenberg <jb...@squareup.com>.
Hmmm....I guess I was thinking that a broker could receive a message and
keep it in memory, before having disk space reserved for it's eventual
storage.  Are you saying that memory is not allocated for a message without
there already being disk space allocated for it?  In which case, there
should be no problem!

Jason


On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <ja...@gmail.com> wrote:

> I don't think the filesystem will overcommit its disk space, but I'm
> actually not sure. I think this would only come into play on a fs like ext4
> which does lazy block allocation in addition to lazy writing. But I think
> even ext4 is probably not allowed to hand out more disk space then it has.
>
>
> On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <jb...@squareup.com>
> wrote:
>
> > A related question:  Will producers sending messages with acknowledgment,
> > get a failed ack if a broker is out of disk space, or will messages get
> > buffered in memory successfully (resulting in a good ack, before failing
> to
> > be written).
> >
> > It seems like it might be a good feature to have the broker auto-detect
> if
> > it's log dir is nearing full, so that there is some runway to gracefully
> > shutdown, while still writing any in memory buffered messages.  It could
> be
> > an optional threshold, like 98% full, or X Mb free, etc.
> >
> > Jason
> >
> >
> > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > The crash is actually just a call to shutdown. We think this is the
> right
> > > thing to do, though I agree it is unintuitive. Here is why. When you
> get
> > an
> > > out of space error it is likely that the operating system did a partial
> > > write, leaving you with a corrupt log. Furthermore it is possible that
> > > space will free up at which point more writes on the log could succeed
> so
> > > you wouldn't even know there was a problem but all your consumers would
> > hit
> > > this data and choke.
> > >
> > > By "crashing" the node we ensure that recovery is run on the log to
> bring
> > > it into a consistent state.
> > >
> > > Theoretically we could leave the node up accepting reads but rejecting
> > > writes while attempting to recover the log. But there are a bunch of
> > > problems with this. But this is very complex. Likely if you are out of
> > > space you are just going to keep getting writes, and running out of
> space
> > > again and then running recovery and so on. This kind of crazy loop is
> > much
> > > worse then just needing to bring the node back up.
> > >
> > > Alternately we could leave the node up but go into some kind of
> > > write-rejecting mode forever. But this would still require that you
> > restart
> > > the node, and we would have to implement that write-rejecting node.
> > >
> > > Cheers,
> > >
> > > -Jay
> > >
> > >
> > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bj...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > This is more of a thought question than a problem that I need support
> > > for.
> > > > I have trying out Kafka 0.8.0-beta1 with replication. For our user
> case
> > > we
> > > > want to try and guarantee that our consumers will see all messages
> even
> > > if
> > > > they have fallen greatly behind the broker/producer. For this reason
> I
> > > > wanted to know how the broker would react when the filesystem it
> writes
> > > its
> > > > messages to is full. What I found was that the broker crashes and
> > cannot
> > > be
> > > > started until the filesystem has space again.
> > > >
> > > > Is there or would it make sense to provide configuration allowing the
> > > > broker to reject writes in this case rather than crashing, electing a
> > new
> > > > leader and attempting the write again? I can clearly understand the
> use
> > > > case that we don't want to 'lose' messages from the producer and I
> > could
> > > > also see how lack of filesystem space could be considered a machine
> > > > failure, but with replication I would think if you are running out of
> > > space
> > > > on 1 broker you are likely running out of space on others.
> > > >
> > > > Bryan
> > > >
> > >
> >
>

Re: Broker crashes when no space left for log.dirs

Posted by Jay Kreps <ja...@gmail.com>.
I don't think the filesystem will overcommit its disk space, but I'm
actually not sure. I think this would only come into play on a fs like ext4
which does lazy block allocation in addition to lazy writing. But I think
even ext4 is probably not allowed to hand out more disk space then it has.


On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <jb...@squareup.com> wrote:

> A related question:  Will producers sending messages with acknowledgment,
> get a failed ack if a broker is out of disk space, or will messages get
> buffered in memory successfully (resulting in a good ack, before failing to
> be written).
>
> It seems like it might be a good feature to have the broker auto-detect if
> it's log dir is nearing full, so that there is some runway to gracefully
> shutdown, while still writing any in memory buffered messages.  It could be
> an optional threshold, like 98% full, or X Mb free, etc.
>
> Jason
>
>
> On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > The crash is actually just a call to shutdown. We think this is the right
> > thing to do, though I agree it is unintuitive. Here is why. When you get
> an
> > out of space error it is likely that the operating system did a partial
> > write, leaving you with a corrupt log. Furthermore it is possible that
> > space will free up at which point more writes on the log could succeed so
> > you wouldn't even know there was a problem but all your consumers would
> hit
> > this data and choke.
> >
> > By "crashing" the node we ensure that recovery is run on the log to bring
> > it into a consistent state.
> >
> > Theoretically we could leave the node up accepting reads but rejecting
> > writes while attempting to recover the log. But there are a bunch of
> > problems with this. But this is very complex. Likely if you are out of
> > space you are just going to keep getting writes, and running out of space
> > again and then running recovery and so on. This kind of crazy loop is
> much
> > worse then just needing to bring the node back up.
> >
> > Alternately we could leave the node up but go into some kind of
> > write-rejecting mode forever. But this would still require that you
> restart
> > the node, and we would have to implement that write-rejecting node.
> >
> > Cheers,
> >
> > -Jay
> >
> >
> > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bj...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > This is more of a thought question than a problem that I need support
> > for.
> > > I have trying out Kafka 0.8.0-beta1 with replication. For our user case
> > we
> > > want to try and guarantee that our consumers will see all messages even
> > if
> > > they have fallen greatly behind the broker/producer. For this reason I
> > > wanted to know how the broker would react when the filesystem it writes
> > its
> > > messages to is full. What I found was that the broker crashes and
> cannot
> > be
> > > started until the filesystem has space again.
> > >
> > > Is there or would it make sense to provide configuration allowing the
> > > broker to reject writes in this case rather than crashing, electing a
> new
> > > leader and attempting the write again? I can clearly understand the use
> > > case that we don't want to 'lose' messages from the producer and I
> could
> > > also see how lack of filesystem space could be considered a machine
> > > failure, but with replication I would think if you are running out of
> > space
> > > on 1 broker you are likely running out of space on others.
> > >
> > > Bryan
> > >
> >
>

Re: Broker crashes when no space left for log.dirs

Posted by Jason Rosenberg <jb...@squareup.com>.
A related question:  Will producers sending messages with acknowledgment,
get a failed ack if a broker is out of disk space, or will messages get
buffered in memory successfully (resulting in a good ack, before failing to
be written).

It seems like it might be a good feature to have the broker auto-detect if
it's log dir is nearing full, so that there is some runway to gracefully
shutdown, while still writing any in memory buffered messages.  It could be
an optional threshold, like 98% full, or X Mb free, etc.

Jason


On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <ja...@gmail.com> wrote:

> The crash is actually just a call to shutdown. We think this is the right
> thing to do, though I agree it is unintuitive. Here is why. When you get an
> out of space error it is likely that the operating system did a partial
> write, leaving you with a corrupt log. Furthermore it is possible that
> space will free up at which point more writes on the log could succeed so
> you wouldn't even know there was a problem but all your consumers would hit
> this data and choke.
>
> By "crashing" the node we ensure that recovery is run on the log to bring
> it into a consistent state.
>
> Theoretically we could leave the node up accepting reads but rejecting
> writes while attempting to recover the log. But there are a bunch of
> problems with this. But this is very complex. Likely if you are out of
> space you are just going to keep getting writes, and running out of space
> again and then running recovery and so on. This kind of crazy loop is much
> worse then just needing to bring the node back up.
>
> Alternately we could leave the node up but go into some kind of
> write-rejecting mode forever. But this would still require that you restart
> the node, and we would have to implement that write-rejecting node.
>
> Cheers,
>
> -Jay
>
>
> On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bj...@gmail.com> wrote:
>
> > Hi,
> >
> > This is more of a thought question than a problem that I need support
> for.
> > I have trying out Kafka 0.8.0-beta1 with replication. For our user case
> we
> > want to try and guarantee that our consumers will see all messages even
> if
> > they have fallen greatly behind the broker/producer. For this reason I
> > wanted to know how the broker would react when the filesystem it writes
> its
> > messages to is full. What I found was that the broker crashes and cannot
> be
> > started until the filesystem has space again.
> >
> > Is there or would it make sense to provide configuration allowing the
> > broker to reject writes in this case rather than crashing, electing a new
> > leader and attempting the write again? I can clearly understand the use
> > case that we don't want to 'lose' messages from the producer and I could
> > also see how lack of filesystem space could be considered a machine
> > failure, but with replication I would think if you are running out of
> space
> > on 1 broker you are likely running out of space on others.
> >
> > Bryan
> >
>

Re: Broker crashes when no space left for log.dirs

Posted by Jay Kreps <ja...@gmail.com>.
The crash is actually just a call to shutdown. We think this is the right
thing to do, though I agree it is unintuitive. Here is why. When you get an
out of space error it is likely that the operating system did a partial
write, leaving you with a corrupt log. Furthermore it is possible that
space will free up at which point more writes on the log could succeed so
you wouldn't even know there was a problem but all your consumers would hit
this data and choke.

By "crashing" the node we ensure that recovery is run on the log to bring
it into a consistent state.

Theoretically we could leave the node up accepting reads but rejecting
writes while attempting to recover the log. But there are a bunch of
problems with this. But this is very complex. Likely if you are out of
space you are just going to keep getting writes, and running out of space
again and then running recovery and so on. This kind of crazy loop is much
worse then just needing to bring the node back up.

Alternately we could leave the node up but go into some kind of
write-rejecting mode forever. But this would still require that you restart
the node, and we would have to implement that write-rejecting node.

Cheers,

-Jay


On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bj...@gmail.com> wrote:

> Hi,
>
> This is more of a thought question than a problem that I need support for.
> I have trying out Kafka 0.8.0-beta1 with replication. For our user case we
> want to try and guarantee that our consumers will see all messages even if
> they have fallen greatly behind the broker/producer. For this reason I
> wanted to know how the broker would react when the filesystem it writes its
> messages to is full. What I found was that the broker crashes and cannot be
> started until the filesystem has space again.
>
> Is there or would it make sense to provide configuration allowing the
> broker to reject writes in this case rather than crashing, electing a new
> leader and attempting the write again? I can clearly understand the use
> case that we don't want to 'lose' messages from the producer and I could
> also see how lack of filesystem space could be considered a machine
> failure, but with replication I would think if you are running out of space
> on 1 broker you are likely running out of space on others.
>
> Bryan
>