You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Erik Forsberg <fo...@opera.com> on 2014/09/20 09:11:05 UTC

Running out of disk at bootstrap in low-disk situation

Hi!

We have unfortunately managed to put ourselves in a situation where we are
really close to full disks on our existing 27 nodes.

We are now trying to add 15 more nodes, but running into problems with out
of disk space on the new nodes while joining.

We're using vnodes, on Cassandra 1.2.18 (yes, I know that's old, and I'll
upgrade as soon as I'm out of this problematic situation).

I've added all the 15 nodes, with some time inbetween - definitely more
than the 2-minute rule. But it seems like compaction is not keeping up with
the incoming data. Or at least that's my theory.

What are the recommended settings to avoid this problem? I have now set
compaction threshold to 0 for unlimited compaction bandwidth, hoping that
will help (will it?)

Will it help to lower the streaming throughput too? I'm unsure about the
latter since from observation it seems that compaction will not start until
it has finished streaming from a node. With 27 nodes sharing the incoming
bandwidth, all of them will take equally long time to finish and then the
compaction can occur. I guess I could limit streaming bandwidth on some of
the source nodes too. Or am I completely wrong here?

Other ideas most welcome.

Regards,
\EF

Re: Running out of disk at bootstrap in low-disk situation

Posted by Robert Coli <rc...@eventbrite.com>.
On Sat, Sep 20, 2014 at 12:11 AM, Erik Forsberg <fo...@opera.com> wrote:

> I've added all the 15 nodes, with some time inbetween - definitely more
> than the 2-minute rule. But it seems like compaction is not keeping up with
> the incoming data. Or at least that's my theory.
>

I personally would not combine vnodes and trying to add more than one node
at a time, at this time. I understand that you have a lot of nodes to add,
but this is potentially confounding the situation.

I conjecture that you are using level compaction. There is in your version
a pathological behavior during bootstrap where one ends up doing a lot of
compaction. I *think*, but am not sure, that the workaround is to use size
tiered compaction during bootstrap. I *believe* that is what the patch
upstream effectively does.

Probably unthrottling compaction will help, assuming you are not CPU or i/o
bound there.

#cassandra on freenode is probably a slightly better forum for interactive
discusson of detailed operational questions about production environments.

=Rob