You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Robert Wille <rw...@fold3.com> on 2015/01/08 14:12:21 UTC

Why does C* repeatedly compact the same tables over and over?

After bootstrapping a node, the node repeatedly compacts the same tables over and over, even though my cluster is completely idle. I’ve noticed the same behavior after extended periods of heavy writes. I realize that during bootstrapping (or extended periods of heavy writes) that compaction could get seriously behind, but once a table has been compacted, I don’t see the need to recompact the table dozens of more times.

Possibly related, I often see that OpsCenter reports that nodes have a large number of pending tasks, when Pending column of the Thread Pool Stats doesn’t reflect that.

Robert

Re: Why does C* repeatedly compact the same tables over and over?

Posted by mck <mi...@apache.org>.

> Are you using Leveled compaction strategy? 


And if you're using Date Tiered compaction strategy on a table that
isn't time-series data, for example deletes happen, you find it
compacting over and over.

~mck

Re: Why does C* repeatedly compact the same tables over and over?

Posted by Eric Stevens <mi...@gmail.com>.

Are you using Leveled compaction strategy?  If you fall behind on
compaction in leveled (and you will during bootstrap), by default Cassandra
will fall back to size tiered compaction in level 0.  This will cause
SSTables larger than sstable_size_in_mb, and those will be recompacted away
into level 1.  When level 1 gets full, those will be recompacted away into
level 2.  If level 2 gets full, into level 3 and so on.

Leveled compaction is more I/O intensive than size tiered compaction for
reasons such as this.  The same data can be compacted numerous times before
a node settles down.  The upside is that it puts a practical upper bound on
the number of sstables which can be involved in a read (not more than one
per level, not counting false positives from bloom filters).

Leveled compaction is essentially trading increased I/O at write time
(including and especially bootstrap or repair) for decreased I/O at read
time.

On Thu, Jan 8, 2015 at 6:12 AM, Robert Wille <rw...@fold3.com> wrote:

> After bootstrapping a node, the node repeatedly compacts the same tables
> over and over, even though my cluster is completely idle. I’ve noticed the
> same behavior after extended periods of heavy writes. I realize that during
> bootstrapping (or extended periods of heavy writes) that compaction could
> get seriously behind, but once a table has been compacted, I don’t see the
> need to recompact the table dozens of more times.
>
> Possibly related, I often see that OpsCenter reports that nodes have a
> large number of pending tasks, when Pending column of the Thread Pool Stats
> doesn’t reflect that.
>
> Robert
>
>