You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jiri Horky <ho...@avast.com> on 2013/11/01 20:47:56 UTC

Recompacting all sstables

Hi all

since we upgraded half of our Cassandra cluster to 2.0.0 and we use LCS,
we hit CASSANDRA-6284 bug. So basically all data in sstables created
after the upgrade are wrongly (non-uniformly within compaction levels)
distributed. This causes a huge overhead when compacting new sstables
(see the bug for the details).

After applying the patch, the distribution of the data within a level is
supposed to recover itself over time but we would like to not to wait a
month or so until it gets better.

So the question. What is the best way to recompact all the sstables so
the data in one sstables within a level would contain more or less the
right portion of the data, in other worlds, keys would be uniformly
distributed across sstables within a level? (e.g.: assumming total token
range for a node 1..10000, and given that L2 should contain 100
sstables, , all sstables within L2 should cover a range of ~100 tokens).

Based on documentation, I can only think of switching to SizeTiered
compaction, doing major compaction and then switching back to LCS.

Thanks in advance
Jiri Horky

Re: Recompacting all sstables

Posted by Jiri Horky <ho...@avast.com>.

Hi,

On 11/01/2013 09:15 PM, Robert Coli wrote:
> On Fri, Nov 1, 2013 at 12:47 PM, Jiri Horky <horky@avast.com
> <ma...@avast.com>> wrote:
>
>     since we upgraded half of our Cassandra cluster to 2.0.0 and we
>     use LCS,
>     we hit CASSANDRA-6284 bug.
>
>
> 1) Why upgrade a cluster to 2.0.0? Hopefully not a production cluster? [1]
I think you already guessed the answer :) It is a production cluster, we
needed some features (particularly, compare and set) only present in 2.0
because of the applications. Besides, somebody had to discover the
regression, right? :) Thanks for the link.
>
> 3) What do you mean by "upgraded half of our Cassandra cluster"? That
> is Not Supported and also Not Advised... for example, before the
> streaming change in 2.x line, a cluster in such a state may be unable
> to have nodes added, removed or replaced.
We are in the middle of the migration from 1.2.9 to 2.0 when we are also
upgrading our application which can only be run against 2.0 due  to
various technical details. It is rather hard to explain, but we hoped it
will last just for few days and it is definitely not the status we
wanted to keep. Since we hit the bug, we got stalled in the middle of
the migration.
>
>     So the question. What is the best way to recompact all the sstables so
>     the data in one sstables within a level would contain more or less the
>     right portion of the data
>
> ... 
>
>     Based on documentation, I can only think of switching to SizeTiered
>     compaction, doing major compaction and then switching back to LCS.
>
>
> That will work, though be aware of  the implication of CASSANDRA-6092
> [2]. Briefly, if the CF in question is not receiving write load, you
> will be unable to promote your One Big SSTable from L0 to L1. In that
> case, you might want to consider running sstable_split (and then
> restarting the node) in order to split your One Big SSTable into two
> or more smaller ones.
Hmm, thinking about it a bit more, I am unsure this will actually help.
If I understand things correctly, assuming uniform distribution of new
received keys in L0 (ensured by RandomPartitioner), in order for LCS to
work optimally, I need:

a) get uniform distribution of keys across sstables in one level, i.e.
in every level each sstable will cover more or less the same range of keys
b) sstables in each level should cover almost whole space of keys the
node is responsible for
c) propagate sstables to higher levels in uniform fashion, e.g.
round-robin or random (over time, the probability of choosing an
sstables as candidate should be the same for all sstables in the level)

By splitting the sorted Big SStable, I will get a bunch of
non-overlapping sstables. So I will surely achieve a). Point c) is fixed
by the patch. But what about b)? It probably depends on order of
compaction across levels, i.e. whether the compactions in various levels
are being run in parallel and interleaved or not. In case it compacts
all the tables from one level and only after that starts to compact
sstables in higher level etc, one will end up in very similar situation
as caused by the referenced bug (because of round robin fashion of
choosing candidates), i.e. having the biggest keys in L1 and smallest
keys in the highest level. So in this case, it would actually not help
at all.

Does it make sense or am I completely wrong? :)

BTW: Not very though-out idea, but wouldn't it actually be better to
select candidates completely randomly?

Cheers
Jiri Horky

>
> =Rob
>
> [1] https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
> [2] https://issues.apache.org/jira/browse/CASSANDRA-6092

Re: Recompacting all sstables

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Nov 1, 2013 at 12:47 PM, Jiri Horky <ho...@avast.com> wrote:

> since we upgraded half of our Cassandra cluster to 2.0.0 and we use LCS,
> we hit CASSANDRA-6284 bug.

1) Why upgrade a cluster to 2.0.0? Hopefully not a production cluster? [1]

2) CASSANDRA-6284 is ouch, thx for filing and patching!

3) What do you mean by "upgraded half of our Cassandra cluster"? That is
Not Supported and also Not Advised... for example, before the streaming
change in 2.x line, a cluster in such a state may be unable to have nodes
added, removed or replaced.

So the question. What is the best way to recompact all the sstables so
> the data in one sstables within a level would contain more or less the
> right portion of the data
>
...

> Based on documentation, I can only think of switching to SizeTiered
> compaction, doing major compaction and then switching back to LCS.
>

That will work, though be aware of  the implication of CASSANDRA-6092 [2].
Briefly, if the CF in question is not receiving write load, you will be
unable to promote your One Big SSTable from L0 to L1. In that case, you
might want to consider running sstable_split (and then restarting the node)
in order to split your One Big SSTable into two or more smaller ones.

=Rob

[1]
https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
[2] https://issues.apache.org/jira/browse/CASSANDRA-6092