You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Marcus Eriksson (JIRA)" <ji...@apache.org> on 2016/08/17 08:26:21 UTC

[jira] [Commented] (CASSANDRA-10540) RangeAwareCompaction

    [ https://issues.apache.org/jira/browse/CASSANDRA-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424092#comment-15424092 ] 

Marcus Eriksson commented on CASSANDRA-10540:
---------------------------------------------

These "benchmarks" have been run using cassandra-stress with [this|https://paste.fscking.us/display/jKc0X89MLFzHE9jhRqQ5xfvRHeU] yaml (only modified per run with the different compaction configurations). cassandra-stress generates 40GB of data and then it compacts those sstables using 8 threads. All tests were run with 256 tokens on my machine (2 ssds, 32GB ram):
{code}
./tools/bin/compaction-stress write -d /var/lib/cassandra -d /home/marcuse/cassandra -g 40 -p blogpost-range.yaml -t 4 -v 256
./tools/bin/compaction-stress compact -d /var/lib/cassandra -d /home/marcuse/cassandra -p blogpost-range.yaml -t 8 -v 256
{code}

First a base line - it takes about 7 minutes to compact 40GB of data with STCS, and we get a write amplification (compaction bytes written / size before) of about 1.46.
* 40GB + STCS
||size before||size after||compaction bytes written||time||number of compactions||
|42986704571|31305948786|62268272752|7:44|26|
|43017694284|31717603488|62800073327|7:04|26|
|42863193047|31244649872|64673778727|6:44|26|
|42962733336|31842455113|62985984309|6:14|26|
|43107421526|32526047125|61657717328|6:04|26|

With range aware compaction and a small min_range_sstable_size_in_mb we compact slower, about 2x the time, but the end result is smaller with a tiny bit smaller
write amplification (1.44). The reason for the longer time is that we need to do a lot more tiny compaction for each vnode. The reason for the smaller size after the compactions is that we are much more likely to compact overlapping sstables together as we compact within each vnode.
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 1
||size before||size after||compaction bytes written||time||number of compactions||
|42944940703|25352795435|61734295478|13:18|286|
|42896304174|25830662102|62049066195|15:45|287|
|43091495756|24811367911|61448601743|12:25|287|
|42961529234|26275106863|63118850488|13:17|284|
|42902111497|25749453764|61529524300|13:54|280|

As we increase the min_range_sstable_size_in_mb the time spent is reduced, the size after the compaction is increased and the number of compactions is reduced since we don't promote sstables to the per-vnode-strategies as quickly. With large enough min_range_sstable_size_in_mb the behaviour will be the same as STCS (+a small overhead for estimating the size of the next vnode range during compaction)
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 5
||size before||size after||compaction bytes written||time||number of compactions||
|43071111106|27586259306|62855258024|10:35|172|
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 10
||size before||size after||compaction bytes written||time||number of compactions||
|42998501805|28281735688|65469323764|9:45|109|
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 20
||size before||size after||compaction bytes written||time||number of compactions||
|42801860659|28934194973|66554340039|10:05|48|
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 50
||size before||size after||compaction bytes written||time||number of compactions||
|42881416448|30352758950|61223610818|7:25|27|

With LCS and a small sstable_size_in_mb we get a huge difference with range aware due to the amount of compactions we need to do to get the leveling without range aware compaction. With range aware, we get fewer levels in each vnode-range and that is much quicker to compact. Write amplification is about 2.0 with range aware and 3.4 without.
* 40GB + LCS + sstable_size_in_mb: 10 + range_aware + min_range_sstable_size_in_mb: 10
||size before||size after||compaction bytes written||time||number of compactions||
|43170254812|26511935628|87637370434|19:55|903|
|43015904097|26100197485|83125478305|14:45|854|
|43188886684|25651102691|87520409116|19:55|920|

* 40GB + LCS + sstable_size_in_mb: 10
||size before||size after||compaction bytes written||time||number of compactions||
|43099495889|23876144309|139000531662|28:25|3751|
|42811000078|24620085107|147722973544|30:35|3909|
|42879141849|24479485292|146194679395|30:46|3882|

If we bump the lcs sstable_size_in_mb to the default we get more similar results. Write amplification is smaller with range aware compaction but size after is also bigger. The reason for the bigger size after compaction has settled is that we run with a bigger min_range_sstable_size_in_mb which means more data will stay out of the per-range compaction strategies and this means it is only size tiered. This probably also explains the reduced write amplification - 2.0 with range aware and 2.3 without.
* 40GB + LCS + sstable_size_in_mb: 160 + range_aware + min_range_sstable_size_in_mb: 20
||size before||size after||compaction bytes written||time||number of compactions||
|42970784099|27044941599|85933586287|12:55|180|
|42953512565|26229232777|82158863291|11:36|155|
|43028281629|26025950993|86704157660|11:25|177|

* 40GB + LCS + sstable_size_in_mb: 160
||size before||size after||compaction bytes written||time||number of compactions||
|43120992697|24487560567|100347633105|12:25|151|
|42854926611|24466503628|102492898148|10:55|155|
|42919253642|24831918330|100902215961|12:15|161|


> RangeAwareCompaction
> --------------------
>
>                 Key: CASSANDRA-10540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10540
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>              Labels: compaction, lcs, vnodes
>             Fix For: 3.x
>
>
> Broken out from CASSANDRA-6696, we should split sstables based on ranges during compaction.
> Requirements;
> * dont create tiny sstables - keep them bunched together until a single vnode is big enough (configurable how big that is)
> * make it possible to run existing compaction strategies on the per-range sstables
> We should probably add a global compaction strategy parameter that states whether this should be enabled or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)