You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Vassil Lunchev (JIRA)" <ji...@apache.org> on 2016/06/13 15:16:21 UTC

[jira] [Updated] (CASSANDRA-11997) Add a STCS compaction subproperty for DESC order bucketing

     [ https://issues.apache.org/jira/browse/CASSANDRA-11997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vassil Lunchev updated CASSANDRA-11997:
---------------------------------------
    Description: 
Looking at SizeTieredCompactionStrategy.java -> getBuckets().

This method is the only one using 3 of the 10 subproperties of STCS. It buckets the files by sorting them ASC and then grouping them using bucket_high and min_sstable_size.

getBuckets() practically doesn't use bucket_low at all. As long as it is between 0 and 1, the result doesn't depend on bucket_low. For example:

{code:java}
  public static void main(String[] args) {
    List<Pair<String, Long>> files = new ArrayList<>();
    files.add(new Pair<>("10.1G", 10944793422l));
    files.add(new Pair<>("9.4G", 10056333820l));
    files.add(new Pair<>("8.7G", 9266612562l));
    files.add(new Pair<>("4.0G", 4254518390l));
    files.add(new Pair<>("3.5G", 3729627496l));
    files.add(new Pair<>("2.5G", 2587912419l));
    files.add(new Pair<>("2.2G", 2304124647l));
    files.add(new Pair<>("1.4G", 1485000127l));
    files.add(new Pair<>("1.3G", 1340382610l));
    files.add(new Pair<>("456M", 477906537l));
    files.add(new Pair<>("451M", 472012692l));
    files.add(new Pair<>("53M", 54968524l));
    files.add(new Pair<>("18M", 18447540l));
    List<List<String>> buckets = getBuckets(files, 1.5, 0.5, 50l*1024*1024);
    System.out.println(buckets);
  }
{code}

The result is:
{code}
[[451M, 456M], [8.7G, 9.4G, 10.1G], [53M], [1.3G, 1.4G], [18M], [3.5G, 4.0G], [2.2G, 2.5G]]
{code}

You can test it with any value for bucketLow between 0 and 1, the result will be the same. And it contains no buckets that can be compacted.

However, if you reverse the initial sorting order to DESC (look at the files from largest to smallest) you get a completely different bucketing:

{code:java}
  return p2.right.compareTo(p1.right);
{code} 

{code:txt}
  [[456M, 451M], [4.0G, 3.5G, 2.5G, 2.2G], [10.1G, 9.4G, 8.7G], [53M], [1.4G, 1.3G], [18M]]
{code}

Now there is a bucket that can be compacted: [4.0G, 3.5G, 2.5G, 2.2G]
After that compaction, there will be one more bucket that can be compacted: [10.1G, 9.4G, 8.7G, <new>GB]

The sizes given here are real values, from a production load Cassandra deployment. We would like to have an aggressive STCS compaction that compacts as soon as reasonably possible. (I know about LCS, let's not include it in this ticket). However since the ordering in getBuckets is ASC, we cannot do much with configuration parameters. Specifically, using min_threshold = 3 is not helping - it all boils down to the ordering.

Probably bucket_high = 2 is an option, but then why does Cassandra offer a property that doesn't change anything (with a fixed ASC ordering, bucket_low is literally useless)

I would like to have the ability to configure DESC ordering. My suggestion is to add a new compaction subproperty for STCS, for example named bucket_iteration_order, which has ASC by default for backward compatibility, but it can be switched to DESC if an aggressive ordering is required.

  was:
Looking at SizeTieredCompactionStrategy.java -> getBuckets().

This method is the only one using 3 of the 10 subproperties of STCS. It buckets the files by sorting them ASC and then grouping them using bucket_high and min_sstable_size.

getBuckets() practically doesn't use bucket_low at all. As long as it is between 0 and 1, the result doesn't depend on bucket_low. For example:

{code:java}
  public static void main(String[] args) {
    List<Pair<String, Long>> files = new ArrayList<>();
    files.add(new Pair<>("10.1G", 10944793422l));
    files.add(new Pair<>("9.4G", 10056333820l));
    files.add(new Pair<>("8.7G", 9266612562l));
    files.add(new Pair<>("4.0G", 4254518390l));
    files.add(new Pair<>("3.5G", 3729627496l));
    files.add(new Pair<>("2.5G", 2587912419l));
    files.add(new Pair<>("2.2G", 2304124647l));
    files.add(new Pair<>("1.4G", 1485000127l));
    files.add(new Pair<>("1.3G", 1340382610l));
    files.add(new Pair<>("456M", 477906537l));
    files.add(new Pair<>("451M", 472012692l));
    files.add(new Pair<>("53M", 54968524l));
    files.add(new Pair<>("18M", 18447540l));
    List<List<String>> buckets = getBuckets(files, 1.5, 0.5, 50l*1024*1024);
    System.out.println(buckets);
  }
{code}

The result is:
{code}
[[451M, 456M], [8.7G, 9.4G, 10.1G], [53M], [1.3G, 1.4G], [18M], [3.5G, 4.0G], [2.2G, 2.5G]]
{code}

You can test it with any value for bucketLow between 0 and 1, the result will be the same. And it contains no buckets that can be compacted.

However, if you reverse the initial sorting order to DESC (look at the files from largest to smallest) you get a completely different bucketing:

{code:java}
  return p2.right.compareTo(p1.right);
{code} 

{code:txt}
  [[456M, 451M], [4.0G, 3.5G, 2.5G, 2.2G], [10.1G, 9.4G, 8.7G], [53M], [1.4G, 1.3G], [18M]]
{code}

Now there is a bucket that can be compacted: [4.0G, 3.5G, 2.5G, 2.2G]
After that compaction, there will be one more bucket that can be compacted: [10.1G, 9.4G, 8.7G, <new>GB]

The sizes given here are real values, from a production load Cassandra deployment. We would like to have an aggressive STCS compaction that compacts as soon as reasonably possible. (I know about LCS, let's not include it in this ticket). However since the ordering in getBuckets is ASC, we cannot do much with configuration parameters. Specifically, using min_threshold = 3 is not helping - it all boils down to the ordering.

Probably bucket_high = 2 is an option, but then why does Cassandra offer a property that doesn't change anything (with a fixed ASC ordering, bucket_low is literally useless)

I would like to have the ability to configure DESC ordering. My suggestion is to add a new compaction subproperty for STCS, for example named bucket_iteration_order, which has ASC by default for backward compatibility.


> Add a STCS compaction subproperty for DESC order bucketing
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-11997
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11997
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Vassil Lunchev
>
> Looking at SizeTieredCompactionStrategy.java -> getBuckets().
> This method is the only one using 3 of the 10 subproperties of STCS. It buckets the files by sorting them ASC and then grouping them using bucket_high and min_sstable_size.
> getBuckets() practically doesn't use bucket_low at all. As long as it is between 0 and 1, the result doesn't depend on bucket_low. For example:
> {code:java}
>   public static void main(String[] args) {
>     List<Pair<String, Long>> files = new ArrayList<>();
>     files.add(new Pair<>("10.1G", 10944793422l));
>     files.add(new Pair<>("9.4G", 10056333820l));
>     files.add(new Pair<>("8.7G", 9266612562l));
>     files.add(new Pair<>("4.0G", 4254518390l));
>     files.add(new Pair<>("3.5G", 3729627496l));
>     files.add(new Pair<>("2.5G", 2587912419l));
>     files.add(new Pair<>("2.2G", 2304124647l));
>     files.add(new Pair<>("1.4G", 1485000127l));
>     files.add(new Pair<>("1.3G", 1340382610l));
>     files.add(new Pair<>("456M", 477906537l));
>     files.add(new Pair<>("451M", 472012692l));
>     files.add(new Pair<>("53M", 54968524l));
>     files.add(new Pair<>("18M", 18447540l));
>     List<List<String>> buckets = getBuckets(files, 1.5, 0.5, 50l*1024*1024);
>     System.out.println(buckets);
>   }
> {code}
> The result is:
> {code}
> [[451M, 456M], [8.7G, 9.4G, 10.1G], [53M], [1.3G, 1.4G], [18M], [3.5G, 4.0G], [2.2G, 2.5G]]
> {code}
> You can test it with any value for bucketLow between 0 and 1, the result will be the same. And it contains no buckets that can be compacted.
> However, if you reverse the initial sorting order to DESC (look at the files from largest to smallest) you get a completely different bucketing:
> {code:java}
>   return p2.right.compareTo(p1.right);
> {code} 
> {code:txt}
>   [[456M, 451M], [4.0G, 3.5G, 2.5G, 2.2G], [10.1G, 9.4G, 8.7G], [53M], [1.4G, 1.3G], [18M]]
> {code}
> Now there is a bucket that can be compacted: [4.0G, 3.5G, 2.5G, 2.2G]
> After that compaction, there will be one more bucket that can be compacted: [10.1G, 9.4G, 8.7G, <new>GB]
> The sizes given here are real values, from a production load Cassandra deployment. We would like to have an aggressive STCS compaction that compacts as soon as reasonably possible. (I know about LCS, let's not include it in this ticket). However since the ordering in getBuckets is ASC, we cannot do much with configuration parameters. Specifically, using min_threshold = 3 is not helping - it all boils down to the ordering.
> Probably bucket_high = 2 is an option, but then why does Cassandra offer a property that doesn't change anything (with a fixed ASC ordering, bucket_low is literally useless)
> I would like to have the ability to configure DESC ordering. My suggestion is to add a new compaction subproperty for STCS, for example named bucket_iteration_order, which has ASC by default for backward compatibility, but it can be switched to DESC if an aggressive ordering is required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)