You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/02/12 00:42:58 UTC

[jira] Created: (CASSANDRA-2156) Compaction Throttling

Compaction Throttling
---------------------

                 Key: CASSANDRA-2156
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Stu Hood
             Fix For: 0.8


Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").

Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.

For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.

Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment:     (was: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt)

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2156:
--------------------------------------

    Reviewer: slebresne
    Assignee: Stu Hood

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment:     (was: for-0.6-0002-Make-compaction-throttling-configurable.txt)

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013867#comment-13013867 ] 

Stu Hood edited comment on CASSANDRA-2156 at 3/31/11 10:08 AM:
---------------------------------------------------------------

1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice (imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled... for someone who really wants to disable it, setting it to a high enough value that it never kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}} would probably be better to do in another ticket?

----
Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains {{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1 / MIN_THRESHOLD}} times as often (see the math in the description). This means that the number of buckets influences how fast you need to compact, and that each additional bucket adds a linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.

As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 more than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8 - 1). So, ideally we would be able to compact those 8 units in _exactly_ the time it takes for 1 more unit to be flushed, and for the compactions of the other buckets to trickle up and refill the largest bucket. Pheew?

CASSANDRA-2171 will allow us to calculate the flush rate, which we can then multiply by the count of buckets (note... one tiny missing piece is determining how many buckets are "empty": an empty bucket is not created in the current approach).

----
> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because every new bucket is a chance to compact. It's a fundamental question, but I haven't thought about it... sorry.

      was (Author: stuhood):
    1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice (imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled... for someone who really wants to disable it, setting it to a high enough value that it never kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}} would probably be better to do in another thread?

----
Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains {{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1 / MIN_THRESHOLD}} times as often (see the math in the description). This means that the number of buckets influences how fast you need to compact, and that each additional bucket adds a linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.

As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 more than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8 - 1). So, ideally we would be able to compact those 8 units in _exactly_ the time it takes for 1 more unit to be flushed, and for the compactions of the other buckets to trickle up and refill the largest bucket. Pheew?

CASSANDRA-2171 will allow us to calculate the flush rate, which we can then multiply by the count of buckets (note... one tiny missing piece is determining how many buckets are "empty": an empty bucket is not created in the current approach).

----
> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because every new bucket is a chance to compact. It's a fundamental question, but I haven't thought about it... sorry.
  
> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment:     (was: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt)

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018255#comment-13018255 ] 

Hudson commented on CASSANDRA-2156:
-----------------------------------

Integrated in Cassandra #848 (See [https://hudson.apache.org/hudson/job/Cassandra/848/])
    Compaction throttling
patch by stuhood; reviewed by slebresne for CASSANDRA-2156


> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0007-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993827#comment-12993827 ] 

Stu Hood commented on CASSANDRA-2156:
-------------------------------------

Actually, this throttling probably needs to occur on the read side to properly account for cases with lots of updates... on the write side, we might have compacted the data down by 32x for example.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993828#comment-12993828 ] 

Stu Hood commented on CASSANDRA-2156:
-------------------------------------

> I'm pretty uncomfortable committing changes to 0.6 compaction at this point.
Oh yea... I mostly posted this particular version for rcoli's benefit: it should go into trunk, and could probably be slipped into 0.7, depending on what the final patch looks like.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: for-0.6-0002-Make-compaction-throttling-configurable.txt
                for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt

Attaching a patch for 0.6 that implements compaction throttling for a fixed value.

Since it is relatively easy to automatically figure out the proper throughput, we might want to make throttling automatic rather than exposing a config option.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment:     (was: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt)

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Comment: was deleted

(was: Actually, this throttling probably needs to occur on the read side to properly account for cases with lots of updates... on the write side, we might have compacted the data down by 32x for example.

EDIT: Oops... it is already read throttled.)

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt

* Fixed bytes-per-ms calculation
* Allow disabling compaction throttling
* Add JMX method to adjust throttling
* Added div-by-zero protection for targetBytesPerMS
* excessBytes being negative doesn't matter because we check for a positive value before sleeping

Still applies atop #2191.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment:     (was: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt)

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: 0007-Throttle-total-compaction-to-a-configurable-throughput.txt

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0007-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011068#comment-13011068 ] 

Aaron Morton commented on CASSANDRA-2156:
-----------------------------------------

Bunch of questions again as I'm trying to understand some more of whats going on. 

# if compaction_throughput_kb_per_sec is always going be megabytes should it change to MB  
# Not in your changes but CompactionIterator.close() will stop closing files after the first one fails. 
# I'm guessing most of the time the actual and target throughput will not match. How about moving the INFO message in throttle() to the DEBUG level? Or only logging at INFO is the thread will sleep?     
# Should there be a config setting to turn throttling on and off? Could setting compaction_throughput_kb_per_sec to 0 disable it ?  
# For my understanding: Is there a case for making the sampling interval in CompactionIterator.getReduce() configurable? Would we want different settings for fewer big rows vs many small rows. e.g. two CFs where one is a secondary index for rows in the other, could be millions of cols in one an a few in another.


I dont understand the approach to deciding what value compaction_throughput_kb_per_sec should have. Can you add some more info and clarify if you are talking about the per CF buckets creating during Compaction?

Final question. Would it be better to have fewer parallel compactions where each compaction completes quickly, than more parallel compactions that take longer to complete. Assuming that once compaction has finished read performance and disk usage may improve. If so would limiting compaction by sizing the compaction thread pool be effective? (I guess the down side may be starvation for some CF's) 

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013867#comment-13013867 ] 

Stu Hood commented on CASSANDRA-2156:
-------------------------------------

1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice (imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled... for someone who really wants to disable it, setting it to a high enough value that it never kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}} would probably be better to do in another thread?

----
Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains {{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1 / MIN_THRESHOLD}} times as often (see the math in the description). This means that the number of buckets influences how fast you need to compact, and that each additional bucket adds a linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.

As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 less than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8 - 1). So, ideally we would be able to compact those 8 units in the time it _exactly_ the time it takes for 1 more unit to be flushed, and for the compactions of the other buckets to trickle up and refill the largest bucket. Pheew?

CASSANDRA-2171 will allow us to calculate the flush rate, which we can then multiply by the count of buckets (note... one tiny missing piece is determining how many buckets are "empty": an empty bucket is not created in the current approach).

----
> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because every new bucket is a chance to compact. It's a fundamental question, but I haven't thought about it... sorry.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017794#comment-13017794 ] 

Aaron Morton commented on CASSANDRA-2156:
-----------------------------------------

>From a discussion on the user list http://www.mail-archive.com/user@cassandra.apache.org/msg12027.html

CompactionManager.submitSSTableBuild() and submitIndexBuild() are used when receiving streams from other nodes. But they do not use the CompactionIterator() so are not covered by this ticket.

Want to create another ticket just for those tasks or reopen CASSANDRA-1882 and punt it to a future version?

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Issue Comment Edited: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993827#comment-12993827 ] 

Stu Hood edited comment on CASSANDRA-2156 at 2/13/11 1:05 AM:
--------------------------------------------------------------

Actually, this throttling probably needs to occur on the read side to properly account for cases with lots of updates... on the write side, we might have compacted the data down by 32x for example.

EDIT: Oops... it is already read throttled.

      was (Author: stuhood):
    Actually, this throttling probably needs to occur on the read side to properly account for cases with lots of updates... on the write side, we might have compacted the data down by 32x for example.
  
> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt

Rebased for trunk: still applies atop CASSANDRA-2191.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013867#comment-13013867 ] 

Stu Hood edited comment on CASSANDRA-2156 at 3/31/11 10:07 AM:
---------------------------------------------------------------

1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice (imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled... for someone who really wants to disable it, setting it to a high enough value that it never kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}} would probably be better to do in another thread?

----
Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains {{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1 / MIN_THRESHOLD}} times as often (see the math in the description). This means that the number of buckets influences how fast you need to compact, and that each additional bucket adds a linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.

As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 more than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8 - 1). So, ideally we would be able to compact those 8 units in _exactly_ the time it takes for 1 more unit to be flushed, and for the compactions of the other buckets to trickle up and refill the largest bucket. Pheew?

CASSANDRA-2171 will allow us to calculate the flush rate, which we can then multiply by the count of buckets (note... one tiny missing piece is determining how many buckets are "empty": an empty bucket is not created in the current approach).

----
> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because every new bucket is a chance to compact. It's a fundamental question, but I haven't thought about it... sorry.

      was (Author: stuhood):
    1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice (imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled... for someone who really wants to disable it, setting it to a high enough value that it never kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}} would probably be better to do in another thread?

----
Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains {{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1 / MIN_THRESHOLD}} times as often (see the math in the description). This means that the number of buckets influences how fast you need to compact, and that each additional bucket adds a linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.

As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 more than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8 - 1). So, ideally we would be able to compact those 8 units in the time it _exactly_ the time it takes for 1 more unit to be flushed, and for the compactions of the other buckets to trickle up and refill the largest bucket. Pheew?

CASSANDRA-2171 will allow us to calculate the flush rate, which we can then multiply by the count of buckets (note... one tiny missing piece is determining how many buckets are "empty": an empty bucket is not created in the current approach).

----
> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because every new bucket is a chance to compact. It's a fundamental question, but I haven't thought about it... sorry.
  
> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment:     (was: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt)

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2156) Compaction Throttling

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993810#comment-12993810 ] 

Jonathan Ellis commented on CASSANDRA-2156:
-------------------------------------------

Related: CASSANDRA-1882.

I'm pretty uncomfortable committing changes to 0.6 compaction at this point.

0.7 is (*looks furtively over his shoulder*) probably ok, if it defaults to off.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt

Attaching a patch for trunk that applies atop CASSANDRA-2191, and shares the total compaction throughput between all active compactions.

CASSANDRA-2171 is still valid, and we have a new guy on our team working on it, but it seemed appropriate to not hold back this patch too long. For now, a fixed rate can be specified via a config setting.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013867#comment-13013867 ] 

Stu Hood edited comment on CASSANDRA-2156 at 3/31/11 10:06 AM:
---------------------------------------------------------------

1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice (imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled... for someone who really wants to disable it, setting it to a high enough value that it never kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}} would probably be better to do in another thread?

----
Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains {{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1 / MIN_THRESHOLD}} times as often (see the math in the description). This means that the number of buckets influences how fast you need to compact, and that each additional bucket adds a linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.

As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 more than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8 - 1). So, ideally we would be able to compact those 8 units in the time it _exactly_ the time it takes for 1 more unit to be flushed, and for the compactions of the other buckets to trickle up and refill the largest bucket. Pheew?

CASSANDRA-2171 will allow us to calculate the flush rate, which we can then multiply by the count of buckets (note... one tiny missing piece is determining how many buckets are "empty": an empty bucket is not created in the current approach).

----
> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because every new bucket is a chance to compact. It's a fundamental question, but I haven't thought about it... sorry.

      was (Author: stuhood):
    1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice (imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled... for someone who really wants to disable it, setting it to a high enough value that it never kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}} would probably be better to do in another thread?

----
Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains {{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1 / MIN_THRESHOLD}} times as often (see the math in the description). This means that the number of buckets influences how fast you need to compact, and that each additional bucket adds a linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.

As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 less than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8 - 1). So, ideally we would be able to compact those 8 units in the time it _exactly_ the time it takes for 1 more unit to be flushed, and for the compactions of the other buckets to trickle up and refill the largest bucket. Pheew?

CASSANDRA-2171 will allow us to calculate the flush rate, which we can then multiply by the count of buckets (note... one tiny missing piece is determining how many buckets are "empty": an empty bucket is not created in the current approach).

----
> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because every new bucket is a chance to compact. It's a fundamental question, but I haven't thought about it... sorry.
  
> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: for-0.6-0002-Make-compaction-throttling-configurable.txt

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017949#comment-13017949 ] 

Stu Hood edited comment on CASSANDRA-2156 at 4/9/11 9:07 PM:
-------------------------------------------------------------

I'd prefer to tackle those in a future ticket... for one thing, I don't think it is clear cut whether we should throttle them.

EDIT: ... because I suspect that the actual file transfer causes more load.

      was (Author: stuhood):
    I'd prefer to tackle those in a future ticket... for one thing, I don't think it is clear cut whether we should throttle them.
  
> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0006-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015469#comment-13015469 ] 

Sylvain Lebresne commented on CASSANDRA-2156:
---------------------------------------------

I think this will quite a useful patch.

Dividing the total compaction rate by the number of active compaction to determine each given active compaction rate may be a bit coarse-grained in some situations, but it's also probably good enough and I'm fine letting that as further improvement if it happens that it needs to be improved.

Also, we may want ultimately to throttle cleanup compaction too and maybe have a specific rate for validation compaction. But I'm fine having it as another ticket.

A few comments:
 * A MB is 1024 * 1024 bytes, and a ms is 1000 seconds. I think the definition of CompactionIterator.THROTTLE_BYTES_PER_MS takes liberties with standard units :).
 * We should really allow 0 for the compaction rate to deactivate throttling (and that should really throttle() completely), if only because bugs exist.
 * To have compaction rate changeable live would be pretty cool and it's super easy (an AtomicInteger for THROTTLE_BYTES_PER_MS with some jmx call in CompactionManager to change it should be enough), so let's do it now.
 * In theory, there is a risk of division by 0 because targetBytesPerMs can be 0. Granted this is more than unlikely given that the minimum value for THROTTLE is 1024, but nevertheless, let's be on the safe side.
 * In the same idea, excessBytes can be negative. Pretty sure sleep just assumes that any negative number is 0, but it would be better to actually check for all those limit case.
 * I'd also be in favor of having the logging in changes of targetByteInMS at debug level. Because there'll be one message each time you start a compaction and n messages each time the number of active compaction change and we'll print them even though we doesn't throttle anything, so it will be noise for most people. Anyway, really no big deal.


> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: 0006-Throttle-total-compaction-to-a-configurable-throughput.txt

Rebased: ready for review.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0006-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017949#comment-13017949 ] 

Stu Hood commented on CASSANDRA-2156:
-------------------------------------

I'd prefer to tackle those in a future ticket... for one thing, I don't think it is clear cut whether we should throttle them.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

Posted by "Ryan King (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014298#comment-13014298 ] 

Ryan King commented on CASSANDRA-2156:
--------------------------------------

This has been a big improvement for us in production. It'd be nice to get more eyes on it for 0.8.

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment:     (was: 0006-Throttle-total-compaction-to-a-configurable-throughput.txt)

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.8
>
>         Attachments: for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2156) Compaction Throttling

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2156:
--------------------------------

    Attachment: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt

> Compaction Throttling
> ---------------------
>
>                 Key: CASSANDRA-2156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 0005-Throttle-total-compaction-to-a-configurable-throughput.txt, for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, for-0.6-0002-Make-compaction-throttling-configurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at: a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than 15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill. If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira