You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jonathan Colby <jo...@gmail.com> on 2011/06/22 18:35:57 UTC

simple question about merged SSTable sizes

The way compaction works,  "x" same-sized files are merged into a new SSTable.  This repeats itself and the SSTable get bigger and bigger.

So what is the upper limit??     If you are not deleting stuff fast enough, wouldn't the SSTable sizes grow indefinitely?

I ask because we have some rather large SSTable files (80-100 GB) and I'm starting to worry about future compactions.

Second, compacting such large files is an IO killer.    What can be tuned other than compaction_threshold to help optimize this and prevent the files from getting too big?

Thanks!

Re: simple question about merged SSTable sizes

Posted by Jonathan Colby <jo...@gmail.com>.
Thanks Ryan.  Done that : )   1 TB is the striped  size.    We might look into bigger disks for our blades.

On Jun 22, 2011, at 7:09 PM, Ryan King wrote:

> On Wed, Jun 22, 2011 at 10:00 AM, Jonathan Colby
> <jo...@gmail.com> wrote:
>> Thanks for the explanation.  I'm still a bit "skeptical".
>> 
>> So if you really needed to control the maximum size of compacted SSTables,  you need to delete data at such a rate that the new files created by compaction are less than or equal to the sum of the segments being merged.
>> 
>> Is anyone else running into really large compacted SSTables that gave you trouble with hard disk capacity?  How did you deal with it?
>> 
>> We have 1 TB disks in our nodes, but keeping in mind we need to have at least 50% for the worst case compaction scenario I'm still a bit worried that one day we're going to hit a dead end.
> 
> You should stripe those disks together with RAID-0.
> 
> -ryan


Re: simple question about merged SSTable sizes

Posted by Ryan King <ry...@twitter.com>.
On Wed, Jun 22, 2011 at 10:00 AM, Jonathan Colby
<jo...@gmail.com> wrote:
> Thanks for the explanation.  I'm still a bit "skeptical".
>
> So if you really needed to control the maximum size of compacted SSTables,  you need to delete data at such a rate that the new files created by compaction are less than or equal to the sum of the segments being merged.
>
> Is anyone else running into really large compacted SSTables that gave you trouble with hard disk capacity?  How did you deal with it?
>
> We have 1 TB disks in our nodes, but keeping in mind we need to have at least 50% for the worst case compaction scenario I'm still a bit worried that one day we're going to hit a dead end.

You should stripe those disks together with RAID-0.

-ryan

Re: simple question about merged SSTable sizes

Posted by Jonathan Colby <jo...@gmail.com>.
Thanks for the explanation.  I'm still a bit "skeptical".   

So if you really needed to control the maximum size of compacted SSTables,  you need to delete data at such a rate that the new files created by compaction are less than or equal to the sum of the segments being merged.

Is anyone else running into really large compacted SSTables that gave you trouble with hard disk capacity?  How did you deal with it?   

We have 1 TB disks in our nodes, but keeping in mind we need to have at least 50% for the worst case compaction scenario I'm still a bit worried that one day we're going to hit a dead end.



On Jun 22, 2011, at 6:50 PM, Eric tamme wrote:

> On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby
> <jo...@gmail.com> wrote:
>> 
>> The way compaction works,  "x" same-sized files are merged into a new SSTable.  This repeats itself and the SSTable get bigger and bigger.
>> 
>> So what is the upper limit??     If you are not deleting stuff fast enough, wouldn't the SSTable sizes grow indefinitely?
>> 
>> I ask because we have some rather large SSTable files (80-100 GB) and I'm starting to worry about future compactions.
>> 
>> Second, compacting such large files is an IO killer.    What can be tuned other than compaction_threshold to help optimize this and prevent the files from getting too big?
>> 
>> Thanks!
> 
> 
> The compaction is an iterative process that first compacts uncompacted
> SSTables and removes tombstones etc.  This compaction takes multiple
> files and merges them into one SSTable.  This process repeats until
> you have "compaction_threshold=X" number of similarly sized SSTables,
> then those will get re-compacted (merged) together.  The number and
> size of SSTables that you have as a result of a flush is tuned by max
> size, or records, or time.  Contrary to what you might believe, having
> fewer larger SSTables reduces IO compared to compacting many small
> SSTables.  Also the merge operation of previously compacted SSTables
> is relatively fast.
> 
> As far as I know, cassandra will continue compacting SSTables into an
> indefinitely larger sized SSTable.  The tunable side of things is for
> adjusting when to flush memtable to SSTable, and the number of
> SSTables of similar size that must be present to execute a compaction.
> 
> -Eric


Re: simple question about merged SSTable sizes

Posted by Eric tamme <et...@gmail.com>.
On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby
<jo...@gmail.com> wrote:
>
> The way compaction works,  "x" same-sized files are merged into a new SSTable.  This repeats itself and the SSTable get bigger and bigger.
>
> So what is the upper limit??     If you are not deleting stuff fast enough, wouldn't the SSTable sizes grow indefinitely?
>
> I ask because we have some rather large SSTable files (80-100 GB) and I'm starting to worry about future compactions.
>
> Second, compacting such large files is an IO killer.    What can be tuned other than compaction_threshold to help optimize this and prevent the files from getting too big?
>
> Thanks!


The compaction is an iterative process that first compacts uncompacted
SSTables and removes tombstones etc.  This compaction takes multiple
files and merges them into one SSTable.  This process repeats until
you have "compaction_threshold=X" number of similarly sized SSTables,
then those will get re-compacted (merged) together.  The number and
size of SSTables that you have as a result of a flush is tuned by max
size, or records, or time.  Contrary to what you might believe, having
fewer larger SSTables reduces IO compared to compacting many small
SSTables.  Also the merge operation of previously compacted SSTables
is relatively fast.

As far as I know, cassandra will continue compacting SSTables into an
indefinitely larger sized SSTable.  The tunable side of things is for
adjusting when to flush memtable to SSTable, and the number of
SSTables of similar size that must be present to execute a compaction.

-Eric

Re: simple question about merged SSTable sizes

Posted by Jonathan Colby <jo...@gmail.com>.
Awesome tip on TTL.  We can really use this as a catch-all to make sure all columns are purged based on time.  Fits our use-case good.  I forgot this feature existed.


On Jun 22, 2011, at 7:11 PM, Eric tamme wrote:

>>> Second, compacting such large files is an IO killer.    What can be tuned
>>> other than compaction_threshold to help optimize this and prevent the files
>>> from getting too big?
>>> 
>>> Thanks!
>> 
>> 
> 
> Just a personal implementation note - I make heavy use of column TTL,
> so I have very specifically tuned cassandra to having a pretty
> constant max disk usage based on my data insertion rate, the TTL, the
> memtable flush threshold, and min compaction threshold.  My data
> basically lives for 7 days and depending on where it is in the
> compaction cycle goes from 130 gigs per node up to 160gigs per node.
> 
> If setting TTL is an option for you, It is one way to auto purge data
> and keep overall size in check.
> 
> -Eric


Re: simple question about merged SSTable sizes

Posted by Eric tamme <et...@gmail.com>.
>> Second, compacting such large files is an IO killer.    What can be tuned
>> other than compaction_threshold to help optimize this and prevent the files
>> from getting too big?
>>
>> Thanks!
>
>

Just a personal implementation note - I make heavy use of column TTL,
so I have very specifically tuned cassandra to having a pretty
constant max disk usage based on my data insertion rate, the TTL, the
memtable flush threshold, and min compaction threshold.  My data
basically lives for 7 days and depending on where it is in the
compaction cycle goes from 130 gigs per node up to 160gigs per node.

If setting TTL is an option for you, It is one way to auto purge data
and keep overall size in check.

-Eric

Re: simple question about merged SSTable sizes

Posted by Edward Capriolo <ed...@gmail.com>.
I would not say avoid major compactions at all cost.

In the old days < 0.6.5 IIRC the only way to clear tombstones was a major
compaction. The nice thing about major compaction is if you have a situation
with 4 SSTables at 2GB each (that is total 8GB). Under normal write
conditions it could be more then gc_grace days before a deleted row gets
cleared from disk. It is hard to exactly say how long before the overwritten
rows will have the duplicates removed.

Even with bloom filters and indexes the fact remains that fewer smaller
tables search faster (truly a less is more scenario). If your force a major
compaction and night and maybe bring this table down to 4GB or 6GB.  It is
now less space on disk. This uses disk bandwidth initially, but once done
your page cache is much more effective. Since compacting this small table
does not take very long it is a win/win. For tables that need to stay small
I might major compact then every other day.

As you pointed out when your SStables get larger the situation becomes less
of a win/win, mostly because it takes much longer longer to compact, so they
get harder to schedule. At that point it is sometimes better to let
compaction run its "natural course" instead of forcing a major.


On Wed, Jun 22, 2011 at 1:03 PM, Jonathan Colby <jo...@gmail.com>wrote:

> So the take-away is try to avoid major compactions at all costs!   Thanks
> Ed and Eric.
>
> On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote:
>
> Yes, if you are not deleting fast enough they will grow. This is not
> specifically a cassandra problem /var/log/messages has the same issue.
>
> There is a JIRA ticket about having a maximum size for SSTables, so they
> always stay manageable
>
> You fall into a small trap when you force major compaction in that many
> small tables turn into one big one, from their it is hard to get back to
> many smaller ones again, the other side of the coin if you do not major
> compact you can end up with much more disk usage then live data (IE large %
> of disk is overwrites and tombstones).
>
> You can tune the compaction rate now so compaction does not kill your IO.
> Generally I think avoiding really large SSTables is the best way to do.
> Scale out and avoid very large SSTables/node if possible.
>
> Edward
>
>
> On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby <jonathan.colby@gmail.com
> > wrote:
>
>>
>> The way compaction works,  "x" same-sized files are merged into a new
>> SSTable.  This repeats itself and the SSTable get bigger and bigger.
>>
>> So what is the upper limit??     If you are not deleting stuff fast
>> enough, wouldn't the SSTable sizes grow indefinitely?
>>
>> I ask because we have some rather large SSTable files (80-100 GB) and I'm
>> starting to worry about future compactions.
>>
>> Second, compacting such large files is an IO killer.    What can be tuned
>> other than compaction_threshold to help optimize this and prevent the files
>> from getting too big?
>>
>> Thanks!
>
>
>
>

Re: simple question about merged SSTable sizes

Posted by Jonathan Colby <jo...@gmail.com>.
So the take-away is try to avoid major compactions at all costs!   Thanks Ed and Eric.

On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote:

> Yes, if you are not deleting fast enough they will grow. This is not specifically a cassandra problem /var/log/messages has the same issue. 
> 
> There is a JIRA ticket about having a maximum size for SSTables, so they always stay manageable
> 
> You fall into a small trap when you force major compaction in that many small tables turn into one big one, from their it is hard to get back to many smaller ones again, the other side of the coin if you do not major compact you can end up with much more disk usage then live data (IE large % of disk is overwrites and tombstones).
> 
> You can tune the compaction rate now so compaction does not kill your IO. Generally I think avoiding really large SSTables is the best way to do. Scale out and avoid very large SSTables/node if possible.
> 
> Edward
> 
> 
> On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby <jo...@gmail.com> wrote:
> 
> The way compaction works,  "x" same-sized files are merged into a new SSTable.  This repeats itself and the SSTable get bigger and bigger.
> 
> So what is the upper limit??     If you are not deleting stuff fast enough, wouldn't the SSTable sizes grow indefinitely?
> 
> I ask because we have some rather large SSTable files (80-100 GB) and I'm starting to worry about future compactions.
> 
> Second, compacting such large files is an IO killer.    What can be tuned other than compaction_threshold to help optimize this and prevent the files from getting too big?
> 
> Thanks!
> 


Re: simple question about merged SSTable sizes

Posted by Edward Capriolo <ed...@gmail.com>.
Yes, if you are not deleting fast enough they will grow. This is not
specifically a cassandra problem /var/log/messages has the same issue.

There is a JIRA ticket about having a maximum size for SSTables, so they
always stay manageable

You fall into a small trap when you force major compaction in that many
small tables turn into one big one, from their it is hard to get back to
many smaller ones again, the other side of the coin if you do not major
compact you can end up with much more disk usage then live data (IE large %
of disk is overwrites and tombstones).

You can tune the compaction rate now so compaction does not kill your IO.
Generally I think avoiding really large SSTables is the best way to do.
Scale out and avoid very large SSTables/node if possible.

Edward


On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby
<jo...@gmail.com>wrote:

>
> The way compaction works,  "x" same-sized files are merged into a new
> SSTable.  This repeats itself and the SSTable get bigger and bigger.
>
> So what is the upper limit??     If you are not deleting stuff fast enough,
> wouldn't the SSTable sizes grow indefinitely?
>
> I ask because we have some rather large SSTable files (80-100 GB) and I'm
> starting to worry about future compactions.
>
> Second, compacting such large files is an IO killer.    What can be tuned
> other than compaction_threshold to help optimize this and prevent the files
> from getting too big?
>
> Thanks!