You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vlad <qa...@yahoo.com> on 2016/03/01 13:13:16 UTC

Commit log size vs memtable total size

Hi,there are following parameters in casansdra.yaml:
memtable_total_space_in_mb (1/4 of heap, e.g. 512MB)- Specifies the total memory used for all memtables on a node.
commitlog_total_space_in_mb (8GB) - Total space used for commit logs. If the used space goes above this value, Cassandra rounds up to the next nearest segment multiple and flushes memtables to disk for the  oldest commitlog segments, removing those log segments.
My question is what is meaning of commit log size being much more than memtables size.
>From manual: "Cassandra flushes memtables to disk, creating SSTables when the commit log space threshold has been exceeded.", but as far as I understand memtables are also flushed when too much memtables are used and anyway unflashed size can't be more than memtable size in memory.  So commit log can't keep more than memtable size, why is difference in commit log and memtables sizes?

Regards, Vlad



Re: Commit log size vs memtable total size

Posted by Vlad <qa...@yahoo.com>.
Tyler, thanks for explanation!
So commit segment can contain both data from flushed table A and non-flushed table B.How is it replayed on start up? Does C* skip portions belonging to table A that already were written to SSTable?
Regards, Vlad
 

    On Tuesday, March 1, 2016 11:37 PM, Tyler Hobbs <ty...@datastax.com> wrote:
 

 
On Tue, Mar 1, 2016 at 6:13 AM, Vlad <qa...@yahoo.com> wrote:

So commit log can't keep more than memtable size, why is difference in commit log and memtables sizes?

In order to purge a commitlog segment, all memtables that contain data from that segment must be flushed to disk.

Suppose you have two tables:
 - table A has extremely high throughput
 - table B has low throughput

Every commitlog segment will have a mixture of writes for table A and table B.  The memtable for table A will fill up rapidly and will be flushed frequently.  The memtable for table B will slowly filly up, and will not be flushed often.  Since table B's memtable isn't flushed, none of the commit log segments can purged/recycled.  Once the commitlog hits its size limit, it will force a flush of table B.

This behavior is good, because it allows table B to be flushed in large chunks instead of hundreds of tiny sstables.  If the commitlog space were equal to the memtable space, Cassandra would have to force a flush of table B's memtable approximately every time table A is flushed, despite being much smaller.

To summarize: if you use more than one table, it makes sense to have a larger space for commitlog segments.

-- 
Tyler Hobbs
DataStax


  

Re: Commit log size vs memtable total size

Posted by Jack Krupansky <ja...@gmail.com>.
It would be nice to get this info into the doc or at least a blog post.

-- Jack Krupansky

On Tue, Mar 1, 2016 at 4:37 PM, Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Tue, Mar 1, 2016 at 6:13 AM, Vlad <qa...@yahoo.com> wrote:
>
>> So commit log can't keep more than memtable size, why is difference in
>> commit log and memtables sizes?
>
>
> In order to purge a commitlog segment, *all* memtables that contain data
> from that segment must be flushed to disk.
>
> Suppose you have two tables:
>  - table A has extremely high throughput
>  - table B has low throughput
>
> Every commitlog segment will have a mixture of writes for table A and
> table B.  The memtable for table A will fill up rapidly and will be flushed
> frequently.  The memtable for table B will slowly filly up, and will not be
> flushed often.  Since table B's memtable isn't flushed, none of the commit
> log segments can purged/recycled.  Once the commitlog hits its size limit,
> it will force a flush of table B.
>
> This behavior is good, because it allows table B to be flushed in large
> chunks instead of hundreds of tiny sstables.  If the commitlog space were
> equal to the memtable space, Cassandra would have to force a flush of table
> B's memtable approximately every time table A is flushed, despite being
> much smaller.
>
> To summarize: if you use more than one table, it makes sense to have a
> larger space for commitlog segments.
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Commit log size vs memtable total size

Posted by Tyler Hobbs <ty...@datastax.com>.
On Tue, Mar 1, 2016 at 6:13 AM, Vlad <qa...@yahoo.com> wrote:

> So commit log can't keep more than memtable size, why is difference in
> commit log and memtables sizes?


In order to purge a commitlog segment, *all* memtables that contain data
from that segment must be flushed to disk.

Suppose you have two tables:
 - table A has extremely high throughput
 - table B has low throughput

Every commitlog segment will have a mixture of writes for table A and table
B.  The memtable for table A will fill up rapidly and will be flushed
frequently.  The memtable for table B will slowly filly up, and will not be
flushed often.  Since table B's memtable isn't flushed, none of the commit
log segments can purged/recycled.  Once the commitlog hits its size limit,
it will force a flush of table B.

This behavior is good, because it allows table B to be flushed in large
chunks instead of hundreds of tiny sstables.  If the commitlog space were
equal to the memtable space, Cassandra would have to force a flush of table
B's memtable approximately every time table A is flushed, despite being
much smaller.

To summarize: if you use more than one table, it makes sense to have a
larger space for commitlog segments.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>