You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jason Brown (JIRA)" <ji...@apache.org> on 2014/08/20 10:38:29 UTC

[jira] [Comment Edited] (CASSANDRA-6809) Compressed Commit Log

    [ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103622#comment-14103622 ] 

Jason Brown edited comment on CASSANDRA-6809 at 8/20/14 8:37 AM:
-----------------------------------------------------------------

bq. If we're dropping recycling, ... bottlenecking anything.

Reread this paragraph several times, now it makes sense. I wasn't thinking about the write perf, necessarily, but about having the file contiguous on disk. However, since the commit log files are, more or less, one-time use (meaning, we're not doing tons of random nor sequential I/O reads on them), I guess worrying about a large contiguous block on disk isn't necessary.

bq. Per-disk sync threads

I'm still not sure sync threads, in the manner initially described above, are totally necessary. If you are worried about the time for the mmap'ed buffers to flush in the same thread that's handling all the CL entry processing + any possible compression or encryption, a simple solution might be to have a sync thread that merely invokes the mmap buffer flush. Thus, the main CL thread(s) can continue processing the new entries and writing to the mmap buffer, but the sync thread eats the cost of the msync.


was (Author: jasobrown):
bq. If we're dropping recycling, ... bottlenecking anything.

Reread this paragraph several times, now it makes sense. I wasn't thinking about the write perf, necessarily, but about having the file contiguous on disk. However, since the commit log files are, more or less, one-time use (meaning, we're not doing tons of random nor sequential I/O reads on them), I guess worrying about a large contiguous block on disk isn't necessary.

bq. Per-disk sync threads

I'm still not sure sync threads are totally necessary. If you are worried about the time for the mmap'ed buffers to flush in the same thread that's handling all the CL entry processing + any possible compression or encryption, a simple solution might be to have a sync thread that merely invokes the mmap buffer flush. Thus, the main CL thread(s) can continue processing the new entries and writing to the mmap buffer, but the sync thread eats the cost of the msync.

> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>
> It seems an unnecessary oversight that we don't compress the commit log. Doing so should improve throughput, but some care will need to be taken to ensure we use as much of a segment as possible. I propose decoupling the writing of the records from the segments. Basically write into a (queue of) DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X MB written to the CL (where X is ordinarily CLS size), and then pack as many of the compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)