You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2013/12/01 00:20:35 UTC

[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog

    [ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835894#comment-13835894 ] 

Benedict commented on CASSANDRA-3578:
-------------------------------------

bq. We don't assume that segments we recycle are the "right" size, but I don't see anywhere that we actually extend a segment past the standard length.
Are you referring to the fact that I always use buffer.capacity() instead of the constant defining their size? I just felt it was clearer whilst writing the code, but you're right we could switch it, and in particular in CLS.allocate(int) it might be a good idea to do so, though it is unlikely to have a measurable impact. It might do if we start pushing closer to a million writes/sec/node, so no harm future proofing.

bq. Looks like discardUnusedTail can be called before sync() in a couple ways. This means that we won't have room to write a sync marker at the end of the allocated space which could confuse replay.
This is actually the intended use case. The sync marker doesn't really occur at the end of a sync(), but at the beginning. When we sync we are always guaranteed to be directly preceeded by a preallocated marker, that has been zeroed out for us (the zeroing is not necessary, but allows us to log warnings if it is missing) - when we sync we attempt to allocate a new marker for the *next* sync, and we point the previous marker to the next one. If there's no room, we point to the end of the file, which is completely safe. Note that this can occur without calling discardUnusedTail().

It might clarify to call them headers, though I settled on marker because at some point my nomenclature felt like it was getting confused with header. I'm sure we could make it work though.

bq. Is there a race in removeCleanFromDirty where we compare the dirty position w/ the synced one, while someone dirties it with a higher position?
No, for this very reason (and simpliciy's sake) removeCleanFromDirty NOOPs until isFullySynced() holds, which only occurs once the segment is both fully used *and synced* to disk, i.e. is no longer being updated.


> Multithreaded commitlog
> -----------------------
>
>                 Key: CASSANDRA-3578
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>         Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, TestEA.java, latency.svg, oprate.svg, parallel_commit_log_2.patch
>
>
> Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation.
> Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable.
> (moved from CASSANDRA-622, which was getting a bit muddled.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)