You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Michaël Figuière (Commented JIRA)" <ji...@apache.org> on 2012/01/30 02:15:10 UTC
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog

    [ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195902#comment-13195902 ] 

Michaël Figuière commented on CASSANDRA-3578:
---------------------------------------------

I propose a different approach than Piotr in this patch. In this implementation, we only have one thread to handle syncs, all the processing, that is serialization, CRC and copying the RM into the mmap segment is handled directly in the writer threads. These threads exchange data with the syncer thread in a non blocking way, thus the ExecutorService abstraction has been replaced by a lighter structure.
Several components of the CL presented some challenges to implement in such a manner:


*CL Segment switch*

Switching CL segment when it's full isn't straightforward without locks. Here we use a boolean mark the is atomically CASed by a writer thread giving him the responsibility for performing the switch. If the mark can't be grabbed, the thread is waiting on a condition which is later reused using stamps to avoid any ABA problem.


*Batch CL*

The Batch CL strategy is considered as a safer mode for Cassandra as it guarantee the client that the RM is synced on disk before answering. Making the CL multithreaded, we must ensure that we don't acknowledge a RM that is synced on disk but preceded by an unsynced RM in the CL Segment as it would make the replaying of the RM impossible. For this reason, we track the state of each RM processing, and mark as synced any continuous set of RM fully written when the sync() call is executed.

Avoiding any blocking queue, we still need a way to put the writer threads on hold while the sync is being ensured. LockSupport.park()/unpark() provides a nice way the do it without relying on any coarse grain synchronization and avoiding any condition reuse/renewing issue.


*Periodic CL*

The Periodic CL's challenge is mostly around the throttling of the writers as here again we don't use any synchronized queue to reduce contention. Actually here we just need "half a blocking queue" as nothing is really added or consumed. For this reason, here we just use an atomic counter and a empty/full condition couple. Here again, a pool of conditions and a stamp are used to avoid the ABA problem.

 
*End of Segment marker*

Another point is that this implementation don't use any End of Segment marker. As we now have several concurrent writers, it's not possible anymore to write temporary marker after an entry. That mean that the recently committed code that fix CASSANDRA-3615 is obviously not included in this patch.

Nevertheless, a mechanism to avoid unwanted replay of entry from recycled segment is still required. I haven't included it in the patch as I think it's a design choice that need to be debated but that seem straightforward to implement. The options I can see are the following:
- Fill CL segment file with 0 on recycling. Doing so avoid any problem but will typically require a several second write on recycling that will lead to write latency hiccup.
- Include segment id in every entry. This avoid any problem as well but increase the entry size by 8 bytes which has a cost but isn't a drama and can't be considered as spreading the cost of the previous option over the entire CL writing.
- Salting the two checksums included in the entry with the segment id. Doing so lowers the probability of any unwanted replay to happen to a level that seems fairly acceptable. The advantage of this solution is that its performance cost is null.



Finally, here are some noteworthy observations:
* Here the writer thread WAITS for the processing to complete. Compared to a _push-on-queue-and-forget_ approach, this slightly increases write latency when using the Periodic CL (the Batch CL still being synchronous) especially for large RMs. Nevertheless, in a highly loaded server, the next writes waiting to be executed would have to wait anyway for their thread to be scheduled, thus the latency cost might eventually be paid. Increasing the number of writer thread should help to increase the insensitiveness of the small RMs to the large RMs.
* If extensive benchmarks tend to show that the previous point is an issue, there's some room to make this Periodic CL asynchronous with the writer threads.
* To reduce as much as possible the contention on the atomic states that can be modified several time by each thread, some naughty packing of several states within a single AtomicLong is used as it decreases the likeliness of an extra spin to happen compare to a more classical AtomicReference approach to non-blocking synchronization. The downside is code complexity, thus I think AtomicReference still stay an option to make the code more readable and maintainable. 
* Actually for now to ensure the required throttling of incoming RM we use a constant function with a fixed threshold of unsynced mutation. But we now have the tools to easily make the function more complex, like making it non constant and including some relation to the size of the mutations for instance.

                
> Multithreaded commitlog
> -----------------------
>
>                 Key: CASSANDRA-3578
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Priority: Minor
>         Attachments: parallel_commit_log_2.patch
>
>
> Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation.
> Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable.
> (moved from CASSANDRA-622, which was getting a bit muddled.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira