You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/01/03 21:40:52 UTC

[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock

    [ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861886#comment-13861886 ] 

Benedict commented on CASSANDRA-5549:
-------------------------------------

I have a patch available for this [here|https://github.com/belliottsmith/cassandra/tree/only-5549]

I've been a little reticent to post it, as it's a bit of a monster of a patch, but I think I've now done my best to keep it well commented and mostly limit unnecessary changes. There are some changes that may appear over engineered for their current use, but I am using these in a continuation of this patch for off-heap memtables. I'll describe some of these below, but unpicking still useful changes seemed wasteful. If they get in the way of review we can revisit that decision.

There are several main areas of updates:

1) Removal of switchLock itself: The main work here is actually in the OpOrdering synchronisation class. This class explains itself, so I won't go into detail here, but provides an easy mechanism for ensuring we can coordinate our updates to Memtables so that we know what CL position they contain data to, and to know when the memtable is safe to be written to disk. The actual flushing of the memtable has been refactored a little also, to keep ordering guarantees.

2) Allocators and Memory Management: by removing the switch lock, we get rid of our ability to control heap growth by row mutations. To fix this, I've created the concept of a PoolAllocator, with associated Pool that has fixed memory limits. Any allocation requires the pool to allot room from its limit to the allocator (this is dealt with by MemoryTracker and MemoryOwner). This required a lot of minor modifications all over the place, to make measurement of object sizes at modification time cheap and accurate. Mostly I've achieved this by modifying jamm - a new branch is [here|https://github.com/belliottsmith/jamm/tree/guess] so that it will always give us a useful answer. Wherever we used to be using ObjectSizes adhoc in a class (generally incorrectly it turns out, not unsurprisingly as the API isn't obvious) I now *always* call measure() on an instance of the object and store that in a static field, and use simpler methods for any dynamic space use.

Worth noting: I've renamed IMeasureableMemory.memorySize() to excessHeapSize(), and I've modified (where applicable) its value to only count data we wouldn't otherwise be storing. This only makes a difference in a few places, but I think is an important distinction.

This change also makes any limit on flush queue size irrelevant, so the metric we use for controlling flushing is instead a ratio of in-use-memory to memory-limit, ignoring any already flushing data, which once breached will trigger a flush of the largest CFS.

3) Some concurrency primitives: NonBlockingQueue (and related classes) and WaitQueue. NonBlockingQueue is used more extensively in the off heap changes, but I leave it in here because it improves WaitQueue a lot, and we rely on WaitQueue much more with the proliferation of the OpOrdering operations. It helps us move much closer to completely non-blocking read/write operations also. We also use it to get rid of the Thread.yield() in SlabAllocator. I've aimed to keep NBQ as simple as possible.

4) CommitLog has been updated to use OpOrdering, and also includes a bug fix. I considered splitting this into a separate ticket, but it's such a tiny proportion of the overall changes I'm not sure it warrants it. The bug fix we may want to split out if this takes a while to go through.


> Remove Table.switchLock
> -----------------------
>
>                 Key: CASSANDRA-5549
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1
>
>         Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png
>
>
> As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write path.  ReentrantReadWriteLock is not lightweight, even if there is no contention per se between readers and writers of the lock (in Cassandra, memtable updates and switches).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)