You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/02/18 23:25:21 UTC
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

    [ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904708#comment-13904708 ] 

Benedict commented on CASSANDRA-6694:
-------------------------------------

Initial patch for this is available [here|https://github.com/belliottsmith/cassandra/tree/offheap2]

The basic idea is that we have made Cell and DecoratedKey both interfaces, and we have "buffer" and "native" implementations. The native implementations squash the implementation of CellName into the same object, so that we can avoid any allocation overhead, and so we don't need to allocate a new object every time we read the name. As a result we have had to go a little anti-OOP; DecoratedKey and *Cell are now interfaces, with static implementation "modules", the methods of which are invoked by each implementation with themselves as the first parameter. This isn't super pretty, but it isn't super ugly either. The ugliest thing here is that I flatten the logic from db.composites all into NativeCell, but it turns out this is actually really not very hard; they behave _mostly_ the same.

I've also quite widely refactored the stuff introduced in CASSANDRA-5549 and CASSANDRA-6689: the PoolAllocator in utils.memory now only defines methods for managing the memory use of the pool; what it *means* to "allocate" is now left to its descendants to define. We now split them up into two camps: ByteBufferPool and NativePool (renamed from OffHeapPool). The formers' allocators implement ByteBufferAllocator (formerly AbstractAllocator), whereas the NativeAllocator allocates NativeAllocations. With me still? These NativeAllocations form the basis for any objects stored off-heap.

Anyway, these PoolAllocators are now utilised by *Data*Allocators in the db package tree; these are comparatively simple, and I wanted to keep the guts of the memory management in utils.memory. These DataAllocator instances simply know how to clone DecoratedKey and Cell instances, and also how to tidy up any unused references.

Some notes:
- This (and CASSANDRA-6689) have negative implications for Thrift at the moment, as I have to copy any data on-heap in order to return to thrift. Unfortunately this can only easily be rectified by modifying thrift so that we have method calls we can override in the worker tasks, that are invoked when starting and finishing the servicing of a request.
- I've settled for a 24-byte object, as I really needed to keep some extra information on-heap. We can definitely tighten this in the future, but I think it probably isn't worth doing at this stage.
- As things stand, without CASSANDRA-6697 we allocate a lot of ByteBuffers temporarily, i.e. whenever we read the constituents of the name or the contents of the cell

It would be good to get some testing resources allocated to the first and last points to see if we should be trying to fix it. We should decide if we want CASSANDRA-6697 preferably before we go live with a final 2.1 release. What we really need to do is run a number of tests against schemas with a lot of composite columns, and see what effect there is on garbage collections, and latency metrics.

> Slightly More Off-Heap Memtables
> --------------------------------
>
>                 Key: CASSANDRA-6694
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)