You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/08/19 07:08:18 UTC
[jira] [Commented] (CASSANDRA-6726) Recycle CompressedRandomAccessReader/RandomAccessReader buffers independently of their owners, and move them off-heap when possible

    [ https://issues.apache.org/jira/browse/CASSANDRA-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101856#comment-14101856 ] 

Benedict commented on CASSANDRA-6726:
-------------------------------------

Thanks. Some comments:

* It looks like we're still recycling the bytebuffer we've allocated _along with the RAR_ into FileCacheService which is what we're trying to avoid - i.e. in close() we should recycle the bytebuffer whether or not we deallocate, and rewire it up next time we're returned
* I appreciate what you're doing with the logarithmic buckets and special pool classes, but we already have a morass of pooling classes and I'd prefer not to complicate that further unless necessary. Here I think we can do much less and achieve the same end: currently the block size is fixed (and this storage format is has an expiry date attached to it, so this is unlikely to change), so a ThreadLocal ArrayDeque containing buffers all of DEFAULT_BUFFER_SIZE should be sufficient, perhaps with a low ceiling on size and a shared fallback queue for when that one is exausted, so that we still function well on small numbers of huge workloads. Let's keep it all within FileCacheService since that's what currently manages the cache, although it's conflating concepts a bit
* To clarify this, we could remove the bufferSize parameter so it's always clearly DEFAULT_BUFFER_SIZE
* We should probably rename file_cache_size_in_mb to something more accurate like file_buffer_pool_size_in_mb
* We should probably introduce a new max_file_cache_handles property, and FileCacheService should pool files based on this property, and no longer pay attention to getTotalBufferSize()
* There are a few places you allocate non-tiny temporary arrays inside of methods. It would be preferable to use a ThreadLocal<byte[]> so they can be reused; we do this in a few places, so it might be nice to coalesce them all into a ByteBufferUtil method so we don't litter the whole codebase with this pattern (currently happens in PureJavaCrc32 and AbstractNativeCell). They can all use arrays of the same size - these other two use 256 vs. your 4096; IMO 256 or 512 is plenty for all of these scenarios (although feel free to benchmark the difference; we do have particularly large chunks of memory to checksum in the reader)
* Possibly ByteArrayCompressor.uncompress should also have its own ThreadLocal temporary array of arbitrary (max encountered) size limit instead of delegating to getArray, but this is not a big deal since we don't expect to exercise this codepath 

bq. so please let me know of any style, testing, etc. issues, or anything I may be missing about the Cassandra review process.

Patch looks good style-wise. Standard practice is to name a diff as #.txt, in this case 6726.txt. Personally I much prefer to review branches posted to github, for all but the most trivial patches.

> Recycle CompressedRandomAccessReader/RandomAccessReader buffers independently of their owners, and move them off-heap when possible
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6726
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: cassandra-6726.patch
>
>
> Whilst CRAR and RAR are pooled, we could and probably should pool the buffers independently, so that they are not tied to a specific sstable. It may be possible to move the RAR buffer off-heap, and the CRAR sometimes (e.g. Snappy may possibly support off-heap buffers)



--
This message was sent by Atlassian JIRA
(v6.2#6252)