You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2015/01/01 00:18:16 UTC

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

    [ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262487#comment-14262487 ] 

Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

bq. Whether to migrate whole OHC code into org.apache.cassandra codebase (with the option to either turn it on or off).
I am open to either. I asked Benedict and he prefers having it inside C* so we can patch it. The advantage of having it outside is that it might see use elsewhere and get additional eyes/contributions. You could start with it outside and publish to maven central and if there an issue getting patches applied quickly we can always fork it in C*.

bq. Whether to implement a “pluggable row cache“ (to allow multiple implementations)
I think that we aren't going to need multiple cache implementations in the long run. Seems like we should be able to have on that can be configured to have the desired behavior. Benedict doesn't feel strongly about it either. If Vijay wants to continue working on another implementation then we would want to keep it pluggable the way it currently is.

It looks like the KeyCache and CounterCache both use a different implementation and not SerializingCache. I am not clear on why they don’t use serializing cache. It's worth evaluating why that is before converging on a single implementation.

bq. New per-table knob to enable whether to populate entries to the row cache on reads+writes or just on reads (to target different workloads)
Sounds like it would be useful, but first we have to come up with someone somewhere that says I want this, or a workload where this is the right call. There may also be correctness issues to think about see next item.

bq. Rethink about whether to keep the current RowCacheSentinel implementation as is - if I understand it correctly, it just reduces the number of cache-put operations (cache hit on a sentinel performs a disk read). A compromise regarding additional serialization cost?
I think it is for correctness? https://issues.apache.org/jira/browse/CASSANDRA-3862
I'm still reading up on this.

bq. Improvement of key (de)serialization (saving the row cache to disk) - use direct I/O
There is some trickiness here because the AutoSavingCache breaks apart the keys to determine where the data goes.
bq. Optimizations of value deserialization effort - let C* directly access a cached row in off-heap memory instead of the deserialization (and on-heap object construction) overhead.
I think these two together would make a good follow up ticket. Another good follow up ticket would be addressing the allocator for performance and for fragmentation.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)