You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Rick Branson (JIRA)" <ji...@apache.org> on 2013/11/26 20:14:38 UTC

[jira] [Commented] (CASSANDRA-5357) Query cache

    [ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832901#comment-13832901 ] 

Rick Branson commented on CASSANDRA-5357:
-----------------------------------------

Perhaps an anecdote from a production system might help find a simple, yet useful improvement to the row cache. Facebook's TAO distributed storage system supports a data model called "assocs" which are basically just graph edges, and nodes assigned to a given assoc ID hold a write-through cache of the state. The assoc storage can be roughly considered a more use-case specific CF. For large assocs with many thousands of edges, TAO only maintains the tail of the assoc in memory, as those tend to be the most "interesting" portions of data. More of the details are discussed in the linked paper[1].

Perhaps instead of a total overhaul, what's really needed to evolve the row cache by modifying it to only cache the head of the row and it's bounds. In contrast to the complexity of trying to match queries & mutations to a set of serialized query filter objects, the cache only needs to maintain one interval for each row at most. This would provide a very simple write-through story. After reviewing our production wide row use cases, they seem to fall into two camps. The first and most read-performance sensitive is vastly skewed towards reads on the head of the row (>90% of the time) with a fixed limit. The second is randomly distributed slice queries which would not seem to provide a very good cache hit rate either way.

[1] https://www.usenix.org/conference/atc13/technical-sessions/papers/bronson)

> Query cache
> -----------
>
>                 Key: CASSANDRA-5357
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>
> I think that most people expect the row cache to act like a query cache, because that's a reasonable model.  Caching the entire partition is, in retrospect, not really reasonable, so it's not surprising that it catches people off guard, especially given the confusion we've inflicted on ourselves as to what a "row" constitutes.
> I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1#6144)