You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/01/09 08:48:45 UTC

[jira] Created: (CASSANDRA-1956) Convert row cache to row+filter cache

Convert row cache to row+filter cache
-------------------------------------

                 Key: CASSANDRA-1956
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Stu Hood


Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.

Possible implementations:
* (copout) Cache a single filter per row, and leave the cache key as is
* Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
* Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
* others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180152#comment-13180152 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

I think that would be great ...

bq. Although if you never query the excluded column... close enough, right?

Well maybe you could incorporate hinting as mysql or oracle does in thrift and cql as in

{noformat}
SELECT * FROM WhatEver IGNORE CACHE (*) WHERE KEY = 'ComesMyWay'
{noformat}

or

{noformat}
SELECT * FROM WhatEver /*+ nocache(*) */ WHERE KEY = 'ComesMyWay'
{noformat}

That would also prevent cache pollution when you need to run jobs
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179905#comment-13179905 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

You could be right.  I'd still prefer to do have everything go through a single code path first, and then we can evaluate optimization afterwards.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256983#comment-13256983 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

Also, it sounds like this always invalidates on update.  Would it be possible to preserve the current row cache behavior?  I.e., update-in-place if a non-copying cache implementation.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-1956:
-----------------------------

    Attachment: 0001-commiting-block-cache.patch

This patch is not complete yet i just wanted to show and see what you guys think about this... This patch is something like a block cache where it will cache blocks of columns where the user can choose the block size and if the query is within the block we are good by just pulling the block into memory else we will scan through the blocks and get the required blocks. Updates can also scan through the blocks and update them... The good part here is this should have lower memory foot print than Query cache but it should also solve the problems which we are discussing in this ticket. Let me know thanks! Again there is more logic/cases to be handled, Just a prototype for now.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256975#comment-13256975 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

Does this support caching head/tail queries?  Or do X and Y have to be existing column values?
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181200#comment-13181200 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. That, and "I want to cache a specific set of known-ahead-of-time columns [maybe the entire row]," which is what today's row cache is mostly used for.

That is trivially handled by the filter-per-cf approach I'm advocating, contrarily to the query cache solution. 

bq. I think it's a huge, huge win for a design to be able to handle both of these, without requiring it to be specified in the schema.

Again, I really don't think specifying it in the schema is such a big deal in that case (I insist on the "in that case", I'm *not* pretending hand-tuning is never a big deal), nor does it feel a hard one to get right.

Now don't get me wrong, I agree that self-tuning is great, but only if we know how to do it correctly. Typically, and to refer to some ideas above, I think that if users have to think about what query they should do to have good caching (like using select * when really they want select x, y but want to keep the full row in cache, or to be careful that if they use too many different queries for a given row it won't play well with the cache), then 1) it's still hand-tuning and 2) one that is imo far less convenient/intuitive.

Basically what I'm saying is that with a query cache, I see a number of unknowns, of added difficulties (what about the space taken by all those filter per query? how do we make sure to cache the full row when it's the right thing to do without any user intervention? etc...) and of cases where it will be less efficient that the filter-per-cf alternative unless the user is super careful (will that be a problem in real life ? maybe not, but maybe). On the other side, adding a simple per-cf filter is a nice simple increment over what we have and we stay in known territory while solving the problem we want to solve.

Besides, if specifying a filter with the schema is that much of a problem, maybe we can do that choice automatically. We have stats on the rows avg and max size, and we can easily start gathering some simple stats on queries, at least enough to be able to say if it's the head or tail that we need to keep in cache for wide rows. Though honestly, even if we do that, my preference would largely go to still allow the user to override whatever automatic choice we came up with if they wish so.


                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179787#comment-13179787 ] 

Pavel Yaskevich commented on CASSANDRA-1956:
--------------------------------------------

That is why I propose to combine current technique and filter-data and use first for small rows and latter for wide ones without on-update invalidation. And I agree with Daniel and Sylvain that serializing query cache that invalidate on update won't be very useful in most cases.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-1956:
-----------------------------

    Comment: was deleted

(was: So for this ticket:

1) Expose rowCache API's so we can extend easier.
2) Reduce the Query cache memory foot print.
3) reject rows > x (Configurable)
4) Writes should not invalidates the cache (Configurable but if not invalidate take some hit on the write performance).

Reasonable? Anything missing?)
    
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179146#comment-13179146 ] 

Pavel Yaskevich commented on CASSANDRA-1956:
--------------------------------------------

What I propose is - let's make a row cache more configurable according to the way we handle big rows, we can allow cache to take a per-cf "filter" class which would have settings for max row size to cache (and probably some other options in the future) so we can feed it a ColumnFamily from getTopColumns and let it decide if we should just keep that ColumnFamily in cache as is, because it's a thin row, or (if we are querying different parts of the big row) keep QueryFilter and ColumnFamily.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-1956:
----------------------------------------

    Fix Version/s:     (was: 0.7.2)
                   0.7.3

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.3
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256850#comment-13256850 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

how do you decide which blocks to cache?
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256949#comment-13256949 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

Hi Jonathan, When user requests X1,Y1 we cache the block from (Start->X1-Position ->Y1-Position) + N.... with block size being configurable. (it should similar to the page cache and if there is a write on the block the whole block is marked dirty and next fetch will go to the FS). there is a configurable block size when set high enough will cache the whole row (like the existing cache). The logic around it is kind of what the patch has....

We can also use column indexes (if needed) and caching data within it, but the simple may be is to start from the column requested and cache blocks (else updating them cache without invalidating the whole row is going to be hard).
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205503#comment-13205503 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. What about secondary indexes? I guess you'd like to be able to cache "hot" columns in secondary indexes CF

It's a good question. Secondary indexes CF are accessed from beginning to end, using paging potentially. Which means it's unclear to me if we can do much better than caching the whole secondary index rows. But again, I'm good discussing that, but what I'm mainly saying is that we can't choose the best solution unless we clearly identify the problems we are trying to solve.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179689#comment-13179689 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. ... to answer my own question, "I want to keep an entire CF in memory" is a fairly common request. So maybe the answer is, we support that more directly, as well as the query cache.

Thinking out loud, but with secondary indexes, we use the trick of expanding the filter to a full row filter if the maxRowSize for the cfs is smaller that some value (the columnIndexSize more specially). Given that keeping entire CF in memory really make sense only for static (narrow) CFs, we could just expand filters for those CFs automatically and just have a query cache as far as the cache implementation is concerned.

As a side note, I share Daniel's opinion (at least I believe that's what he meant earlier) that a serializing query cache that invalidate on update will be very useful for wide rows. At least I don't see many use cases where it would. However I see a cache that would not invalidate on update but keep the cached data matching the filter be much more useful, even if we start by an on-heap cache. And once we've agreed that we'll evacuate the delete problem by invalidating, I don't think it's too hard to do. 
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142264#comment-13142264 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

As I wrote earlier I'm a little sceptical that a query cache like this will be useful in many cases but since there is something going on here and Jonathan asked for a wish list:

Would you guys consider making the row cache a little more pluggable? This would allow us to maintain custom implementation more easily. Also I think that the core code could benefit as well moving some ifs out of CFS.

Instead of implementing the control flow in CFS and using the cache as a map you could introduce an RowCache instance that would act more like a service layer like:

{noformat}
interface RowCache {

    // returns filtered rows - ready to serve. reads the row from cfs if necessary.
    ColumnFamily getRow(CFS store, QueryFilter filter, int gcBefore);
    
    // notify the cache of a mutation. it can update or invalidate 
    void apply(CFS store, DK key, CF cf);
}    
{noformat}

This way CFS would need no knowledge wether a cache is able to update or only invalidate. And when it invalidates wether it has to invalidate the row or just portions of it. Also there would be no expectation about the internal caching format. The row cache could do whatever it likes. 

In CFS there would be only the cache reference. No distinction between old row cache, query cache, off-heap-cache, my-awesome-very-specialized-cache would be necessary.
 
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180093#comment-13180093 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

Nice try :-)

I was Mr I dont believe in magic from the very beginning. I thought that there are some common cases like tail and head caching or exclusion of columns that could be implemented. But never mind - maybe the proposed cache is all greatness. All I'm trying to say is that it's pretty easy to end up in propagation failure hell here or change something else that blows things up for use cases that are not foreseen.

So to prevent a rude awakening for some users it might be cool to provide some means (config or whatever) that works similar as the current version. Or at least schedule this for a release which allows for a downgrade.

Just saying ... 
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142264#comment-13142264 ] 

Daniel Doubleday edited comment on CASSANDRA-1956 at 11/2/11 4:40 PM:
----------------------------------------------------------------------

As I wrote earlier I'm a little sceptical that a query cache like this will be useful in many cases but since there is something going on here and Jonathan asked for a wish list:

Would you guys consider making the row cache a little more pluggable? This would allow us to maintain custom implementation more easily. Also I think that the core code could benefit as well moving some ifs out of CFS.

Instead of implementing the control flow in CFS and using the cache as a map you could introduce an RowCache instance that would act more like a service layer like:

{noformat}
interface RowCache {

    // returns filtered rows - ready to serve. reads the row from cfs if necessary.
    ColumnFamily getRow(CFS store, QueryFilter filter, int gcBefore);
    
    // notify the cache of a mutation. it can update or invalidate 
    // most impl will not need the store param but it might come handy in special cases
    void apply(CFS store, DK key, CF cf);
}    
{noformat}

This way CFS would need no knowledge wether a cache is able to update or only invalidate. And when it invalidates wether it has to invalidate the row or just portions of it. Also there would be no expectation about the internal caching format. The row cache could do whatever it likes. 

In CFS there would be only the cache reference. No distinction between old row cache, query cache, off-heap-cache, my-awesome-very-specialized-cache would be necessary.
 
                
      was (Author: doubleday):
    As I wrote earlier I'm a little sceptical that a query cache like this will be useful in many cases but since there is something going on here and Jonathan asked for a wish list:

Would you guys consider making the row cache a little more pluggable? This would allow us to maintain custom implementation more easily. Also I think that the core code could benefit as well moving some ifs out of CFS.

Instead of implementing the control flow in CFS and using the cache as a map you could introduce an RowCache instance that would act more like a service layer like:

{noformat}
interface RowCache {

    // returns filtered rows - ready to serve. reads the row from cfs if necessary.
    ColumnFamily getRow(CFS store, QueryFilter filter, int gcBefore);
    
    // notify the cache of a mutation. it can update or invalidate 
    void apply(CFS store, DK key, CF cf);
}    
{noformat}

This way CFS would need no knowledge wether a cache is able to update or only invalidate. And when it invalidates wether it has to invalidate the row or just portions of it. Also there would be no expectation about the internal caching format. The row cache could do whatever it likes. 

In CFS there would be only the cache reference. No distinction between old row cache, query cache, off-heap-cache, my-awesome-very-specialized-cache would be necessary.
 
                  
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144902#comment-13144902 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

Sure, If no one else has a concern i can do the refractor as a part of this patch...
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136055#comment-13136055 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

Do we store the full key twice, or just two references to the same key object?  The latter would be negligible IMO.  (And asking "How many total columns are in this row?" isn't free, either.)
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179149#comment-13179149 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

What if we turned it into a real query cache?  That way we don't have to predefine filters in the schema, we just use the IFilter objects from the queries involved.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182148#comment-13182148 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

How about a query cache which will cache all query's by default
1) Users can set an upper bound on the size per row (cache rejection handle)
2) Users can also say cache everything startWith="a" to endWith="x" if the query falls within. (The first time we see see for a row) We will populate the cache with the predefined query and subsequent queries which will fetch results within these limits will served from the cache and the rest will go to the disk.
3) the existing row cache is nothing but a configuration with startWith="" and endWith="" (everything in a row).

Makes sense?

                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179883#comment-13179883 ] 

Pavel Yaskevich commented on CASSANDRA-1956:
--------------------------------------------

I don't say that we should keep both implementations, I suggest we combine them :) I'm not a big fan of keeping filter for small rows because it creates superfluous overhead.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1956:
--------------------------------------

         Reviewer: jbellis
    Fix Version/s: 0.7.2
         Assignee: Daniel Doubleday

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.2
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179876#comment-13179876 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

bq. That is why I propose to combine current technique and filter-data and use first for small rows and latter for wide ones

I'd rather avoid the complexity of keeping both implementations around.  If the rows are small enough that keeping the whole thing in memory is the right tradeoff, then users can optimize that themselves by using "select *" instead of "select x" and "select y" (i.e., the former would result in just one cache entry for the row).  I suspect it won't matter a great deal in real work scenarios anyway.

How about this?

- Query cache replaces row cache, with on/off heap implementations based on existing SC/CLHC.  Use CLHM weight feature to rank by query result size.
- Cache key becomes (row key, query filter)
- When applying an update to row X, check query cache for filters on X.  Update cached CF with the new data for on-heap, invalidate for off-.
- New ticket for "pin CF in memory" feature
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205673#comment-13205673 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

Alright, What i was trying to do here is to get the feedback from everyone on all the use cases and try to fit it into the one cache,

I did some fair amount of research to see if there is any better option and there wasn't one, the closest concept which i got to was something like a block cache or Linux Page cache.... When there is updates to those blocks we can find those and update those.

1) The problem shows up only when you have a wide row, which means most probably the user is doing a range queries
2) If the user has a wide row then most probably he has a large number of writes into the row, but if we invalidate the row cache for every updates then it might not be useful and also the first read will have to read multiple SST's.
3) Lets say user has a 100 columns to query and he queries in this case (specially with composite type columns where the column names can be larger than the value), then we can possibly run into memory pressure.
4) Having whole row in memory is absolutely required case and we are supporting it (setting min and max number of columns in a block will help it).
5) the above solution can work seamlessly well for narrow rows when the block size is reasonably big.

Head and Tail is basically a optimization for the Reverse/Forward queries which is supported if you have 1 M rows and your block size is 500 and your count is 100 and you are reading from reverse.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162540#comment-13162540 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

Looking at CASSANDRA-3143 I understand that the consensus seems to be that users should be protected from missconfigurations and that a sane 'works most of the time' solution is the way to go.

Still I want to note why I do believe that some solution along the lines of your patch is a good idea.
Obviously I can only speak for us and our usecases but our app can only work with the relevant data set in mem.

Being able to to optimize caching using the knowledge about data access patterns could significantly improve effeciency.

Right now we still need first level caching for a lot of use cases which makes life not necessarily easier.

With the patch custom cache providers could (on a per cf basis) cache partial rows, optimize memory format or even integrate other caching solutions. We could also skip caching entirely for certain queries. This would also be great for maintenance jobs which would otherwise lead to cache thrashing.

I understand that nobody wants to scare off adopters but all of this would not really change anything for people who want to go with the sane standard. And of course it is also clear that such extension points cannot be guaranteed to be stable. 

Yada yada yada ... 

Thanks for the effort btw!
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136106#comment-13136106 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

Hi Chris, i will add evict in the v1....
Jonathan, yes in short it is just the reference.

There will also be duplicate data (Columns in the cache) if the query's over lap (which can be further optimized later?)
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988023#action_12988023 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

Now it's my ahh  :-) I think I understood your idea.

Instead of configuring cache rules you want to cache every filtered request like mysql query cache?

I dropped the idea because I thought it would be either very restricted to certain query patterns or very complicated to keep in sync with memtables and/or decide whether a query can be served by the cache. Also it might be hard to avoid that the cache is being polluted (analogous to the page cache eviction problem during compaction). It might force the developer to spread the data over multiple CFs according to access pattern which increases memory needs (more memtables, more rows).

But yes - if you can come up with an automagical cache management that just works that would be obviously nicer!

PS: If you wan to have a look at the patch: apply to 0.7 r1064192


> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.2
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1956:
--------------------------------------

             Priority: Minor  (was: Major)
    Affects Version/s:     (was: 0.7.0)
        Fix Version/s:     (was: 0.7.4)
                       0.8

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1956:
--------------------------------------

    Reviewer: slebresne  (was: jbellis)
    Assignee: Vijay  (was: Daniel Doubleday)
    
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179619#comment-13179619 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

If i understand it correctly, we need a configuration which will tell cache to cache filters only if the returned row is more than x size?
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194981#comment-13194981 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

So for this ticket:

1) Expose rowCache API's so we can extend easier.
2) Reduce the Query cache memory foot print.
3) reject rows > x (Configurable)
4) Writes should not invalidates the cache (Configurable but if not invalidate take some hit on the write performance).

Reasonable? Anything missing?
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205579#comment-13205579 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

bq. What is so clumsy?

First, that you need to explicitly configure it as part of the schema.  Second, because it inherently only allows one type of query to be cached.  "One CF per query" is the wrong rule of thumb -- "One CF per type of resultset" is my preferred one.  So for instance, you couldn't cache both oldest entries and newest, from the same row in a CF.

bq. while with a true query cache we could do write-through updates on 2I queries as well

What I mean by this is that {{select * from users where birth_date = 1980}} is a query that people could reasonably want to cache, that we can't fit into your 3 categories of "full row, head/tail, handful of named columns."  

At a more sophisticated stage from that, a "true" query cache could update that cached resultsets whenever someone updates the birth_date value to or from 1980, so the query stays fast without having to be recalculated.  (We already have the perfect place in the code for this where index maintenance happens in Table.apply.)
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256954#comment-13256954 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

How is that different from the "query cache" I waved my hands about?

                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204329#comment-13204329 ] 

Vijay edited comment on CASSANDRA-1956 at 2/9/12 7:30 AM:
----------------------------------------------------------

This patch is not complete yet i just wanted to show and see what you guys think about this... This patch is something like a block cache where it will cache blocks of columns where the user can choose the block size and if the query is within the block we are good by just pulling the block into memory else we will scan through the blocks and get the required blocks. Updates can also scan through the blocks and update them... The good part here is this should have lower memory foot print than Query cache but it should also solve the problems which we are discussing in this ticket and it doesnt support Super columns and I dont plan to do so. Let me know, Thanks! Again there is more logic/cases to be handled, Just a prototype for now.
                
      was (Author: vijay2win@yahoo.com):
    This patch is not complete yet i just wanted to show and see what you guys think about this... This patch is something like a block cache where it will cache blocks of columns where the user can choose the block size and if the query is within the block we are good by just pulling the block into memory else we will scan through the blocks and get the required blocks. Updates can also scan through the blocks and update them... The good part here is this should have lower memory foot print than Query cache but it should also solve the problems which we are discussing in this ticket. Let me know thanks! Again there is more logic/cases to be handled, Just a prototype for now.
                  
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205519#comment-13205519 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

Right now we have a cache that is only useful for accelerating queries against rows that fit easily into memory, and even then it is inefficient if you only care about part of the row (either name-based or slice-based).

The filter approach allows us to make slice-based queries more efficient (somewhat clumsily) but doesn't really address the inefficiency for name-based queries.  As Lior points out, it also doesn't help with 2I queries, while with a true query cache we could do write-through updates on 2I queries as well ("select * from users where birth_date = 1980").  (This is a fairly straightforward jump to make for queries within a single composite-PK row, and admittedly more complex when spanning multiple physical rows gets involved.)
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179003#comment-13179003 ] 

Pavel Yaskevich commented on CASSANDRA-1956:
--------------------------------------------

How about we make row cache value more generic so we can distinguish between fat/thin rows: cache would access a "filter" object which would decide how to be with a row according to it's size (and some other characteristics probably) - should the row be stored directly as ColumnFamily or by slice as hash filter => ColumnFamily or should it start slicing from particular size?... Row cache options could be customized the same way as compaction strategy and compression parameters...
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179909#comment-13179909 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

bq. users can optimize that themselves by using "select *" instead of "select x" and "select y"

I guess it's not totally atypical right now to model data in a way that it fits the current caching scheme. I.e. we have UserData rows with around 10 columns and around 50 - 100k row size. All of them are read during one session at a different time. With the proposed new caching scheme we have the choice to either create 10x cache misses and a lot more objects to gc. Or load the entire row everywhere.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194980#comment-13194980 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

So for this ticket:

1) Expose rowCache API's so we can extend easier.
2) Reduce the Query cache memory foot print.
3) reject rows > x (Configurable)
4) Writes should not invalidates the cache (Configurable but if not invalidate take some hit on the write performance).

Reasonable? Anything missing?
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211491#comment-13211491 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

Just wanted to make sure to clarify: The proposal is not about head or the tail row's cache, but to have a block cache where blocks of the wide rows are cached (with cname>x to cname<y)... here head and tail is just pure optimization on where to start the block cache from (where to start the block from from the tail or from the head).
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Doubleday updated CASSANDRA-1956:
----------------------------------------

    Attachment:     (was: 0001-row-cache-filter.patch)

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.2
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256965#comment-13256965 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

:) Quite similar but a different version with less memory foot print, and efficient updates.

1) For example if there are 10 columns which are queried the key will have those names and in the CF return object;
2) If we have 2 kinds of queries with over laps (slice and column names) then we will be caching twice or sometimes more in a pure query cache;
3) If we have an update to a column out of 10 columns.... we have to search to see if they are available in them or invalidate the whole row. in this way we can update the block and be done with it.

This also allows us to incrementally deserialize some parts of the row when the whole row is cached. *

                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Lior Golan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205272#comment-13205272 ] 

Lior Golan commented on CASSANDRA-1956:
---------------------------------------

{quote}
# Caching a row entirely; that's what we do and I think we agree we should keep that feature because sometimes that's what you want.
# Caching the head or the tail of a row for wide rows.
# I could also imagine cases where you want to only pin a few columns (by name) into the cache without keeping the row entirely.
{quote}

What about secondary indexes? I guess you'd like to be able to cache "hot" columns in secondary indexes CF
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182364#comment-13182364 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

bq. I really don't think specifying it in the schema is such a big deal

Maybe, but I see a true query cache as being better than the row cache in 90% of situations.  It's only an accident of implementation that we did the row cache first.  So it feels weird to me to argue to keep it instead of moving to a more flexible model.  (One that could accommodate 2ary index queries as well, for instance.)
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179155#comment-13179155 ] 

Pavel Yaskevich commented on CASSANDRA-1956:
--------------------------------------------

That would work, actually I think that we can slightly modify IFilter to return number of columns involved or use resulting ColumnFamily.serializedSize() to check for threshold, I just want to make sure that cache has minimal memory overhead associated with keeping filters for thin rows.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180355#comment-13180355 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

Well that is pretty much exactly what the initial patch does.

The only inefficiency as noted above are deletions which are hard to handle without invalidation and now that we have them ttl columns.

That is when 'CACHING FIRST 100' implies a promise that the user will receive 100 valid cols.
The easy way out would be to state that 100 includes tombstones.


                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Doubleday updated CASSANDRA-1956:
----------------------------------------

    Attachment: 0001-row-cache-filter.patch

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179765#comment-13179765 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

bq. As a side note, I share Daniel's opinion (at least I believe that's what he meant earlier) that a serializing query cache that invalidate on update will be very useful for wide rows.

If there's a 'not' missing here than yes :-)

bq. However I see a cache that would not invalidate on update but keep the cached data matching the filter be much more useful

Just as an implementation idea that could make this easier: If the cache data would be merged with memtables upon read you could merge/write back cache data upon memtable flush which avoids synchronization headaches and might be more efficient (we did something similar in CASSANDRA-2864)
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180297#comment-13180297 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

Thinking about this a bit more, I'm not sure I'm convinced by a query cache. Or rather, I think that a cache + filter defined with the schema could be much simpler and imo likely good enough.

More precisely, with a query cache, when you update a (cached) row, you have to check every queries for that row to see if it should be updated. I'm afraid there will be cases where this will be inefficient and this will put the burden on user to make sure they don't make query that hit those inefficiencies. I'm also really not fan of having of putting the burden on user to query full row if they want it cached fully for equivalent reasons.

It seems to me that what we want to handle here is exactly the 'cache head or tail of row' problem. If so, it seems to me that simply adding a per-cf (optional) filter to the cache has the following advantages:
- It handles the head/tail use case, as well as the current cache all row case.
- You don't have to care about the problems I mention above
- There is no in-memory overhead of filters problem. We just keep one filter per cf. We can allow more than one filter per-cf in the future *if* that proves useful, which I'm not even too sure it will.
- There is no upgrade headache at all, no new cache that use will have to switch to, nothing we'd have to deprecate. No new mental model for the user of how things are cached, just a imho very natural new option of being able to select what part of the row is cached.
- No question of having two solutions. The current cache will just be the case were there is no filter configured (or the filter is the identity filter, whether "optimize" the no filter case or use the identity filter is really a implementation detail).

Now the only downside I could see to that compared to a query cache is the fact that you have to define the filter with the schema. I really see this as almost anecdotal. Doesn't seem very complicated (and certainly more simple than having to change your query to make sure what you want is cached) to write something along the line of:
{noformat}
CREATE TABLE timeline (
    userid uuid,
    timestamp time,
    action text,
    PRIMARY KEY (userid, timestamp)
) WITH COMPACT STORAGE AND CACHING FIRST 100;
{noformat}
(note the use of our tentative syntax of CASSANDRA-2474 for wide rows) or even
{noformat}
CREATE TABLE users (
    userid uuid PRIMARY KEY,
    firstname text,
    lastname text,
    age int,
    email text,
    picture binary,
) WITH CACHING (firstname, lastname, email);
{noformat}
if one is so inclined to do that (because he don't want to cache profile pictures for instance).

                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-1956:
-----------------------------

    Attachment: 0002-add-query-cache.patch
                0001-re-factor-row-cache.patch

First take on the refactor, i am not really happy with this but thought of sharing it to see if there is any concerns on the same... 

TODO: AutoSaving for query cache.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "B. Todd Burruss (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207370#comment-13207370 ] 

B. Todd Burruss commented on CASSANDRA-1956:
--------------------------------------------

i have very wide rows (140k columns) that i randomly query on, usually about 100 columns at a time.  wide rows do not work well with the SerializingCacheProvider because of the constant copying of data. ConcurrentLinkedHashMap performs very well, but eats memory because of all the ByteBuffers.

I'm trying to understand if this will help my case.  head and tail caching will help folks with time series data, but not me.  possibly the "handful of named columns" caching will help, but there will be overlap in my queries so columns will exist in multiple cache entries, balooning the cache.

what i was hoping for was a scheme to segment the wider row into smaller "segments" so not as much copying is performed in the SerializingCacheProvider.

                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205619#comment-13205619 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. So for instance, you couldn't cache both oldest entries and newest, from the same row in a CF.

Well, as I said earlier, that would just require to have a handful of query per-CF rather than just one. Granted, this may blur a little bit the actual complexity difference with respect to a query cache, but it's still likely simpler and with less overhead.

I think that for a good part what bothers me with a pure query cache is that I think picking what to cache is difficult to do automatically, and looking each query in isolation (which is what a query cache does) is not necessarily the right thing. The typical example is when the right thing to do is to cache the whole row while you'll never query the full row (but maybe on part of the code query the firstname and lastname, another query the email, another the phone). We've mentioned the idea of having the 'cache the full row' as a special case but that doesn't sound very convenient. And it makes me wonder if we won't have the same problem for other situation, where the query cache actually play against you because it just don't see the big picture. While for the user usually know that big picture. In any case, what I meant by my previous comment is that pining a full row into the cache is imho something we should keep (without forcing the user to always query the full row to get that), and that's not handle by a true query cache.

{quote}
What I mean by this is that select * from users where birth_date = 1980 is a query that people could reasonably want to cache, that we can't fit into your 3 categories of "full row, head/tail, handful of named columns."

At a more sophisticated stage from that, a "true" query cache could update that cached resultsets whenever someone updates the birth_date value to or from 1980, so the query stays fast without having to be recalculated. (We already have the perfect place in the code for this where index maintenance happens in Table.apply.)
{quote}

That's a good point. I agree that in that case a query cache is what we want, because the query "spans" multiple CF (or in other words the query doesn't map directly to what's on disk but compute the result). But I'm still not sold on the query cache on direct queries of rows, because of the reasons above. 
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256992#comment-13256992 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

>>> Does this support caching head/tail queries? Or do X and Y have to be existing column values?
No X and Y doesn't need to existing, they are just markers in the RowCacheKey (for example if the query has x* -> y* we will have that in the RCK instead of xeon -> yum)... It does support head and tail queries.

>>>  it sounds like this always invalidates on update. Would it be possible to preserve the current row cache behavior?
Yeah the prototype does the update on write, but the problem is that when there are a lot of updates block size will increase then initially cached, at some point we need to split/re-partition it...
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180134#comment-13180134 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

Maybe we should just leave the existing row cache in for 1.2 and deprecate it in favor of query cache, and remove it in 1.3.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987360#action_12987360 ] 

Stu Hood commented on CASSANDRA-1956:
-------------------------------------

Thanks for the patch Daniel! We actually have existing 'filter' implementations (in {{org.apache.cassandra.db.filter}}) that I think would make the most sense for use aside cache entries.

> What about just invalidating (removing from the cache) the row on delete and letting it get rebuild on the next read?
Also, regarding the "tombstones in cache" problem: I believe it came up in IRC the other day. The solution that seemed closest to our existing methods was to keep the tombstones in cache, but to add a thread that periodically walked the cache to perform GC (with our existing GC timeout) like we would during compaction.

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.2
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987674#action_12987674 ] 

Daniel Doubleday commented on CASSANDRA-1956:
---------------------------------------------

@Jonathan 

Yes I like the invalidation idea. 

@Stu

bq. We actually have existing 'filter' implementations (in org.apache.cassandra.db.filter) that I think would make the most sense for use aside cache entries.

In my implementation the RowCacheFilter has more responsibilities than just filtering the columns. It's more like a plugin interface and filter is probably not the right name for it. It actually uses the db.filter classes.

I guess I don't understand what you proposed as general mechanism. As I said I had this implementation on 0.6 before and just ported it. My main point was that I think it would be really helpful if the plugin mechanism would allow for custom cache handlers (maybe that would be a better name) with a generic configuration mechanism (filter params in yaml). Because every use case / data model can be very specific.

bq. Also, regarding the "tombstones in cache" problem: I believe it came up in IRC the other day. The solution that seemed closest to our existing methods was to keep the tombstones in cache, but to add a thread that periodically walked the cache to perform GC (with our existing GC timeout) like we would during compaction.

I don't understand how that would help me: the TailRowCacheFilter contract is that it can return a specific amount of live columns. These are cached. If one of them gets deleted it would not be able to return a valid response. I tried to cope with that with the expected deletion ratio. But I agree with Jonathon that this is not good enough. 

Again you might have an entirely different solution in mind ...

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.2
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180375#comment-13180375 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. The only inefficiency as noted above are deletions which are hard to handle without invalidation and now that we have them ttl columns.

Yes, though I'll note that a query cache have the exact same problem so it's more a problem we have to deal with than anything else. And I'm good by starting with just invalidating for deletion and TTL at first. On a second iteration, I think we could improve the TTL case by keeping for each cached row (at least those associated to a non-identity slice filter) a firstColumnToExpireTimestamp. We would keep that to the smallest localExpirationTime in the cached row. Gets and puts would just invalidate the cache if now >= firstColumnToExpireTimestamp. But again, we don't have to concern ourselves with that initially.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180912#comment-13180912 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

bq. It seems to me that what we want to handle here is exactly the 'cache head or tail of row' problem.

That, and "I want to cache a specific set of known-ahead-of-time columns [maybe the entire row]," which is what today's row cache is mostly used for.

I think it's a huge, huge win for a design to be able to handle both of these, *without* requiring it to be specified in the schema.  We've been moving away from hand-tuning, towards self-tuning, for a very good reason: when you require humans to do the right thing to be efficient, you're going to be inefficient an awful lot of the time. :)
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204371#comment-13204371 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq.  it should also solve the problems which we are discussing in this ticket

What are those?

I'd like us to be a little scientific on that issue. What is it we are trying to do in the first place? My take on that (and please feel free to correct me if I'm missing something) is that the kind of caching that I can really see useful in practice are:
# Caching a row entirely; that's what we do and I think we agree we should keep that feature because sometimes that's what you want.
# Caching the head or the tail of a row for wide rows.
# I could also imagine cases where you want to only pin a few columns (by name) into the cache without keeping the row entirely.

And well, that's it. I try to think of other type of (not far fetched hypothetical) workload where caching could be a notable win but are not handled by the 3 cases above and I don't really find one. Now I apparently am stupid and miss 90% of situations since:

bq. but I see a true query cache as being better than the row cache in 90% of situations

because the 3 cases above are perfectly handled by the idea of just adding a filter per-cf to our current row cache (which btw could easily be extended to 2-3 filters per-cf if that proves necessary). So please let's share those cases that are not above and that we want to handle as part of this ticket.

But if what's above does sum up the problem we want to solve, then I continue to think that simply adding a per-cf filter alongside our current row cache is the best solution:
* there is *no* memory overhead.
* all 3 caching use case above are handled without any drawback that I can think of.
* it's an incremental change of the existing, not a completely new thing, thus lowering then risk of introducing new bugs. Typically, I can easily see how CASSANDRA-3862 will translate to that solution; but I suspect thing may get more complicated for say a query cache.

The only criticism that I've seen so far on that solution is the question of the user configuration of the cache, while for the query cache there wouldn't be a configuration (which remains to be proven btw if we want to support the 'stick a row entirely in cache always' case). If someone consider that auto-configuration should be an absolute priority then let's discuss that, because I disagree with that (to sum up, I think any auto-configuration of caches will have drawbacks so I think users should be able to override the default and so I think it's more sane to start with a cache that user can make do what they want and then evaluate how to make that configuration mostly automatic, which I think can be done).

So before considering other solutions, I'd like to understand first more clearly why we're discarding that per-cf filter idea. Because currently it seems to strike a pretty nice balance of fixing what seems to be the problem versus the added complexity.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1956:
--------------------------------

    Comment: was deleted

(was: Thanks for the patch Daniel! We actually have existing 'filter' implementations (in {{org.apache.cassandra.db.filter}}) that I think would make the most sense for use aside cache entries.

> What about just invalidating (removing from the cache) the row on delete and letting it get rebuild on the next read?
Also, regarding the "tombstones in cache" problem: I believe it came up in IRC the other day. The solution that seemed closest to our existing methods was to keep the tombstones in cache, but to add a thread that periodically walked the cache to perform GC (with our existing GC timeout) like we would during compaction.)

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.2
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179800#comment-13179800 ] 

Vijay commented on CASSANDRA-1956:
----------------------------------

Cool i will do the following: 
* Make a configurable cache type serializing or inheap.
* Make x size to be configurable either full row or partial.
* Add a ranking of Queries 
Example: IdentifyFilter will have the highest rank and any query can go fetch the data from them, in other words compare the query and see if the returned data will be a subset of the cache query and if yes then return those....
Use the same ranking if it is not an invalidating the cache to update the rows (which can be tricky but i can give it a shot).
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179981#comment-13179981 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

So... this proposal would be *worse* than the status quo for you?  I thought this was your idea! :)
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180133#comment-13180133 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

bq. I thought that there are some common cases like tail and head caching or exclusion of columns that could be implemented.

Right.  That's what we want to get out of this -- well, the tail/head part, since query cache can only cache what you do ask for, but it can't exclude what you don't.  (Although if you never query the excluded column... close enough, right?)

bq. So to prevent a rude awakening for some users it might be cool to provide some means (config or whatever) that works similar as the current version.

Good point.  Damn it. :)

                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Doubleday updated CASSANDRA-1956:
----------------------------------------

    Attachment: 0001-row-cache-filter.patch

Allow filter to invalidate row

TailRowCacheFilter invalidates cache if a column within its range is deleted

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.2
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179627#comment-13179627 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

I don't know, is it really worth keeping the old "cache entire rows" behavior around if we have something more sophisticated?
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-1956:
-----------------------------

    Attachment: 0001-1956-cache-updates-v0.patch
                0002-1956-updates-to-thrift-and-avro-v0.patch

an alternative approach: 

This is still v0/proto-type once you guys agree on this i will make a fresh version.

Ledger is mainly a look up map to invalidate the filters when there is new changes.

Things to do:
1) Make ledger an object and attach it to the cache instead of extending ASC
2) More fine grain locks on the ledger
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179629#comment-13179629 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

... to answer my own question, "I want to keep an entire CF in memory" is a fairly common request.  So maybe the answer is, we support that more directly, as well as the query cache.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Chris Burroughs (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136053#comment-13136053 ] 

Chris Burroughs commented on CASSANDRA-1956:
--------------------------------------------

Doesn't the QueryCache's ledger also need to have entries removed when entries are evicted from the primary cache?

Would it be reasonable to have a tunning knob so only rows with more than n columns are filter cached to avoid paying the penalty of storing the row key twice?
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180156#comment-13180156 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

I suppose, but best practice is still going to be to run that kind of job on separate replicas, so that feels pretty low priority to me.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205528#comment-13205528 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. The filter approach allows us to make slice-based queries more efficient (somewhat clumsily)

What is so clumsy?

bq. but doesn't really address the inefficiency for name-based queries

Depends on what we're talking. The filter approach would allow to set a name-based filter. But ok, that is less convenient. But the query cache is not perfect either. If you do different name-based query, we will end up caching the same data multiple times. We may be able to optimize this, but then it becomes fairly complicated.

bq. while with a true query cache we could do write-through updates on 2I queries as well

I'm not sure I understand, could you clarify your idea?

Don't get me wrong, I'm not totally closed to the idea of query cache or something alike, but I do want to make sure we don't jump on it without a good reasoning behind, because I do fear a query cache will come with a bunch of complication (and while you may have good reasoning, I personally don't yet see clearly that it's the best choice, so I'll need some convincing). The query cache also has the risk of caching multiple time the same thing. Take a CF on which you do some paging: provided the row receives a few update, we'll end up re-caching the same things multiple times (unless we're really smart about it but I'm pretty sure it's not a simple problem). I'm not sure how much of a problem that'll be in practice but ...

Then there is also the fact that the way you model in C* is usually with one CF per kind of query. So it does feel like keeping each query separately shouldn't be necessary. But that's not a technical argument.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987739#action_12987739 ] 

Stu Hood commented on CASSANDRA-1956:
-------------------------------------

> These are cached. If one of them gets deleted it would not be able to return a valid response.
Ahh, sorry: quite right. Invalidation sounds like the best option there.

I'll try and review this more closely in the next week, but I'm not sure I like the filter as a configuration option, as opposed to any of the ideas in the summary.

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Daniel Doubleday
>             Fix For: 0.7.2
>
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1956) Convert row cache to row+filter cache

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987072#action_12987072 ] 

Jonathan Ellis commented on CASSANDRA-1956:
-------------------------------------------

bq. Rather than keeping track of tombstones in the cache (could not figure out how to do it without locking) deletions are not handled but 'expected' and accounted by caching columns * expected_ts_ratio which is configurable)

This seems pretty error prone.  What about just invalidating (removing from the cache) the row on delete and letting it get rebuild on the next read?  Either way, it means TailRowCacheFilter is not a great fit for workloads w/ lots of deletes (which I am okay with) but at least you don't risk outright invalid answers this way.

> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>         Attachments: 0001-row-cache-filter.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable
> * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.