You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/08/21 14:35:19 UTC

[jira] Created: (CASSANDRA-1417) add cache save/load

add cache save/load
-------------------

                 Key: CASSANDRA-1417
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Jonathan Ellis
             Fix For: 0.6.6, 0.7 beta 2


Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to Thrift, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.

The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.

Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment: 1417-cassandra-0.6-v4.txt

v4 has LoggingOnlyWrappedRunnable

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6.txt, 1417-v2.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906540#action_12906540 ] 

Jonathan Ellis commented on CASSANDRA-1417:
-------------------------------------------

comments on patch:

 - if you're going to organize imports, follow the order on http://wiki.apache.org/cassandra/CodeStyle.  also, looks like unused classes like crypto.Data are being added
 - we don't want to add a setting for loading cache on startup; if a cache is present, it should be loaded
 - we DO want a setting for how often to save cache (default should be: not at all)
 - cache saving should be done at compaction priority.  may mean you need to use an executor instead of a timer to get that level of control
 - tmp cleanup should be in CFS constructor (0.6) / scrubDataDirectories (0.7), not CassandraDaemon.  similarly, loadRowCache should probably be called by CFS constructor
 - we only care about cache keys during save, so getEntrySet should be getKeys
 - avoid the temptation to mix unrelated refactoring like the CFMetaData and DD changes here into patches like this (this is where git comes in handy, you can easilysave them off for separate review later)
 - SavedCacheReader is a class with no fields, should probably just be a static method somewhere (SSTable?).  also, should return a new set rather than taking one as parameter
 - instead of making CacheWriter abstract, it would be cleaner to make the key extractor an interface and have the CacheWriter take an instance of that (i believe guava Converter interface would work here)
 - should respect configured cache size and if saved cache is larger, should log at info and oad as much as there is room for
 

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1417) add cache save/load

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901123#action_12901123 ] 

Stu Hood commented on CASSANDRA-1417:
-------------------------------------

Mm, cache summaries could be stored a bloom filters, since we have to read through the entire index at startup time anyway.

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.6.6, 0.7 beta 2
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment: 1417-trunk-v9.txt
                1417-cassandra-0.6-v9.txt

v9 attached

* adds JMX interface to save caches in trunk
* orders reads when loading row cache in token order
* adds tracker back into open calls so the key cache can be loaded
* sets KeyCacheSavePeriod to 1 hour in trunk
* migrates cache save periods in config-converter

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6, 0.7.0
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6-v5.txt, 1417-cassandra-0.6-v7.txt, 1417-cassandra-0.6-v9.txt, 1417-cassandra-0.6.txt, 1417-trunk-v7.txt, 1417-trunk-v9.txt, 1417-v2.txt, 1417-v6.txt, 1417-v8.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916709#action_12916709 ] 

Jonathan Ellis edited comment on CASSANDRA-1417 at 9/30/10 6:10 PM:
--------------------------------------------------------------------

v8 attached.

 - sets default key cache save period to 0.  please stop changing this, we shouldn't be inflicting extra i/o on people by default.
 - fixes loading of row cache in on-disk (token) order
 - adds jmx interface to manually save cache
 - misc cleanup

      was (Author: jbellis):
    v8 attached.

 - sets default key cache save period to 0.  please stop changing this, we shouldn't be inflicting extra i/o on people by default.
 - fixes loading of key cache in on-disk (token) order
 - adds jmx interface to manually save cache
 - misc cleanup
  
> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6, 0.7 beta 2
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6-v5.txt, 1417-cassandra-0.6-v7.txt, 1417-cassandra-0.6.txt, 1417-trunk-v7.txt, 1417-v2.txt, 1417-v6.txt, 1417-v8.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1417) add cache save/load

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901122#action_12901122 ] 

Stu Hood commented on CASSANDRA-1417:
-------------------------------------

Could this be stored in the statistics column family that was added in 0.7?

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.6.6, 0.7 beta 2
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1417:
--------------------------------------

    Description: 
Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.

The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.

Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

  was:
Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to Thrift, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.

The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.

Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).


> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.6.6, 0.7 beta 2
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1417:
--------------------------------------

    Fix Version/s:     (was: 0.7 beta 2)
                   0.7.0

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6, 0.7.0
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6-v5.txt, 1417-cassandra-0.6-v7.txt, 1417-cassandra-0.6.txt, 1417-trunk-v7.txt, 1417-v2.txt, 1417-v6.txt, 1417-v8.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment: 1417-cassandra-0.6-v3.txt

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6.txt, 1417-v2.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment: 1417-cassandra-0.6.txt

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment: 1417-cassandra-0.6.txt

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906861#action_12906861 ] 

Matthew F. Dennis commented on CASSANDRA-1417:
----------------------------------------------

new patch attached

{quote}if you're going to organize imports, follow the order on http://wiki.apache.org/cassandra/CodeStyle. also, looks like unused classes like crypto.Data are being added{quote}

I've updated intellij again to match the order on the wiki

{quote}loadRowCache should probably be called by CFS constructor{quote}

loadRowCache depends on the table being fully constructed first so it's done after the initial Table.open()

{quote}SavedCacheReader is a class with no fields, should probably just be a static method somewhere (SSTable?). also, should return a new set rather than taking one as parameter{quote}

It takes a parameter because the two places it's called pass it different set implementations.  In .7 the method creates a new TreeSet<byte[]>(BytesType) and returns that.

{quote}should respect configured cache size and if saved cache is larger, should log at info and oad as much as there is room for{quote}

The configured cache size depends on the results from reading all the indexes when SSTableTrackers are created (which is the same place the cache is populated so we don't iterate over all the index entries twice).  For percentages, we can't really get around this - we need to get a row count before we can figure out the percentage.  For cache settings that are fixed in size, we could do that but given that cache sizes hardly ever change and even when they do the cache will be set to the correct size after we have a row count the extra code to handle this isn't worth it.


> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment: 1417-cassandra-0.6.txt

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment:     (was: 1417-cassandra-0.6.txt)

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1417:
--------------------------------------

    Attachment: 1417-v2.txt

started cleaning it up (v2 attached): spaces after //, justification after newline according to style guide, operator spacing, etc.

major fix still needed is converting the timer to an executor (creating a new thread and joining it is a clever workaround but clearly worse than just using an executor -- DTPE has a constructor that does what you want).

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6.txt, 1417-v2.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916618#action_12916618 ] 

Matthew F. Dennis commented on CASSANDRA-1417:
----------------------------------------------

v7 patches:
uses RetryingScheduledThreadPoolExecutor in both trunk and 0.6
moves saved caches to their own directory and changed names of saved cache files (to prevent conflicts with index/db files)
adds config option for location of saved caches directory and creates it on startup
changes 0.6 to write byte[] instead of string to 0.7 can read them
defaults keycache save period to 3600 seconds, turns rowcache save period off
changes default (when not specified) compaction and save cache priority to Thread.MIN
more useful logging, less redundant logging
switched to using com.google.common.base.Function in 0.7 for savedCacheWriter
modified config-converter to convert SavedCacheDirectory (and noticed that I missed the save_[key|row]_cache period)
modified config-converter to take location of old and new config as arguments



> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6, 0.7 beta 2
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6-v5.txt, 1417-cassandra-0.6-v7.txt, 1417-cassandra-0.6.txt, 1417-trunk-v7.txt, 1417-v2.txt, 1417-v6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-1417:
-----------------------------------------

    Assignee: Matthew F. Dennis

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6, 0.7 beta 2
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment: 1417-cassandra-0.6-v5.txt

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6-v5.txt, 1417-cassandra-0.6.txt, 1417-v2.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1417:
--------------------------------------

    Attachment: 1417-v6.txt

v6 has RetryingScheduledThreadPoolExecutor.

unfortunate that you did not start with v2 as a base for subsequent revisions; I had to re-apply those edits again.

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6-v5.txt, 1417-cassandra-0.6.txt, 1417-v2.txt, 1417-v6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906989#action_12906989 ] 

Jonathan Ellis commented on CASSANDRA-1417:
-------------------------------------------

LoggingOnlyWrappedRunnable isn't included in the patch, but I can guess.  I'd rather have a DSTPE than rely on the runnable to catch its own exceptions, which is error-prone.

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6.txt, 1417-v2.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment:     (was: 1417-cassandra-0.6.txt)

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1417:
--------------------------------------

    Attachment: 1417-v8.txt

v8 attached.

 - sets default key cache save period to 0.  please stop changing this, we shouldn't be inflicting extra i/o on people by default.
 - fixes loading of key cache in on-disk (token) order
 - adds jmx interface to manually save cache
 - misc cleanup

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6, 0.7 beta 2
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6-v5.txt, 1417-cassandra-0.6-v7.txt, 1417-cassandra-0.6.txt, 1417-trunk-v7.txt, 1417-v2.txt, 1417-v6.txt, 1417-v8.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1417) add cache save/load

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1417:
-----------------------------------------

    Attachment: 1417-cassandra-0.6-v7.txt
                1417-trunk-v7.txt

it's important that both the trunk and 0.6 patches are applied together (otherwise the upgrades might not work) so if there are changes required in one, it's probably best holding off on committing the other.

> add cache save/load
> -------------------
>
>                 Key: CASSANDRA-1417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1417
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.6.6
>
>         Attachments: 1417-cassandra-0.6-v3.txt, 1417-cassandra-0.6-v4.txt, 1417-cassandra-0.6-v5.txt, 1417-cassandra-0.6-v7.txt, 1417-cassandra-0.6.txt, 1417-trunk-v7.txt, 1417-v2.txt, 1417-v6.txt
>
>
> Since mixing 0.7 nodes with 0.6 is looking increasingly unlikely to be supported because of the deep changes to the Thrift API, we should allow saving out the 0.6 cache and loading it on startup so that we don't inflict the pain of an entire cluster of cold cache on upgraders.
> The cache format should just be a list of row keys.  Loading it is as simple as calling getColumnFamily (with a zero-column predicate) on each row, for row cache.
> Key cache is more complicated, but only a little.  First is that you have to de-duplicate the row keys from multiple sstables.  (Saving which sstable version it's associated with is less useful, since that will be obsoleted by compaction.)  Second is that we don't need to actually read any row data, we just need to go through the index locator part of the read path (getPosition).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.