You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/06/27 00:23:47 UTC

[jira] Created: (CASSANDRA-259) LRU cache for key positions

LRU cache for key positions
---------------------------

                 Key: CASSANDRA-259
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis


add cache like the old touch cache, but working :)

this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726249#action_12726249 ] 

Jun Rao commented on CASSANDRA-259:
-----------------------------------

Some minor comments:
1. Need to make it clear that the new config para keyCacheSize is in percentage of # of keys. Change it to keyCacheSizeInPCT?
2. Make SSTable an abstract class.

The bigger questions:  Do we plan to cache the column values themselves at CFS level? This seems to be the more effective caching mechanism. Will caching at CFS level obviate the need for caching key positions or do we want to support caching in multiple levels?

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726772#action_12726772 ] 

Jonathan Ellis commented on CASSANDRA-259:
------------------------------------------

Not efficiently, no.

The default KCF of 0.01 will use roughly the same amount of memory as the existing 1/128 key "index" used for binary search.  I can add a comment to that effect.

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725438#action_12725438 ] 

Jonathan Ellis commented on CASSANDRA-259:
------------------------------------------

LinkedHashMap is a nonstarter though.  This was used in the old code but it's not threadsafe and if you wrap in the naive Collections.synchronizedMap performance will suffer since every read (from potentially lots of threads) has to go through that.

Going to use the one from http://code.google.com/p/concurrentlinkedhashmap/.  The race condition mentioned on the front page is fixed in trunk.

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-259) LRU cache for key positions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-259.
--------------------------------------

    Resolution: Fixed

committed with changes noted above.

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725445#action_12725445 ] 

Jonathan Ellis commented on CASSANDRA-259:
------------------------------------------

Thanks for the pointer.  I didn't actually see much about CLHM there, just the comments "Their 'old algorithm' is much like this one. The new algorithm looks very cool but is not production ready yet... Okay the algorithms i'm using in our version are not actually the same as their old algorithm. In any case, should stay tuned to their new algorithm."

Do you think we should borrow HBase's implementation instead?  I'm reasonably confident that CLHM is production ready given that ehcache (http://ehcache.sourceforge.net/) uses them heavily enough to have run into at least that one concurrency bug that's fixed now. :)


> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728668#action_12728668 ] 

Hudson commented on CASSANDRA-259:
----------------------------------

Integrated in Cassandra #131 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/131/])
    per-table key cache size.
patch by jbellis; reviewed by Jun Rao for 
refactor sstable into SSTable, SSTableReader, and SSTableWriter.
patch by jbellis; reviewed by Jun Rao for 
add concurrentlinkedhashmap cache and config option.  config option is NOT yet wired up to control cache sizes.
patch by jbellis; reviewed by Jun Rao for 
encapsulate bloom filter access into sstable.getPosition
patch by jbellis; reviewed by Jun Rao for 


> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726253#action_12726253 ] 

Jonathan Ellis commented on CASSANDRA-259:
------------------------------------------

1. I thought the comment
            <!-- Key cache size is the fraction of keys per sstable whose locations
                 we keep in memory in "mostly LRU" order. -->

made it clear what was going on, but we can change it to KeyCachedFraction, if you like that better.  (It's not a percent, since that is 0-100 not 0-1 :)

2. done

3: Column sizes vary a lot more than key sizes.  Also it is clear that if the client is doing a slice op on the key, we want to cache the key location, but do we want to cache the column values?  That is much less clear.  Hence I think this is best managed explicitly by the client with memcached, ehcache, etc.

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726106#action_12726106 ] 

Jonathan Ellis commented on CASSANDRA-259:
------------------------------------------

04
    wire up per-table cache size

03
    refactor sstable into SSTable, SSTableReader, and SSTableWriter.

02
    add concurrentlinkedhashmap cache and config option.  config option is NOT yet wired up to control cache sizes.

01
    encapsulate bloom filter access into sstable.getPosition


> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725439#action_12725439 ] 

Michael Greene commented on CASSANDRA-259:
------------------------------------------

There's good discussion on that project and LRU cache in general in HBASE-1460 if you missed it.

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726517#action_12726517 ] 

Jun Rao commented on CASSANDRA-259:
-----------------------------------

Fine. Another comment. KeyCachedFraction doesn't give an upper bound on memory consumption. Is it possible to change it to an absolute number (maybe per CF) in terms of MB?

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-259) LRU cache for key positions

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725457#action_12725457 ] 

Michael Greene commented on CASSANDRA-259:
------------------------------------------

ehcache picked up that new implementation fairly recently (release two weeks ago, from the looks of it) but it does look like they received massive benefits from it.  http://gregluck.com/blog/archives/2009/02/i_have_been_wai.html -- wow.  I didn't realize that the new design had hit major software yet, just remembered seeing jgray's comment. +1 CLHM.

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-259) LRU cache for key positions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-259:
-------------------------------------

    Attachment: 0004-per-table-cache-size.txt
                0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt
                0002-add-concurrentlinkedhashmap-cache-and-config-option.txt
                0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-259) LRU cache for key positions

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Greene updated CASSANDRA-259:
-------------------------------------

    Component/s: Core

> LRU cache for key positions
> ---------------------------
>
>                 Key: CASSANDRA-259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-259
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-259-encapsulate-bloom-filter-access-into-sst.txt, 0002-add-concurrentlinkedhashmap-cache-and-config-option.txt, 0003-refactor-sstable-into-SSTable-SSTableReader-and-SSTa.txt, 0004-per-table-cache-size.txt
>
>
> add cache like the old touch cache, but working :)
> this will mitigate the performance hit from CASSANDRA-223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.