You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Pavel Yaskevich (JIRA)" <ji...@apache.org> on 2011/06/01 23:44:48 UTC

[jira] [Created] (CASSANDRA-2731) Impelement in-house file caching.

Impelement in-house file caching.
---------------------------------

                 Key: CASSANDRA-2731
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2731
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
    Affects Versions: 1.0
            Reporter: Pavel Yaskevich
            Assignee: Pavel Yaskevich


Implement FileCache, CachedRandomAccessFile (to replace BufferedRandomAccessFile) and RadixTree (to play role of the backend cache storage) classes.

FileCache class with be responsible for storing/retrieving data from Radix Tree and also flushing of the dirty pages to the disk, page management such as adding new pages, utilizing old/unused pages.

CRAF Linux only features (via JNI):

1). O_DIRECT for both read/write operations.
2). AIO's lio_listio write operation batching.

Provide possibility to migrate hot data directly from Memtable to CRAF cache to keep live-reads data always hot in memory. To minimise compaction effects CRAF should provide a way to by-pass a caching data if it does not already exists. 

Provide a way to make pointers in the cache which will be useful to minimize impact on performance when a single column is distributed among multiple SSTable files (except counter columns).

Use jemalloc (http://www.canonware.com/jemalloc/) for cache memory management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2731) Impelement in-house file caching.

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048613#comment-13048613 ] 

Jonathan Ellis commented on CASSANDRA-2731:
-------------------------------------------

So the idea is we store disk blocks in the radix tree?  Why a radix tree?  What do we do for Windows? Would it make sense to store sstable blocks directly instead if we move to a block-based format like CASSANDRA-674?

> Impelement in-house file caching.
> ---------------------------------
>
>                 Key: CASSANDRA-2731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2731
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>
> Implement FileCache, CachedRandomAccessFile (to replace BufferedRandomAccessFile) and RadixTree (to play role of the backend cache storage) classes.
> FileCache class with be responsible for storing/retrieving data from Radix Tree and also flushing of the dirty pages to the disk, page management such as adding new pages, utilizing old/unused pages.
> CRAF Linux only features (via JNI):
> 1). O_DIRECT for both read/write operations.
> 2). AIO's lio_listio write operation batching.
> Provide possibility to migrate hot data directly from Memtable to CRAF cache to keep live-reads data always hot in memory. To minimise compaction effects CRAF should provide a way to by-pass a caching data if it does not already exists. 
> Provide a way to make pointers in the cache which will be useful to minimize impact on performance when a single column is distributed among multiple SSTable files (except counter columns).
> Use jemalloc (http://www.canonware.com/jemalloc/) for cache memory management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2731) Impelement in-house file caching.

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2731:
--------------------------------------

             Priority: Minor  (was: Major)
    Affects Version/s:     (was: 1.0)

> Impelement in-house file caching.
> ---------------------------------
>
>                 Key: CASSANDRA-2731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2731
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>
> Implement FileCache, CachedRandomAccessFile (to replace BufferedRandomAccessFile) and RadixTree (to play role of the backend cache storage) classes.
> FileCache class with be responsible for storing/retrieving data from Radix Tree and also flushing of the dirty pages to the disk, page management such as adding new pages, utilizing old/unused pages.
> CRAF Linux only features (via JNI):
> 1). O_DIRECT for both read/write operations.
> 2). AIO's lio_listio write operation batching.
> Provide possibility to migrate hot data directly from Memtable to CRAF cache to keep live-reads data always hot in memory. To minimise compaction effects CRAF should provide a way to by-pass a caching data if it does not already exists. 
> Provide a way to make pointers in the cache which will be useful to minimize impact on performance when a single column is distributed among multiple SSTable files (except counter columns).
> Use jemalloc (http://www.canonware.com/jemalloc/) for cache memory management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2731) Impelement in-house file caching.

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104673#comment-13104673 ] 

Matthew F. Dennis commented on CASSANDRA-2731:
----------------------------------------------

as much as I dislike winblows, the reality is that their async IO facilities are far superior to anything you'll find in Linux.  I'm not sure how one would go about using them from the JVM, but my guess is that the new NIO stuff uses it natively without having a user space thread pool like it does on Linux.

> Impelement in-house file caching.
> ---------------------------------
>
>                 Key: CASSANDRA-2731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2731
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>
> Implement FileCache, CachedRandomAccessFile (to replace BufferedRandomAccessFile) and RadixTree (to play role of the backend cache storage) classes.
> FileCache class with be responsible for storing/retrieving data from Radix Tree and also flushing of the dirty pages to the disk, page management such as adding new pages, utilizing old/unused pages.
> CRAF Linux only features (via JNI):
> 1). O_DIRECT for both read/write operations.
> 2). AIO's lio_listio write operation batching.
> Provide possibility to migrate hot data directly from Memtable to CRAF cache to keep live-reads data always hot in memory. To minimise compaction effects CRAF should provide a way to by-pass a caching data if it does not already exists. 
> Provide a way to make pointers in the cache which will be useful to minimize impact on performance when a single column is distributed among multiple SSTable files (except counter columns).
> Use jemalloc (http://www.canonware.com/jemalloc/) for cache memory management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2731) Impelement in-house file caching.

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048632#comment-13048632 ] 

Pavel Yaskevich edited comment on CASSANDRA-2731 at 6/13/11 4:48 PM:
---------------------------------------------------------------------

Sorry, I misunderstood that. I previously read that Windows also has a possibility to run async I/O but I still need to investigate that deeper, anyway I was planing to check a platform using #ifdef's and act properly.

      was (Author: xedin):
    Sorry, I misunderstand that. I previously read that Windows also has a possibility to run async I/O but I still need to investigate that deeper, anyway I was planing to check a platform using #ifdef's and act properly.
  
> Impelement in-house file caching.
> ---------------------------------
>
>                 Key: CASSANDRA-2731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2731
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>
> Implement FileCache, CachedRandomAccessFile (to replace BufferedRandomAccessFile) and RadixTree (to play role of the backend cache storage) classes.
> FileCache class with be responsible for storing/retrieving data from Radix Tree and also flushing of the dirty pages to the disk, page management such as adding new pages, utilizing old/unused pages.
> CRAF Linux only features (via JNI):
> 1). O_DIRECT for both read/write operations.
> 2). AIO's lio_listio write operation batching.
> Provide possibility to migrate hot data directly from Memtable to CRAF cache to keep live-reads data always hot in memory. To minimise compaction effects CRAF should provide a way to by-pass a caching data if it does not already exists. 
> Provide a way to make pointers in the cache which will be useful to minimize impact on performance when a single column is distributed among multiple SSTable files (except counter columns).
> Use jemalloc (http://www.canonware.com/jemalloc/) for cache memory management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2731) Impelement in-house file caching.

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048626#comment-13048626 ] 

Jonathan Ellis commented on CASSANDRA-2731:
-------------------------------------------

By "What do we do for Windows" I was referring to the native parts of this, not the radix tree.

> Impelement in-house file caching.
> ---------------------------------
>
>                 Key: CASSANDRA-2731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2731
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>
> Implement FileCache, CachedRandomAccessFile (to replace BufferedRandomAccessFile) and RadixTree (to play role of the backend cache storage) classes.
> FileCache class with be responsible for storing/retrieving data from Radix Tree and also flushing of the dirty pages to the disk, page management such as adding new pages, utilizing old/unused pages.
> CRAF Linux only features (via JNI):
> 1). O_DIRECT for both read/write operations.
> 2). AIO's lio_listio write operation batching.
> Provide possibility to migrate hot data directly from Memtable to CRAF cache to keep live-reads data always hot in memory. To minimise compaction effects CRAF should provide a way to by-pass a caching data if it does not already exists. 
> Provide a way to make pointers in the cache which will be useful to minimize impact on performance when a single column is distributed among multiple SSTable files (except counter columns).
> Use jemalloc (http://www.canonware.com/jemalloc/) for cache memory management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2731) Impelement in-house file caching.

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048632#comment-13048632 ] 

Pavel Yaskevich commented on CASSANDRA-2731:
--------------------------------------------

Sorry, I misunderstand that. I previously read that Windows also has a possibility to run async I/O but I still need to investigate that deeper, anyway I was planing to check a platform using #ifdef's and act properly.

> Impelement in-house file caching.
> ---------------------------------
>
>                 Key: CASSANDRA-2731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2731
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>
> Implement FileCache, CachedRandomAccessFile (to replace BufferedRandomAccessFile) and RadixTree (to play role of the backend cache storage) classes.
> FileCache class with be responsible for storing/retrieving data from Radix Tree and also flushing of the dirty pages to the disk, page management such as adding new pages, utilizing old/unused pages.
> CRAF Linux only features (via JNI):
> 1). O_DIRECT for both read/write operations.
> 2). AIO's lio_listio write operation batching.
> Provide possibility to migrate hot data directly from Memtable to CRAF cache to keep live-reads data always hot in memory. To minimise compaction effects CRAF should provide a way to by-pass a caching data if it does not already exists. 
> Provide a way to make pointers in the cache which will be useful to minimize impact on performance when a single column is distributed among multiple SSTable files (except counter columns).
> Use jemalloc (http://www.canonware.com/jemalloc/) for cache memory management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2731) Impelement in-house file caching.

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048624#comment-13048624 ] 

Pavel Yaskevich commented on CASSANDRA-2731:
--------------------------------------------

RadixTree is the best data structure for that - tree of the low depth allows to store and fast access to the huge amount of data (for 4KB page, 64 slots in each of the nodes on the depth of 6 allows to store/index up to 15TB of data), it will be implemented in Java so no problem with portability. http://lwn.net/Articles/175432/ overview of the structure

> Impelement in-house file caching.
> ---------------------------------
>
>                 Key: CASSANDRA-2731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2731
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>
> Implement FileCache, CachedRandomAccessFile (to replace BufferedRandomAccessFile) and RadixTree (to play role of the backend cache storage) classes.
> FileCache class with be responsible for storing/retrieving data from Radix Tree and also flushing of the dirty pages to the disk, page management such as adding new pages, utilizing old/unused pages.
> CRAF Linux only features (via JNI):
> 1). O_DIRECT for both read/write operations.
> 2). AIO's lio_listio write operation batching.
> Provide possibility to migrate hot data directly from Memtable to CRAF cache to keep live-reads data always hot in memory. To minimise compaction effects CRAF should provide a way to by-pass a caching data if it does not already exists. 
> Provide a way to make pointers in the cache which will be useful to minimize impact on performance when a single column is distributed among multiple SSTable files (except counter columns).
> Use jemalloc (http://www.canonware.com/jemalloc/) for cache memory management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira