You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/07/01 17:41:47 UTC

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726111#action_12726111 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

You're right, since not each column name is indexed I think we can get by with column index in memory.  this will allow 100s of millions of columns, maybe only 10s to make sure you can hold multiple large indexes in memory at once, but that is still adequate for any use case I can think of.  So I don't think we need to worry about writing indexes to a separate file for that reason.

There are two other downsides though to endex-at-the-end; one is having to do an extra seek (we seek first to the end of the row to read the index size, then have to seek back from there to read the actual index), and the other is that index-at-the-end code will is inherently more complex than index-in-separate-file.

But index-in-separate-file has its own problems; an extra fopen on the performance side, and since we'd want to keep small indexes inline, the complexity of handling both inline indexes and separate-file ones.

On balance I think I lean towards index-at-the-end and hope we have enough ram that the OS cache can make the extra seek go away. :)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.