You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sandeep Tata (JIRA)" <ji...@apache.org> on 2009/03/27 02:12:50 UTC

[jira] Created: (CASSANDRA-16) Memory efficient compactions

Memory efficient compactions 
-----------------------------

                 Key: CASSANDRA-16
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
             Project: Cassandra
          Issue Type: Improvement
         Environment: All
            Reporter: Sandeep Tata


The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).

The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-16:
------------------------------------

    Fix Version/s:     (was: 0.5)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689793#action_12689793 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

High level, you want to make a CF deserializer that implements Iterable<IColumn> (with buffering of course).  Then have merge operate on those iterables instead of full CFs.

It should be fairly self-contained, really.  I think you only need to worry about the code in this small part of doCompaction:

{code}
	                                if(columnFamilies.size() > 1)
	                                {
	    		                        merge(columnFamilies);
	                                }
			                        // deserialize into column families                                    
			                     columnFamilies.add(ColumnFamily.serializer().deserialize(filestruct.getBufIn()));
{code}

and then the sub-methods of merge of course.

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737582#action_12737582 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

sure, if you want to add two seeks per row (first back to the hole, second to reposition for the next row).

I'd rather maintain our No Seeking For Writes design than have huge rows.

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.5
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-16:
---------------------------------------

    Assignee: Eric Evans

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: trunk
>         Environment: All
>            Reporter: Sandeep Tata
>            Assignee: Eric Evans
>             Fix For: 0.3
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725726#action_12725726 ] 

Jun Rao commented on CASSANDRA-16:
----------------------------------

A couple of comments.

1. While the row index has one index entry per row, the column index has one index entry per group of columns. So, the chance of column index not fitting in memory is low. Plus, one can always increase the column group size to reduce the index footprint.

2. As a general solution, maybe we can put the column index after the column data in the same file. During compaction, we try to keep the column index in memory. If not possible, we append the column index to a temp file first. After we have written all columns, we copy the column index from the temp file to the end of the data file. So, in the worse case, we make two passes of the column index, but not the column data.

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737409#action_12737409 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

I suppose we could get the size information from the index, though.  But that introduces a fair amount of complexity to what used to be simple operations.

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.5
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689947#action_12689947 ] 

Jun Rao commented on CASSANDRA-16:
----------------------------------

A CF can be defined to be indexed either by name or by timestamp. When storing columns in sstables, the columns are sorted according to the index attribute, i.e., either name or timestamp.


> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-16) Memory efficient compactions

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata reassigned CASSANDRA-16:
-------------------------------------

    Assignee:     (was: Sandeep Tata)

Prashant, Avinash -- are you guys working on this already?

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726111#action_12726111 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

You're right, since not each column name is indexed I think we can get by with column index in memory.  this will allow 100s of millions of columns, maybe only 10s to make sure you can hold multiple large indexes in memory at once, but that is still adequate for any use case I can think of.  So I don't think we need to worry about writing indexes to a separate file for that reason.

There are two other downsides though to endex-at-the-end; one is having to do an extra seek (we seek first to the end of the row to read the index size, then have to seek back from there to read the actual index), and the other is that index-at-the-end code will is inherently more complex than index-in-separate-file.

But index-in-separate-file has its own problems; an extra fopen on the performance side, and since we'd want to keep small indexes inline, the complexity of handling both inline indexes and separate-file ones.

On balance I think I lean towards index-at-the-end and hope we have enough ram that the OS cache can make the extra seek go away. :)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725449#action_12725449 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

Indexes are the real problem we're going to have to deal with here.

We can't write the indexes first, if we can't merge the columns we're indexing in memory.  (Not without making two passes: one to scan all the column names while writing the indexes, and another to do the full merge.  Two passes is too high a cost to pay.)

But we can't merge the columns in a streaming fashion while keeping the index data in memory to spit out at the end, either.  We just fixed a bug from taking exactly this approach in CASSANDRA-208: this would limit the number of columns we support to a relatively small number; probably low millions, depending on your column name size and how much memory you can throw at the jvm.

I think a hybrid approach is called for.  If there are less than some threshold of columns (1000? 100000?) we merge in memory and put the index first, as we do now.  Otherwise, we do a streaming merge and write the index to a separate file, similar to how we write the key index now.  (In fact we could probably encapsulate this code as SSTableIndexWriter and use it in both places.)

We don't want to _always_ index in separate file because (a) filesystems have limits too -- we don't want one index file per row per columnfamily -- and because we want to do streaming writes wherever possible, which means staying in the same file.

This approach will result in a litlte more seeking (between column index and sstable) than the two-pass inline approach, but merging in a single pass is worth the trade.  (Remember that for large rows, reading the input multiple sstables will not be seek-free either once buffers max out.  So we want to keep to a single pass for performance as well as simplicity.)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719581#action_12719581 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

see CASSANDRA-226 for an idea on how to address time-sorted CFs here

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744279#action_12744279 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

(This would be appropriate for workloads where you have a few outlier rows that incur the two-pass penalty, but most of the time you do not so it is less painful to do a few slower merges than redo the datamodel to something that maps less well to the domain.)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.5
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-16:
------------------------------------

    Priority: Major  (was: Critical)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-16:
------------------------------------

        Fix Version/s: 0.3
    Affects Version/s: trunk

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: trunk
>         Environment: All
>            Reporter: Sandeep Tata
>            Assignee: Eric Evans
>             Fix For: 0.3
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744277#action_12744277 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

Or, how about this compromise:

We know each row size at the start.  If the sum of these (which will always be equal or greater than the actual merged size) is greater than some user-defined number of MB, we do a two-pass merge; first to compute bloom filter, column index, and total row size, and second to actually write out the merged columns.

Otherwise we do an in-memory merge the way we do now so that narrow rows are not penalized.

This has the added benefit of not requiring a disk format change.


> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.5
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689948#action_12689948 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

Oh, right, when they are stored by timestamp then you're not guaranteed that the columns you need to merge will come out in anything near the right order.

Probably need to make two passes, or fall back to old in-memory for time-based.  (Does anyone use those?)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737408#action_12737408 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

We have a bigger problem.

We rely on knowing the total size of the serialized columns to be able to seek around the sstable.  But we can't write that data at the start without making two passes (the first to compute the size).  Obviously writing it at the end is a nonstarter since we'd have no way to know where the end is, absent the size information.

Bigtable doesn't seem to have found a way out of this either, limiting the data associated with a key to 64KB (see section 4).

I'd rather limit the size (2GB is the current limit, which is more reasonable than 64KB I think) than make two passes in compaction.  Huge rows seems almost like a misfeature given the key-oriented partitioner design.

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.5
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737577#action_12737577 ] 

Jun Rao commented on CASSANDRA-16:
----------------------------------

Can we leave a place-holder for the total size at the start and go back and fill the hole at the end of compaction?

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.5
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-16:
------------------------------------

    Fix Version/s: 0.9

Re-scheduling for 0.9.  Maybe this time for sure. :)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>             Fix For: 0.9
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-16:
------------------------------------

    Fix Version/s:     (was: 0.6)
                   0.7

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>             Fix For: 0.7
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-16:
------------------------------------

    Fix Version/s:     (was: 0.3)
                   0.4

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698425#action_12698425 ] 

Jun Rao commented on CASSANDRA-16:
----------------------------------

Prashant, Avinash,

Are you guys working on this issue already? Thanks,


> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-16:
---------------------------------------

    Assignee: Sandeep Tata  (was: Eric Evans)

Sandeep, are you or Jun going to be able to work on this?

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: trunk
>         Environment: All
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.3
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689940#action_12689940 ] 

Jun Rao commented on CASSANDRA-16:
----------------------------------

If a CF is indexed by name, an IColumn iterator is not that hard to be added on an sstable. It becomes harder if a CF is not indexed by name.

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-16:
------------------------------------

    Fix Version/s:     (was: 0.4)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-16:
------------------------------------

    Priority: Critical  (was: Major)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689942#action_12689942 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

I think I remember seeing something about by-time sorting that made it more complicated but I don't remember what.  Can you refresh my memory?

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>
> The basic idea is to allow rows to get large enough that they don't have to fit in memory entirely, but can easily fit on a disk. The compaction algorithm today de-serializes the entire row in memory before writing out the compacted SSTable (see ColumnFamilyStore.doCompaction() and associated methods).
> The requirement is to have a compaction method with a lower memory requirement so we can support rows larger than available main memory. To re-use the old FB example, if we stored a user's inbox in a row, we'd want the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.