You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/09/10 16:36:57 UTC

[jira] Created: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

OOM during major compaction on many (hundreds) of sstables
----------------------------------------------------------

                 Key: CASSANDRA-436
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis
             Fix For: 0.5


compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755104#action_12755104 ] 

Jonathan Ellis commented on CASSANDRA-436:
------------------------------------------

04
    Replace PriorityQueue mess with a CompactionIterator that efficiently yields compacted Rows from a set of
    sstables by feeding CollationIterator into a ReducingIterator transform.  ("Efficiently" means we        
    never deserialize data until it is needed, so the number of sstables that can be compacted at once is    
    virtually unlimited, and if only one sstable contains a given key that row data will be copied over      
    without an intermediate de/serialize step.) This is a very natural fit                                   
    for the compaction algorithm and almost entirely gets rid of duplicated code between doFileCompaction and
    doAntiCompaction.

03
    allow ReducingIterator to reduce from one type to a different one

02
    copy FileStruct to SSTableScanner and remove cruft.  Migrate getKeyRange to new scanner class.

01
    minor cleanup


> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755863#action_12755863 ] 

Chris Goffinet commented on CASSANDRA-436:
------------------------------------------

I will be testing this tonight on our cluster. I'll need roughly 15-20 hours but should have some results tomorrow.

> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-436:
-------------------------------------

    Attachment: 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
                0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt
                0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt
                0001-CASSANDRA-436-minor-fixes.txt

> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 0001-CASSANDRA-436-minor-fixes.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-436:
-------------------------------------

    Attachment: 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
                0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt
                0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt
                0001-CASSANDRA-436-minor-fixes.txt

> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757147#action_12757147 ] 

Hudson commented on CASSANDRA-436:
----------------------------------

Integrated in Cassandra #201 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/201/])
    Replace PriorityQueue mess with a CompactionIterator that efficiently yields compacted Rows from a set of sstables by feeding CollationIterator into a ReducingIterator transform.  ("Efficiently" means we never deserialize data until it is needed, so the number of sstables that can be compacted at once is  virtually unlimited, and if only one sstable contains a given key that row data will be copied over without an intermediate de/serialize step.) This is a very natural fit for the compaction algorithm and almost entirely gets rid of duplicated code between doFileCompaction and doAntiCompaction.
patch by jbellis; reviewed by goffinet for 
allow ReducingIterator to reduce from one type to a different one
patch by jbellis; reviewed by goffinet for 
copy FileStruct to SSTableScanner and remove cruft.  Migrate getKeyRange to new scanner class.
patch by jbellis; reviewed by goffinet for 
minor fixes
patch by jbellis; reviewed by goffinet for 


> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 0001-CASSANDRA-436-minor-fixes.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756066#action_12756066 ] 

Jonathan Ellis commented on CASSANDRA-436:
------------------------------------------

rebased

> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 0001-CASSANDRA-436-minor-fixes.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755918#action_12755918 ] 

Chris Goffinet commented on CASSANDRA-436:
------------------------------------------

This needs to be rebased, I can't apply the last patch

> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-436) OOM during major compaction on many (hundreds) of sstables

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756111#action_12756111 ] 

Chris Goffinet commented on CASSANDRA-436:
------------------------------------------

Tested on our cluster. Much better improvement! Before we were seeing 2-7GB of heap usage now its under 700MB.

+1

> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 0001-CASSANDRA-436-minor-fixes.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt, 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one per sstable.  if we only deserialized on-demand the current algorithm would be fine on nearly arbitrarily large numbers of sstables.  (this is only important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.