You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/08/30 01:15:33 UTC

[jira] Created: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
-------------------------------------------------------------------------------------------------

                 Key: CASSANDRA-401
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Jonathan Ellis
             Fix For: 0.5


Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756122#action_12756122 ] 

Jonathan Ellis edited comment on CASSANDRA-401 at 9/16/09 10:38 AM:
--------------------------------------------------------------------

02
    clean out unused code from MessagingService. Inline sink processing into sendOneWay instead of having another executor.
    this sets the stage for backpressuring the client, should we choose to do that

01
    support multiple flush threads safely.  automatically use up to avaiable core count threads for
    flushing.  pause updates when too many unflushed memtables are generated.

Note that I'm not actually adding backpressure here: on further thought, it seems like the worst of both worlds.  No matter what we do on the sending side, we can't tell ahead of time if the receiving node is going to start blocking while waiting to apply the mutation.  So backpressure would mean having to deal with both UnavailableException, and potentially unbounded wait times.  Without it we just have to deal w/ the former case.

      was (Author: jbellis):
    02
    clean out unused code from MessagingService. Inline sink processing into sendOneWay instead of having another exe
    this sets the stage for backpressuring the client, should we choose to do that

01
    support multiple flush threads safely.  automatically use up to avaiable core count threads for
    flushing.  pause updates when too many unflushed memtables are generated.

Note that I'm not actually adding backpressure here: on further thought, it seems like the worst of both worlds.  No matter what we do on the sending side, we can't tell ahead of time if the receiving node is going to start blocking while waiting to apply the mutation.  So backpressure would mean having to deal with both UnavailableException, and potentially unbounded wait times.  Without it we just have to deal w/ the former case.
  
> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-401.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758524#action_12758524 ] 

Chris Goffinet commented on CASSANDRA-401:
------------------------------------------

+1

> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-401.txt, 0001-CASSANDRA-401.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, 0003-r-m-unused-commitlogcontext-arg-to-flush.-synchronize.txt, 0004-clean-up-ThreadFactoryImpl-and-rename-to-NamedThreadFa.txt, 0005-split-flusher-executor-into-flushSorter-and-flushWrite.txt, screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757386#action_12757386 ] 

Chris Goffinet commented on CASSANDRA-401:
------------------------------------------

This is looking good. As discussed in IRC, the better solution would be to have "two" thread pools, one for sorting (availableCores()) and another thread pool for writing to disk (numberOfDisks)

> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-401.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755700#action_12755700 ] 

Jonathan Ellis commented on CASSANDRA-401:
------------------------------------------

The MemoryPoolMXBean approach is going to be JVM and even GC implementation specific, right?  For instance right now I have CMS old gen, CMS perm gen, code cache, par eden space, and par survivor space memory pools with the JDK6 CMS GC.  But with the throughput GC, or JDK7 G1, those would be different.  So picking a single pool and setting a threshold seems fragile, and so does taking the average or max of all pools.

> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Sam Pullara (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sam Pullara updated CASSANDRA-401:
----------------------------------

    Attachment: screenshot-1.jpg

Cassandra continues to accept writes from a client even though it is getting behind in flushing and compacting, eventually reaching GC thrash state and stopping forward progress.  You have to be very careful not to reach this state when loading a lot of data.  My suggestion was to use the MemoryPoolMXBean to monitor the amount of heap available after a GC to ensure that you never get into this state.  You should like stop accepting writes from clients at that point until the condition expires.

> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757898#action_12757898 ] 

Jonathan Ellis commented on CASSANDRA-401:
------------------------------------------

05
    split flusher executor into flushSorter and flushWriter.
    This is because sorting is CPU-bound, and writing is disk-bound; we want to be able to do both at 

04
    clean up ThreadFactoryImpl and rename to NamedThreadFactory

03
    r/m unused commitlogcontext arg to flush. synchronize getTempSSTablePath


> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-401.txt, 0001-CASSANDRA-401.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, 0003-r-m-unused-commitlogcontext-arg-to-flush.-synchronize.txt, 0004-clean-up-ThreadFactoryImpl-and-rename-to-NamedThreadFa.txt, 0005-split-flusher-executor-into-flushSorter-and-flushWrite.txt, screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756122#action_12756122 ] 

Jonathan Ellis commented on CASSANDRA-401:
------------------------------------------

02
    clean out unused code from MessagingService. Inline sink processing into sendOneWay instead of having another exe
    this sets the stage for backpressuring the client, should we choose to do that

01
    support multiple flush threads safely.  automatically use up to avaiable core count threads for
    flushing.  pause updates when too many unflushed memtables are generated.

Note that I'm not actually adding backpressure here: on further thought, it seems like the worst of both worlds.  No matter what we do on the sending side, we can't tell ahead of time if the receiving node is going to start blocking while waiting to apply the mutation.  So backpressure would mean having to deal with both UnavailableException, and potentially unbounded wait times.  Without it we just have to deal w/ the former case.

> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-401.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-401:
-------------------------------------

    Attachment: 0005-split-flusher-executor-into-flushSorter-and-flushWrite.txt
                0004-clean-up-ThreadFactoryImpl-and-rename-to-NamedThreadFa.txt
                0003-r-m-unused-commitlogcontext-arg-to-flush.-synchronize.txt
                0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt
                0001-CASSANDRA-401.txt

> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-401.txt, 0001-CASSANDRA-401.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, 0003-r-m-unused-commitlogcontext-arg-to-flush.-synchronize.txt, 0004-clean-up-ThreadFactoryImpl-and-rename-to-NamedThreadFa.txt, 0005-split-flusher-executor-into-flushSorter-and-flushWrite.txt, screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749217#action_12749217 ] 

Jeff Hammerbacher commented on CASSANDRA-401:
---------------------------------------------

Hey Sam,

If you are performing a bulk load, you may want to check out the Binary Memtable path that was recently excavated by Chris Goffinet: http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master. Essentially it's a MapReduce job that writes data in the format it will take on disk to make compactions cheap, quite similar to the work from the PNUTS team at SIGMOD 2008 (http://portal.acm.org/citation.cfm?id=1376693).

The problem described in this ticket certainly remains when one performs many insertions via the standard API, of course.

Later,
Jeff

> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-401:
-------------------------------------

    Attachment: 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt
                0001-CASSANDRA-401.txt

> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-401.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-401) Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758692#action_12758692 ] 

Hudson commented on CASSANDRA-401:
----------------------------------

Integrated in Cassandra #206 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/206/])
    split flusher executor into flushSorter and flushWriter.  This is because sorting is CPU-bound, and writing is disk-bound; we want to be able to do both at once.
patch by jbellis; reviewed by goffinet for 
clean up ThreadFactoryImpl and rename to NamedThreadFactory
patch by jbellis; reviewed by goffinet for 
r/m unused commitlogcontext arg to flush. synchronize getTempSSTablePath
patch by jbellis; reviewed by goffinet for 
clean out unused code from MessagingService. Inline sink processing into sendOneWay instead of having another executor for that.
this sets the stage for backpressuring the client, should we choose to do that
patch by jbellis; reviewed by goffinet for 
support multiple flush threads safely.  automatically use up to avaiable core count threads for flushing.  pause updates when too many unflushed memtables are generated.
patch by jbellis; reviewed by goffinet for 


> Less crappy failure mode when swamped with inserts than "run out of memory and gc-storm to death"
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-401
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-401.txt, 0001-CASSANDRA-401.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, 0002-clean-out-unused-code-from-MessagingService.-Inline-si.txt, 0003-r-m-unused-commitlogcontext-arg-to-flush.-synchronize.txt, 0004-clean-up-ThreadFactoryImpl-and-rename-to-NamedThreadFa.txt, 0005-split-flusher-executor-into-flushSorter-and-flushWrite.txt, screenshot-1.jpg
>
>
> Suggestion was made that http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html#setCollectionUsageThreshold(long) is relevant.  Correlation eludes me, but I Am Not A Java Expert. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.