You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/06/10 17:13:14 UTC

[jira] Resolved: (CASSANDRA-1177) OutOfMemory on heavy inserts

     [ https://issues.apache.org/jira/browse/CASSANDRA-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1177.
---------------------------------------

    Resolution: Not A Problem

if you're balancing by "disk size" then you're basically creating hot spots on the ring deliberately.  that's not a good idea unless you are disk space bound and you're sure your disk-heavy machines can handle the extra load, which doesn't look like the case here. :)

cassandra doesn't do backpressure yet (see CASSANDRA-685) so when you are OOMing it under load then you can mitigate it by (a) giving the JVM more heap (or adding machines) and (b) when you get a TimedOutException on the client, sleep 100ms before retrying.  

You may well also be consuming a lot of heap in (a) compaction of Activities or (b) compaction or scanning of hinted handoff rows (once one node starts going down, say, the 12GB one to start with, that will start generating hints on the other ones that can add to the memory pressure they see).

We can continue troubleshooting here or on the list / irc, but I'm resolving NAP because it's almost certainly not a bug per se.

> OutOfMemory on heavy inserts
> ----------------------------
>
>                 Key: CASSANDRA-1177
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1177
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6.2
>         Environment: SunOS 5.10, x86 32bit, Jave Hotspot Server VM 11.2-b01 mixed mode
> Sun SDK 1.6.0_12-b04
>            Reporter: Torsten Curdt
>            Priority: Critical
>         Attachments: bug report.zip
>
>
> We have cluster of 6 Cassandra 0.6.2 nodes running under SunOS (see environment).
> On initial import (using the thrift API) we see some weird behavior of half the cluster. While cas04-06 look fine as you can see from the attached munin graphs, the other 3 nodes kept on GCing (see log file) until they became unreachable and went OOM. (This is also why the stats are so spotty - munin could no longer reach the boxes) We have seen the same behavior on 0.6.2 and 0.6.1. This started after around 100 million inserts.
> Looking at the hprof (which is of course to big to attach) we see lots of ConcurrentSkipListMap$Node's and quite some Column objects. Please see the stats attached.
> This looks similar to https://issues.apache.org/jira/browse/CASSANDRA-1014 but we are not sure it really is the same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.