You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by 王一锋 <wa...@aspire-tech.com> on 2010/07/14 11:38:39 UTC

Frequent crashes

Hi, 

Has anyboy done any memory usage analysis for cassandra?

How much memory does cassandra need to manager 300G of data load? How much extra memory will be needed when doing compaction?

Regarding mmap, memory usage will be determined by the OS so it has nothing to do with the heap size of JVM, am I right?

I have a cassandra cluster of 13 nodes, each with 200~300g data.
JVM settings
JVM_OPTS=" \
        -ea \
        -Xms6G \
        -Xmx6G \
        -XX:TargetSurvivorRatio=90 \
        -XX:+AggressiveOpts \
        -XX:+UseParNewGC \
        -XX:+UseConcMarkSweepGC \
        -XX:+CMSParallelRemarkEnabled \
        -XX:+HeapDumpOnOutOfMemoryError \
        -XX:SurvivorRatio=128 \
        -XX:MaxTenuringThreshold=0 \
        -XX:+PrintGC -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \
        -Dcom.sun.management.jmxremote.port=4993 \
        -Dcom.sun.management.jmxremote.ssl=false \
        -Dcom.sun.management.jmxremote.authenticate=false"
KeysCache settings for 3 column families are 5,000,000  1,000,000  1,000,000

some nodes run for 1 to 2 days, and then gets very slow, due to bad gc performance, then crashed. This happed quite a lot, almost every day. 
Here is a fragment of the gc.log

 (concurrent mode failure): 6014591K->6014591K(6014592K), 25.4846400 secs] 6289343K->6282274K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 25.4848970 secs] [Times: user=37.76 sys=0.12, real=25.49 secs] 
69695.771: [Full GC 69695.771: [CMS: 6014591K->6014591K(6014592K), 21.0911470 secs] 6289343K->6282177K(6289344K), [CMS Perm : 17287K->17287K(28988K)], 21.0913910 secs] [Times: user=21.01 sys=0.12, real=21.09 secs] 
69716.902: [GC [1 CMS-initial-mark: 6014591K(6014592K)] 6287620K(6289344K), 0.2759980 secs] [Times: user=0.28 sys=0.00, real=0.28 secs] 
69717.178: [CMS-concurrent-mark-start]
69717.203: [Full GC 69717.203: [CMS69721.345: [CMS-concurrent-mark: 4.152/4.167 secs] [Times: user=16.64 sys=0.01, real=4.17 secs] 
 (concurrent mode failure): 6014592K->6014591K(6014592K), 25.3649330 secs] 6289343K->6282200K(6289344K), [CMS Perm : 17287K->17287K(28988K)], 25.3651670 secs] [Times: user=37.67 sys=0.13, real=25.37 secs] 
69742.598: [Full GC 69742.598: [CMS: 6014591K->6014592K(6014592K), 21.0942430 secs] 6289343K->6282398K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 21.0944950 secs] [Times: user=21.00 sys=0.12, real=21.10 secs] 
69763.721: [Full GC 69763.721: [CMS: 6014592K->6014591K(6014592K), 21.0978230 secs] 6289343K->6282553K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 21.0980600 secs] [Times: user=20.99 sys=0.12, real=21.09 secs] 
69784.830: [GC [1 CMS-initial-mark: 6014591K(6014592K)] 6287995K(6289344K), 0.2765360 secs] [Times: user=0.28 sys=0.00, real=0.28 secs] 
69785.107: [CMS-concurrent-mark-start]
69785.123: [Full GC 69785.123: [CMS69789.244: [CMS-concurrent-mark: 4.132/4.136 secs] [Times: user=16.49 sys=0.03, real=4.13 secs] 
 (concurrent mode failure): 6014591K->6014591K(6014592K), 26.0883660 secs] 6289343K->6282549K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 26.0886060 secs] [Times: user=38.28 sys=0.15, real=26.09 secs] 

Anybody got an idea?

Re: Frequent crashes

Posted by Peter Schuller <pe...@infidyne.com>.
> How much memory does cassandra need to manager 300G of data load? How much
> extra memory will be needed when doing compaction?

For one thing it depends on the data. One thing that scales linearly
(but with a low constant) with the amount of data are the bloom
filters. If those 300 GB correspond to 1 billion small values, more
memory will be used for the sstable bloom filters than if they
correspond to 1 million large values.

> Regarding mmap, memory usage will be determined by the OS so it has nothing
> to do with the heap size of JVM, am I right?

Yes, though heap size can affect whether the OS starts swapping the JVM out.

> some nodes run for 1 to 2 days, and then gets very slow, due to bad gc
> performance, then crashed. This happed quite a lot, almost every day.
> Here is a fragment of the gc.log
>
>  (concurrent mode failure): 6014591K->6014591K(6014592K), 25.4846400 secs] 6289343K->6282274K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 25.4848970 secs] [Times: user=37.76 sys=0.12, real=25.49 secs]
> 69695.771: [Full GC 69695.771: [CMS: 6014591K->6014591K(6014592K), 21.0911470 secs] 6289343K->6282177K(6289344K), [CMS Perm : 17287K->17287K(28988K)], 21.0913910 secs] [Times: user=21.01 sys=0.12, real=21.09 secs]

You're running out of heap size. Concurrent mode failure means the
heap became full before concurrent marking could complete; the
subsequent full GC then shows that almost no data was freeed in the
full GC:s, indicating that you simply have far too much live data for
your heap size.

Either increase the JVM heap size or adjust Cassandra settings to take
less memory (eg smaller memtables sizes, less caching).

-- 
/ Peter Schuller