You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brent Haines (JIRA)" <ji...@apache.org> on 2015/05/06 22:26:01 UTC

[jira] [Comment Edited] (CASSANDRA-8723) Cassandra 2.1.2 Memory issue - java process memory usage continuously increases until process is killed by OOM killer

    [ https://issues.apache.org/jira/browse/CASSANDRA-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531336#comment-14531336 ] 

Brent Haines edited comment on CASSANDRA-8723 at 5/6/15 8:25 PM:
-----------------------------------------------------------------

Generating the dump now. While that is happening, I figure should give some more details on the CF that is causing trouble: 

{code}
CREATE TABLE apps.objects (
    id timeuuid PRIMARY KEY,
    data map<text, text>,
    tags set<text>,
    type text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = 'Stores a single object instance. All values of all fields are encoded into text.'
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.1
    AND speculative_retry = '99.0PERCENTILE';
{code}

And the cfstats are ...

{code}
                Table: objects
                SSTable count: 67
                SSTables in each level: [2, 20/10, 45, 0, 0, 0, 0, 0, 0]
                Space used (live): 11041125371
                Space used (total): 11041125371
                Space used by snapshots (total): 21276723306
                Off heap memory used (total): 168200214
                SSTable Compression Ratio: 0.3461410251588382
                Number of keys (estimate): 47986989
                Memtable cell count: 1967437
                Memtable data size: 21238508
                Memtable off heap memory used: 89217498
                Memtable switch count: 1
                Local read count: 163797
                Local read latency: 30.316 ms
                Local write count: 164188
                Local write latency: 0.072 ms
                Pending flushes: 0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 65108800
                Bloom filter off heap memory used: 65108264
                Index summary off heap memory used: 10498180
                Compression metadata off heap memory used: 3376272
                Compacted partition minimum bytes: 30
                Compacted partition maximum bytes: 268650950
                Compacted partition mean bytes: 629
                Average live cells per slice (last five minutes): 0.9964102125789082
                Maximum live cells per slice (last five minutes): 1.0
                Average tombstones per slice (last five minutes): 0.002612974517393375
                Maximum tombstones per slice (last five minutes): 13.0
{code}

The table isn't huge, about 10GB. It should have pretty reasonable entropy on the partition key and I don't expect a lot of really long rows on this one... none, really. It is very, very busy with writes and reads during peak loads though.

In the past, if I deleted the CF and let it sync back to the node, this issue would go away, which seems to make sense if compaction is the problem. I can include  our .yaml if that helps. 


was (Author: thebrenthaines):
Generating the dump now. While that is happening, I figure should give some more details on the CF that is causing trouble: 

{code}
CREATE TABLE apps.objects (
    id timeuuid PRIMARY KEY,
    data map<text, text>,
    tags set<text>,
    type text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = 'Stores a single object instance. All values of all fields are encoded into text.'
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.1
    AND speculative_retry = '99.0PERCENTILE';
{code}

And the cfstats are ...

{code}
                Table: objects
                SSTable count: 67
                SSTables in each level: [2, 20/10, 45, 0, 0, 0, 0, 0, 0]
                Space used (live): 11041125371
                Space used (total): 11041125371
                Space used by snapshots (total): 21276723306
                Off heap memory used (total): 168200214
                SSTable Compression Ratio: 0.3461410251588382
                Number of keys (estimate): 47986989
                Memtable cell count: 1967437
                Memtable data size: 21238508
                Memtable off heap memory used: 89217498
                Memtable switch count: 1
                Local read count: 163797
                Local read latency: 30.316 ms
                Local write count: 164188
                Local write latency: 0.072 ms
                Pending flushes: 0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 65108800
                Bloom filter off heap memory used: 65108264
                Index summary off heap memory used: 10498180
                Compression metadata off heap memory used: 3376272
                Compacted partition minimum bytes: 30
                Compacted partition maximum bytes: 268650950
                Compacted partition mean bytes: 629
                Average live cells per slice (last five minutes): 0.9964102125789082
                Maximum live cells per slice (last five minutes): 1.0
                Average tombstones per slice (last five minutes): 0.002612974517393375
                Maximum tombstones per slice (last five minutes): 13.0
{code}

The table isn't huge, about 10GB. It should have pretty reasonable entropy on the partition key and I don't expect a lot of really long rows on this one... none, really. It is very, very busy with writes and reads during peak loads though.

In the past, if I deleted the CF and let it sync back to the node, this issue would go away, which seems to make sense if compaction is the problem. I can include  our .yaml that helps. 

> Cassandra 2.1.2 Memory issue - java process memory usage continuously increases until process is killed by OOM killer
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8723
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8723
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeff Liu
>             Fix For: 2.1.x
>
>         Attachments: cassandra.yaml
>
>
> Issue:
> We have an on-going issue with cassandra nodes running with continuously increasing memory until killed by OOM.
> {noformat}
> Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783481] Out of memory: Kill process 13919 (java) score 911 or sacrifice child
> Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783557] Killed process 13919 (java) total-vm:18366340kB, anon-rss:6461472kB, file-rss:6684kB
> {noformat}
> System Profile:
> cassandra version 2.1.2
> system: aws c1.xlarge instance with 8 cores, 7.1G memory.
> cassandra jvm:
> -Xms1792M -Xmx1792M -Xmn400M -Xss256k
> {noformat}
> java -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.8.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1792M -Xmx1792M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+CMSClassUnloadingEnabled -XX:+UseCondCardMark -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1421511249.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=48M -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -javaagent:/usr/share/java/graphite-reporter-agent-1.0-SNAPSHOT.jar=graphiteServer=metrics-a.hq.nest.com;graphitePort=2003;graphitePollInt=60 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp /etc/cassandra:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.8.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassandra/lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/metrics-graphite-2.2.0.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/stringtemplate-4.0.2.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1.2.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.2.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0.5.jar:/usr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stress.jar: -XX:HeapDumpPath=/var/lib/cassandra/java_1421511248.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1421511248.log org.apache.cassandra.service.CassandraDaemon
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)