You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Thom Bartold (JIRA)" <ji...@apache.org> on 2015/12/05 15:45:10 UTC
[jira] [Created] (CASSANDRA-10821) OOM Killer terminates Cassandra
when Compactions use too much memory then won't restart
Thom Bartold created CASSANDRA-10821:
----------------------------------------
Summary: OOM Killer terminates Cassandra when Compactions use too much memory then won't restart
Key: CASSANDRA-10821
URL: https://issues.apache.org/jira/browse/CASSANDRA-10821
Project: Cassandra
Issue Type: Bug
Components: Compaction
Environment: EC2 32 x i2.xlarge split between us-east-1a,c and us-west 2a,b
Linux 4.1.10-17.31.amzn1.x86_64 #1 SMP Sat Oct 24 01:31:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
Cassandra version: 2.2.3
Reporter: Thom Bartold
We were writing to the DB from EC2 instances in us-east-1 at a rate of about 3000 per second, replication us-east:2 us-west:2, LeveledCompaction and DeflateCompressor.
After about 48 hours some nodes had over 800 pending compactions and a few of them started getting killed for Linux OOM. Priam attempts to restart the nodes, but they fail because of corrupted saved_cahce files.
Loading has finished, and the cluster is mostly idle, but 6 of the nodes were killed again last night by OOM.
This is the log message where the node won't restart:
ERROR [main] 2015-12-05 13:59:13,754 CassandraDaemon.java:635 - Detected unreadable sstables /media/ephemeral0/cassandra/saved_caches/KeyCache-ca.db, please check NEWS.txt and ensure that you have upgraded through all required intermediate versions, running upgradesstables
This is the dmesg where the node is terminated:
[360803.234422] Out of memory: Kill process 10809 (java) score 949 or sacrifice child
[360803.237544] Killed process 10809 (java) total-vm:438484092kB, anon-rss:29228012kB, file-rss:107576kB
This is what Compaction Stats look like currently:
pending tasks: 1096
id compaction type keyspace table completed total unit progress
93eb3200-9b58-11e5-b9f1-ffef1041ec45 Compaction overlordpreprod document 8670748796 839129219651 bytes 1.03%
Compaction system hints 30 1921326518 bytes 0.00%
Active compaction remaining time : 27h33m47s
Only 6 of the 32 nodes have compactions pending, and all on the order of 1000.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)