You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2016/07/22 13:03:20 UTC
[jira] [Resolved] (CASSANDRA-10821) OOM Killer terminates Cassandra
when Compactions use too much memory then won't restart
[ https://issues.apache.org/jira/browse/CASSANDRA-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis resolved CASSANDRA-10821.
----------------------------------------
Resolution: Won't Fix
Compaction is often described as, "worst case you will need 2x disk space while it writes out new data before it can clean up the old," but you can also need 2x RAM for the off-heap compression metadata and bloom filters.
Your best bet is probably to disable bloom filters until this compaction finishes. Switching to more aggressive compression may also help.
> OOM Killer terminates Cassandra when Compactions use too much memory then won't restart
> ---------------------------------------------------------------------------------------
>
> Key: CASSANDRA-10821
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10821
> Project: Cassandra
> Issue Type: Bug
> Components: Compaction
> Environment: EC2 32 x i2.xlarge split between us-east-1a,c and us-west 2a,b
> Linux 4.1.10-17.31.amzn1.x86_64 #1 SMP Sat Oct 24 01:31:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
> Cassandra version: 2.2.3
> Reporter: Thom Bartold
>
> We were writing to the DB from EC2 instances in us-east-1 at a rate of about 3000 per second, replication us-east:2 us-west:2, LeveledCompaction and DeflateCompressor.
> After about 48 hours some nodes had over 800 pending compactions and a few of them started getting killed for Linux OOM. Priam attempts to restart the nodes, but they fail because of corrupted saved_cahce files.
> Loading has finished, and the cluster is mostly idle, but 6 of the nodes were killed again last night by OOM.
> This is the log message where the node won't restart:
> ERROR [main] 2015-12-05 13:59:13,754 CassandraDaemon.java:635 - Detected unreadable sstables /media/ephemeral0/cassandra/saved_caches/KeyCache-ca.db, please check NEWS.txt and ensure that you have upgraded through all required intermediate versions, running upgradesstables
> This is the dmesg where the node is terminated:
> [360803.234422] Out of memory: Kill process 10809 (java) score 949 or sacrifice child
> [360803.237544] Killed process 10809 (java) total-vm:438484092kB, anon-rss:29228012kB, file-rss:107576kB
> This is what Compaction Stats look like currently:
> pending tasks: 1096
> id compaction type keyspace table completed total unit progress
> 93eb3200-9b58-11e5-b9f1-ffef1041ec45 Compaction overlordpreprod document 8670748796 839129219651 bytes 1.03%
> Compaction system hints 30 1921326518 bytes 0.00%
> Active compaction remaining time : 27h33m47s
> Only 6 of the 32 nodes have compactions pending, and all on the order of 1000.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)