You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2016/07/22 13:03:20 UTC
[jira] [Resolved] (CASSANDRA-10821) OOM Killer terminates Cassandra when Compactions use too much memory then won't restart

     [ https://issues.apache.org/jira/browse/CASSANDRA-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-10821.
----------------------------------------
    Resolution: Won't Fix

Compaction is often described as, "worst case you will need 2x disk space while it writes out new data before it can clean up the old," but you can also need 2x RAM for the off-heap compression metadata and bloom filters.

Your best bet is probably to disable bloom filters until this compaction finishes.  Switching to more aggressive compression may also help.

> OOM Killer terminates Cassandra when Compactions use too much memory then won't restart
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10821
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10821
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>         Environment: EC2 32 x i2.xlarge split between us-east-1a,c and us-west 2a,b
> Linux  4.1.10-17.31.amzn1.x86_64 #1 SMP Sat Oct 24 01:31:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
> Cassandra version: 2.2.3
>            Reporter: Thom Bartold
>
> We were writing to the DB from EC2 instances in us-east-1 at a rate of about 3000 per second, replication us-east:2 us-west:2, LeveledCompaction and DeflateCompressor.
> After about 48 hours some nodes had over 800 pending compactions and a few of them started getting killed for Linux OOM. Priam attempts to restart the nodes, but they fail because of corrupted saved_cahce files.
> Loading has finished, and the cluster is mostly idle, but 6 of the nodes were killed again last night by OOM.
> This is the log message where the node won't restart:
> ERROR [main] 2015-12-05 13:59:13,754 CassandraDaemon.java:635 - Detected unreadable sstables /media/ephemeral0/cassandra/saved_caches/KeyCache-ca.db, please check NEWS.txt and ensure that you have upgraded through all required intermediate versions, running upgradesstables
> This is the dmesg where the node is terminated:
> [360803.234422] Out of memory: Kill process 10809 (java) score 949 or sacrifice child
> [360803.237544] Killed process 10809 (java) total-vm:438484092kB, anon-rss:29228012kB, file-rss:107576kB
> This is what Compaction Stats look like currently:
> pending tasks: 1096
>                                      id   compaction type          keyspace      table    completed          total    unit   progress
>    93eb3200-9b58-11e5-b9f1-ffef1041ec45        Compaction   overlordpreprod   document   8670748796   839129219651   bytes      1.03%
>                                                Compaction            system      hints           30     1921326518   bytes      0.00%
> Active compaction remaining time :  27h33m47s
> Only 6 of the 32 nodes have compactions pending, and all on the order of 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)