You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by su...@bluebox.net on 2015/04/12 02:47:23 UTC

[BBG-137068] New Ticket: [jira] [Updated] (CASSANDRA-9120) OutOfMemoryError when read auto-saved cache (probably broken)

----Reply above this line----

 (dev@cassandra.apache.org) said:


     [ https://issues.apache.org/jira/browse/CASSANDRA-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Jirsa updated CASSANDRA-9120:
----------------------------------
    Assignee:     (was: Jeff Jirsa)

> OutOfMemoryError when read auto-saved cache (probably broken)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-9120
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9120
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Linux
>            Reporter: Vladimir
>             Fix For: 3.0, 2.0.15, 2.1.5
>
>
> Found during tests on a 100 nodes cluster. After restart I found that one node constantly crashes with OutOfMemory Exception. I guess that auto-saved cache was corrupted and Cassandra can't recognize it. I see that similar issues was already fixed (when negative size of some structure was read). Does auto-saved cache have checksum? it'd help to reject corrupted cache at the very beginning.
> As far as I can see current code still have that problem. Stack trace is:
> {code}
> INFO [main] 2015-03-28 01:04:13,503 AutoSavingCache.java (line 114) reading saved cache /storage/core/loginsight/cidata/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db
> ERROR [main] 2015-03-28 01:04:14,718 CassandraDaemon.java (line 513) Exception encountered during startup
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.ArrayList.(Unknown Source)
>         at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
>         at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
>         at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
>         at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:262)
>         at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
>         at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
>         at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:315)
>         at org.apache.cassandra.db.Keyspace.(Keyspace.java:272)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:114)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92)
>         at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
> {code}
> I looked at source code of Cassandra and see:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.0.10/org/apache/cassandra/db/RowIndexEntry.java
> 119 int entries = in.readInt();
> 120 List columnsIndex = new ArrayList(entries);
> It seems that value entries is invalid (negative) and it tries too allocate an array with huge initial capacity and hits OOM. I have deleted saved_cache directory and was able to start node correctly. We should expect that it may happen in real world. Cassandra should be able to skip incorrect cached data and run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
======================================

Your Blue Box support ticket can be viewed here:
https://support.bluebox.net/tickets/137068

For urgent issues (system down, service outage), open the link above and click Escalate, or call Blue Box Support at 1-800-613-4305 ext 1.

Thank You,

The Blue Box Support Team