You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by ca...@ajowa.de on 2010/11/29 10:28:29 UTC

Cassandra 0.7 beta 3 outOfMemory (OOM)

Hi community,

during my tests i had several OOM crashes.
Getting some hints to find the problem would be nice.

First cassandra crashes after about 45 min insert test script.
During the following tests time to OOM was shorter until it started to crash
even in "idle" mode.

Here the facts:
- cassandra 0.7 beta 3
- using lucandra to index about 3 million files ~1kb data
- inserting with one client to one cassandra node with about 200 files/s
- cassandra data files for this keyspace grow up to about 20 GB
- the keyspace only contains the two lucandra specific CFs

Cluster:
- cassandra single node on windows 32bit, Xeon 2,5 Ghz, 4GB RAM
- java jre 1.6.0_22
- heap space first 1GB, later increased to 1,3 GB

Cassandra.yaml:
default + reduced "binary_memtable_throughput_in_mb" to 128

CFs:
default + reduced
min_compaction_threshold: 4
max_compaction_threshold: 8


I think the problem appears always during compaction,
and perhaps it is a result of large rows (some about 170mb).

Are there more optionions we could use to work with few memory?

Is it a problem of compaction?
And how to avoid?
Slower inserts? More memory?
Even fewer memtable_throuput or in_memory_compaction_limit?
Continuous manual major comapction?

I've read  
http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors
- row_size should be fixed since 0.7 and 200mb is still far away from 2gb
- only key cache is used a little bit 3600/20000
- after a lot of writes cassandra crashes even in idle mode
- memtablesize was reduced and there are only 2 CFs

Several heapdumps in MAT show 60-99% heapusage of compaction thread.

Here some log extract:

  INFO [CompactionExecutor:1] 2010-11-26 14:18:18,593  
CompactionIterator.java (line 134) Compacting large row  
6650325572717566efbfbf44545241434b53efbfbf31 (172967291 bytes)  
incrementally
  INFO [ScheduledTasks:1] 2010-11-26 14:18:41,421 GCInspector.java  
(line 133) GC for ParNew: 365 ms, 54551328 reclaimed leaving 459496840  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:42,437 GCInspector.java  
(line 133) GC for ParNew: 226 ms, 12469104 reclaimed leaving 554506776  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:43,453 GCInspector.java  
(line 133) GC for ParNew: 224 ms, 12777840 reclaimed leaving 649207976  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:44,468 GCInspector.java  
(line 133) GC for ParNew: 225 ms, 12564144 reclaimed leaving 744122872  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:45,468 GCInspector.java  
(line 133) GC for ParNew: 222 ms, 16020328 reclaimed leaving 835581584  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:46,468 GCInspector.java  
(line 133) GC for ParNew: 226 ms, 12697912 reclaimed leaving 930362712  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:47,468 GCInspector.java  
(line 133) GC for ParNew: 227 ms, 15816872 reclaimed leaving  
1022026288 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:48,484 GCInspector.java  
(line 133) GC for ParNew: 258 ms, 12746584 reclaimed leaving  
1116758744 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:49,484 GCInspector.java  
(line 133) GC for ParNew: 257 ms, 12802608 reclaimed leaving  
1211435176 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 133) GC for ConcurrentMarkSweep: 4188 ms, 271308512 reclaimed  
leaving 1047605704 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 153) Pool Name                    Active   Pending
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) ResponseStage                     0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) ReadStage                         0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) ReadRepair                        0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) MutationStage                     0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) GossipStage                       0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) AntientropyStage                  0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) MigrationStage                    0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) StreamStage                       0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) MemtablePostFlusher               0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) FlushWriter                       0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) MiscStage                         0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) FlushSorter                       0         0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 164) CompactionManager               n/a         6
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 176) MessagingService                n/a       0,0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 180) ColumnFamily                Memtable ops,data  Row cache  
size/cap  Key cache size/cap
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 183) system.LocationInfo                      1,17                
   0/0                 3/3
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 183) system.HintsColumnFamily                  0,0                
   0/0                 0/1
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java  
(line 183) system.Migrations                         0,0                
   0/0                 0/2
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java  
(line 183) system.Schema                             0,0                
   0/0                 3/3
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java  
(line 183) system.IndexInfo                          0,0                
   0/0                 0/1
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java  
(line 183) Lucandra.Documents                        0,0                
   0/0            0/200000
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java  
(line 183) Lucandra.TermInfo                         0,0                
   0/0            0/200000
  INFO [ScheduledTasks:1] 2010-11-26 14:19:05,328 GCInspector.java  
(line 133) GC for ConcurrentMarkSweep: 3807 ms, 31040 reclaimed  
leaving 1153889144 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:20:09,093 GCInspector.java  
(line 153) Pool Name                    Active   Pending
  INFO [ScheduledTasks:1] 2010-11-26 14:20:09,093 GCInspector.java  
(line 160) ResponseStage                     0         0
ERROR [CompactionExecutor:1] 2010-11-26 14:20:09,093  
AbstractCassandraDaemon.java (line 89) Fatal exception in thread  
Thread[CompactionExecutor:1,1,main]
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2786)
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
	at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
	at  
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:312)
	at  
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:291)
	at  
org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:82)
	at  
org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:101)
	at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:96)
	at  
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138)
	at  
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)
	at  
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42)
	at  
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
	at  
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
	at  
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
	at  
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
	at  
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
	at  
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:273)
	at  
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:109)
	at  
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:87)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at  
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at  
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

Re: Cassandra 0.7 beta 3 outOfMemory (OOM)

Posted by Aaron Morton <aa...@thelastpickle.com>.
Sounds like you need to increase the Heap size and/or reduce the memtable_throughput_in_mb and/or turn off the internal caches. Normally the binary memtable thresholds only apply to bulk load operations and it's the per CF memtable_* settings you want to change. I'm not familiar with lucandra though. 

See the section on JVM Heap Size here 
http://wiki.apache.org/cassandra/MemtableThresholds

Bottom line is you will need more JVM heap memory.

Hope that helps.
Aaron

On 29 Nov, 2010,at 10:28 PM, cassandra@ajowa.de wrote:

Hi community,

during my tests i had several OOM crashes.
Getting some hints to find the problem would be nice.

First cassandra crashes after about 45 min insert test script.
During the following tests time to OOM was shorter until it started to crash
even in "idle" mode.

Here the facts:
- cassandra 0.7 beta 3
- using lucandra to index about 3 million files ~1kb data
- inserting with one client to one cassandra node with about 200 files/s
- cassandra data files for this keyspace grow up to about 20 GB
- the keyspace only contains the two lucandra specific CFs

Cluster:
- cassandra single node on windows 32bit, Xeon 2,5 Ghz, 4GB RAM
- java jre 1.6.0_22
- heap space first 1GB, later increased to 1,3 GB

Cassandra.yaml:
default + reduced "binary_memtable_throughput_in_mb" to 128

CFs:
default + reduced
min_compaction_threshold: 4
max_compaction_threshold: 8


I think the problem appears always during compaction,
and perhaps it is a result of large rows (some about 170mb).

Are there more optionions we could use to work with few memory?

Is it a problem of compaction?
And how to avoid?
Slower inserts? More memory?
Even fewer memtable_throuput or in_memory_compaction_limit?
Continuous manual major comapction?

I've read 
http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors
- row_size should be fixed since 0.7 and 200mb is still far away from 2gb
- only key cache is used a little bit 3600/20000
- after a lot of writes cassandra crashes even in idle mode
- memtablesize was reduced and there are only 2 CFs

Several heapdumps in MAT show 60-99% heapusage of compaction thread.

Here some log extract:

INFO [CompactionExecutor:1] 2010-11-26 14:18:18,593 
CompactionIterator.java (line 134) Compacting large row 
6650325572717566efbfbf44545241434b53efbfbf31 (172967291 bytes) 
incrementally
INFO [ScheduledTasks:1] 2010-11-26 14:18:41,421 GCInspector.java 
(line 133) GC for ParNew: 365 ms, 54551328 reclaimed leaving 459496840 
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:42,437 GCInspector.java 
(line 133) GC for ParNew: 226 ms, 12469104 reclaimed leaving 554506776 
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:43,453 GCInspector.java 
(line 133) GC for ParNew: 224 ms, 12777840 reclaimed leaving 649207976 
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:44,468 GCInspector.java 
(line 133) GC for ParNew: 225 ms, 12564144 reclaimed leaving 744122872 
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:45,468 GCInspector.java 
(line 133) GC for ParNew: 222 ms, 16020328 reclaimed leaving 835581584 
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:46,468 GCInspector.java 
(line 133) GC for ParNew: 226 ms, 12697912 reclaimed leaving 930362712 
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:47,468 GCInspector.java 
(line 133) GC for ParNew: 227 ms, 15816872 reclaimed leaving 
1022026288 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:48,484 GCInspector.java 
(line 133) GC for ParNew: 258 ms, 12746584 reclaimed leaving 
1116758744 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:49,484 GCInspector.java 
(line 133) GC for ParNew: 257 ms, 12802608 reclaimed leaving 
1211435176 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java 
(line 133) GC for ConcurrentMarkSweep: 4188 ms, 271308512 reclaimed 
leaving 1047605704 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java 
(line 153) Pool Name Active Pending
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java 
(line 160) ResponseStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java 
(line 160) ReadStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java 
(line 160) ReadRepair 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspectorjava 
(line 160) MutationStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java 
(line 160) GossipStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java 
(line 160) AntientropyStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 160) MigrationStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 160) StreamStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 160) MemtablePostFlusher 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 160) FlushWriter 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 160) MiscStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 160) FlushSorter 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 164) CompactionManager n/a 6
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 176) MessagingService n/a 0,0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 180) ColumnFamily Memtable ops,data Row cache 
size/cap Key cache size/cap
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 183) system.LocationInfo 1,17 
0/0 3/3
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java 
(line 183) system.HintsColumnFamily 0,0 
0/0 0/1
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java 
(line 183) system.Migrations 0,0 
0/0 0/2
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java 
(line 183) system.Schema 0,0 
0/0 3/3
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java 
(line 183) system.IndexInfo 0,0 
0/0 0/1
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java 
(line 183) Lucandra.Documents 0,0 
0/0 0/200000
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java 
(line 183) Lucandra.TermInfo 0,0 
0/0 0/200000
INFO [ScheduledTasks:1] 2010-11-26 14:19:05,328 GCInspector.java 
(line 133) GC for ConcurrentMarkSweep: 3807 ms, 31040 reclaimed 
leaving 1153889144 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:20:09,093 GCInspector.java 
(line 153) Pool Name Active Pending
INFO [ScheduledTasks:1] 2010-11-26 14:20:09,093 GCInspector.java 
(line 160) ResponseStage 0 0
ERROR [CompactionExecutor:1] 2010-11-26 14:20:09,093 
AbstractCassandraDaemon.java (line 89) Fatal exception in thread 
Thread[CompactionExecutor:1,1,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
at 
org.apache.cassandradb.SuperColumnSerializer.serialize(SuperColumn.java:312)
at 
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:291)
at 
org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:82)
at 
org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:101)
at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:96)
at 
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138)
at 
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)
at 
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42)
at 
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
at 
comgoogle.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at 
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at 
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:273)
at 
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:109)
at 
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:87)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)