You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ken Hancock <ke...@schange.com> on 2016/01/11 16:32:10 UTC

Cassandra 1.2 & Compressed Data

We were running a contrived system test last week trying to measure the
effect that compaction was having on our I/O and read performance.  As a
test, we set compaction throughput to 1MB/sec.

As expected, we fell greatly behind and the number of SSTables grew.
Unexpectedly, we went OOM.

One of my CFs had 1127 SSTables and those SSTables had a Retained Heap of
almost 1GB.  This was after stopping both compaction and all reads and
writes as well as a executing a full GC.

Here's the heap dump summary for a single CF:

Class Name                                                    | Objects |
Shallow Heap |  Retained Heap
--------------------------------------------------------------------------------------------------------
org.apache.cassandra.io.sstable.SSTableReader                 |   1,127
|      117,208 | >= 985,675,936
|- org.apache.cassandra.io.sstable.SSTableMetadata            |   1,127
|       63,112 |   >= 7,284,600
|- java.util.concurrent.atomic.AtomicLong                     |   2,254
|       54,096 |      >= 54,096
|- org.apache.cassandra.db.DecoratedKey                       |   2,254
|       54,096 |     >= 378,672
|- org.apache.cassandra.io.sstable.SSTableDeletingTask        |   1,127
|       45,080 |      >= 45,080
|- org.apache.cassandra.io.util.CompressedPoolingSegmentedFile|   1,127
|       45,080 | >= 969,094,776
|- org.apache.cassandra.io.sstable.Descriptor                 |   1,127
|       45,080 |     >= 483,696
|- org.apache.cassandra.io.sstable.BloomFilterTracker         |   1,127
|       45,080 |      >= 99,176
|- org.apache.cassandra.io.util.MmappedSegmentedFile          |   1,127
|       45,080 |     >= 360,640
|- java.util.concurrent.atomic.AtomicBoolean                  |   2,254
|       36,064 |      >= 36,064
|- org.apache.cassandra.utils.Murmur3BloomFilter              |   1,127
|       27,048 |      >= 81,144
|- org.apache.cassandra.io.sstable.IndexSummary               |   1,127
|       27,048 |   >= 7,896,104
|- java.util.concurrent.CopyOnWriteArraySet                   |   1,127
|       18,032 |     >= 153,272
|- java.util.concurrent.atomic.AtomicInteger                  |   1,127
|       18,032 |      >= 18,032
|- org.apache.cassandra.config.CFMetaData                     |       1
|          120 |          1,608
|- org.apache.cassandra.cache.AutoSavingCache                 |       1
|           40 |             56
|- java.lang.Class                                            |       1
|           16 |             16
|- org.apache.cassandra.dht.Murmur3Partitioner                |       1
|           16 |             32
--------------------------------------------------------------------------------------------------------

The retained heap is all in the io.util.CompressedPoolingSegmentedFile.
Specifically, it is all used up by
io.compress.CompressedRandomAccessReader's.compressed ByteBuffer.

I'm not familiar with the cassandra source code, but here's how I'm reading
it.  A SSTable is segmented and a ConcurrentLinkedQueue (appears unbounded)
is created which will contain a Reader for each segment.  Since this table
is compressed, each segment has a
io.compress.CompressedRandomAccessReader.  CompressedRandomAccessReader
allocates an on-heap ByteBuffer, buffer, to receive decompressed data.

It appears, this buffer is only released when the SSTable is closed, i.e.
when it's compressed or cassandra shuts down.

In our case, we had a contrived test where compression was essentially
disabled.  However, if I have a huge table which will not get compressed
for weeks (STCS), it seems that for each segment Cassandra will allocate a
CompressedRandomAccessReader which will allocate a 65K decompression buffer
for each segment that is read and those will never get freed and are
unbounded.  My reading is the memory requirements in Cassandra 1.2.18 for
compressed data become unbounded and can consume as much heap space as
compressed data is read.

Seaching Jira, I found https://issues.apache.org/jira/browse/CASSANDRA-5661
which sounds like the fix effectively orphaned Cassandra 1.2:

"Reader pooling was introduced in CASSANDRA-4942
<https://issues.apache.org/jira/browse/CASSANDRA-4942> but pooled
RandomAccessReaders are never cleaned up until the SSTableReader is closed.
So memory use is "the worst case simultaneous RAR we had open for this
file, forever."

We should introduce a global limit on how much memory to use for RAR, and
evict old ones."

I'm not clear how the "simultaneous" comment above applies.  If I'm reading
this correctly, STCS and compressed data is a ticking timebomb for
Cassandra 1.2.
Hopefully someone with more knowledge of the source code can let me know if
my analysis is correct.