You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Plotnik, Alexey" <ap...@rhonda.ru> on 2014/02/18 01:35:15 UTC

Turn off compression (1.2.11)

Each compressed SSTable uses additional transfer buffer in CompressedRandomAccessReader instance.
After analyzing Heap I saw this buffer has a size about 70KB per SSTable. I have more than 30K SSTables per node.
I want to turn off a compression for this column family to save some Heap. How can I do it safely? I mean after that the SCRUB/UPGRADESSTABLES process should be started?
Next question is how do this processes work in terms of using disk space. I.e. CLEANUP is safe - it takes one SSTable, analyzes keys, builds new one, replace existing and goes further. Do SCRUB/UPGRADESSTABLES work in the same manner? My concern is a disk space usage overhead.

Re: Turn off compression (1.2.11)

Posted by Edward Capriolo <ed...@gmail.com>.

Personally I think having compression on by default is the wrong choice.
Depending on your access patterns and row sizes the overhead of compression
can create more Garbage Collection and become your bottleneck before your
potentially bottleneck your disk (ssd disk)


On Tue, Feb 18, 2014 at 2:23 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Mon, Feb 17, 2014 at 4:35 PM, Plotnik, Alexey <ap...@rhonda.ru>wrote:
>
>> After analyzing Heap I saw this buffer has a size about 70KB per SSTable.
>> I have more than 30K SSTables per node.
>>
>
> I'm thinking your problem is not compression, it's using the old 5mb
> default for Level Compaction and having 30,000 SSTables as a result.
>
> Before turning off compression, I suggest :
>
> 1) change level SSTable size to the new default, 160mb
> 2) force all SSTables to L0 (in 1.2, this means removing their .json files
> with the node down, IIRC)
> 3) watch level compaction run for a long time, reducing the number of
> SSTables you have
>
> As an aside, 1.2.0 beta moved a bunch of data related to compression off
> the heap. If you were to try to run the same cluster under 1.1, you'd
> probably OOM your heap immediately.
>
> https://issues.apache.org/jira/browse/CASSANDRA-4941
>
> =Rob
>
>

Re: Turn off compression (1.2.11)

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Feb 18, 2014 at 2:51 PM, Plotnik, Alexey <ap...@rhonda.ru> wrote:

> My SSTable size is 100Mb. Last time I removed leveled manifest compaction
> was running for 3 months
>

At 3TB per node, you are at, and probably exceeding the maximum size anyone
suggests for Cassandra 1.2.x.

Add more nodes?

=Rob

RE: Turn off compression (1.2.11)

Posted by "Plotnik, Alexey" <ap...@rhonda.ru>.

My SSTable size is 100Mb. Last time I removed leveled manifest compaction was running for 3 months

From: Robert Coli [mailto:rcoli@eventbrite.com]
Sent: 19 февраля 2014 г. 6:24
To: user@cassandra.apache.org
Subject: Re: Turn off compression (1.2.11)

On Mon, Feb 17, 2014 at 4:35 PM, Plotnik, Alexey <ap...@rhonda.ru>> wrote:
After analyzing Heap I saw this buffer has a size about 70KB per SSTable. I have more than 30K SSTables per node.

I'm thinking your problem is not compression, it's using the old 5mb default for Level Compaction and having 30,000 SSTables as a result.

Before turning off compression, I suggest :

1) change level SSTable size to the new default, 160mb
2) force all SSTables to L0 (in 1.2, this means removing their .json files with the node down, IIRC)
3) watch level compaction run for a long time, reducing the number of SSTables you have

As an aside, 1.2.0 beta moved a bunch of data related to compression off the heap. If you were to try to run the same cluster under 1.1, you'd probably OOM your heap immediately.

https://issues.apache.org/jira/browse/CASSANDRA-4941

=Rob

RE: Turn off compression (1.2.11)

Posted by "Plotnik, Alexey" <ap...@rhonda.ru>.

It’s not wrong in case of economy, we have many TB of data, and it’s very expensive to have even 3TB per machine (we need 10TB minimum I think).
The main thing you should understand – TB is not a problem, the problem is how many rows you have per node.

From: Yogi Nerella [mailto:ynerella999@gmail.com]
Sent: 19 февраля 2014 г. 10:21
To: user@cassandra.apache.org
Subject: Re: Turn off compression (1.2.11)

I am new and trying to learn Cassandra.

Based on my understanding of the problem, almost 2Gb is taken up just for the compression headers heap.

And 100MB per SSTable, and about 30,000 files gives about 3TB of data?

What is the hardware and memory configuration you are using to provide this large data?

Should this be reduced to smaller data sets, and partitioned into multiple nodes?

If my understanding is  totally wrong, please forgive and if possible explain.

On Tue, Feb 18, 2014 at 2:58 PM, Plotnik, Alexey <ap...@rhonda.ru>> wrote:
Compression buffers are located in Heap, I saw them in Heapdump. That is:

======================
public class CompressedRandomAccessReader extends RandomAccessReader {
…..
   private ByteBuffer compressed; // <-- THAT IS
======================

From: Robert Coli [mailto:rcoli@eventbrite.com<ma...@eventbrite.com>]
Sent: 19 февраля 2014 г. 6:24
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Turn off compression (1.2.11)

On Mon, Feb 17, 2014 at 4:35 PM, Plotnik, Alexey <ap...@rhonda.ru>> wrote:

As an aside, 1.2.0 beta moved a bunch of data related to compression off the heap. If you were to try to run the same cluster under 1.1, you'd probably OOM your heap immediately.

https://issues.apache.org/jira/browse/CASSANDRA-4941

=Rob

Re: Turn off compression (1.2.11)

Posted by Yogi Nerella <yn...@gmail.com>.

I am new and trying to learn Cassandra.

Based on my understanding of the problem, almost 2Gb is taken up just for
the compression headers heap.

And 100MB per SSTable, and about 30,000 files gives about 3TB of data?

What is the hardware and memory configuration you are using to provide this
large data?

Should this be reduced to smaller data sets, and partitioned into multiple
nodes?

If my understanding is  totally wrong, please forgive and if possible
explain.

On Tue, Feb 18, 2014 at 2:58 PM, Plotnik, Alexey <ap...@rhonda.ru> wrote:

> Compression buffers are located in Heap, I saw them in Heapdump. That is:
>
>
>
> ======================
>
> public class CompressedRandomAccessReader extends RandomAccessReader {
>
> …..
>
>    private ByteBuffer compressed; // ß THAT IS
>
> ======================
>
>
>
> *From:* Robert Coli [mailto:rcoli@eventbrite.com]
> *Sent:* 19 февраля 2014 г. 6:24
> *To:* user@cassandra.apache.org
> *Subject:* Re: Turn off compression (1.2.11)
>
>
>
> On Mon, Feb 17, 2014 at 4:35 PM, Plotnik, Alexey <ap...@rhonda.ru>
> wrote:
>
>
>
> As an aside, 1.2.0 beta moved a bunch of data related to compression off
> the heap. If you were to try to run the same cluster under 1.1, you'd
> probably OOM your heap immediately.
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-4941
>
>
>
> =Rob
>
>
>

RE: Turn off compression (1.2.11)

Posted by "Plotnik, Alexey" <ap...@rhonda.ru>.

Compression buffers are located in Heap, I saw them in Heapdump. That is:

======================
public class CompressedRandomAccessReader extends RandomAccessReader {
…..
   private ByteBuffer compressed; // <-- THAT IS
======================

From: Robert Coli [mailto:rcoli@eventbrite.com]
Sent: 19 февраля 2014 г. 6:24
To: user@cassandra.apache.org
Subject: Re: Turn off compression (1.2.11)

On Mon, Feb 17, 2014 at 4:35 PM, Plotnik, Alexey <ap...@rhonda.ru>> wrote:

As an aside, 1.2.0 beta moved a bunch of data related to compression off the heap. If you were to try to run the same cluster under 1.1, you'd probably OOM your heap immediately.

https://issues.apache.org/jira/browse/CASSANDRA-4941

=Rob

Re: Turn off compression (1.2.11)

Posted by Robert Coli <rc...@eventbrite.com>.

On Mon, Feb 17, 2014 at 4:35 PM, Plotnik, Alexey <ap...@rhonda.ru> wrote:

> After analyzing Heap I saw this buffer has a size about 70KB per SSTable.
> I have more than 30K SSTables per node.
>

I'm thinking your problem is not compression, it's using the old 5mb
default for Level Compaction and having 30,000 SSTables as a result.

Before turning off compression, I suggest :

1) change level SSTable size to the new default, 160mb
2) force all SSTables to L0 (in 1.2, this means removing their .json files
with the node down, IIRC)
3) watch level compaction run for a long time, reducing the number of
SSTables you have

As an aside, 1.2.0 beta moved a bunch of data related to compression off
the heap. If you were to try to run the same cluster under 1.1, you'd
probably OOM your heap immediately.

https://issues.apache.org/jira/browse/CASSANDRA-4941

=Rob