You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Adria Arcarons <Ad...@greenpowermonitor.com> on 2014/10/28 17:02:29 UTC

OldGen saturation

Hi,

I work for a company that gathers time series data from different sensors. I've been trying to set up C* in a single-node test environment in order to have an idea of what performance will Cassandra give in our use case. To do so I have implemented a test to simulate our real insertion pattern.

We have about 50.000 CFs of varying size, grouping sensors that are in the same physical location. Our partition key is made up of the id of the sensor and the type of the value that is being measured. Hence, a single row for each combination of (sensorId,parameterId). Our primary key is made up of the partition key + the timestamp and the measured value. Moreover, we have a clustering key by timestamps in order to make slice reads fast.

The writing test consists of a continuous flow of inserts. The inserts are done inside BATCH statements in groups of 1.000 to a single CF at a time to make them faster. The client is executed in a separate machine.

The problem I'm experiencing is that, eventually, when the script has been running for almost 40mins, the heap gets saturated. OldGen gets full and then there is an intensive GC activity trying to free OldGen objects, but it can only free very little space in each pass. Then GC saturates the CPU. Here are the graphs obtained with VisualVM that show this behavior:

CPU: https://www.dropbox.com/s/oqqqg0ygbd72n0n/CPU%202014-10-28%2014_24_06-VisualVM%201.3.8.jpg?dl=0
HEAP usage: https://www.dropbox.com/s/qp7iyc5o0fpr1xa/Estancament%20MEM%202014-10-28%2014_21_53-VisualVM%201.3.8.jpg?dl=0
OLDGEN full (via VisualGC): https://www.dropbox.com/s/5udvqq95qkjuppq/HEAP%202014-10-28%2014_22_27-VisualVM%201.3.8.jpg?dl=0

Moreover, when the heap is saturated, IO activity drops, from avg 90% of utilization of HD to roughly 15%. So I end up in a situation where very few data is flushed, very few data is freed from memory, and insert rate gets very slow. If the insert process is stopped, C* completes all its pending flushes and after a certain time GC activity stops but OldGen occupancy remains almost full.

Why the GC is not capable of freeing more memory?
Isn't cassandra supposed to stop accepting writes until a certain amount of memory is freed?
I'm sceptic about increasing the size of the memtables. If the IO subsystem isn't able to cope with the flush activity, the problem would only be delayed.
Can this problem be related in any way to our CF indexing settings?
Why, after completing all pending flushes and compactions, OldGen is still almost full, even when mct is set to 0.15?
Is the BATCH statement the appropriate to insert multiple values inside the same CF?

Any thoughts on this would be appreciated. I can provide full logs or config files to anyone interested.

Regards,
Adrià.

P.S. Details on the setup:
I'm working with the default values except for:
- offheap_objects enabled
- on-heap memtable size set to 128mb. I've experienced that this problem is reproduced also with greater on-heap memtable sizes.
- off-heap memtable size set to 2.5GB.
- The number of memtable flusher threads is 3.
- memtable_flush_threshold is set to 0.15 to perform regular flushes to disk.

My total heap size is 1GB and the the NewGen region of 256MB. The C* node has 4GB RAM. Intel Xeon CPU E5520 @ 2.27GHz (3 cores). SATA 500GB HD. Debian 7+Cassandra 2.1.0 + Oracle Java JRE  (build 1.7.0_71-b14). Regarding the writing client, it is implemented in PHP with the YACassandraPDO CQL library, which is based on thrift. The client is executed in a separate machine.

RE: OldGen saturation

Posted by Adria Arcarons <Ad...@greenpowermonitor.com>.
Thank you Bryan and Mark. I have redesigned my schema in such a way that I only have 50CFs and I’ve given 2GB for the Heap space and now it’s working fine.

De: Mark Reddy [mailto:mark.l.reddy@gmail.com]
Enviado el: martes, 28 de octubre de 2014 18:31
Para: user@cassandra.apache.org
Asunto: Re: OldGen saturation

Hi Adrià,

We have about 50.000 CFs of varying size

Before I read any further, having 50,000 CFs is something that I would highly discourage. Each column family is allocated 1MB of available memory (CASSANDRA-2252<https://issues.apache.org/jira/browse/CASSANDRA-2252>) so having anything over a few hundred on a 1GB heap would be the first thing I would reconsider. Also 1GB isn't something I'd run a production or load test  Cassandra on. If your test machine has only 4GB give it half the total memory (2GB), for a production system you would want something more than a 4GB machine.

Here are some JIRAs and mailing list topics on the subject of large quantities of CFs:

https://issues.apache.org/jira/browse/CASSANDRA-7643
https://issues.apache.org/jira/browse/CASSANDRA-6794
https://issues.apache.org/jira/browse/CASSANDRA-7444
http://mail-archives.apache.org/mod_mbox/cassandra-user/201407.mbox/%3C10D771CCF4F243149C928D0CB32BCD78@JackKrupansky14%3E
http://mail-archives.apache.org/mod_mbox/cassandra-user/201408.mbox/%3CCAAZU44m87C1yUFfz08nzVtkQNwW95YAw9bOsY_Ugu0fsWL7VsQ@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201408.mbox/%3CCALRai9Ao=mdkRKLOwrbYaJJP+fc4h5Tpx-EjGDqXTaYqj5ubGw@mail.gmail.com%3E


Regards,
Mark

On 28 October 2014 17:19, Bryan Talbot <br...@playnext.com>> wrote:
On Tue, Oct 28, 2014 at 9:02 AM, Adria Arcarons <Ad...@greenpowermonitor.com>> wrote:
Hi,
Hi



We have about 50.000 CFs of varying size



The writing test consists of a continuous flow of inserts. The inserts are done inside BATCH statements in groups of 1.000 to a single CF at a time to make them faster.



The problem I’m experiencing is that, eventually, when the script has been running for almost 40mins, the heap gets saturated. OldGen gets full and then there is an intensive GC activity trying to free OldGen objects, but it can only free very little space in each pass. Then GC saturates the CPU. Here are the graphs obtained with VisualVM that show this behavior:


My total heap size is 1GB and the the NewGen region of 256MB. The C* node has 4GB RAM. Intel Xeon CPU E5520 @


Without looking at your VM graphs, I'm going to go out on a limb here and say that your host is woefully underpowered to host fifty-thousand column families and batch writes of one-thousand statements.

A 1 GB java heap size is sometimes acceptable for a unit test or playing around with but you can't actually expect it to be adequate for a load test can you?

Every CF consumes some permanent heap space for its metadata. Too many CF are a bad thing. You probably have ten times more CF than would be recommended as an upper limit.

-Bryan



Re: OldGen saturation

Posted by Mark Reddy <ma...@gmail.com>.
Hi Adrià,

We have about 50.000 CFs of varying size


Before I read any further, having 50,000 CFs is something that I would
highly discourage. Each column family is allocated 1MB of available memory (
CASSANDRA-2252 <https://issues.apache.org/jira/browse/CASSANDRA-2252>) so
having anything over a few hundred on a 1GB heap would be the first thing I
would reconsider. Also 1GB isn't something I'd run a production or load
test  Cassandra on. If your test machine has only 4GB give it half the
total memory (2GB), for a production system you would want something more
than a 4GB machine.

Here are some JIRAs and mailing list topics on the subject of large
quantities of CFs:

https://issues.apache.org/jira/browse/CASSANDRA-7643
https://issues.apache.org/jira/browse/CASSANDRA-6794
https://issues.apache.org/jira/browse/CASSANDRA-7444
http://mail-archives.apache.org/mod_mbox/cassandra-user/201407.mbox/%3C10D771CCF4F243149C928D0CB32BCD78@JackKrupansky14%3E
http://mail-archives.apache.org/mod_mbox/cassandra-user/201408.mbox/%3CCAAZU44m87C1yUFfz08nzVtkQNwW95YAw9bOsY_Ugu0fsWL7VsQ@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201408.mbox/%3CCALRai9Ao=mdkRKLOwrbYaJJP+fc4h5Tpx-EjGDqXTaYqj5ubGw@mail.gmail.com%3E


Regards,
Mark

On 28 October 2014 17:19, Bryan Talbot <br...@playnext.com> wrote:

> On Tue, Oct 28, 2014 at 9:02 AM, Adria Arcarons <
> Adria.Arcarons@greenpowermonitor.com> wrote:
>
>>  Hi,
>>
>> Hi
>
>
>
>>
>>
>> We have about 50.000 CFs of varying size
>>
>
>
>>
>>
>
>>
>> The writing test consists of a continuous flow of inserts. The inserts
>> are done inside BATCH statements in groups of 1.000 to a single CF at a
>> time to make them faster.
>>
>
>
>
>>
>>
>> The problem I’m experiencing is that, eventually, when the script has
>> been running for almost 40mins, the heap gets saturated. OldGen gets full
>> and then there is an intensive GC activity trying to free OldGen objects,
>> but it can only free very little space in each pass. Then GC saturates the
>> CPU. Here are the graphs obtained with VisualVM that show this behavior:
>>
>>
>>
>>
>>
>> My total heap size is 1GB and the the NewGen region of 256MB. The C* node
>> has 4GB RAM. Intel Xeon CPU E5520 @
>>
>
>
> Without looking at your VM graphs, I'm going to go out on a limb here and
> say that your host is woefully underpowered to host fifty-thousand column
> families and batch writes of one-thousand statements.
>
> A 1 GB java heap size is sometimes acceptable for a unit test or playing
> around with but you can't actually expect it to be adequate for a load test
> can you?
>
> Every CF consumes some permanent heap space for its metadata. Too many CF
> are a bad thing. You probably have ten times more CF than would be
> recommended as an upper limit.
>
> -Bryan
>
>

Re: OldGen saturation

Posted by Bryan Talbot <br...@playnext.com>.
On Tue, Oct 28, 2014 at 9:02 AM, Adria Arcarons <
Adria.Arcarons@greenpowermonitor.com> wrote:

>  Hi,
>
> Hi



>
>
> We have about 50.000 CFs of varying size
>


>
>

>
> The writing test consists of a continuous flow of inserts. The inserts are
> done inside BATCH statements in groups of 1.000 to a single CF at a time to
> make them faster.
>



>
>
> The problem I’m experiencing is that, eventually, when the script has been
> running for almost 40mins, the heap gets saturated. OldGen gets full and
> then there is an intensive GC activity trying to free OldGen objects, but
> it can only free very little space in each pass. Then GC saturates the CPU.
> Here are the graphs obtained with VisualVM that show this behavior:
>
>
>
>
>
> My total heap size is 1GB and the the NewGen region of 256MB. The C* node
> has 4GB RAM. Intel Xeon CPU E5520 @
>


Without looking at your VM graphs, I'm going to go out on a limb here and
say that your host is woefully underpowered to host fifty-thousand column
families and batch writes of one-thousand statements.

A 1 GB java heap size is sometimes acceptable for a unit test or playing
around with but you can't actually expect it to be adequate for a load test
can you?

Every CF consumes some permanent heap space for its metadata. Too many CF
are a bad thing. You probably have ten times more CF than would be
recommended as an upper limit.

-Bryan