You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Steppacher Ralf <ra...@derivativepartners.com> on 2013/04/30 18:47:36 UTC

How does a healthy node look like?

Hi,

I have troubles finding some quantitative information as to how a healthy Cassandra node should look like (CPU usage, number of flushes,SSTables, compactions, GC), given a certain hardware spec and read/write load. I have troubles gauging our first and only Cassandra node, whether it needs tuning or is simply overloaded.
If anyone could point me to some data that would be very helpful.

(So far I have run the node with the default settings in cassandra.yaml and cassandra-env. The log claims that the server is occasionally under memory pressure and I get frequent timeouts for writes.  I see what I think are many flushes, compactions and GCs in the log. Some toying with heap and new gen sizes, key cache, and the compaction throughput settings did not improve the overall situation much.)


Thanks!
Ralf

RE: How does a healthy node look like?

Posted by Steppacher Ralf <ra...@derivativepartners.com>.

Re timeouts:
I receive the following exception from Hector: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException(acknowledged_by:0) 
I assumed that this is a server-side timeout. Also because increasing the xxx_request_timeout_in_ms parameter values made the exception go away.
I went from 20 to 60 seconds for the timeouts. Now I am not getting any HTimedOutException any more.

Re cores:
Yes, we have one node on a server with 6 cores.

Re tombstones:
Deletion is a new trick for us. Up until two weeks ago we always truncated all column families in the early morning and already then the write timeouts occured. 
No, we do not do range slices over deleted rows. We also set the gc_grace parameter to 0 for all columns families, as we are running a single node at the moment. So even if we were to do range slices over deleted rows, the tombstones should be very short lived?

Re cfstats/cfhistograms:
They are attached. The histogram I created for the column families that store the event type that occurs most often.

Re GC logging:
I went all in and activated all output. I ran it through gcviewer but it complained a lot about non-parsable lines, so I am not sure how reliable the output is. It claims that on average about 500MB are collected, but at the same time that the average heap usage after GC is only about 100MB

Side node: We added more RAM to the machine, so now Cassandra starts with a bit more than 8GB by default.


Thanks!
Ralf

________________________________________
From: aaron morton [aaron@thelastpickle.com]
Sent: Monday, May 06, 2013 10:43
To: user@cassandra.apache.org
Subject: Re: How does a healthy node look like?

Confirm if your write timeouts are client side socket time outs or the TimedOutException from the server.

Typically write latency is related to GC problems, like you are seeing.

I'm unsure how much CPU resources each cassandra instance has. Is there one node on a machine with 6 cores ?
How many rows are on the node and how wide are the rows ? cfstats or cfhistgrams will help.
Enable the gc logging, or use something like Data Stax OpsCentre, to see how low the heap gets after a CMS GC.

> The write-timeouts correlate with the hours of high (ca. >450/h) "GC for ParNew". I never saw any read-timeouts. I set all timeouts to 20 seconds in cassandra.yaml.
That'll do it.

> To do so we iterate over all rows in the three time-line column families and load the value of the column that is most recent given a cut-off timestamp.
…
> Every night we delete all events that are older than 2 days. Again in batches of 100 rows.
Are you deleting rows from the CF's that you then do a range slice on ?
The tombstones may be hurting you on the range scans, can you remove them ?

Hope that helps.

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/05/2013, at 9:25 PM, Steppacher Ralf <ra...@derivativepartners.com> wrote:

> Sure, I can do that.
>
> My main concern is write latency and the write timeouts we are experiencing. Read latency is secondary, as long as we do not introduce timeouts on read and do not exceed our sampling intervals (see below).
>
> We are running Cassandra 1.2.1 on Ubuntu 12.04 with JDK 1.7.0_17 (64bit).
> The hardware is virtual but so far we are the only tenant on the physical host.
>
> Hardware:
> - 1x6 cores with 2.3GHz
> - 30GB RAM
> - 1 physical disk for both the tx log and the data files
> - 2 x 1GB Ethernet combined into one virtual interface
>
> Cassandra Config:
> Cassandra runs with
> - 7.5GB of heap and
> - 600MB of new gen space
> as calculated by the cassandra-env script.
> I have adjusted all cassandra.yaml settings where clear guidance is given, e.g. <factor> x <num_cores>.
> I have tried to increase and decrease heap (between 6 and 8GB) and new gen size (between 300 and 1.1GB).
> I have tried compaction_throughput_mb_per_sec values between 16 and 48.
> I have disabled key caches.
>
> Unfortunately Cassandra has to share the host with other Java processes, the most resource demanding being ActiveMQ 5.8.
>
> Log Output:
> Over the course of a day (08:00 to 22:00) I see in the logs
> - 280 and 760 "GC for ParNew" per hour (most around 300/h)
> - 60 and 180 "Completed flushing" per hour (most around 100/h)
> - 17 and 46 "Compacted N sstables to" per hour (most around 35/h)
>
> Data Model:
> The data model is made up of 6 column families. 3 are dynamic to capture the time-line of 3 event types; each event creates a new column and the value is the row key of the event. 3 have a static schema and store the event itself.
> The largest event messages has 16 attributes. All are short text identifiers, floating point numbers and timestamps. For storage in Cassandra every attribute is converted to a string and stored with the utf8 validator.
>
> Timeouts and Memory pressure:
> The write-timeouts correlate with the hours of high (ca. >450/h) "GC for ParNew". I never saw any read-timeouts. I set all timeouts to 20 seconds in cassandra.yaml.
> Cassandra comes under memory pressure ("Flushing CFS X to relieve memory pressure") between 3 and 5 times a day. The tendency is for it to happen in the afternoon and evening. But also sometimes right after 08:00 in the morning. In about 75% of the cases it flushes one of the event column families, in 25% a time-line column family.
>
> Write Load:
> We collect events for a theoretical universe of 2.2 million items -> there are a  max of 2.2 million rows in each of the time-line column families, but I never saw an estimated row count in the cfstats of more than 1 million.
> Roughly 1/3 of the entities receive a maximum of 3 events, one of each event type, in a 15 minutes interval from 08:00 to 22:00. The other 2/3 receive 3 events 3 times a day. About 16'000 entities receive only one event type, but about once in 3 minutes.
> On a typical day the load adds up to about 70 to 80 million messages.
> Not all messages are original though. The sources will re-send an event in every interval if there are no new events. The noise ratio I do not know. I guestimate it to be at least 50%. In case of a repeat the existing time-line column and event row are updated with their previous values.
>
> Read Load:
> In one hour intervals we sample a time coherent snapshot of the events. To do so we iterate over all rows in the three time-line column families and load the value of the column that is most recent given a cut-off timestamp. The value is the row key of the actual event, which we then load as well. We do that in batches of 100 rows at a time.
>
> Deletes:
> Every night we delete all events that are older than 2 days. Again in batches of 100 rows.
>
>
> Thanks for helping!
> Ralf
>
>
> From: Alain RODRIGUEZ [arodrime@gmail.com]
> Sent: Thursday, May 02, 2013 09:12
> To: user@cassandra.apache.org
> Subject: Re: How does a healthy node look like?
>
> Well, maybe should you describe us your hardware and the C* release toi are using. Also give us some metrics.
> Le 30 avr. 2013 18:48, "Steppacher Ralf" <ra...@derivativepartners.com> a écrit :
> Hi,
>
> I have troubles finding some quantitative information as to how a healthy Cassandra node should look like (CPU usage, number of flushes,SSTables, compactions, GC), given a certain hardware spec and read/write load. I have troubles gauging our first and only Cassandra node, whether it needs tuning or is simply overloaded.
> If anyone could point me to some data that would be very helpful.
>
> (So far I have run the node with the default settings in cassandra.yaml and cassandra-env. The log claims that the server is occasionally under memory pressure and I get frequent timeouts for writes.  I see what I think are many flushes, compactions and GCs in the log. Some toying with heap and new gen sizes, key cache, and the compaction throughput settings did not improve the overall situation much.)
>
>
> Thanks!
> Ralf

Re: How does a healthy node look like?

Posted by aaron morton <aa...@thelastpickle.com>.

Confirm if your write timeouts are client side socket time outs or the TimedOutException from the server. 

Typically write latency is related to GC problems, like you are seeing. 

I'm unsure how much CPU resources each cassandra instance has. Is there one node on a machine with 6 cores ? 
How many rows are on the node and how wide are the rows ? cfstats or cfhistgrams will help. 
Enable the gc logging, or use something like Data Stax OpsCentre, to see how low the heap gets after a CMS GC.

> The write-timeouts correlate with the hours of high (ca. >450/h) "GC for ParNew". I never saw any read-timeouts. I set all timeouts to 20 seconds in cassandra.yaml.
That'll do it. 

> To do so we iterate over all rows in the three time-line column families and load the value of the column that is most recent given a cut-off timestamp.
…
> Every night we delete all events that are older than 2 days. Again in batches of 100 rows.
Are you deleting rows from the CF's that you then do a range slice on ?
The tombstones may be hurting you on the range scans, can you remove them ? 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/05/2013, at 9:25 PM, Steppacher Ralf <ra...@derivativepartners.com> wrote:

> Sure, I can do that.  
> 
> My main concern is write latency and the write timeouts we are experiencing. Read latency is secondary, as long as we do not introduce timeouts on read and do not exceed our sampling intervals (see below).
> 
> We are running Cassandra 1.2.1 on Ubuntu 12.04 with JDK 1.7.0_17 (64bit).
> The hardware is virtual but so far we are the only tenant on the physical host. 
> 
> Hardware:
> - 1x6 cores with 2.3GHz 
> - 30GB RAM 
> - 1 physical disk for both the tx log and the data files
> - 2 x 1GB Ethernet combined into one virtual interface
> 
> Cassandra Config:
> Cassandra runs with 
> - 7.5GB of heap and 
> - 600MB of new gen space
> as calculated by the cassandra-env script.
> I have adjusted all cassandra.yaml settings where clear guidance is given, e.g. <factor> x <num_cores>.
> I have tried to increase and decrease heap (between 6 and 8GB) and new gen size (between 300 and 1.1GB).
> I have tried compaction_throughput_mb_per_sec values between 16 and 48.
> I have disabled key caches.
> 
> Unfortunately Cassandra has to share the host with other Java processes, the most resource demanding being ActiveMQ 5.8.
> 
> Log Output:
> Over the course of a day (08:00 to 22:00) I see in the logs
> - 280 and 760 "GC for ParNew" per hour (most around 300/h)
> - 60 and 180 "Completed flushing" per hour (most around 100/h)
> - 17 and 46 "Compacted N sstables to" per hour (most around 35/h)
> 
> Data Model:
> The data model is made up of 6 column families. 3 are dynamic to capture the time-line of 3 event types; each event creates a new column and the value is the row key of the event. 3 have a static schema and store the event itself.
> The largest event messages has 16 attributes. All are short text identifiers, floating point numbers and timestamps. For storage in Cassandra every attribute is converted to a string and stored with the utf8 validator.
> 
> Timeouts and Memory pressure:
> The write-timeouts correlate with the hours of high (ca. >450/h) "GC for ParNew". I never saw any read-timeouts. I set all timeouts to 20 seconds in cassandra.yaml.
> Cassandra comes under memory pressure ("Flushing CFS X to relieve memory pressure") between 3 and 5 times a day. The tendency is for it to happen in the afternoon and evening. But also sometimes right after 08:00 in the morning. In about 75% of the cases it flushes one of the event column families, in 25% a time-line column family.
> 
> Write Load:
> We collect events for a theoretical universe of 2.2 million items -> there are a  max of 2.2 million rows in each of the time-line column families, but I never saw an estimated row count in the cfstats of more than 1 million.
> Roughly 1/3 of the entities receive a maximum of 3 events, one of each event type, in a 15 minutes interval from 08:00 to 22:00. The other 2/3 receive 3 events 3 times a day. About 16'000 entities receive only one event type, but about once in 3 minutes. 
> On a typical day the load adds up to about 70 to 80 million messages.
> Not all messages are original though. The sources will re-send an event in every interval if there are no new events. The noise ratio I do not know. I guestimate it to be at least 50%. In case of a repeat the existing time-line column and event row are updated with their previous values.
> 
> Read Load:
> In one hour intervals we sample a time coherent snapshot of the events. To do so we iterate over all rows in the three time-line column families and load the value of the column that is most recent given a cut-off timestamp. The value is the row key of the actual event, which we then load as well. We do that in batches of 100 rows at a time.
> 
> Deletes:
> Every night we delete all events that are older than 2 days. Again in batches of 100 rows.
> 
> 
> Thanks for helping!
> Ralf
> 
> 
> From: Alain RODRIGUEZ [arodrime@gmail.com]
> Sent: Thursday, May 02, 2013 09:12
> To: user@cassandra.apache.org
> Subject: Re: How does a healthy node look like?
> 
> Well, maybe should you describe us your hardware and the C* release toi are using. Also give us some metrics.
> Le 30 avr. 2013 18:48, "Steppacher Ralf" <ra...@derivativepartners.com> a écrit :
> Hi,
> 
> I have troubles finding some quantitative information as to how a healthy Cassandra node should look like (CPU usage, number of flushes,SSTables, compactions, GC), given a certain hardware spec and read/write load. I have troubles gauging our first and only Cassandra node, whether it needs tuning or is simply overloaded.
> If anyone could point me to some data that would be very helpful.
> 
> (So far I have run the node with the default settings in cassandra.yaml and cassandra-env. The log claims that the server is occasionally under memory pressure and I get frequent timeouts for writes.  I see what I think are many flushes, compactions and GCs in the log. Some toying with heap and new gen sizes, key cache, and the compaction throughput settings did not improve the overall situation much.)
> 
> 
> Thanks!
> Ralf

RE: How does a healthy node look like?

Posted by Steppacher Ralf <ra...@derivativepartners.com>.

Sure, I can do that.

My main concern is write latency and the write timeouts we are experiencing. Read latency is secondary, as long as we do not introduce timeouts on read and do not exceed our sampling intervals (see below).

We are running Cassandra 1.2.1 on Ubuntu 12.04 with JDK 1.7.0_17 (64bit).
The hardware is virtual but so far we are the only tenant on the physical host.

Hardware:
- 1x6 cores with 2.3GHz
- 30GB RAM
- 1 physical disk for both the tx log and the data files
- 2 x 1GB Ethernet combined into one virtual interface

Cassandra Config:
Cassandra runs with
- 7.5GB of heap and
- 600MB of new gen space
as calculated by the cassandra-env script.
I have adjusted all cassandra.yaml settings where clear guidance is given, e.g. <factor> x <num_cores>.
I have tried to increase and decrease heap (between 6 and 8GB) and new gen size (between 300 and 1.1GB).
I have tried compaction_throughput_mb_per_sec values between 16 and 48.
I have disabled key caches.

Unfortunately Cassandra has to share the host with other Java processes, the most resource demanding being ActiveMQ 5.8.

Log Output:
Over the course of a day (08:00 to 22:00) I see in the logs
- 280 and 760 "GC for ParNew" per hour (most around 300/h)
- 60 and 180 "Completed flushing" per hour (most around 100/h)
- 17 and 46 "Compacted N sstables to" per hour (most around 35/h)

Data Model:
The data model is made up of 6 column families. 3 are dynamic to capture the time-line of 3 event types; each event creates a new column and the value is the row key of the event. 3 have a static schema and store the event itself.
The largest event messages has 16 attributes. All are short text identifiers, floating point numbers and timestamps. For storage in Cassandra every attribute is converted to a string and stored with the utf8 validator.

Timeouts and Memory pressure:
The write-timeouts correlate with the hours of high (ca. >450/h) "GC for ParNew". I never saw any read-timeouts. I set all timeouts to 20 seconds in cassandra.yaml.
Cassandra comes under memory pressure ("Flushing CFS X to relieve memory pressure") between 3 and 5 times a day. The tendency is for it to happen in the afternoon and evening. But also sometimes right after 08:00 in the morning. In about 75% of the cases it flushes one of the event column families, in 25% a time-line column family.

Write Load:
We collect events for a theoretical universe of 2.2 million items -> there are a  max of 2.2 million rows in each of the time-line column families, but I never saw an estimated row count in the cfstats of more than 1 million.
Roughly 1/3 of the entities receive a maximum of 3 events, one of each event type, in a 15 minutes interval from 08:00 to 22:00. The other 2/3 receive 3 events 3 times a day. About 16'000 entities receive only one event type, but about once in 3 minutes.
On a typical day the load adds up to about 70 to 80 million messages.
Not all messages are original though. The sources will re-send an event in every interval if there are no new events. The noise ratio I do not know. I guestimate it to be at least 50%. In case of a repeat the existing time-line column and event row are updated with their previous values.

Read Load:
In one hour intervals we sample a time coherent snapshot of the events. To do so we iterate over all rows in the three time-line column families and load the value of the column that is most recent given a cut-off timestamp. The value is the row key of the actual event, which we then load as well. We do that in batches of 100 rows at a time.

Deletes:
Every night we delete all events that are older than 2 days. Again in batches of 100 rows.


Thanks for helping!
Ralf


________________________________
From: Alain RODRIGUEZ [arodrime@gmail.com]
Sent: Thursday, May 02, 2013 09:12
To: user@cassandra.apache.org
Subject: Re: How does a healthy node look like?


Well, maybe should you describe us your hardware and the C* release toi are using. Also give us some metrics.

Le 30 avr. 2013 18:48, "Steppacher Ralf" <ra...@derivativepartners.com>> a écrit :
Hi,

I have troubles finding some quantitative information as to how a healthy Cassandra node should look like (CPU usage, number of flushes,SSTables, compactions, GC), given a certain hardware spec and read/write load. I have troubles gauging our first and only Cassandra node, whether it needs tuning or is simply overloaded.
If anyone could point me to some data that would be very helpful.

(So far I have run the node with the default settings in cassandra.yaml and cassandra-env. The log claims that the server is occasionally under memory pressure and I get frequent timeouts for writes.  I see what I think are many flushes, compactions and GCs in the log. Some toying with heap and new gen sizes, key cache, and the compaction throughput settings did not improve the overall situation much.)


Thanks!
Ralf

Re: How does a healthy node look like?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Well, maybe should you describe us your hardware and the C* release toi are
using. Also give us some metrics.
Le 30 avr. 2013 18:48, "Steppacher Ralf" <
ralf.steppacher@derivativepartners.com> a écrit :

>  Hi,
>
> I have troubles finding some quantitative information as to how a healthy
> Cassandra node should look like (CPU usage, number of flushes,SSTables,
> compactions, GC), given a certain hardware spec and read/write load. I have
> troubles gauging our first and only Cassandra node, whether it needs tuning
> or is simply overloaded.
> If anyone could point me to some data that would be very helpful.
>
> (So far I have run the node with the default settings in cassandra.yaml
> and cassandra-env. The log claims that the server is occasionally under
> memory pressure and I get frequent timeouts for writes.  I see what I think
> are many flushes, compactions and GCs in the log. Some toying with heap and
> new gen sizes, key cache, and the compaction throughput settings did not
> improve the overall situation much.)
>
>
> Thanks!
> Ralf
>