You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2011/11/15 12:34:02 UTC

Compaction -> CPU load 100% -> time out

Hi, I'm running a 3 node cassandra 1.0.2 cluster on 3 Amazon EC2 t1.micro.

I managed to fix some OOM I had, but I still have some spike of cpu load.

I know that t1.micro have small resources, but I think it could be enough
if they were well managed.

My application works well, excepted when cassandra need to run a compaction
on a node. To do it, Cassandra uses 100% of the cpu, generating a lot of
time out. My time out is configured to 250 ms with 2 attempt max. I'm
running in production, our actual system use MySQL and we are trying to
replace MySQLwith Cassandra. Cassandra musn't slow down the production
environnement while we use both DB in parallel, that is why I can't
increase the time before a time out.

Running this compaction in background somehow could be a good idea, after
my seach about this subject, I tried by adding JVM_OPTS="$JVM_OPTS
-Dcassandra.compaction.priority=1" to the cassandra-env.sh

This option was added for Cassandra 0.6.3, is it still usefull ? It doesn't
resolve my problem.

Anyways, this doesn't help while performing a nodetool repair, the cpu load
is still 100%.

Is there a way to turn these exceptional tasks into backgrounds tasks,
using only available cpu ?

Is there a way to get Cassandra working properly on EC2 t1.micros ?

Thanks,

Alain

Re: Compaction -> CPU load 100% -> time out

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

This is already a lot better. While compacting, the cpu load remain quite
low. However, I still have some spikes of overload generating timeouts. Is
there some others tunes I can do to make this compaction more stable ?

2011/11/22 Jonathan Ellis <jb...@gmail.com>

> m1.small is still... small.  start by turning
> compaction_throughput_mb_per_sec all the way down to 1MB/s.
>
> On Tue, Nov 22, 2011 at 9:58 AM, Alain RODRIGUEZ <ar...@gmail.com>
> wrote:
> > I followed your advice and install a 3 m1.small instance cluster. The
> > problem is still there. I've got less timeouts because I have less
> > compaction due to a bigger amount of memory usable before flushing, but
> when
> > a compaction starts, I can reach 95% of the cpu used, which produces
> > timeouts. The compaction run faster, so I have less time out but they are
> > still some.
> > Is there really no way to turn compaction into a background and low CPU
> > consumption task ?
> > What kind of information can I give you to help you understanding what is
> > going on with these timeouts ?
> >
> > 2011/11/15 Dan Hendry <da...@gmail.com>
> >>
> >> I really don’t recommend using t1.micros. The problem with them is that
> >> they have CPU bursting, basically meaning you get lots of CPU resources
> for
> >> a short time but if you use more than you have been allocated you get
> >> basically nothing for 10+ seconds afterwards. By ‘basically nothing’ I
> >> really mean that – the machine is effectively dead. The biggest problem
> with
> >> this (which we found out the hard way, within a test environment
> thankfully)
> >> is that it makes capacity planning extremely difficult – the line
> between
> >> having a cluster with sufficient capacity and being overloaded is
> extremely
> >> abrupt and very difficult to see coming. Moreover once you are over
> >> capacity, the ‘dead periods caused’ by CPU bursting cause things spiral
> out
> >> of control rapidly due to overtly aggressive client retries and hinted
> >> handoff increasing overall load (although the HH problem might have
> improved
> >> with 1.0.x). I would recommend m1.smalls at the very least.
> >>
> >>
> >>
> >> If you are set on micros, make sure you only ever trigger compaction on
> >> one node at a time (or better, consider if you even need to trigger
> major
> >> compactions at all), set compaction_throughput_mb_per_sec
> (cassandra.yaml)
> >> as low as you possibly can (1 is the minimum I believe), try disabling
> >> hinted handoff (on all nodes), and use lower read/write consistency
> levels
> >> if you can.
> >>
> >>
> >>
> >> Dan
> >>
> >>
> >>
> >> From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
> >> Sent: November-15-11 6:34
> >> To: user@cassandra.apache.org
> >> Subject: Compaction -> CPU load 100% -> time out
> >>
> >>
> >>
> >> Hi, I'm running a 3 node cassandra 1.0.2 cluster on 3 Amazon EC2
> t1.micro.
> >>
> >>
> >>
> >> I managed to fix some OOM I had, but I still have some spike of cpu
> load.
> >>
> >>
> >>
> >> I know that t1.micro have small resources, but I think it could be
> enough
> >> if they were well managed.
> >>
> >>
> >>
> >> My application works well, excepted when cassandra need to run a
> >> compaction on a node. To do it, Cassandra uses 100% of the cpu,
> generating a
> >> lot of time out. My time out is configured to 250 ms with 2 attempt
> max. I'm
> >> running in production, our actual system use MySQL and we are trying to
> >> replace MySQLwith Cassandra. Cassandra musn't slow down the production
> >> environnement while we use both DB in parallel, that is why I can't
> increase
> >> the time before a time out.
> >>
> >>
> >>
> >> Running this compaction in background somehow could be a good idea,
> after
> >> my seach about this subject, I tried by adding JVM_OPTS="$JVM_OPTS
> >> -Dcassandra.compaction.priority=1" to the cassandra-env.sh
> >>
> >>
> >>
> >> This option was added for Cassandra 0.6.3, is it still usefull ? It
> >> doesn't resolve my problem.
> >>
> >>
> >>
> >> Anyways, this doesn't help while performing a nodetool repair, the cpu
> >> load is still 100%.
> >>
> >>
> >>
> >> Is there a way to turn these exceptional tasks into backgrounds tasks,
> >> using only available cpu ?
> >>
> >>
> >>
> >> Is there a way to get Cassandra working properly on EC2 t1.micros ?
> >>
> >>
> >>
> >> Thanks,
> >>
> >>
> >>
> >> Alain
> >>
> >> No virus found in this incoming message.
> >> Checked by AVG - www.avg.com
> >> Version: 9.0.920 / Virus Database: 271.1.1/4017 - Release Date: 11/14/11
> >> 14:34:00
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Compaction -> CPU load 100% -> time out

Posted by Jonathan Ellis <jb...@gmail.com>.

m1.small is still... small.  start by turning
compaction_throughput_mb_per_sec all the way down to 1MB/s.

On Tue, Nov 22, 2011 at 9:58 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
> I followed your advice and install a 3 m1.small instance cluster. The
> problem is still there. I've got less timeouts because I have less
> compaction due to a bigger amount of memory usable before flushing, but when
> a compaction starts, I can reach 95% of the cpu used, which produces
> timeouts. The compaction run faster, so I have less time out but they are
> still some.
> Is there really no way to turn compaction into a background and low CPU
> consumption task ?
> What kind of information can I give you to help you understanding what is
> going on with these timeouts ?
>
> 2011/11/15 Dan Hendry <da...@gmail.com>
>>
>> I really don’t recommend using t1.micros. The problem with them is that
>> they have CPU bursting, basically meaning you get lots of CPU resources for
>> a short time but if you use more than you have been allocated you get
>> basically nothing for 10+ seconds afterwards. By ‘basically nothing’ I
>> really mean that – the machine is effectively dead. The biggest problem with
>> this (which we found out the hard way, within a test environment thankfully)
>> is that it makes capacity planning extremely difficult – the line between
>> having a cluster with sufficient capacity and being overloaded is extremely
>> abrupt and very difficult to see coming. Moreover once you are over
>> capacity, the ‘dead periods caused’ by CPU bursting cause things spiral out
>> of control rapidly due to overtly aggressive client retries and hinted
>> handoff increasing overall load (although the HH problem might have improved
>> with 1.0.x). I would recommend m1.smalls at the very least.
>>
>>
>>
>> If you are set on micros, make sure you only ever trigger compaction on
>> one node at a time (or better, consider if you even need to trigger major
>> compactions at all), set compaction_throughput_mb_per_sec (cassandra.yaml)
>> as low as you possibly can (1 is the minimum I believe), try disabling
>> hinted handoff (on all nodes), and use lower read/write consistency levels
>> if you can.
>>
>>
>>
>> Dan
>>
>>
>>
>> From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
>> Sent: November-15-11 6:34
>> To: user@cassandra.apache.org
>> Subject: Compaction -> CPU load 100% -> time out
>>
>>
>>
>> Hi, I'm running a 3 node cassandra 1.0.2 cluster on 3 Amazon EC2 t1.micro.
>>
>>
>>
>> I managed to fix some OOM I had, but I still have some spike of cpu load.
>>
>>
>>
>> I know that t1.micro have small resources, but I think it could be enough
>> if they were well managed.
>>
>>
>>
>> My application works well, excepted when cassandra need to run a
>> compaction on a node. To do it, Cassandra uses 100% of the cpu, generating a
>> lot of time out. My time out is configured to 250 ms with 2 attempt max. I'm
>> running in production, our actual system use MySQL and we are trying to
>> replace MySQLwith Cassandra. Cassandra musn't slow down the production
>> environnement while we use both DB in parallel, that is why I can't increase
>> the time before a time out.
>>
>>
>>
>> Running this compaction in background somehow could be a good idea, after
>> my seach about this subject, I tried by adding JVM_OPTS="$JVM_OPTS
>> -Dcassandra.compaction.priority=1" to the cassandra-env.sh
>>
>>
>>
>> This option was added for Cassandra 0.6.3, is it still usefull ? It
>> doesn't resolve my problem.
>>
>>
>>
>> Anyways, this doesn't help while performing a nodetool repair, the cpu
>> load is still 100%.
>>
>>
>>
>> Is there a way to turn these exceptional tasks into backgrounds tasks,
>> using only available cpu ?
>>
>>
>>
>> Is there a way to get Cassandra working properly on EC2 t1.micros ?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Alain
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.920 / Virus Database: 271.1.1/4017 - Release Date: 11/14/11
>> 14:34:00
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Compaction -> CPU load 100% -> time out

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

I followed your advice and install a 3 m1.small instance cluster. The
problem is still there. I've got less timeouts because I have less
compaction due to a bigger amount of memory usable before flushing, but
when a compaction starts, I can reach 95% of the cpu used, which produces
timeouts. The compaction run faster, so I have less time out but they are
still some.

Is there really no way to turn compaction into a background and low CPU
consumption task ?

What kind of information can I give you to help you understanding what is
going on with these timeouts ?

2011/11/15 Dan Hendry <da...@gmail.com>

> I really don’t recommend using t1.micros. The problem with them is that
> they have CPU bursting, basically meaning you get lots of CPU resources for
> a short time but if you use more than you have been allocated you get
> basically nothing for 10+ seconds afterwards. By ‘basically nothing’ I
> really mean that – the machine is effectively dead. The biggest problem
> with this (which we found out the hard way, within a test environment
> thankfully) is that it makes capacity planning extremely difficult – the
> line between having a cluster with sufficient capacity and being overloaded
> is extremely abrupt and very difficult to see coming. Moreover once you are
> over capacity, the ‘dead periods caused’ by CPU bursting cause things
> spiral out of control rapidly due to overtly aggressive client retries and
> hinted handoff increasing overall load (although the HH problem might have
> improved with 1.0.x). I would recommend m1.smalls at the very least.****
>
> ** **
>
> If you are set on micros, make sure you only ever trigger compaction on
> one node at a time (or better, consider if you even need to trigger major
> compactions at all), set compaction_throughput_mb_per_sec (cassandra.yaml)
> as low as you possibly can (1 is the minimum I believe), try disabling
> hinted handoff (on all nodes), and use lower read/write consistency levels
> if you can.****
>
> ** **
>
> Dan****
>
> ** **
>
> *From:* Alain RODRIGUEZ [mailto:arodrime@gmail.com]
> *Sent:* November-15-11 6:34
> *To:* user@cassandra.apache.org
> *Subject:* Compaction -> CPU load 100% -> time out****
>
> ** **
>
> Hi, I'm running a 3 node cassandra 1.0.2 cluster on 3 Amazon EC2 t1.micro.
> ****
>
> ** **
>
> I managed to fix some OOM I had, but I still have some spike of cpu load.*
> ***
>
> ** **
>
> I know that t1.micro have small resources, but I think it could be enough
> if they were well managed.****
>
> ** **
>
> My application works well, excepted when cassandra need to run a
> compaction on a node. To do it, Cassandra uses 100% of the cpu, generating
> a lot of time out. My time out is configured to 250 ms with 2 attempt max.
> I'm running in production, our actual system use MySQL and we are trying to
> replace MySQLwith Cassandra. Cassandra musn't slow down the production
> environnement while we use both DB in parallel, that is why I can't
> increase the time before a time out.****
>
> ** **
>
> Running this compaction in background somehow could be a good idea, after
> my seach about this subject, I tried by adding JVM_OPTS="$JVM_OPTS
> -Dcassandra.compaction.priority=1" to the cassandra-env.sh****
>
> ** **
>
> This option was added for Cassandra 0.6.3, is it still usefull ? It
> doesn't resolve my problem.****
>
> ** **
>
> Anyways, this doesn't help while performing a nodetool repair, the cpu
> load is still 100%.****
>
> ** **
>
> Is there a way to turn these exceptional tasks into backgrounds tasks,
> using only available cpu ?****
>
> ** **
>
> Is there a way to get Cassandra working properly on EC2 t1.micros ?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Alain****
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.920 / Virus Database: 271.1.1/4017 - Release Date: 11/14/11
> 14:34:00****
>

RE: Compaction -> CPU load 100% -> time out

Posted by Dan Hendry <da...@gmail.com>.

I really don't recommend using t1.micros. The problem with them is that they
have CPU bursting, basically meaning you get lots of CPU resources for a
short time but if you use more than you have been allocated you get
basically nothing for 10+ seconds afterwards. By 'basically nothing' I
really mean that - the machine is effectively dead. The biggest problem with
this (which we found out the hard way, within a test environment thankfully)
is that it makes capacity planning extremely difficult - the line between
having a cluster with sufficient capacity and being overloaded is extremely
abrupt and very difficult to see coming. Moreover once you are over
capacity, the 'dead periods caused' by CPU bursting cause things spiral out
of control rapidly due to overtly aggressive client retries and hinted
handoff increasing overall load (although the HH problem might have improved
with 1.0.x). I would recommend m1.smalls at the very least.

 

If you are set on micros, make sure you only ever trigger compaction on one
node at a time (or better, consider if you even need to trigger major
compactions at all), set compaction_throughput_mb_per_sec (cassandra.yaml)
as low as you possibly can (1 is the minimum I believe), try disabling
hinted handoff (on all nodes), and use lower read/write consistency levels
if you can.

 

Dan

 

From: Alain RODRIGUEZ [mailto:arodrime@gmail.com] 
Sent: November-15-11 6:34
To: user@cassandra.apache.org
Subject: Compaction -> CPU load 100% -> time out

 

Hi, I'm running a 3 node cassandra 1.0.2 cluster on 3 Amazon EC2 t1.micro.

 

I managed to fix some OOM I had, but I still have some spike of cpu load.

 

I know that t1.micro have small resources, but I think it could be enough if
they were well managed.

 

My application works well, excepted when cassandra need to run a compaction
on a node. To do it, Cassandra uses 100% of the cpu, generating a lot of
time out. My time out is configured to 250 ms with 2 attempt max. I'm
running in production, our actual system use MySQL and we are trying to
replace MySQLwith Cassandra. Cassandra musn't slow down the production
environnement while we use both DB in parallel, that is why I can't increase
the time before a time out.

 

Running this compaction in background somehow could be a good idea, after my
seach about this subject, I tried by adding JVM_OPTS="$JVM_OPTS
-Dcassandra.compaction.priority=1" to the cassandra-env.sh

 

This option was added for Cassandra 0.6.3, is it still usefull ? It doesn't
resolve my problem.

 

Anyways, this doesn't help while performing a nodetool repair, the cpu load
is still 100%.

 

Is there a way to turn these exceptional tasks into backgrounds tasks, using
only available cpu ?

 

Is there a way to get Cassandra working properly on EC2 t1.micros ?

 

Thanks,

 

Alain

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.920 / Virus Database: 271.1.1/4017 - Release Date: 11/14/11
14:34:00