You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Tamar Fraenkel <ta...@tok-media.com> on 2013/02/10 13:01:30 UTC

High CPU usage during repair

Hi!
I run repair weekly, using a scheduled cron job.
During repair I see high CPU consumption, and messages in the log file
"INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
3894411264"
>From time to time, there are also messages of the form
"INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
(line 607) 1 READ messages dropped in last 5000ms"

Using opscenter, jmx and nodetool compactionstats I can see that during the
time the CPU consumption is high, there are compactions waiting.

I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
I have the default settings:
compaction_throughput_mb_per_sec: 16
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_preheat_key_cache: true

I am thinking on the following solution, and wanted to ask if I am on the
right track:
I thought of adding a call to my repair script, before repair starts to do:
nodetool setcompactionthroughput 0
and then when repair finishes call
nodetool setcompactionthroughput 16

Is this a right solution?
Thanks,
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956

Re: High CPU usage during repair

Posted by Tamar Fraenkel <ta...@tok-media.com>.
Thank you very much! Due to monetary limitations I will keep the m1.large
for now, but try the throughput modification.
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Mon, Feb 11, 2013 at 11:30 AM, aaron morton <aa...@thelastpickle.com>wrote:

>  What machine size?
>>
> m1.large
>
> If you are seeing high CPU move to an m1.xlarge, that's the sweet spot.
>
> That's normally ok. How many are waiting?
>>
>> I have seen 4 this morning
>
> That's not really abnormal.
> The pending task count goes when when a file *may* be eligible for
> compaction, not when there is a compaction task waiting.
>
> If you suddenly create a number of new SSTables for a CF the pending count
> will rise, however one of the tasks may compact all the sstables waiting
> for compaction. So the count will suddenly drop as well.
>
> Just to make sure I understand you correctly, you suggest that I change
> throughput to 12 regardless of whether repair is ongoing or not. I will do
> it using nodetool and change the yaml file in case a restart will occur in
> the future?
>
> Yes.
> If you are seeing performance degrade during compaction or repair try
> reducing the throughput.
>
> I would attribute most of the problems you have described to using
> m1.large.
>
> Cheers
>
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/02/2013, at 9:16 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:
>
> Hi!
> Thanks for the response.
> See my answers and questions below.
> Thanks!
> Tamar
>
>  *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> <tokLogo.png>
>
> tamar@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
> On Sun, Feb 10, 2013 at 10:04 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> During repair I see high CPU consumption,
>>
>> Repair reads the data and computes a hash, this is a CPU intensive
>> operation.
>> Is the CPU over loaded or is just under load?
>>
>  Usually just load, but in the past two weeks I have seen CPU of over 90%!
>
>> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
>>
>> What machine size?
>>
> m1.large
>
>>
>> there are compactions waiting.
>>
>> That's normally ok. How many are waiting?
>>
>> I have seen 4 this morning
>
>> I thought of adding a call to my repair script, before repair starts to
>> do:
>> nodetool setcompactionthroughput 0
>> and then when repair finishes call
>> nodetool setcompactionthroughput 16
>>
>> That will remove throttling on compaction and the validation compaction
>> used for the repair. Which may in turn add additional IO load, CPU load and
>> GC pressure. You probably do not want to do this.
>>
>> Try reducing the compaction throughput to say 12 normally and see the
>> effect.
>>
>> Just to make sure I understand you correctly, you suggest that I change
> throughput to 12 regardless of whether repair is ongoing or not. I will do
> it using nodetool and change the yaml file in case a restart will occur in
> the future?
>
>> Cheers
>>
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 11/02/2013, at 1:01 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:
>>
>> Hi!
>> I run repair weekly, using a scheduled cron job.
>> During repair I see high CPU consumption, and messages in the log file
>> "INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
>> 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
>> 3894411264"
>> From time to time, there are also messages of the form
>> "INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
>> (line 607) 1 READ messages dropped in last 5000ms"
>>
>> Using opscenter, jmx and nodetool compactionstats I can see that during
>> the time the CPU consumption is high, there are compactions waiting.
>>
>> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
>> I have the default settings:
>> compaction_throughput_mb_per_sec: 16
>> in_memory_compaction_limit_in_mb: 64
>> multithreaded_compaction: false
>> compaction_preheat_key_cache: true
>>
>> I am thinking on the following solution, and wanted to ask if I am on the
>> right track:
>> I thought of adding a call to my repair script, before repair starts to
>> do:
>> nodetool setcompactionthroughput 0
>> and then when repair finishes call
>> nodetool setcompactionthroughput 16
>>
>> Is this a right solution?
>> Thanks,
>> Tamar
>>
>> *Tamar Fraenkel *
>> Senior Software Engineer, TOK Media
>>
>> <tokLogo.png>
>>
>>
>> tamar@tok-media.com
>> Tel:   +972 2 6409736
>> Mob:  +972 54 8356490
>> Fax:   +972 2 5612956
>>
>>
>>
>>
>
>

Re: High CPU usage during repair

Posted by aaron morton <aa...@thelastpickle.com>.
> What machine size?
> m1.large 
If you are seeing high CPU move to an m1.xlarge, that's the sweet spot. 

> That's normally ok. How many are waiting?
> 
> I have seen 4 this morning 
That's not really abnormal. 
The pending task count goes when when a file *may* be eligible for compaction, not when there is a compaction task waiting. 

If you suddenly create a number of new SSTables for a CF the pending count will rise, however one of the tasks may compact all the sstables waiting for compaction. So the count will suddenly drop as well. 

> Just to make sure I understand you correctly, you suggest that I change throughput to 12 regardless of whether repair is ongoing or not. I will do it using nodetool and change the yaml file in case a restart will occur in the future? 
Yes. 
If you are seeing performance degrade during compaction or repair try reducing the throughput. 

I would attribute most of the problems you have described to using m1.large. 

Cheers
 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/02/2013, at 9:16 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:

> Hi!
> Thanks for the response.
> See my answers and questions below.
> Thanks!
> Tamar
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> <tokLogo.png>
> 
> tamar@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
> 
> 
> 
> 
> On Sun, Feb 10, 2013 at 10:04 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> During repair I see high CPU consumption, 
> Repair reads the data and computes a hash, this is a CPU intensive operation.
> Is the CPU over loaded or is just under load?
>  Usually just load, but in the past two weeks I have seen CPU of over 90%!
>> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
> 
> What machine size?
> m1.large 
> 
>> there are compactions waiting.
> That's normally ok. How many are waiting?
> 
> I have seen 4 this morning 
>> I thought of adding a call to my repair script, before repair starts to do:
>> nodetool setcompactionthroughput 0
>> and then when repair finishes call
>> nodetool setcompactionthroughput 16
> That will remove throttling on compaction and the validation compaction used for the repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do not want to do this. 
> 
> Try reducing the compaction throughput to say 12 normally and see the effect.
> 
> Just to make sure I understand you correctly, you suggest that I change throughput to 12 regardless of whether repair is ongoing or not. I will do it using nodetool and change the yaml file in case a restart will occur in the future? 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 11/02/2013, at 1:01 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:
> 
>> Hi!
>> I run repair weekly, using a scheduled cron job.
>> During repair I see high CPU consumption, and messages in the log file
>> "INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264"
>> From time to time, there are also messages of the form
>> "INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms"
>> 
>> Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting.
>> 
>> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
>> I have the default settings:
>> compaction_throughput_mb_per_sec: 16
>> in_memory_compaction_limit_in_mb: 64
>> multithreaded_compaction: false
>> compaction_preheat_key_cache: true
>> 
>> I am thinking on the following solution, and wanted to ask if I am on the right track:
>> I thought of adding a call to my repair script, before repair starts to do:
>> nodetool setcompactionthroughput 0
>> and then when repair finishes call
>> nodetool setcompactionthroughput 16
>> 
>> Is this a right solution?
>> Thanks,
>> Tamar
>> 
>> Tamar Fraenkel 
>> Senior Software Engineer, TOK Media 
>> 
>> <tokLogo.png>
>> 
>> 
>> tamar@tok-media.com
>> Tel:   +972 2 6409736 
>> Mob:  +972 54 8356490 
>> Fax:   +972 2 5612956 
>> 
>> 
> 
> 


Re: High CPU usage during repair

Posted by Tamar Fraenkel <ta...@tok-media.com>.
Hi!
Thanks for the response.
See my answers and questions below.
Thanks!
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Sun, Feb 10, 2013 at 10:04 PM, aaron morton <aa...@thelastpickle.com>wrote:

> During repair I see high CPU consumption,
>
> Repair reads the data and computes a hash, this is a CPU intensive
> operation.
> Is the CPU over loaded or is just under load?
>
 Usually just load, but in the past two weeks I have seen CPU of over 90%!

> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
>
> What machine size?
>
m1.large

>
> there are compactions waiting.
>
> That's normally ok. How many are waiting?
>
> I have seen 4 this morning

> I thought of adding a call to my repair script, before repair starts to do:
> nodetool setcompactionthroughput 0
> and then when repair finishes call
> nodetool setcompactionthroughput 16
>
> That will remove throttling on compaction and the validation compaction
> used for the repair. Which may in turn add additional IO load, CPU load and
> GC pressure. You probably do not want to do this.
>
> Try reducing the compaction throughput to say 12 normally and see the
> effect.
>
> Just to make sure I understand you correctly, you suggest that I change
throughput to 12 regardless of whether repair is ongoing or not. I will do
it using nodetool and change the yaml file in case a restart will occur in
the future?

> Cheers
>
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/02/2013, at 1:01 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:
>
> Hi!
> I run repair weekly, using a scheduled cron job.
> During repair I see high CPU consumption, and messages in the log file
> "INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
> 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
> 3894411264"
> From time to time, there are also messages of the form
> "INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
> (line 607) 1 READ messages dropped in last 5000ms"
>
> Using opscenter, jmx and nodetool compactionstats I can see that during
> the time the CPU consumption is high, there are compactions waiting.
>
> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
> I have the default settings:
> compaction_throughput_mb_per_sec: 16
> in_memory_compaction_limit_in_mb: 64
> multithreaded_compaction: false
> compaction_preheat_key_cache: true
>
> I am thinking on the following solution, and wanted to ask if I am on the
> right track:
> I thought of adding a call to my repair script, before repair starts to do:
> nodetool setcompactionthroughput 0
> and then when repair finishes call
> nodetool setcompactionthroughput 16
>
> Is this a right solution?
> Thanks,
> Tamar
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> <tokLogo.png>
>
>
> tamar@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>

Re: High CPU usage during repair

Posted by aaron morton <aa...@thelastpickle.com>.
> During repair I see high CPU consumption, 
Repair reads the data and computes a hash, this is a CPU intensive operation.
Is the CPU over loaded or is just under load?

> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
What machine size?

> there are compactions waiting.
That's normally ok. How many are waiting?

> I thought of adding a call to my repair script, before repair starts to do:
> nodetool setcompactionthroughput 0
> and then when repair finishes call
> nodetool setcompactionthroughput 16
That will remove throttling on compaction and the validation compaction used for the repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do not want to do this. 

Try reducing the compaction throughput to say 12 normally and see the effect.

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/02/2013, at 1:01 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:

> Hi!
> I run repair weekly, using a scheduled cron job.
> During repair I see high CPU consumption, and messages in the log file
> "INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264"
> From time to time, there are also messages of the form
> "INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms"
> 
> Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting.
> 
> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
> I have the default settings:
> compaction_throughput_mb_per_sec: 16
> in_memory_compaction_limit_in_mb: 64
> multithreaded_compaction: false
> compaction_preheat_key_cache: true
> 
> I am thinking on the following solution, and wanted to ask if I am on the right track:
> I thought of adding a call to my repair script, before repair starts to do:
> nodetool setcompactionthroughput 0
> and then when repair finishes call
> nodetool setcompactionthroughput 16
> 
> Is this a right solution?
> Thanks,
> Tamar
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> <tokLogo.png>
> 
> tamar@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
> 
>