You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Matthew Trinneer <ma...@coveritlive.com> on 2012/01/25 15:26:08 UTC

1.0.6 - High CPU troubleshooting

Hello Community,

Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same issue, however it doesn't occur simultaneous on them (although all potentially could be effected at the same time).

Here's what I've been doing to troubleshoot.

* vmstat - have run on all servers and there is *no* swap going on 

* iostat - have run on all servers and there is no indication that the disk is in anyway a bottleneck

* nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).

* Have tracked it down to a specific thread (top -H).  When I find that thread's hex equivalent in a jstack dump I see the following 

	"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534 runnable


Which makes me think that perhaps it's something to do with GC configuration.   Unfortunately I'm not able to determine why this might be happening.  Any suggestions on how to continue troubleshooting?

btw - here's my jvm (just in case)

java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)


Thanks!

Matt

Re: 1.0.6 - High CPU troubleshooting

Posted by Matthew Trinneer <ma...@coveritlive.com>.

Aaron,

Have reduced cache sizes and been monitoring for the past week.  It appears as if this was the culprit - since the changes have not seen a resurfacing.  

For those keeping score at home.

* Had sudden persistent spikes in CPU from the Cassandra java process
* Occurred every 24-48 hours and required a restart to resolve
* Reducing row cache sizes on some of the more active column families (which have wide rows) appears to eliminate the issue.


On 2012-01-25, at 7:49 PM, aaron morton wrote:

> You are running into GC issues. 
> 
>>> WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
> 
> Can you reduce the size of the caches ? 
> 
> As you are under low load, does it correlate with compaction or repair processes ? Check node tool compactioninfo
> 
> Do you have wide rows ? Checlk the max row size with nodetool cfstats. 
> 
> Also, if you have made any changes to the default memory and gc settings try reverting them. 
> 
> 
> Hope that helps. 
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 26/01/2012, at 5:24 AM, Vitalii Tymchyshyn wrote:
> 
>> According to the log, I don't see much time spent for GC. You can still check it with jstat or uncomment GC logging in cassandra-env.sh. Are you sure you've identified the thread correctly?
>> It's still possible that you have memory spike where GCInspector simply has no chance to run between Full GC rounds. Checking with jstat or adding GC logging may help to diagnose.
>> 
>> 25.01.12 17:24, Matthew Trinneer написав(ла):
>>> Here is a snippet of what I'm getting out of system.log for GC.  Anything is there provide a clue?
>>> 
>>>  WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>>>  INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 123) GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 123) GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 123) GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 123) GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 123) GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 123) GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 123) GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; max is 8547991552
>>> 
>>> 
>>> 
>>> On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:
>>> 
>>>> Hello.
>>>> 
>>>> What's in the logs? It should output something like "Hey, you've got most of your memory used. I am going to flush some of memtables". Sorry, I don't remember exact spelling, but it's gong from GC, so it should be greppable by "GC".
>>>> 
>>>> 25.01.12 16:26, Matthew Trinneer написав(ла):
>>>>> Hello Community,
>>>>> 
>>>>> Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same issue, however it doesn't occur simultaneous on them (although all potentially could be effected at the same time).
>>>>> 
>>>>> Here's what I've been doing to troubleshoot.
>>>>> 
>>>>> * vmstat - have run on all servers and there is *no* swap going on
>>>>> 
>>>>> * iostat - have run on all servers and there is no indication that the disk is in anyway a bottleneck
>>>>> 
>>>>> * nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).
>>>>> 
>>>>> * Have tracked it down to a specific thread (top -H).  When I find that thread's hex equivalent in a jstack dump I see the following
>>>>> 
>>>>> 	"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534 runnable
>>>>> 
>>>>> 
>>>>> Which makes me think that perhaps it's something to do with GC configuration.   Unfortunately I'm not able to determine why this might be happening.  Any suggestions on how to continue troubleshooting?
>>>>> 
>>>>> btw - here's my jvm (just in case)
>>>>> 
>>>>> java version "1.6.0_22"
>>>>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>>>>> 
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Matt
>>>>> 
>> 
>

Re: 1.0.6 - High CPU troubleshooting

Posted by Matthew Trinneer <ma...@coveritlive.com>.

Caches might be it.  Will try reducing and see how it goes.  

Didn't mention this because I wasn't seeing the same errors yesterday, but last night thought it might be worth mentioning.

Had changed the location of some data last week, a few days subsequent to that I received some IOException errors for permissions issues.  When that occurred the high CPU usage happened almost simultaneously.  Fixed the permissions issues and restarted.  After that I would run into the high CPU  situation once every 6-24 hours, but not see the IOExceptions any more.  A restart of the node would fix temporarily.  

Yesterday, I did a scrub on each of the nodes before restarting.  So far the issue hasn't returned.   Probably a coincidence, but thought I'd mention anyway.


On 2012-01-25, at 7:49 PM, aaron morton wrote:

> You are running into GC issues. 
> 
>>> WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
> 
> Can you reduce the size of the caches ? 
> 
> As you are under low load, does it correlate with compaction or repair processes ? Check node tool compactioninfo
> 
> Do you have wide rows ? Checlk the max row size with nodetool cfstats. 
> 
> Also, if you have made any changes to the default memory and gc settings try reverting them. 
> 
> 
> Hope that helps. 
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 26/01/2012, at 5:24 AM, Vitalii Tymchyshyn wrote:
> 
>> According to the log, I don't see much time spent for GC. You can still check it with jstat or uncomment GC logging in cassandra-env.sh. Are you sure you've identified the thread correctly?
>> It's still possible that you have memory spike where GCInspector simply has no chance to run between Full GC rounds. Checking with jstat or adding GC logging may help to diagnose.
>> 
>> 25.01.12 17:24, Matthew Trinneer написав(ла):
>>> Here is a snippet of what I'm getting out of system.log for GC.  Anything is there provide a clue?
>>> 
>>>  WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>>>  INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 123) GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 123) GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 123) GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 123) GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 123) GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 123) GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 123) GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; max is 8547991552
>>> 
>>> 
>>> 
>>> On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:
>>> 
>>>> Hello.
>>>> 
>>>> What's in the logs? It should output something like "Hey, you've got most of your memory used. I am going to flush some of memtables". Sorry, I don't remember exact spelling, but it's gong from GC, so it should be greppable by "GC".
>>>> 
>>>> 25.01.12 16:26, Matthew Trinneer написав(ла):
>>>>> Hello Community,
>>>>> 
>>>>> Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same issue, however it doesn't occur simultaneous on them (although all potentially could be effected at the same time).
>>>>> 
>>>>> Here's what I've been doing to troubleshoot.
>>>>> 
>>>>> * vmstat - have run on all servers and there is *no* swap going on
>>>>> 
>>>>> * iostat - have run on all servers and there is no indication that the disk is in anyway a bottleneck
>>>>> 
>>>>> * nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).
>>>>> 
>>>>> * Have tracked it down to a specific thread (top -H).  When I find that thread's hex equivalent in a jstack dump I see the following
>>>>> 
>>>>> 	"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534 runnable
>>>>> 
>>>>> 
>>>>> Which makes me think that perhaps it's something to do with GC configuration.   Unfortunately I'm not able to determine why this might be happening.  Any suggestions on how to continue troubleshooting?
>>>>> 
>>>>> btw - here's my jvm (just in case)
>>>>> 
>>>>> java version "1.6.0_22"
>>>>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>>>>> 
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Matt
>>>>> 
>> 
>

Re: 1.0.6 - High CPU troubleshooting

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.

That's once in few days, so I don't think it's too important. Especially 
since 0.77 is much better than 0.99 I've seen sometimes :)

26.01.12 02:49, aaron morton написав(ла):
> You are running into GC issues.
>
>>> WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java 
>>> (line 146) Heap is 0.7767292149986439 full.  You may need to reduce 
>>> memtable and/or cache sizes.  Cassandra will now flush up to the two 
>>> largest memtables to free up memory.  Adjust 
>>> flush_largest_memtables_at threshold in cassandra.yaml if you don't 
>>> want Cassandra to do this automatically
>
> Can you reduce the size of the caches ?
>
> As you are under low load, does it correlate with compaction or repair 
> processes ? Check node tool compactioninfo
>
> Do you have wide rows ? Checlk the max row size with nodetool cfstats.
>
> Also, if you have made any changes to the default memory and gc 
> settings try reverting them.
>
>

Re: 1.0.6 - High CPU troubleshooting

Posted by aaron morton <aa...@thelastpickle.com>.

You are running into GC issues. 

>> WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically

Can you reduce the size of the caches ? 

As you are under low load, does it correlate with compaction or repair processes ? Check node tool compactioninfo

Do you have wide rows ? Checlk the max row size with nodetool cfstats. 

Also, if you have made any changes to the default memory and gc settings try reverting them. 


Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/01/2012, at 5:24 AM, Vitalii Tymchyshyn wrote:

> According to the log, I don't see much time spent for GC. You can still check it with jstat or uncomment GC logging in cassandra-env.sh. Are you sure you've identified the thread correctly?
> It's still possible that you have memory spike where GCInspector simply has no chance to run between Full GC rounds. Checking with jstat or adding GC logging may help to diagnose.
> 
> 25.01.12 17:24, Matthew Trinneer написав(ла):
>> Here is a snippet of what I'm getting out of system.log for GC.  Anything is there provide a clue?
>> 
>>  WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>>  INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 123) GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 123) GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 123) GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 123) GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 123) GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 123) GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 123) GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 8547991552
>>  INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; max is 8547991552
>> 
>> 
>> 
>> On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:
>> 
>>> Hello.
>>> 
>>> What's in the logs? It should output something like "Hey, you've got most of your memory used. I am going to flush some of memtables". Sorry, I don't remember exact spelling, but it's gong from GC, so it should be greppable by "GC".
>>> 
>>> 25.01.12 16:26, Matthew Trinneer написав(ла):
>>>> Hello Community,
>>>> 
>>>> Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same issue, however it doesn't occur simultaneous on them (although all potentially could be effected at the same time).
>>>> 
>>>> Here's what I've been doing to troubleshoot.
>>>> 
>>>> * vmstat - have run on all servers and there is *no* swap going on
>>>> 
>>>> * iostat - have run on all servers and there is no indication that the disk is in anyway a bottleneck
>>>> 
>>>> * nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).
>>>> 
>>>> * Have tracked it down to a specific thread (top -H).  When I find that thread's hex equivalent in a jstack dump I see the following
>>>> 
>>>> 	"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534 runnable
>>>> 
>>>> 
>>>> Which makes me think that perhaps it's something to do with GC configuration.   Unfortunately I'm not able to determine why this might be happening.  Any suggestions on how to continue troubleshooting?
>>>> 
>>>> btw - here's my jvm (just in case)
>>>> 
>>>> java version "1.6.0_22"
>>>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>>>> 
>>>> 
>>>> Thanks!
>>>> 
>>>> Matt
>>>> 
>

Re: 1.0.6 - High CPU troubleshooting

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.

According to the log, I don't see much time spent for GC. You can still 
check it with jstat or uncomment GC logging in cassandra-env.sh. Are you 
sure you've identified the thread correctly?
It's still possible that you have memory spike where GCInspector simply 
has no chance to run between Full GC rounds. Checking with jstat or 
adding GC logging may help to diagnose.

25.01.12 17:24, Matthew Trinneer написав(ла):
> Here is a snippet of what I'm getting out of system.log for GC.  Anything is there provide a clue?
>
>   WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>   INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 123) GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 123) GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 123) GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 123) GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 123) GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 123) GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 123) GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 8547991552
>   INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; max is 8547991552
>
>
>
> On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:
>
>> Hello.
>>
>> What's in the logs? It should output something like "Hey, you've got most of your memory used. I am going to flush some of memtables". Sorry, I don't remember exact spelling, but it's gong from GC, so it should be greppable by "GC".
>>
>> 25.01.12 16:26, Matthew Trinneer написав(ла):
>>> Hello Community,
>>>
>>> Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same issue, however it doesn't occur simultaneous on them (although all potentially could be effected at the same time).
>>>
>>> Here's what I've been doing to troubleshoot.
>>>
>>> * vmstat - have run on all servers and there is *no* swap going on
>>>
>>> * iostat - have run on all servers and there is no indication that the disk is in anyway a bottleneck
>>>
>>> * nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).
>>>
>>> * Have tracked it down to a specific thread (top -H).  When I find that thread's hex equivalent in a jstack dump I see the following
>>>
>>> 	"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534 runnable
>>>
>>>
>>> Which makes me think that perhaps it's something to do with GC configuration.   Unfortunately I'm not able to determine why this might be happening.  Any suggestions on how to continue troubleshooting?
>>>
>>> btw - here's my jvm (just in case)
>>>
>>> java version "1.6.0_22"
>>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>>>
>>>
>>> Thanks!
>>>
>>> Matt
>>>

Re: 1.0.6 - High CPU troubleshooting

Posted by Matthew Trinneer <ma...@coveritlive.com>.

Here is a snippet of what I'm getting out of system.log for GC.  Anything is there provide a clue?

 WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 123) GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 123) GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 123) GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 123) GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 123) GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 123) GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 123) GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 8547991552
 INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; max is 8547991552



On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:

> Hello.
> 
> What's in the logs? It should output something like "Hey, you've got most of your memory used. I am going to flush some of memtables". Sorry, I don't remember exact spelling, but it's gong from GC, so it should be greppable by "GC".
> 
> 25.01.12 16:26, Matthew Trinneer написав(ла):
>> Hello Community,
>> 
>> Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same issue, however it doesn't occur simultaneous on them (although all potentially could be effected at the same time).
>> 
>> Here's what I've been doing to troubleshoot.
>> 
>> * vmstat - have run on all servers and there is *no* swap going on
>> 
>> * iostat - have run on all servers and there is no indication that the disk is in anyway a bottleneck
>> 
>> * nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).
>> 
>> * Have tracked it down to a specific thread (top -H).  When I find that thread's hex equivalent in a jstack dump I see the following
>> 
>> 	"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534 runnable
>> 
>> 
>> Which makes me think that perhaps it's something to do with GC configuration.   Unfortunately I'm not able to determine why this might be happening.  Any suggestions on how to continue troubleshooting?
>> 
>> btw - here's my jvm (just in case)
>> 
>> java version "1.6.0_22"
>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>> 
>> 
>> Thanks!
>> 
>> Matt
>> 
>

Re: 1.0.6 - High CPU troubleshooting

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.

Hello.

What's in the logs? It should output something like "Hey, you've got 
most of your memory used. I am going to flush some of memtables". Sorry, 
I don't remember exact spelling, but it's gong from GC, so it should be 
greppable by "GC".

25.01.12 16:26, Matthew Trinneer написав(ла):
> Hello Community,
>
> Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 node cluster.  This takes place when there is minimal load on the servers, and continues indefinitely until I stop and restart a node.  All nodes (3) seem to be effected by the same issue, however it doesn't occur simultaneous on them (although all potentially could be effected at the same time).
>
> Here's what I've been doing to troubleshoot.
>
> * vmstat - have run on all servers and there is *no* swap going on
>
> * iostat - have run on all servers and there is no indication that the disk is in anyway a bottleneck
>
> * nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).
>
> * Have tracked it down to a specific thread (top -H).  When I find that thread's hex equivalent in a jstack dump I see the following
>
> 	"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 nid=0x4534 runnable
>
>
> Which makes me think that perhaps it's something to do with GC configuration.   Unfortunately I'm not able to determine why this might be happening.  Any suggestions on how to continue troubleshooting?
>
> btw - here's my jvm (just in case)
>
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>
>
> Thanks!
>
> Matt
>