You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Hiller, Dean" <De...@nrel.gov> on 2013/02/20 15:10:22 UTC
how to debug slowdowns from these log snippets(we know the keys
being accessed as well)
Cassandra version 1.1.4
I captured all the logs of node causing timeouts (in a 6 node cluster). We seem to get these slowdowns every once in a while and it causes our whole website to be 10 times slower. Since PlayOrm actually logs the rows being accessed we know exactly which row the timeout occurred on and asytanax logged the node which was nice.
Each time we get this website being 10 times slower, this is in the logs
WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line 145) Heap is 0.8532751700072093 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java (line 2855) Flushing CFS(Keyspace='databus5', ColumnFamily='IntegerIndice') to relieve memory pressure
I seem to remember reading about we should turn off swap which I have not gotten around to at this point :(…will that help in that the node is taken out of the cluster or? As you can see from top, this node has been running a 5,000,000 hours below(What's up with that….maybe the h does not actually mean hours) though I know it has been running for months.
Here is a top command….
top - 07:02:37 up 148 days, 21:11, 1 user, load average: 1.10, 1.09, 1.09
Tasks: 379 total, 2 running, 377 sleeping, 0 stopped, 0 zombie
Cpu(s): 15.9%us, 1.0%sy, 2.9%ni, 79.9%id, 0.1%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 32854680k total, 32582688k used, 271992k free, 44k buffers
Swap: 33554424k total, 52180k used, 33502244k free, 22653528k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
701 cassandr 20 0 63.3g 10g 1.9g S 838.4 32.2 5125794h java
What can I dump to see why we need to "relieve memory pressure"?
I ran jmap like so but it can't attach for some reason
[cassandra@a4 ~]$ jmap 701 -heap:format=b
Attaching to core -heap:format=b from executable 701, please wait...
Error attaching to core file: Can't attach to the core file
Jstack is not helping much either ….
[cassandra@a4 ~]$ jstack -l 701 > threads.txt
701: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
[cassandra@a4 ~]$ jstack -F -l 701 > threads.txt
Attaching to process ID 701, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.7-b02
java.lang.NullPointerException
at sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426)
at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
at sun.tools.jstack.JStack.main(JStack.java:84)
Thanks,
Dean
Re: how to debug slowdowns from these log snippets-more info 2
Posted by aaron morton <aa...@thelastpickle.com>.
Some things to consider:
Check for contention around the switch lock. This can happen if you get a lot of tables flushing at the same time, or if you have a lot of secondary indexes. It shows up as a pattern in the logs. As soon a the writer starts flushing a memtable another will be queued. Probably not happening here but can be a pain when a lot of memtables are flushed.
I would turn on GC logging in cassandra-env.sh and watch that. After a full CMS flush how full / empty is the tenured heap ? If it is still got a lot in it then you are running with too much cache / bloom filter / index sampling.
You can also experiment with the Max Tenuring Threshold, try turning it up to 4 to start with. The GC logs will show you how much data is at each tenuring level. You can then see how much data is being tenuring, and if premature tenuring was an issue. I've seen premature tenuring cause issues with wide rows / long reads.
Hope that helps.
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 21/02/2013, at 4:35 AM, "Hiller, Dean" <De...@nrel.gov> wrote:
> Oh, and my startup command that cassandra logged was
>
> a2.bigde.nrel.gov: xss = -ea -javaagent:/opt/cassandra/lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8021M -Xmx8021M
> -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
>
> And I remember from docs you don't want to go above 8G or java GC doesn't
> work out so well. I am not sure why this is not working out though.
>
> Dean
>
> On 2/20/13 7:16 AM, "Hiller, Dean" <De...@nrel.gov> wrote:
>
>> Here is the printout before that log which is probably important as
>> wellŠ..
>>
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 GCInspector.java (line
>> 122) GC for ConcurrentMarkSweep: 3618 ms for 2 collections, 7038159096
>> used; max is 8243904512
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
>> 57) Pool Name Active Pending Blocked
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
>> 72) ReadStage 11 264 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) RequestResponseStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) ReadRepairStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) MutationStage 12 88 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) ReplicateOnWriteStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) GossipStage 1 7 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) AntiEntropyStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) MigrationStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) StreamStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) MemtablePostFlusher 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) FlushWriter 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) MiscStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) commitlog_archiver 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 72) InternalResponseStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 72) HintedHandoff 0 0 0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 77) CompactionManager 4 5
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 89) MessagingService n/a 10,127
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 99) Cache Type Size Capacity
>> KeysToSave
>> Provider
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 100) KeyCache 1310719 1310719
>> all
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 106) RowCache 0 0
>> all
>> org.apache.cassandra.cache.SerializingCacheProvider
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 113) ColumnFamily Memtable ops,data
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.ServersThatLog 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.DecimalIndice 971,885311
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.EntityGroupXref 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.IntegerIndice 3256,620947
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.KeyToTableName 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.SdiTable 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.DboTableMeta 339,41537
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.nreldata 4528,3261763
>> INFO [GossipTasks:1] 2013-02-20 07:14:00,375 Gossiper.java (line 830)
>> InetAddress /10.10.43.5 is now dead.
>> INFO [GossipTasks:1] 2013-02-20 07:14:00,380 Gossiper.java (line 830)
>> InetAddress /10.10.43.8 is now dead.
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
>> 116) databus5.DboColumnMeta 809,112672
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
>> 116) databus5.DboDatabaseMeta 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.WebNodeDbo 2310,4470
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.User 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.RoleMapping 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.LogEvent 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.SecureResourceGroupXref 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.SecurityGroup 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.StreamAggregation 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.AppProperty 0,0
>> INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
>> InetAddress /10.10.43.3 is now dead.
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.StringIndice 290,27231
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.MonitorDbo 75,2983
>> INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
>> InetAddress /10.10.43.7 is now dead.
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.SdiColumn 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.NodeIdInfo 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.IndexInfo 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.LocationInfo 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.Versions 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.schema_keyspaces 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.Migrations 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.schema_columnfamilies 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.schema_columns 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
>> 116) system.HintsColumnFamily 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
>> 116) system.Schema 0,0
>> WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 GCInspector.java (line
>> 145) Heap is 0.8537409774403752 full. You may need to reduce memtable
>> and/or cache sizes. Cassandra will now flush up to the two largest
>> memtables to free up memory. Adjust flush_largest_memtables_at threshold
>> in cassandra.yaml if you don't want Cassandra to do this automatically
>> WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 StorageService.java (line
>> 2855) Flushing CFS(Keyspace='databus5', ColumnFamily='nreldata') to
>> relieve memory pressure
>>
>>
>>
>> On 2/20/13 7:10 AM, "Hiller, Dean" <De...@nrel.gov> wrote:
>>
>>> Cassandra version 1.1.4
>>>
>>> I captured all the logs of node causing timeouts (in a 6 node cluster).
>>> We seem to get these slowdowns every once in a while and it causes our
>>> whole website to be 10 times slower. Since PlayOrm actually logs the
>>> rows being accessed we know exactly which row the timeout occurred on and
>>> asytanax logged the node which was nice.
>>>
>>> Each time we get this website being 10 times slower, this is in the logs
>>>
>>> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line
>>> 145) Heap is 0.8532751700072093 full. You may need to reduce memtable
>>> and/or cache sizes. Cassandra will now flush up to the two largest
>>> memtables to free up memory. Adjust flush_largest_memtables_at threshold
>>> in cassandra.yaml if you don't want Cassandra to do this automatically
>>> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java
>>> (line 2855) Flushing CFS(Keyspace='databus5',
>>> ColumnFamily='IntegerIndice') to relieve memory pressure
>>>
>>> I seem to remember reading about we should turn off swap which I have not
>>> gotten around to at this point :(Šwill that help in that the node is
>>> taken out of the cluster or? As you can see from top, this node has been
>>> running a 5,000,000 hours below(What's up with thatŠ.maybe the h does not
>>> actually mean hours) though I know it has been running for months.
>>>
>>> Here is a top commandŠ.
>>>
>>> top - 07:02:37 up 148 days, 21:11, 1 user, load average: 1.10, 1.09,
>>> 1.09
>>> Tasks: 379 total, 2 running, 377 sleeping, 0 stopped, 0 zombie
>>> Cpu(s): 15.9%us, 1.0%sy, 2.9%ni, 79.9%id, 0.1%wa, 0.0%hi, 0.2%si,
>>> 0.0%st
>>> Mem: 32854680k total, 32582688k used, 271992k free, 44k buffers
>>> Swap: 33554424k total, 52180k used, 33502244k free, 22653528k cached
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 701 cassandr 20 0 63.3g 10g 1.9g S 838.4 32.2 5125794h java
>>>
>>> What can I dump to see why we need to "relieve memory pressure"?
>>>
>>> I ran jmap like so but it can't attach for some reason
>>>
>>> [cassandra@a4 ~]$ jmap 701 -heap:format=b
>>> Attaching to core -heap:format=b from executable 701, please wait...
>>> Error attaching to core file: Can't attach to the core file
>>>
>>> Jstack is not helping much either Š.
>>>
>>> [cassandra@a4 ~]$ jstack -l 701 > threads.txt
>>> 701: Unable to open socket file: target process not responding or HotSpot
>>> VM not loaded
>>> The -F option can be used when the target process is not responding
>>> [cassandra@a4 ~]$ jstack -F -l 701 > threads.txt
>>> Attaching to process ID 701, please wait...
>>> Debugger attached successfully.
>>> Server compiler detected.
>>> JVM version is 20.7-b02
>>> java.lang.NullPointerException
>>> at
>>> sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:42
>>> 6
>>> )
>>> at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
>>> at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
>>> at
>>> sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
>>> at
>>> sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>>> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>>> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>>> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>>> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>>> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
>>> :
>>> 39)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
>>> m
>>> pl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
>>> at sun.tools.jstack.JStack.main(JStack.java:84)
>>>
>>> Thanks,
>>> Dean
>>
>
Re: how to debug slowdowns from these log snippets-more info 2
Posted by "Hiller, Dean" <De...@nrel.gov>.
Oh, and my startup command that cassandra logged was
a2.bigde.nrel.gov: xss = -ea -javaagent:/opt/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8021M -Xmx8021M
-Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
And I remember from docs you don't want to go above 8G or java GC doesn't
work out so well. I am not sure why this is not working out though.
Dean
On 2/20/13 7:16 AM, "Hiller, Dean" <De...@nrel.gov> wrote:
>Here is the printout before that log which is probably important as
>wellŠ..
>
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 GCInspector.java (line
>122) GC for ConcurrentMarkSweep: 3618 ms for 2 collections, 7038159096
>used; max is 8243904512
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
>57) Pool Name Active Pending Blocked
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
>72) ReadStage 11 264 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) RequestResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) ReadRepairStage 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) MutationStage 12 88 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) ReplicateOnWriteStage 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) GossipStage 1 7 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) AntiEntropyStage 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) MigrationStage 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) StreamStage 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) MemtablePostFlusher 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) FlushWriter 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) MiscStage 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) commitlog_archiver 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>72) InternalResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>72) HintedHandoff 0 0 0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>77) CompactionManager 4 5
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>89) MessagingService n/a 10,127
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>99) Cache Type Size Capacity
> KeysToSave
>Provider
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>100) KeyCache 1310719 1310719
> all
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>106) RowCache 0 0
> all
>org.apache.cassandra.cache.SerializingCacheProvider
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>113) ColumnFamily Memtable ops,data
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.ServersThatLog 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.DecimalIndice 971,885311
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.EntityGroupXref 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.IntegerIndice 3256,620947
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.KeyToTableName 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.SdiTable 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.DboTableMeta 339,41537
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.nreldata 4528,3261763
> INFO [GossipTasks:1] 2013-02-20 07:14:00,375 Gossiper.java (line 830)
>InetAddress /10.10.43.5 is now dead.
> INFO [GossipTasks:1] 2013-02-20 07:14:00,380 Gossiper.java (line 830)
>InetAddress /10.10.43.8 is now dead.
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
>116) databus5.DboColumnMeta 809,112672
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
>116) databus5.DboDatabaseMeta 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.WebNodeDbo 2310,4470
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.User 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.RoleMapping 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.LogEvent 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.SecureResourceGroupXref 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.SecurityGroup 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.StreamAggregation 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.AppProperty 0,0
> INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
>InetAddress /10.10.43.3 is now dead.
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.StringIndice 290,27231
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.MonitorDbo 75,2983
> INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
>InetAddress /10.10.43.7 is now dead.
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.SdiColumn 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.NodeIdInfo 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.IndexInfo 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.LocationInfo 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.Versions 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.schema_keyspaces 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.Migrations 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.schema_columnfamilies 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.schema_columns 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
>116) system.HintsColumnFamily 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
>116) system.Schema 0,0
> WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 GCInspector.java (line
>145) Heap is 0.8537409774403752 full. You may need to reduce memtable
>and/or cache sizes. Cassandra will now flush up to the two largest
>memtables to free up memory. Adjust flush_largest_memtables_at threshold
>in cassandra.yaml if you don't want Cassandra to do this automatically
> WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 StorageService.java (line
>2855) Flushing CFS(Keyspace='databus5', ColumnFamily='nreldata') to
>relieve memory pressure
>
>
>
>On 2/20/13 7:10 AM, "Hiller, Dean" <De...@nrel.gov> wrote:
>
>>Cassandra version 1.1.4
>>
>>I captured all the logs of node causing timeouts (in a 6 node cluster).
>>We seem to get these slowdowns every once in a while and it causes our
>>whole website to be 10 times slower. Since PlayOrm actually logs the
>>rows being accessed we know exactly which row the timeout occurred on and
>>asytanax logged the node which was nice.
>>
>>Each time we get this website being 10 times slower, this is in the logs
>>
>> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line
>>145) Heap is 0.8532751700072093 full. You may need to reduce memtable
>>and/or cache sizes. Cassandra will now flush up to the two largest
>>memtables to free up memory. Adjust flush_largest_memtables_at threshold
>>in cassandra.yaml if you don't want Cassandra to do this automatically
>> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java
>>(line 2855) Flushing CFS(Keyspace='databus5',
>>ColumnFamily='IntegerIndice') to relieve memory pressure
>>
>>I seem to remember reading about we should turn off swap which I have not
>>gotten around to at this point :(Šwill that help in that the node is
>>taken out of the cluster or? As you can see from top, this node has been
>>running a 5,000,000 hours below(What's up with thatŠ.maybe the h does not
>>actually mean hours) though I know it has been running for months.
>>
>>Here is a top commandŠ.
>>
>>top - 07:02:37 up 148 days, 21:11, 1 user, load average: 1.10, 1.09,
>>1.09
>>Tasks: 379 total, 2 running, 377 sleeping, 0 stopped, 0 zombie
>>Cpu(s): 15.9%us, 1.0%sy, 2.9%ni, 79.9%id, 0.1%wa, 0.0%hi, 0.2%si,
>>0.0%st
>>Mem: 32854680k total, 32582688k used, 271992k free, 44k buffers
>>Swap: 33554424k total, 52180k used, 33502244k free, 22653528k cached
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 701 cassandr 20 0 63.3g 10g 1.9g S 838.4 32.2 5125794h java
>>
>>What can I dump to see why we need to "relieve memory pressure"?
>>
>>I ran jmap like so but it can't attach for some reason
>>
>>[cassandra@a4 ~]$ jmap 701 -heap:format=b
>>Attaching to core -heap:format=b from executable 701, please wait...
>>Error attaching to core file: Can't attach to the core file
>>
>>Jstack is not helping much either Š.
>>
>>[cassandra@a4 ~]$ jstack -l 701 > threads.txt
>>701: Unable to open socket file: target process not responding or HotSpot
>>VM not loaded
>>The -F option can be used when the target process is not responding
>>[cassandra@a4 ~]$ jstack -F -l 701 > threads.txt
>>Attaching to process ID 701, please wait...
>>Debugger attached successfully.
>>Server compiler detected.
>>JVM version is 20.7-b02
>>java.lang.NullPointerException
>>at
>>sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:42
>>6
>>)
>>at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
>>at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
>>at
>>sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
>>at
>>sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>>at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>>at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>>at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>>at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>>at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>at
>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
>>:
>>39)
>>at
>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
>>m
>>pl.java:25)
>>at java.lang.reflect.Method.invoke(Method.java:597)
>>at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
>>at sun.tools.jstack.JStack.main(JStack.java:84)
>>
>>Thanks,
>>Dean
>
Re: how to debug slowdowns from these log snippets-more info
Posted by "Hiller, Dean" <De...@nrel.gov>.
Here is the printout before that log which is probably important as wellŠ..
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 GCInspector.java (line
122) GC for ConcurrentMarkSweep: 3618 ms for 2 collections, 7038159096
used; max is 8243904512
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
57) Pool Name Active Pending Blocked
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
72) ReadStage 11 264 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
72) RequestResponseStage 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
72) ReadRepairStage 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
72) MutationStage 12 88 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
72) ReplicateOnWriteStage 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
72) GossipStage 1 7 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
72) AntiEntropyStage 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
72) MigrationStage 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
72) StreamStage 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
72) MemtablePostFlusher 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
72) FlushWriter 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
72) MiscStage 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
72) commitlog_archiver 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
72) InternalResponseStage 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
72) HintedHandoff 0 0 0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
77) CompactionManager 4 5
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
89) MessagingService n/a 10,127
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
99) Cache Type Size Capacity
KeysToSave
Provider
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
100) KeyCache 1310719 1310719
all
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
106) RowCache 0 0
all
org.apache.cassandra.cache.SerializingCacheProvider
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
113) ColumnFamily Memtable ops,data
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
116) databus5.ServersThatLog 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
116) databus5.DecimalIndice 971,885311
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
116) databus5.EntityGroupXref 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
116) databus5.IntegerIndice 3256,620947
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
116) databus5.KeyToTableName 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
116) databus5.SdiTable 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
116) databus5.DboTableMeta 339,41537
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
116) databus5.nreldata 4528,3261763
INFO [GossipTasks:1] 2013-02-20 07:14:00,375 Gossiper.java (line 830)
InetAddress /10.10.43.5 is now dead.
INFO [GossipTasks:1] 2013-02-20 07:14:00,380 Gossiper.java (line 830)
InetAddress /10.10.43.8 is now dead.
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
116) databus5.DboColumnMeta 809,112672
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
116) databus5.DboDatabaseMeta 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
116) databus5.WebNodeDbo 2310,4470
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
116) databus5.User 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
116) databus5.RoleMapping 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
116) databus5.LogEvent 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
116) databus5.SecureResourceGroupXref 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
116) databus5.SecurityGroup 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
116) databus5.StreamAggregation 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
116) databus5.AppProperty 0,0
INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
InetAddress /10.10.43.3 is now dead.
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
116) databus5.StringIndice 290,27231
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
116) databus5.MonitorDbo 75,2983
INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
InetAddress /10.10.43.7 is now dead.
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
116) databus5.SdiColumn 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
116) system.NodeIdInfo 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
116) system.IndexInfo 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
116) system.LocationInfo 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
116) system.Versions 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
116) system.schema_keyspaces 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
116) system.Migrations 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
116) system.schema_columnfamilies 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
116) system.schema_columns 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
116) system.HintsColumnFamily 0,0
INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
116) system.Schema 0,0
WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 GCInspector.java (line
145) Heap is 0.8537409774403752 full. You may need to reduce memtable
and/or cache sizes. Cassandra will now flush up to the two largest
memtables to free up memory. Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 StorageService.java (line
2855) Flushing CFS(Keyspace='databus5', ColumnFamily='nreldata') to
relieve memory pressure
On 2/20/13 7:10 AM, "Hiller, Dean" <De...@nrel.gov> wrote:
>Cassandra version 1.1.4
>
>I captured all the logs of node causing timeouts (in a 6 node cluster).
>We seem to get these slowdowns every once in a while and it causes our
>whole website to be 10 times slower. Since PlayOrm actually logs the
>rows being accessed we know exactly which row the timeout occurred on and
>asytanax logged the node which was nice.
>
>Each time we get this website being 10 times slower, this is in the logs
>
> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line
>145) Heap is 0.8532751700072093 full. You may need to reduce memtable
>and/or cache sizes. Cassandra will now flush up to the two largest
>memtables to free up memory. Adjust flush_largest_memtables_at threshold
>in cassandra.yaml if you don't want Cassandra to do this automatically
> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java
>(line 2855) Flushing CFS(Keyspace='databus5',
>ColumnFamily='IntegerIndice') to relieve memory pressure
>
>I seem to remember reading about we should turn off swap which I have not
>gotten around to at this point :(Šwill that help in that the node is
>taken out of the cluster or? As you can see from top, this node has been
>running a 5,000,000 hours below(What's up with thatŠ.maybe the h does not
>actually mean hours) though I know it has been running for months.
>
>Here is a top commandŠ.
>
>top - 07:02:37 up 148 days, 21:11, 1 user, load average: 1.10, 1.09,
>1.09
>Tasks: 379 total, 2 running, 377 sleeping, 0 stopped, 0 zombie
>Cpu(s): 15.9%us, 1.0%sy, 2.9%ni, 79.9%id, 0.1%wa, 0.0%hi, 0.2%si,
>0.0%st
>Mem: 32854680k total, 32582688k used, 271992k free, 44k buffers
>Swap: 33554424k total, 52180k used, 33502244k free, 22653528k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 701 cassandr 20 0 63.3g 10g 1.9g S 838.4 32.2 5125794h java
>
>What can I dump to see why we need to "relieve memory pressure"?
>
>I ran jmap like so but it can't attach for some reason
>
>[cassandra@a4 ~]$ jmap 701 -heap:format=b
>Attaching to core -heap:format=b from executable 701, please wait...
>Error attaching to core file: Can't attach to the core file
>
>Jstack is not helping much either Š.
>
>[cassandra@a4 ~]$ jstack -l 701 > threads.txt
>701: Unable to open socket file: target process not responding or HotSpot
>VM not loaded
>The -F option can be used when the target process is not responding
>[cassandra@a4 ~]$ jstack -F -l 701 > threads.txt
>Attaching to process ID 701, please wait...
>Debugger attached successfully.
>Server compiler detected.
>JVM version is 20.7-b02
>java.lang.NullPointerException
>at
>sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426
>)
>at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
>at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
>at
>sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
>at
>sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>39)
>at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
>pl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
>at sun.tools.jstack.JStack.main(JStack.java:84)
>
>Thanks,
>Dean