You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Shuo Chen <ch...@gmail.com> on 2015/11/13 07:35:35 UTC
too many full gc in one node of the cluster
Hi,
We have a small cassandra cluster with 4 nodes for production. All the
nodes have similar hardware configuration and similar data load. The C*
version is 1.0.7 (prretty old)
One of the node has much higher cpu usage than others and high full gc
frequency, but the io of this node is not high and data load of this node
is even lower. So I have several questions:
1. Is that normal that one of the node having much higher full gc with same
jvm configuration?
2. Does this node need special gc tuning and how?
3. How to find the cause of the full gc?
Thank you guys!
The heap size is 8G and max heap size is 16G. The gc config of
cassandra-env.sh is default:
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
---------------------
I print instance in the gc log:
num #instances #bytes class name
----------------------------------------------
1: 2982796 238731200 [B
2: 3889672 186704256 java.nio.HeapByteBuffer
3: 1749589 55986848 org.apache.cassandra.db.Column
4: 1803900 43293600
java.util.concurrent.ConcurrentSkipListMap$Node
5: 859496 20627904
java.util.concurrent.ConcurrentSkipListMap$Index
6: 5568 18827912 [J
7: 162630 6505200 java.math.BigInteger
8: 167572 5716976 [I
9: 141698 4534336
java.util.concurrent.ConcurrentHashMap$HashEntry
10: 141505 4528160
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
11: 31491 4376976 <constMethodKlass>
12: 31491 4291992 <methodKlass>
13: 171695 4120680 org.apache.cassandra.db.DecoratedKey
14: 3157 3436120 <constantPoolKlass>
15: 141784 3402816 java.lang.Long
16: 141624 3398976 org.apache.cassandra.utils.Pair
17: 141505 3396120
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
18: 49604 2675352 <symbolKlass>
19: 162254 2596064 org.apache.cassandra.dht.BigIntegerToken
......
Total 13337798 641834360
----------------------
The gc part and thread status of system log:
INFO [ScheduledTasks:1] 2015-11-13 14:22:08,681 GCInspector.java (line 123)
GC for ParNew: 1015 ms for 2 collections, 3886753520 used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:09,683 GCInspector.java (line
123) GC for ParNew: 500 ms for 1 collections, 4956287408 used; max is
8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:10,685 GCInspector.java (line
123) GC for ParNew: 627 ms for 1 collections, 5615882296used; max is
8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:12,015 GCInspector.java (line
123) GC for ParNew: 988 ms for 2 collections, 4943363480 used; max is
8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:13,016 GCInspector.java (line
123) GC for ParNew: 373 ms for 1 collections, 5978572832 used; max is
8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:14,020 GCInspector.java (line
123) GC for ParNew: 486 ms for 1 collections, 6209638280used; max is
8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:15,412 GCInspector.java (line
123) GC for ParNew: 898 ms for 2 collections, 6045603728used; max is
8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:16,413 GCInspector.java (line
123) GC for ParNew: 503 ms for 1 collections, 6991263984 used; max is
8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:17,416 GCInspector.java (line
123) GC for ParNew: 746 ms for 1 collections, 7073467384used; max is
8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,363 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 843 ms for 2 collections, 1130423160 used;
max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 MessagingService.java
(line 603) 4198 READ messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 StatusLogger.java (line
50) Pool Name Active Pending Blocked
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,368 StatusLogger.java (line
65) ReadStage 32 450 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,370 StatusLogger.java (line
65) RequestResponseStage 0 18 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,371 StatusLogger.java (line
65) ReadRepairStage 0 3 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,372 StatusLogger.java (line
65) MutationStage 2 343 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,373 StatusLogger.java (line
65) ReplicateOnWriteStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,374 StatusLogger.java (line
65) GossipStage 0 3 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,375 StatusLogger.java (line
65) AntiEntropyStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,396 StatusLogger.java (line
65) MigrationStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
65) StreamStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
65) MemtablePostFlusher 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
65) FlushWriter 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
65) MiscStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
65) InternalResponseStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,434 StatusLogger.java (line
65) HintedHandoff 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,435 StatusLogger.java (line
69) CompactionManager n/a 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,436 StatusLogger.java (line
81) MessagingService n/a 0,67
--
*陈硕* *Shuo Chen*
chenatu2006@gmail.com
chenshuo@whaty.com
Re: too many full gc in one node of the cluster
Posted by Jeff Jirsa <je...@crowdstrike.com>.
What Kai Wang is hinting at: one of the common tuning problems people face is they follow the advice in cassandra-env.sh, which says 100M of young gen space (Xmn) per core. Many people find that insufficient – raising that to be 30,40, or 50% of heap size (Xmx) MAY help keep short-lived objects (especially from the read path) in young gen rather than promoted to old gen, so they’ll be dropped quickly rather than promoted/cleaned with stop-the-world phase of CMS.
It’s often the case that increasing Xmn and setting XX:MaxTenuringThreshold=6 or 8 will reduce the number of long CMS pauses.
- Jeff
From: Kai Wang
Reply-To: "user@cassandra.apache.org"
Date: Friday, November 13, 2015 at 5:35 AM
To: "user@cassandra.apache.org"
Subject: Re: too many full gc in one node of the cluster
What's the size of young generation (-Xmn) ?
On Fri, Nov 13, 2015 at 6:38 AM, Jason Wee <pe...@gmail.com> wrote:
Used to manage/develop for cassandra 1.0.8 for quite sometime. Although 1.0 was rocking stable but we encountered various problems as load per node grow beyond 500gb. upgrading is one of the solution but may not be the solution for you but I strongly recommend you upgrade to 1.1 or 1.2. we upgraded the java on the cassandra node and cassandra to 1.1 and a lot of problems went away.
As for your use cases, a quick solution would probably to just add nodes, or study client reading pattern so not on a node hot row (the has on the key), or the client configuration on your application and/or the keyspace replication.
hth,
jason
On Fri, Nov 13, 2015 at 2:35 PM, Shuo Chen <ch...@gmail.com> wrote:
Hi,
We have a small cassandra cluster with 4 nodes for production. All the nodes have similar hardware configuration and similar data load. The C* version is 1.0.7 (prretty old)
One of the node has much higher cpu usage than others and high full gc frequency, but the io of this node is not high and data load of this node is even lower. So I have several questions:
1. Is that normal that one of the node having much higher full gc with same jvm configuration?
2. Does this node need special gc tuning and how?
3. How to find the cause of the full gc?
Thank you guys!
The heap size is 8G and max heap size is 16G. The gc config of cassandra-env.sh is default:
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
---------------------
I print instance in the gc log:
num #instances #bytes class name
----------------------------------------------
1: 2982796 238731200 [B
2: 3889672 186704256 java.nio.HeapByteBuffer
3: 1749589 55986848 org.apache.cassandra.db.Column
4: 1803900 43293600 java.util.concurrent.ConcurrentSkipListMap$Node
5: 859496 20627904 java.util.concurrent.ConcurrentSkipListMap$Index
6: 5568 18827912 [J
7: 162630 6505200 java.math.BigInteger
8: 167572 5716976 [I
9: 141698 4534336 java.util.concurrent.ConcurrentHashMap$HashEntry
10: 141505 4528160 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
11: 31491 4376976 <constMethodKlass>
12: 31491 4291992 <methodKlass>
13: 171695 4120680 org.apache.cassandra.db.DecoratedKey
14: 3157 3436120 <constantPoolKlass>
15: 141784 3402816 java.lang.Long
16: 141624 3398976 org.apache.cassandra.utils.Pair
17: 141505 3396120 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
18: 49604 2675352 <symbolKlass>
19: 162254 2596064 org.apache.cassandra.dht.BigIntegerToken
......
Total 13337798 641834360
----------------------
The gc part and thread status of system log:
INFO [ScheduledTasks:1] 2015-11-13 14:22:08,681 GCInspector.java (line 123) GC for ParNew: 1015 ms for 2 collections, 3886753520 used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:09,683 GCInspector.java (line 123) GC for ParNew: 500 ms for 1 collections, 4956287408 used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:10,685 GCInspector.java (line 123) GC for ParNew: 627 ms for 1 collections, 5615882296used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:12,015 GCInspector.java (line 123) GC for ParNew: 988 ms for 2 collections, 4943363480 used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:13,016 GCInspector.java (line 123) GC for ParNew: 373 ms for 1 collections, 5978572832 used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:14,020 GCInspector.java (line 123) GC for ParNew: 486 ms for 1 collections, 6209638280used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:15,412 GCInspector.java (line 123) GC for ParNew: 898 ms for 2 collections, 6045603728used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:16,413 GCInspector.java (line 123) GC for ParNew: 503 ms for 1 collections, 6991263984 used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:17,416 GCInspector.java (line 123) GC for ParNew: 746 ms for 1 collections, 7073467384used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,363 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 843 ms for 2 collections, 1130423160 used; max is 8231321600
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 MessagingService.java (line 603) 4198 READ messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 StatusLogger.java (line 50) Pool Name Active Pending Blocked
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,368 StatusLogger.java (line 65) ReadStage 32 450 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,370 StatusLogger.java (line 65) RequestResponseStage 0 18 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,371 StatusLogger.java (line 65) ReadRepairStage 0 3 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,372 StatusLogger.java (line 65) MutationStage 2 343 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,373 StatusLogger.java (line 65) ReplicateOnWriteStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,374 StatusLogger.java (line 65) GossipStage 0 3 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,375 StatusLogger.java (line 65) AntiEntropyStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,396 StatusLogger.java (line 65) MigrationStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line 65) StreamStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line 65) MemtablePostFlusher 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line 65) FlushWriter 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line 65) MiscStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line 65) InternalResponseStage 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,434 StatusLogger.java (line 65) HintedHandoff 0 0 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,435 StatusLogger.java (line 69) CompactionManager n/a 0
INFO [ScheduledTasks:1] 2015-11-13 14:22:33,436 StatusLogger.java (line 81) MessagingService n/a 0,67
--
陈硕 Shuo Chen
chenatu2006@gmail.com
chenshuo@whaty.com
Re: too many full gc in one node of the cluster
Posted by Kai Wang <de...@gmail.com>.
What's the size of young generation (-Xmn) ?
On Fri, Nov 13, 2015 at 6:38 AM, Jason Wee <pe...@gmail.com> wrote:
> Used to manage/develop for cassandra 1.0.8 for quite sometime. Although
> 1.0 was rocking stable but we encountered various problems as load per node
> grow beyond 500gb. upgrading is one of the solution but may not be the
> solution for you but I strongly recommend you upgrade to 1.1 or 1.2. we
> upgraded the java on the cassandra node and cassandra to 1.1 and a lot of
> problems went away.
>
> As for your use cases, a quick solution would probably to just add nodes,
> or study client reading pattern so not on a node hot row (the has on the
> key), or the client configuration on your application and/or the keyspace
> replication.
>
> hth,
>
> jason
>
> On Fri, Nov 13, 2015 at 2:35 PM, Shuo Chen <ch...@gmail.com> wrote:
>
>> Hi,
>>
>> We have a small cassandra cluster with 4 nodes for production. All the
>> nodes have similar hardware configuration and similar data load. The C*
>> version is 1.0.7 (prretty old)
>>
>> One of the node has much higher cpu usage than others and high full gc
>> frequency, but the io of this node is not high and data load of this node
>> is even lower. So I have several questions:
>>
>> 1. Is that normal that one of the node having much higher full gc with
>> same jvm configuration?
>> 2. Does this node need special gc tuning and how?
>> 3. How to find the cause of the full gc?
>>
>> Thank you guys!
>>
>>
>> The heap size is 8G and max heap size is 16G. The gc config of
>> cassandra-env.sh is default:
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>
>> ---------------------
>> I print instance in the gc log:
>>
>> num #instances #bytes class name
>> ----------------------------------------------
>> 1: 2982796 238731200 [B
>> 2: 3889672 186704256 java.nio.HeapByteBuffer
>> 3: 1749589 55986848 org.apache.cassandra.db.Column
>> 4: 1803900 43293600
>> java.util.concurrent.ConcurrentSkipListMap$Node
>> 5: 859496 20627904
>> java.util.concurrent.ConcurrentSkipListMap$Index
>> 6: 5568 18827912 [J
>> 7: 162630 6505200 java.math.BigInteger
>> 8: 167572 5716976 [I
>> 9: 141698 4534336
>> java.util.concurrent.ConcurrentHashMap$HashEntry
>> 10: 141505 4528160
>> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
>> 11: 31491 4376976 <constMethodKlass>
>> 12: 31491 4291992 <methodKlass>
>> 13: 171695 4120680 org.apache.cassandra.db.DecoratedKey
>> 14: 3157 3436120 <constantPoolKlass>
>> 15: 141784 3402816 java.lang.Long
>> 16: 141624 3398976 org.apache.cassandra.utils.Pair
>> 17: 141505 3396120
>> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
>> 18: 49604 2675352 <symbolKlass>
>> 19: 162254 2596064
>> org.apache.cassandra.dht.BigIntegerToken
>> ......
>> Total 13337798 641834360
>>
>>
>> ----------------------
>> The gc part and thread status of system log:
>>
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:08,681 GCInspector.java (line
>> 123) GC for ParNew: 1015 ms for 2 collections, 3886753520 used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:09,683 GCInspector.java (line
>> 123) GC for ParNew: 500 ms for 1 collections, 4956287408 used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:10,685 GCInspector.java (line
>> 123) GC for ParNew: 627 ms for 1 collections, 5615882296used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:12,015 GCInspector.java (line
>> 123) GC for ParNew: 988 ms for 2 collections, 4943363480 used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:13,016 GCInspector.java (line
>> 123) GC for ParNew: 373 ms for 1 collections, 5978572832 used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:14,020 GCInspector.java (line
>> 123) GC for ParNew: 486 ms for 1 collections, 6209638280used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:15,412 GCInspector.java (line
>> 123) GC for ParNew: 898 ms for 2 collections, 6045603728used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:16,413 GCInspector.java (line
>> 123) GC for ParNew: 503 ms for 1 collections, 6991263984 used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:17,416 GCInspector.java (line
>> 123) GC for ParNew: 746 ms for 1 collections, 7073467384used; max is
>> 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,363 GCInspector.java (line
>> 123) GC for ConcurrentMarkSweep: 843 ms for 2 collections, 1130423160 used;
>> max is 8231321600
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 MessagingService.java
>> (line 603) 4198 READ messages dropped in last 5000ms
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 StatusLogger.java (line
>> 50) Pool Name Active Pending Blocked
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,368 StatusLogger.java (line
>> 65) ReadStage 32 450 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,370 StatusLogger.java (line
>> 65) RequestResponseStage 0 18 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,371 StatusLogger.java (line
>> 65) ReadRepairStage 0 3 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,372 StatusLogger.java (line
>> 65) MutationStage 2 343 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,373 StatusLogger.java (line
>> 65) ReplicateOnWriteStage 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,374 StatusLogger.java (line
>> 65) GossipStage 0 3 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,375 StatusLogger.java (line
>> 65) AntiEntropyStage 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,396 StatusLogger.java (line
>> 65) MigrationStage 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
>> 65) StreamStage 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
>> 65) MemtablePostFlusher 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
>> 65) FlushWriter 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
>> 65) MiscStage 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
>> 65) InternalResponseStage 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,434 StatusLogger.java (line
>> 65) HintedHandoff 0 0 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,435 StatusLogger.java (line
>> 69) CompactionManager n/a 0
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,436 StatusLogger.java (line
>> 81) MessagingService n/a 0,67
>>
>> --
>> *陈硕* *Shuo Chen*
>> chenatu2006@gmail.com
>> chenshuo@whaty.com
>>
>
>
Re: too many full gc in one node of the cluster
Posted by Jason Wee <pe...@gmail.com>.
Used to manage/develop for cassandra 1.0.8 for quite sometime. Although 1.0
was rocking stable but we encountered various problems as load per node
grow beyond 500gb. upgrading is one of the solution but may not be the
solution for you but I strongly recommend you upgrade to 1.1 or 1.2. we
upgraded the java on the cassandra node and cassandra to 1.1 and a lot of
problems went away.
As for your use cases, a quick solution would probably to just add nodes,
or study client reading pattern so not on a node hot row (the has on the
key), or the client configuration on your application and/or the keyspace
replication.
hth,
jason
On Fri, Nov 13, 2015 at 2:35 PM, Shuo Chen <ch...@gmail.com> wrote:
> Hi,
>
> We have a small cassandra cluster with 4 nodes for production. All the
> nodes have similar hardware configuration and similar data load. The C*
> version is 1.0.7 (prretty old)
>
> One of the node has much higher cpu usage than others and high full gc
> frequency, but the io of this node is not high and data load of this node
> is even lower. So I have several questions:
>
> 1. Is that normal that one of the node having much higher full gc with
> same jvm configuration?
> 2. Does this node need special gc tuning and how?
> 3. How to find the cause of the full gc?
>
> Thank you guys!
>
>
> The heap size is 8G and max heap size is 16G. The gc config of
> cassandra-env.sh is default:
>
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> ---------------------
> I print instance in the gc log:
>
> num #instances #bytes class name
> ----------------------------------------------
> 1: 2982796 238731200 [B
> 2: 3889672 186704256 java.nio.HeapByteBuffer
> 3: 1749589 55986848 org.apache.cassandra.db.Column
> 4: 1803900 43293600
> java.util.concurrent.ConcurrentSkipListMap$Node
> 5: 859496 20627904
> java.util.concurrent.ConcurrentSkipListMap$Index
> 6: 5568 18827912 [J
> 7: 162630 6505200 java.math.BigInteger
> 8: 167572 5716976 [I
> 9: 141698 4534336
> java.util.concurrent.ConcurrentHashMap$HashEntry
> 10: 141505 4528160
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> 11: 31491 4376976 <constMethodKlass>
> 12: 31491 4291992 <methodKlass>
> 13: 171695 4120680 org.apache.cassandra.db.DecoratedKey
> 14: 3157 3436120 <constantPoolKlass>
> 15: 141784 3402816 java.lang.Long
> 16: 141624 3398976 org.apache.cassandra.utils.Pair
> 17: 141505 3396120
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
> 18: 49604 2675352 <symbolKlass>
> 19: 162254 2596064
> org.apache.cassandra.dht.BigIntegerToken
> ......
> Total 13337798 641834360
>
>
> ----------------------
> The gc part and thread status of system log:
>
> INFO [ScheduledTasks:1] 2015-11-13 14:22:08,681 GCInspector.java (line
> 123) GC for ParNew: 1015 ms for 2 collections, 3886753520 used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:09,683 GCInspector.java (line
> 123) GC for ParNew: 500 ms for 1 collections, 4956287408 used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:10,685 GCInspector.java (line
> 123) GC for ParNew: 627 ms for 1 collections, 5615882296used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:12,015 GCInspector.java (line
> 123) GC for ParNew: 988 ms for 2 collections, 4943363480 used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:13,016 GCInspector.java (line
> 123) GC for ParNew: 373 ms for 1 collections, 5978572832 used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:14,020 GCInspector.java (line
> 123) GC for ParNew: 486 ms for 1 collections, 6209638280used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:15,412 GCInspector.java (line
> 123) GC for ParNew: 898 ms for 2 collections, 6045603728used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:16,413 GCInspector.java (line
> 123) GC for ParNew: 503 ms for 1 collections, 6991263984 used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:17,416 GCInspector.java (line
> 123) GC for ParNew: 746 ms for 1 collections, 7073467384used; max is
> 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,363 GCInspector.java (line
> 123) GC for ConcurrentMarkSweep: 843 ms for 2 collections, 1130423160 used;
> max is 8231321600
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 MessagingService.java
> (line 603) 4198 READ messages dropped in last 5000ms
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 StatusLogger.java (line
> 50) Pool Name Active Pending Blocked
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,368 StatusLogger.java (line
> 65) ReadStage 32 450 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,370 StatusLogger.java (line
> 65) RequestResponseStage 0 18 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,371 StatusLogger.java (line
> 65) ReadRepairStage 0 3 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,372 StatusLogger.java (line
> 65) MutationStage 2 343 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,373 StatusLogger.java (line
> 65) ReplicateOnWriteStage 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,374 StatusLogger.java (line
> 65) GossipStage 0 3 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,375 StatusLogger.java (line
> 65) AntiEntropyStage 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,396 StatusLogger.java (line
> 65) MigrationStage 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
> 65) StreamStage 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
> 65) MemtablePostFlusher 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
> 65) FlushWriter 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
> 65) MiscStage 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
> 65) InternalResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,434 StatusLogger.java (line
> 65) HintedHandoff 0 0 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,435 StatusLogger.java (line
> 69) CompactionManager n/a 0
> INFO [ScheduledTasks:1] 2015-11-13 14:22:33,436 StatusLogger.java (line
> 81) MessagingService n/a 0,67
>
> --
> *陈硕* *Shuo Chen*
> chenatu2006@gmail.com
> chenshuo@whaty.com
>
Re: too many full gc in one node of the cluster
Posted by Shuo Chen <ch...@gmail.com>.
The 4th node has even less data node than the others. And I check the
connections to the port 9160 counts among the nodes. This node also has
less connections. So it is really strange...
On Sat, Nov 14, 2015 at 5:50 AM, Robert Coli <rc...@eventbrite.com> wrote:
> On Thu, Nov 12, 2015 at 10:35 PM, Shuo Chen <ch...@gmail.com> wrote:
>
>> We have a small cassandra cluster with 4 nodes for production. All the
>> nodes have similar hardware configuration and similar data load. The C*
>> version is 1.0.7 (prretty old)
>>
>> One of the node has much higher cpu usage than others and high full gc
>> frequency, but the io of this node is not high and data load of this node
>> is even lower. So I have several questions:
>>
>
> The simplest explanation is that your 4th node is the only node your
> clients ever talk to.
>
> Other simple explanations include "the 4th node owns more of the ring than
> other nodes do."
>
> =Rob
>
>
--
*陈硕* *Shuo Chen*
chenatu2006@gmail.com
chenshuo@whaty.com
Re: too many full gc in one node of the cluster
Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Nov 12, 2015 at 10:35 PM, Shuo Chen <ch...@gmail.com> wrote:
> We have a small cassandra cluster with 4 nodes for production. All the
> nodes have similar hardware configuration and similar data load. The C*
> version is 1.0.7 (prretty old)
>
> One of the node has much higher cpu usage than others and high full gc
> frequency, but the io of this node is not high and data load of this node
> is even lower. So I have several questions:
>
The simplest explanation is that your 4th node is the only node your
clients ever talk to.
Other simple explanations include "the 4th node owns more of the ring than
other nodes do."
=Rob