You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Shuo Chen <ch...@gmail.com> on 2015/11/13 07:35:35 UTC

too many full gc in one node of the cluster

Hi,

We have a small cassandra cluster with 4 nodes for production. All the
nodes have similar hardware configuration and similar data load. The C*
version is 1.0.7 (prretty old)

One of the node has much higher cpu usage than others and high full gc
frequency, but the io of this node is not high and data load of this node
is even lower. So I have several questions:

1. Is that normal that one of the node having much higher full gc with same
jvm configuration?
2. Does this node need special gc tuning and how?
3. How to find the cause of the full gc?

Thank you guys!


The heap size is 8G and max heap size is 16G. The gc config of
cassandra-env.sh is default:

JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

---------------------
I print instance in the gc log:

num     #instances         #bytes  class name
----------------------------------------------
   1:       2982796      238731200  [B
   2:       3889672      186704256  java.nio.HeapByteBuffer
   3:       1749589       55986848  org.apache.cassandra.db.Column
   4:       1803900       43293600
 java.util.concurrent.ConcurrentSkipListMap$Node
   5:        859496       20627904
 java.util.concurrent.ConcurrentSkipListMap$Index
   6:          5568       18827912  [J
   7:        162630        6505200  java.math.BigInteger
   8:        167572        5716976  [I
   9:        141698        4534336
 java.util.concurrent.ConcurrentHashMap$HashEntry
  10:        141505        4528160
 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
  11:         31491        4376976  <constMethodKlass>
  12:         31491        4291992  <methodKlass>
  13:        171695        4120680  org.apache.cassandra.db.DecoratedKey
  14:          3157        3436120  <constantPoolKlass>
  15:        141784        3402816  java.lang.Long
  16:        141624        3398976  org.apache.cassandra.utils.Pair
  17:        141505        3396120
 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
  18:         49604        2675352  <symbolKlass>
  19:        162254        2596064  org.apache.cassandra.dht.BigIntegerToken
......
Total      13337798      641834360


----------------------
The gc part and thread status of system log:

INFO [ScheduledTasks:1] 2015-11-13 14:22:08,681 GCInspector.java (line 123)
GC for ParNew: 1015 ms for 2 collections, 3886753520 used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:09,683 GCInspector.java (line
123) GC for ParNew: 500 ms for 1 collections, 4956287408 used; max is
8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:10,685 GCInspector.java (line
123) GC for ParNew: 627 ms for 1 collections, 5615882296used; max is
8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:12,015 GCInspector.java (line
123) GC for ParNew: 988 ms for 2 collections, 4943363480 used; max is
8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:13,016 GCInspector.java (line
123) GC for ParNew: 373 ms for 1 collections, 5978572832 used; max is
8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:14,020 GCInspector.java (line
123) GC for ParNew: 486 ms for 1 collections, 6209638280used; max is
8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:15,412 GCInspector.java (line
123) GC for ParNew: 898 ms for 2 collections, 6045603728used; max is
8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:16,413 GCInspector.java (line
123) GC for ParNew: 503 ms for 1 collections, 6991263984 used; max is
8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:17,416 GCInspector.java (line
123) GC for ParNew: 746 ms for 1 collections, 7073467384used; max is
8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,363 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 843 ms for 2 collections, 1130423160 used;
max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 MessagingService.java
(line 603) 4198 READ messages dropped in last 5000ms
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 StatusLogger.java (line
50) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,368 StatusLogger.java (line
65) ReadStage                        32       450         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,370 StatusLogger.java (line
65) RequestResponseStage              0        18         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,371 StatusLogger.java (line
65) ReadRepairStage                   0         3         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,372 StatusLogger.java (line
65) MutationStage                     2       343         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,373 StatusLogger.java (line
65) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,374 StatusLogger.java (line
65) GossipStage                       0         3         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,375 StatusLogger.java (line
65) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,396 StatusLogger.java (line
65) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
65) StreamStage                       0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
65) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
65) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
65) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
65) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,434 StatusLogger.java (line
65) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,435 StatusLogger.java (line
69) CompactionManager               n/a         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,436 StatusLogger.java (line
81) MessagingService                n/a      0,67

-- 
*陈硕* *Shuo Chen*
chenatu2006@gmail.com
chenshuo@whaty.com

Re: too many full gc in one node of the cluster

Posted by Jeff Jirsa <je...@crowdstrike.com>.

What Kai Wang is hinting at: one of the common tuning problems people face is they follow the advice in cassandra-env.sh, which says 100M of young gen space (Xmn) per core. Many people find that insufficient – raising that to be 30,40, or 50% of heap size (Xmx) MAY help keep short-lived objects (especially from the read path) in young gen rather than promoted to old gen, so they’ll be dropped quickly rather than promoted/cleaned with stop-the-world phase of CMS.

It’s often the case that increasing Xmn and setting XX:MaxTenuringThreshold=6 or 8 will reduce the number of long CMS pauses. 

- Jeff

From:  Kai Wang
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, November 13, 2015 at 5:35 AM
To:  "user@cassandra.apache.org"
Subject:  Re: too many full gc in one node of the cluster

What's the size of young generation (-Xmn) ?

On Fri, Nov 13, 2015 at 6:38 AM, Jason Wee <pe...@gmail.com> wrote:
Used to manage/develop for cassandra 1.0.8 for quite sometime. Although 1.0 was rocking stable but we encountered various problems as load per node grow beyond 500gb. upgrading is one of the solution but may not be the solution for you but I strongly recommend you upgrade to 1.1 or 1.2. we upgraded the java on the cassandra node and cassandra to 1.1 and a lot of problems went away.

As for your use cases, a quick solution would probably to just add nodes, or study client reading pattern so not on a node hot row (the has on the key), or the client configuration on your application and/or the keyspace replication.

hth,

jason

On Fri, Nov 13, 2015 at 2:35 PM, Shuo Chen <ch...@gmail.com> wrote:
Hi,

We have a small cassandra cluster with 4 nodes for production. All the nodes have similar hardware configuration and similar data load. The C* version is 1.0.7 (prretty old)

One of the node has much higher cpu usage than others and high full gc frequency, but the io of this node is not high and data load of this node is even lower. So I have several questions:

1. Is that normal that one of the node having much higher full gc with same jvm configuration?
2. Does this node need special gc tuning and how?
3. How to find the cause of the full gc?

Thank you guys!


The heap size is 8G and max heap size is 16G. The gc config of cassandra-env.sh is default:

JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

---------------------
I print instance in the gc log:

num     #instances         #bytes  class name
----------------------------------------------
   1:       2982796      238731200  [B
   2:       3889672      186704256  java.nio.HeapByteBuffer
   3:       1749589       55986848  org.apache.cassandra.db.Column
   4:       1803900       43293600  java.util.concurrent.ConcurrentSkipListMap$Node
   5:        859496       20627904  java.util.concurrent.ConcurrentSkipListMap$Index
   6:          5568       18827912  [J
   7:        162630        6505200  java.math.BigInteger
   8:        167572        5716976  [I
   9:        141698        4534336  java.util.concurrent.ConcurrentHashMap$HashEntry
  10:        141505        4528160  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
  11:         31491        4376976  <constMethodKlass>
  12:         31491        4291992  <methodKlass>
  13:        171695        4120680  org.apache.cassandra.db.DecoratedKey
  14:          3157        3436120  <constantPoolKlass>
  15:        141784        3402816  java.lang.Long
  16:        141624        3398976  org.apache.cassandra.utils.Pair
  17:        141505        3396120  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
  18:         49604        2675352  <symbolKlass>
  19:        162254        2596064  org.apache.cassandra.dht.BigIntegerToken
......
Total      13337798      641834360


----------------------
The gc part and thread status of system log:

INFO [ScheduledTasks:1] 2015-11-13 14:22:08,681 GCInspector.java (line 123) GC for ParNew: 1015 ms for 2 collections, 3886753520 used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:09,683 GCInspector.java (line 123) GC for ParNew: 500 ms for 1 collections, 4956287408 used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:10,685 GCInspector.java (line 123) GC for ParNew: 627 ms for 1 collections, 5615882296used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:12,015 GCInspector.java (line 123) GC for ParNew: 988 ms for 2 collections, 4943363480 used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:13,016 GCInspector.java (line 123) GC for ParNew: 373 ms for 1 collections, 5978572832 used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:14,020 GCInspector.java (line 123) GC for ParNew: 486 ms for 1 collections, 6209638280used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:15,412 GCInspector.java (line 123) GC for ParNew: 898 ms for 2 collections, 6045603728used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:16,413 GCInspector.java (line 123) GC for ParNew: 503 ms for 1 collections, 6991263984 used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:17,416 GCInspector.java (line 123) GC for ParNew: 746 ms for 1 collections, 7073467384used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,363 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 843 ms for 2 collections, 1130423160 used; max is 8231321600
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 MessagingService.java (line 603) 4198 READ messages dropped in last 5000ms
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 StatusLogger.java (line 50) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,368 StatusLogger.java (line 65) ReadStage                        32       450         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,370 StatusLogger.java (line 65) RequestResponseStage              0        18         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,371 StatusLogger.java (line 65) ReadRepairStage                   0         3         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,372 StatusLogger.java (line 65) MutationStage                     2       343         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,373 StatusLogger.java (line 65) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,374 StatusLogger.java (line 65) GossipStage                       0         3         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,375 StatusLogger.java (line 65) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,396 StatusLogger.java (line 65) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line 65) StreamStage                       0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line 65) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line 65) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line 65) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line 65) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,434 StatusLogger.java (line 65) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,435 StatusLogger.java (line 69) CompactionManager               n/a         0
 INFO [ScheduledTasks:1] 2015-11-13 14:22:33,436 StatusLogger.java (line 81) MessagingService                n/a      0,67

-- 
陈硕 Shuo Chen
chenatu2006@gmail.com
chenshuo@whaty.com

Re: too many full gc in one node of the cluster

Posted by Kai Wang <de...@gmail.com>.

What's the size of young generation (-Xmn) ?

On Fri, Nov 13, 2015 at 6:38 AM, Jason Wee <pe...@gmail.com> wrote:

> Used to manage/develop for cassandra 1.0.8 for quite sometime. Although
> 1.0 was rocking stable but we encountered various problems as load per node
> grow beyond 500gb. upgrading is one of the solution but may not be the
> solution for you but I strongly recommend you upgrade to 1.1 or 1.2. we
> upgraded the java on the cassandra node and cassandra to 1.1 and a lot of
> problems went away.
>
> As for your use cases, a quick solution would probably to just add nodes,
> or study client reading pattern so not on a node hot row (the has on the
> key), or the client configuration on your application and/or the keyspace
> replication.
>
> hth,
>
> jason
>
> On Fri, Nov 13, 2015 at 2:35 PM, Shuo Chen <ch...@gmail.com> wrote:
>
>> Hi,
>>
>> We have a small cassandra cluster with 4 nodes for production. All the
>> nodes have similar hardware configuration and similar data load. The C*
>> version is 1.0.7 (prretty old)
>>
>> One of the node has much higher cpu usage than others and high full gc
>> frequency, but the io of this node is not high and data load of this node
>> is even lower. So I have several questions:
>>
>> 1. Is that normal that one of the node having much higher full gc with
>> same jvm configuration?
>> 2. Does this node need special gc tuning and how?
>> 3. How to find the cause of the full gc?
>>
>> Thank you guys!
>>
>>
>> The heap size is 8G and max heap size is 16G. The gc config of
>> cassandra-env.sh is default:
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>
>> ---------------------
>> I print instance in the gc log:
>>
>> num     #instances         #bytes  class name
>> ----------------------------------------------
>>    1:       2982796      238731200  [B
>>    2:       3889672      186704256  java.nio.HeapByteBuffer
>>    3:       1749589       55986848  org.apache.cassandra.db.Column
>>    4:       1803900       43293600
>>  java.util.concurrent.ConcurrentSkipListMap$Node
>>    5:        859496       20627904
>>  java.util.concurrent.ConcurrentSkipListMap$Index
>>    6:          5568       18827912  [J
>>    7:        162630        6505200  java.math.BigInteger
>>    8:        167572        5716976  [I
>>    9:        141698        4534336
>>  java.util.concurrent.ConcurrentHashMap$HashEntry
>>   10:        141505        4528160
>>  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
>>   11:         31491        4376976  <constMethodKlass>
>>   12:         31491        4291992  <methodKlass>
>>   13:        171695        4120680  org.apache.cassandra.db.DecoratedKey
>>   14:          3157        3436120  <constantPoolKlass>
>>   15:        141784        3402816  java.lang.Long
>>   16:        141624        3398976  org.apache.cassandra.utils.Pair
>>   17:        141505        3396120
>>  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
>>   18:         49604        2675352  <symbolKlass>
>>   19:        162254        2596064
>>  org.apache.cassandra.dht.BigIntegerToken
>> ......
>> Total      13337798      641834360
>>
>>
>> ----------------------
>> The gc part and thread status of system log:
>>
>> INFO [ScheduledTasks:1] 2015-11-13 14:22:08,681 GCInspector.java (line
>> 123) GC for ParNew: 1015 ms for 2 collections, 3886753520 used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:09,683 GCInspector.java (line
>> 123) GC for ParNew: 500 ms for 1 collections, 4956287408 used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:10,685 GCInspector.java (line
>> 123) GC for ParNew: 627 ms for 1 collections, 5615882296used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:12,015 GCInspector.java (line
>> 123) GC for ParNew: 988 ms for 2 collections, 4943363480 used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:13,016 GCInspector.java (line
>> 123) GC for ParNew: 373 ms for 1 collections, 5978572832 used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:14,020 GCInspector.java (line
>> 123) GC for ParNew: 486 ms for 1 collections, 6209638280used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:15,412 GCInspector.java (line
>> 123) GC for ParNew: 898 ms for 2 collections, 6045603728used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:16,413 GCInspector.java (line
>> 123) GC for ParNew: 503 ms for 1 collections, 6991263984 used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:17,416 GCInspector.java (line
>> 123) GC for ParNew: 746 ms for 1 collections, 7073467384used; max is
>> 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,363 GCInspector.java (line
>> 123) GC for ConcurrentMarkSweep: 843 ms for 2 collections, 1130423160 used;
>> max is 8231321600
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 MessagingService.java
>> (line 603) 4198 READ messages dropped in last 5000ms
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 StatusLogger.java (line
>> 50) Pool Name                    Active   Pending   Blocked
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,368 StatusLogger.java (line
>> 65) ReadStage                        32       450         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,370 StatusLogger.java (line
>> 65) RequestResponseStage              0        18         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,371 StatusLogger.java (line
>> 65) ReadRepairStage                   0         3         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,372 StatusLogger.java (line
>> 65) MutationStage                     2       343         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,373 StatusLogger.java (line
>> 65) ReplicateOnWriteStage             0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,374 StatusLogger.java (line
>> 65) GossipStage                       0         3         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,375 StatusLogger.java (line
>> 65) AntiEntropyStage                  0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,396 StatusLogger.java (line
>> 65) MigrationStage                    0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
>> 65) StreamStage                       0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
>> 65) MemtablePostFlusher               0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
>> 65) FlushWriter                       0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
>> 65) MiscStage                         0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
>> 65) InternalResponseStage             0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,434 StatusLogger.java (line
>> 65) HintedHandoff                     0         0         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,435 StatusLogger.java (line
>> 69) CompactionManager               n/a         0
>>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,436 StatusLogger.java (line
>> 81) MessagingService                n/a      0,67
>>
>> --
>> *陈硕* *Shuo Chen*
>> chenatu2006@gmail.com
>> chenshuo@whaty.com
>>
>
>

Re: too many full gc in one node of the cluster

Posted by Jason Wee <pe...@gmail.com>.

Used to manage/develop for cassandra 1.0.8 for quite sometime. Although 1.0
was rocking stable but we encountered various problems as load per node
grow beyond 500gb. upgrading is one of the solution but may not be the
solution for you but I strongly recommend you upgrade to 1.1 or 1.2. we
upgraded the java on the cassandra node and cassandra to 1.1 and a lot of
problems went away.

As for your use cases, a quick solution would probably to just add nodes,
or study client reading pattern so not on a node hot row (the has on the
key), or the client configuration on your application and/or the keyspace
replication.

hth,

jason

On Fri, Nov 13, 2015 at 2:35 PM, Shuo Chen <ch...@gmail.com> wrote:

> Hi,
>
> We have a small cassandra cluster with 4 nodes for production. All the
> nodes have similar hardware configuration and similar data load. The C*
> version is 1.0.7 (prretty old)
>
> One of the node has much higher cpu usage than others and high full gc
> frequency, but the io of this node is not high and data load of this node
> is even lower. So I have several questions:
>
> 1. Is that normal that one of the node having much higher full gc with
> same jvm configuration?
> 2. Does this node need special gc tuning and how?
> 3. How to find the cause of the full gc?
>
> Thank you guys!
>
>
> The heap size is 8G and max heap size is 16G. The gc config of
> cassandra-env.sh is default:
>
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> ---------------------
> I print instance in the gc log:
>
> num     #instances         #bytes  class name
> ----------------------------------------------
>    1:       2982796      238731200  [B
>    2:       3889672      186704256  java.nio.HeapByteBuffer
>    3:       1749589       55986848  org.apache.cassandra.db.Column
>    4:       1803900       43293600
>  java.util.concurrent.ConcurrentSkipListMap$Node
>    5:        859496       20627904
>  java.util.concurrent.ConcurrentSkipListMap$Index
>    6:          5568       18827912  [J
>    7:        162630        6505200  java.math.BigInteger
>    8:        167572        5716976  [I
>    9:        141698        4534336
>  java.util.concurrent.ConcurrentHashMap$HashEntry
>   10:        141505        4528160
>  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
>   11:         31491        4376976  <constMethodKlass>
>   12:         31491        4291992  <methodKlass>
>   13:        171695        4120680  org.apache.cassandra.db.DecoratedKey
>   14:          3157        3436120  <constantPoolKlass>
>   15:        141784        3402816  java.lang.Long
>   16:        141624        3398976  org.apache.cassandra.utils.Pair
>   17:        141505        3396120
>  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
>   18:         49604        2675352  <symbolKlass>
>   19:        162254        2596064
>  org.apache.cassandra.dht.BigIntegerToken
> ......
> Total      13337798      641834360
>
>
> ----------------------
> The gc part and thread status of system log:
>
> INFO [ScheduledTasks:1] 2015-11-13 14:22:08,681 GCInspector.java (line
> 123) GC for ParNew: 1015 ms for 2 collections, 3886753520 used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:09,683 GCInspector.java (line
> 123) GC for ParNew: 500 ms for 1 collections, 4956287408 used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:10,685 GCInspector.java (line
> 123) GC for ParNew: 627 ms for 1 collections, 5615882296used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:12,015 GCInspector.java (line
> 123) GC for ParNew: 988 ms for 2 collections, 4943363480 used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:13,016 GCInspector.java (line
> 123) GC for ParNew: 373 ms for 1 collections, 5978572832 used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:14,020 GCInspector.java (line
> 123) GC for ParNew: 486 ms for 1 collections, 6209638280used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:15,412 GCInspector.java (line
> 123) GC for ParNew: 898 ms for 2 collections, 6045603728used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:16,413 GCInspector.java (line
> 123) GC for ParNew: 503 ms for 1 collections, 6991263984 used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:17,416 GCInspector.java (line
> 123) GC for ParNew: 746 ms for 1 collections, 7073467384used; max is
> 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,363 GCInspector.java (line
> 123) GC for ConcurrentMarkSweep: 843 ms for 2 collections, 1130423160 used;
> max is 8231321600
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 MessagingService.java
> (line 603) 4198 READ messages dropped in last 5000ms
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,364 StatusLogger.java (line
> 50) Pool Name                    Active   Pending   Blocked
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,368 StatusLogger.java (line
> 65) ReadStage                        32       450         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,370 StatusLogger.java (line
> 65) RequestResponseStage              0        18         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,371 StatusLogger.java (line
> 65) ReadRepairStage                   0         3         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,372 StatusLogger.java (line
> 65) MutationStage                     2       343         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,373 StatusLogger.java (line
> 65) ReplicateOnWriteStage             0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,374 StatusLogger.java (line
> 65) GossipStage                       0         3         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,375 StatusLogger.java (line
> 65) AntiEntropyStage                  0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,396 StatusLogger.java (line
> 65) MigrationStage                    0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
> 65) StreamStage                       0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
> 65) MemtablePostFlusher               0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,397 StatusLogger.java (line
> 65) FlushWriter                       0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
> 65) MiscStage                         0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,398 StatusLogger.java (line
> 65) InternalResponseStage             0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,434 StatusLogger.java (line
> 65) HintedHandoff                     0         0         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,435 StatusLogger.java (line
> 69) CompactionManager               n/a         0
>  INFO [ScheduledTasks:1] 2015-11-13 14:22:33,436 StatusLogger.java (line
> 81) MessagingService                n/a      0,67
>
> --
> *陈硕* *Shuo Chen*
> chenatu2006@gmail.com
> chenshuo@whaty.com
>

Re: too many full gc in one node of the cluster

Posted by Shuo Chen <ch...@gmail.com>.

The 4th node has even less data node than the others. And I check the
connections to the port 9160 counts among the nodes. This node also has
less connections. So it is really strange...

On Sat, Nov 14, 2015 at 5:50 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Nov 12, 2015 at 10:35 PM, Shuo Chen <ch...@gmail.com> wrote:
>
>> We have a small cassandra cluster with 4 nodes for production. All the
>> nodes have similar hardware configuration and similar data load. The C*
>> version is 1.0.7 (prretty old)
>>
>> One of the node has much higher cpu usage than others and high full gc
>> frequency, but the io of this node is not high and data load of this node
>> is even lower. So I have several questions:
>>
>
> The simplest explanation is that your 4th node is the only node your
> clients ever talk to.
>
> Other simple explanations include "the 4th node owns more of the ring than
> other nodes do."
>
> =Rob
>
>



-- 
*陈硕* *Shuo Chen*
chenatu2006@gmail.com
chenshuo@whaty.com

Re: too many full gc in one node of the cluster

Posted by Robert Coli <rc...@eventbrite.com>.

On Thu, Nov 12, 2015 at 10:35 PM, Shuo Chen <ch...@gmail.com> wrote:

> We have a small cassandra cluster with 4 nodes for production. All the
> nodes have similar hardware configuration and similar data load. The C*
> version is 1.0.7 (prretty old)
>
> One of the node has much higher cpu usage than others and high full gc
> frequency, but the io of this node is not high and data load of this node
> is even lower. So I have several questions:
>

The simplest explanation is that your 4th node is the only node your
clients ever talk to.

Other simple explanations include "the 4th node owns more of the ring than
other nodes do."

=Rob