You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Peng Xiao <25...@qq.com> on 2017/10/26 14:24:08 UTC

how to identify the root cause of cassandra hang

Hi,


We have a cluster with 48 nodes configured with RACK,sometimes it's hang for even 2 minutes.the response time jump from 300ms to 15s.
Could anyone please advise how to identified the root cause ?


The following is from the system log


INFO  [Service Thread] 2017-10-26 21:45:46,796 GCInspector.java:258 - G1 Young Generation GC in 222ms.  G1 Eden Space: 939524096 -> 0; G1 Old Gen: 6652738584 -> 6662878232; G1 Survivor Space: 134217728 -> 109051904;
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:51 - Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - MutationStage                     0         3     3612475121         0                 0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - RequestResponseStage              0         0     6333593550         0                 0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - ReadRepairStage                   0         0        2773154         0                 0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - CounterMutationStage              0         0              0         0                 0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - ReadStage                         0         4      417419357         0                 0



Thanks.

RE: how to identify the root cause of cassandra hang

Posted by Meg Mara <mm...@digitalriver.com>.

Hello,

It could happen if your GC pauses are too long and/or too frequent. If your heap sizes are not large enough. When a long GC happens, Cassandra node effectively behaves like a dead node (unresponsive). Other nodes start collecting hints for it etc. Maybe you should look into your logs to see if your GC pauses are happening too often. Grep for GCInspector in system.log. Could be a possibility.

Meg Mara


From: Peng Xiao [mailto:2535053@qq.com]
Sent: Thursday, October 26, 2017 9:24 AM
To: user <us...@cassandra.apache.org>
Subject: how to identify the root cause of cassandra hang

Hi,

We have a cluster with 48 nodes configured with RACK,sometimes it's hang for even 2 minutes.the response time jump from 300ms to 15s.
Could anyone please advise how to identified the root cause ?

The following is from the system log

INFO  [Service Thread] 2017-10-26 21:45:46,796 GCInspector.java:258 - G1 Young Generation GC in 222ms.  G1 Eden Space: 939524096 -> 0; G1 Old Gen: 6652738584 -> 6662878232; G1 Survivor Space: 134217728 -> 109051904;
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:51 - Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - MutationStage                     0         3     3612475121         0                 0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - RequestResponseStage              0         0     6333593550         0                 0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - ReadRepairStage                   0         0        2773154         0                 0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - CounterMutationStage              0         0              0         0                 0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - ReadStage                         0         4      417419357         0                 0

Thanks.