You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Chris Kistner (Jira)" <ji...@apache.org> on 2019/12/10 10:18:00 UTC
[jira] [Comment Edited] (CASSANDRA-14355) Memory leak
[ https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992394#comment-16992394 ]
Chris Kistner edited comment on CASSANDRA-14355 at 12/10/19 10:17 AM:
----------------------------------------------------------------------
We have now experienced an issue that might be related to this, however our Cassandra did not crash yet - it just had frequent (every ~ 2 minutes) "ConcurrentMarkSweep GC" events of 16+ seconds!
eg.:
{noformat}
WARN [Service Thread] 2019-12-10 08:03:19,969 GCInspector.java:282 - ConcurrentMarkSweep GC in 19129ms. CMS Old Gen: 7547650016 -> 7547650048; Par Eden Space: 671088640 -> 251798544; Par Survivor Space: 83886048 -> 0
WARN [Service Thread] 2019-12-10 08:03:37,565 GCInspector.java:282 - ConcurrentMarkSweep GC in 16379ms. Par Eden Space: 671088640 -> 254509608; Par Survivor Space: 83886032 -> 0{noformat}
Sometimes it went back down to 200ms again, and after we did a "nodetool drain" and then removed the node from the cluster the GC time remained sub 250ms.
Our setup is:
* 5 nodes in dc1, 5 nodes in dc2.
* RF: dc1=5, dc2=5
* CL = Local Quorum
* Host with 32GB of RAM -> Cassandra allocates 8GB to heap
* Java version: java-1.8.0-openjdk-1.8.0.151-5.b12
* Using Cassandra Reaper 4.6.1 where we scheduled a repair with 32 segments/node (364 segments in total)
I have attached some screenshots from our~11GB heap dump, where io.netty.util.concurrent.FastThreadLocalThread contributed towards 6.4GB of the heap size:
* Problem Suspect 1: LongGC_Problem-Suspect-1_FastThreadLocalThread.png
* Dominator Tree: LongGC_Dominator-Tree.png
* Histogram: LongGC_Histogram.png
I have also attached the output of "nodetool status": LongGC_nodetool_info.txt
We have not tried out Cassandra 3.11.5, which apparently solved the Repair OOME issue: CASSANDRA-14096
was (Author: padakwaak):
We have now experienced an issue that might be related to this, however our Cassandra did not crash yet - it just had frequent (every ~ 2 minutes) "ConcurrentMarkSweep GC" events of 16+ seconds!
eg.:
{noformat}
Line 78776: WARN [Service Thread] 2019-12-10 08:03:19,969 GCInspector.java:282 - ConcurrentMarkSweep GC in 19129ms. CMS Old Gen: 7547650016 -> 7547650048; Par Eden Space: 671088640 -> 251798544; Par Survivor Space: 83886048 -> 0
Line 79080: WARN [Service Thread] 2019-12-10 08:03:37,565 GCInspector.java:282 - ConcurrentMarkSweep GC in 16379ms. Par Eden Space: 671088640 -> 254509608; Par Survivor Space: 83886032 -> 0{noformat}
Sometimes it went back down to 200ms again, and after we did a "nodetool drain" and then removed the node from the cluster the GC time remained sub 250ms.
Our setup is:
* 5 nodes in dc1, 5 nodes in dc2.
* RF: dc1=5, dc2=5
* CL = Local Quorum
* Host with 32GB of RAM -> Cassandra allocates 8GB to heap
* Java version: java-1.8.0-openjdk-1.8.0.151-5.b12
* Using Cassandra Reaper 4.6.1 where we scheduled a repair with 32 segments/node (364 segments in total)
I have attached some screenshots from our~11GB heap dump, where io.netty.util.concurrent.FastThreadLocalThread contributed towards 6.4GB of the heap size:
* Problem Suspect 1: LongGC_Problem-Suspect-1_FastThreadLocalThread.png
* Dominator Tree: LongGC_Dominator-Tree.png
* Histogram: LongGC_Histogram.png
I have also attached the output of "nodetool status": LongGC_nodetool_info.txt
We have not tried out Cassandra 3.11.5, which apparently solved the Repair OOME issue: CASSANDRA-14096
> Memory leak
> -----------
>
> Key: CASSANDRA-14355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14355
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Core
> Environment: Debian Jessie, OpenJDK 1.8.0_151
> Reporter: Eric Evans
> Priority: Normal
> Fix For: 3.11.x
>
> Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 14-24-50.png, LongGC_Dominator-Tree.png, LongGC_Histogram.png, LongGC_Problem-Suspect-1_FastThreadLocalThread.png, LongGC_nodetool_info.txt
>
>
> We're seeing regular, frequent {{OutOfMemoryError}} exceptions. Similar to CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the {{threadLocals}} member of the instances of {{io.netty.util.concurrent.FastThreadLocalThread}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org