You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Chris Kistner (Jira)" <ji...@apache.org> on 2019/12/10 10:18:00 UTC
[jira] [Comment Edited] (CASSANDRA-14355) Memory leak

    [ https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992394#comment-16992394 ] 

Chris Kistner edited comment on CASSANDRA-14355 at 12/10/19 10:17 AM:
----------------------------------------------------------------------

We have now experienced an issue that might be related to this, however our Cassandra did not crash yet - it just had frequent (every ~ 2 minutes) "ConcurrentMarkSweep GC" events of 16+ seconds!
eg.:
{noformat}
WARN  [Service Thread] 2019-12-10 08:03:19,969 GCInspector.java:282 - ConcurrentMarkSweep GC in 19129ms.  CMS Old Gen: 7547650016 -> 7547650048; Par Eden Space: 671088640 -> 251798544; Par Survivor Space: 83886048 -> 0
WARN  [Service Thread] 2019-12-10 08:03:37,565 GCInspector.java:282 - ConcurrentMarkSweep GC in 16379ms.  Par Eden Space: 671088640 -> 254509608; Par Survivor Space: 83886032 -> 0{noformat}
Sometimes it went back down to 200ms again, and after we did a "nodetool drain" and then removed the node from the cluster the GC time remained sub 250ms.

Our setup is:
 * 5 nodes in dc1, 5 nodes in dc2.
 * RF: dc1=5, dc2=5
 * CL = Local Quorum
 * Host with 32GB of RAM -> Cassandra allocates 8GB to heap
 * Java version: java-1.8.0-openjdk-1.8.0.151-5.b12
 * Using Cassandra Reaper 4.6.1 where we scheduled a repair with 32 segments/node (364 segments in total)

I have attached some screenshots from our~11GB heap dump, where io.netty.util.concurrent.FastThreadLocalThread contributed towards 6.4GB of the heap size:
* Problem Suspect 1: LongGC_Problem-Suspect-1_FastThreadLocalThread.png
* Dominator Tree: LongGC_Dominator-Tree.png
* Histogram: LongGC_Histogram.png
I have also attached the output of "nodetool status": LongGC_nodetool_info.txt

We have not tried out Cassandra 3.11.5, which apparently solved the Repair OOME issue: CASSANDRA-14096


was (Author: padakwaak):
We have now experienced an issue that might be related to this, however our Cassandra did not crash yet - it just had frequent (every ~ 2 minutes) "ConcurrentMarkSweep GC" events of 16+ seconds!
eg.:
{noformat}
Line 78776: WARN  [Service Thread] 2019-12-10 08:03:19,969 GCInspector.java:282 - ConcurrentMarkSweep GC in 19129ms.  CMS Old Gen: 7547650016 -> 7547650048; Par Eden Space: 671088640 -> 251798544; Par Survivor Space: 83886048 -> 0
	Line 79080: WARN  [Service Thread] 2019-12-10 08:03:37,565 GCInspector.java:282 - ConcurrentMarkSweep GC in 16379ms.  Par Eden Space: 671088640 -> 254509608; Par Survivor Space: 83886032 -> 0{noformat}
Sometimes it went back down to 200ms again, and after we did a "nodetool drain" and then removed the node from the cluster the GC time remained sub 250ms.

Our setup is:
 * 5 nodes in dc1, 5 nodes in dc2.
 * RF: dc1=5, dc2=5
 * CL = Local Quorum
 * Host with 32GB of RAM -> Cassandra allocates 8GB to heap
 * Java version: java-1.8.0-openjdk-1.8.0.151-5.b12
 * Using Cassandra Reaper 4.6.1 where we scheduled a repair with 32 segments/node (364 segments in total)

I have attached some screenshots from our~11GB heap dump, where io.netty.util.concurrent.FastThreadLocalThread contributed towards 6.4GB of the heap size:
* Problem Suspect 1: LongGC_Problem-Suspect-1_FastThreadLocalThread.png
* Dominator Tree: LongGC_Dominator-Tree.png
* Histogram: LongGC_Histogram.png
I have also attached the output of "nodetool status": LongGC_nodetool_info.txt

We have not tried out Cassandra 3.11.5, which apparently solved the Repair OOME issue: CASSANDRA-14096

> Memory leak
> -----------
>
>                 Key: CASSANDRA-14355
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14355
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>         Environment: Debian Jessie, OpenJDK 1.8.0_151
>            Reporter: Eric Evans
>            Priority: Normal
>             Fix For: 3.11.x
>
>         Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 14-24-50.png, LongGC_Dominator-Tree.png, LongGC_Histogram.png, LongGC_Problem-Suspect-1_FastThreadLocalThread.png, LongGC_nodetool_info.txt
>
>
> We're seeing regular, frequent {{OutOfMemoryError}} exceptions.  Similar to CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the {{threadLocals}} member of the instances of {{io.netty.util.concurrent.FastThreadLocalThread}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org