You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2015/06/29 20:25:05 UTC

[jira] [Comment Edited] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered

    [ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606066#comment-14606066 ] 

Benedict edited comment on CASSANDRA-9681 at 6/29/15 6:24 PM:
--------------------------------------------------------------

Thanks. It is likely the log files will be insufficient to diagnose, though, just to let you know. Assuming that's the case, the best next step is to obtain a heap dump during one of the spikes (doesn't need to be at the peak, just so long as it's well above where it was settled prior to upgrade). In the meantime I'll see if I can find a candidate by looking through recent changes.


was (Author: benedict):
Thanks. It is likely the log files will be insufficient to diagnose, though, just to let you know. Assuming that's the case, theybest next step is to obtain a heap dump during one of the spikes (doesn't need to be at the peak, just so long as it's well above where it was settled prior to upgrade). In the meantime I'll see if I can find a candidate by looking through recent changes.

> Memtable heap size grows and many long GC pauses are triggered
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-9681
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: C* 2.1.7, Debian Wheezy
>            Reporter: mlowicki
>            Assignee: Benedict
>            Priority: Critical
>             Fix For: 2.1.x
>
>         Attachments: cassandra.yaml
>
>
> C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}})
> Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0)
> After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC  pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing.
> Cliffs on the graphs are nodes restarts.
> Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0.
> Replication factor is set to 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)