You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Adam Lindley (JIRA)" <ji...@apache.org> on 2018/09/21 14:16:00 UTC

[jira] [Commented] (CASSANDRA-9805) nodetool status causes garbage to be accrued

    [ https://issues.apache.org/jira/browse/CASSANDRA-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623674#comment-16623674 ] 

Adam Lindley commented on CASSANDRA-9805:
-----------------------------------------

I've been doing some investigation into this, following on from Andy’s work, to see if this is resolved in the latest version of Cassandra that we’re running with.

We’re running on ReleaseVersion: 3.11.2; my test setup was just running `nodetool status` in a loop, while tracking memory usage with `jstat -gc` polling each 5 seconds; Cassandra running on an Ubuntu 18.04 node, with 2vCPUs, and 4GB RAM. I’m attaching the data I pulled from that:

[^jstat-gc.xlsx]

 The pattern I’m seeing looks better than what Andy saw on previous versions: the old space heap use still climbs with each Eden collection, but the gc that gets run each time on the old space brings us back down to the same level each time, rather than leading to us gradually climbing up each time.

From the data output it looks like each gc event is actually a pair of Full Garbage Collect events, which push the FGCTime up ~0.3-0.4 seconds each time. Is anyone able to explain why the events come in pairs?

I’m trying to work out now what the perf degredation during these gc events is likely to be. If someone’s able to point me at a reasonable way to do that, would be much appreciated.

 

Feels like the issue is certainly better in more recent Cassandra releases, but we do still see old space use climb with repeat calls to nodetool status.

I’m not particularly familiar with Java memory management though, so if anyone could confirm my thinking here that would be great

> nodetool status causes garbage to be accrued
> --------------------------------------------
>
>                 Key: CASSANDRA-9805
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9805
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>         Environment: Ubuntu 14.04 64-bit
> Cassandra 2.0.14
> Java 1.7.0 OpenJDK
>            Reporter: Andy Caldwell
>            Priority: Major
>         Attachments: JVM-heap-usage.png, jstat-gc.xlsx
>
>
> As part of monitoring our Cassandra clusters (generally 2-6 nodes) we were running `nodetool status` regularly (~ every 5 minutes).  On Cassandra 1.2.12 this worked fine and had negligible effect on the Cassandra database service.
> Having upgraded to Cassandra 2.0.14, we've found that, over time, the tenured memory space slowly fills with `RMIConnectionImpl` objects (and some other associated objects) until we start running into memory pressure and triggering proactive and then STW GC (which obviously impact performance of the cluster).  It seems that these objects are kept around long enough to get promoted to tenured from Eden and then don't get considered for collection (due to internal reference cycles?).
> Very easy to reproduce, just call `nodetool status` in a loop and watch the memory usage climb to capacity then drop to empty after STW.  No need to be accessing the DB keys at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org