You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alex Petrov (JIRA)" <ji...@apache.org> on 2019/02/04 13:32:00 UTC

[jira] [Commented] (CASSANDRA-14922) In JVM dtests need to clean up after instance shutdown

    [ https://issues.apache.org/jira/browse/CASSANDRA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759846#comment-16759846 ] 

Alex Petrov commented on CASSANDRA-14922:
-----------------------------------------

I've investigated this one a bit further, but to be honest could not find any proof of leaks. I've also tried to make shutdown process more synchronous to reduce amount of in-flight memory, but this hasn't helped. I did also check the possible native memory leak. You are right there native byte buffer instances hanging after instance shutdown, but all of them as far as I can tell were pending finalization (see !Screen Shot 2019-01-30 at 15.47.13!).

I've tried running tests in non-constrained environment in loop [here|https://circleci.com/gh/ifesdjeen/cassandra/1197], and it seems to be passing fine after 10x100 runs. 

I made a small follow-up patch (however, these improvements are minor and do not have any impact on the end result) [here|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:oom-improvements]. In this patch I also disable in-jvm dtests for resource constrained environment.

If this looks ok, I suggest we do that and merge [~benedict]'s multi-version patch and continue writing tests.

> In JVM dtests need to clean up after instance shutdown
> ------------------------------------------------------
>
>                 Key: CASSANDRA-14922
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14922
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest
>            Reporter: Joseph Lynch
>            Assignee: Joseph Lynch
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: AllThreadsStopped.png, ClassLoadersRetaining.png, LeakedNativeMemory.png, Leaking_Metrics_On_Shutdown.png, MainClassRetaining.png, MemoryReclaimedFix.png, Metaspace_Actually_Collected.png, OnlyThreeRootsLeft.png, Screen Shot 2019-01-30 at 15.46.35.png, Screen Shot 2019-01-30 at 15.47.13.png, no_more_references.png
>
>
> Currently the unit tests are failing on circleci ([example one|https://circleci.com/gh/jolynch/cassandra/300#tests/containers/1], [example two|https://circleci.com/gh/rustyrazorblade/cassandra/44#tests/containers/1]) because we use a small container (medium) for unit tests by default and the in JVM dtests are leaking a few hundred megabytes of memory per test right now. This is not a big deal because the dtest runs with the larger containers continue to function fine as well as local testing as the number of in JVM dtests is not yet high enough to cause a problem with more than 2GB of available heap. However we should fix the memory leak so that going forwards we can add more in JVM dtests without worry.
> I've been working with [~ifesdjeen] to debug, and the issue appears to be unreleased Table/Keyspace metrics (screenshot showing the leak attached). I believe that we have a few potential issues that are leading to the leaks:
> 1. The [{{Instance::shutdown}}|https://github.com/apache/cassandra/blob/f22fec927de7ac291266660c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/Instance.java#L328-L354] method is not successfully cleaning up all the metrics created by the {{CassandraMetricsRegistry}}
>  2. The [{{TestCluster::close}}|https://github.com/apache/cassandra/blob/f22fec927de7ac291266660c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/TestCluster.java#L283] method is not waiting for all the instances to finish shutting down and cleaning up before continuing on
> 3. I'm not sure if this is an issue assuming we clear all metrics, but [{{TableMetrics::release}}|https://github.com/apache/cassandra/blob/4ae229f5cd270c2b43475b3f752a7b228de260ea/src/java/org/apache/cassandra/metrics/TableMetrics.java#L951] does not release all the metric references (which could leak them)
> I am working on a patch which shuts down everything and assures that we do not leak memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org