You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2022/11/13 07:49:00 UTC

[jira] [Created] (RATIS-1743) Memory leak in SegmentedRaftLogWorker due to metrics

Attila Doroszlai created RATIS-1743:
---------------------------------------

             Summary: Memory leak in SegmentedRaftLogWorker due to metrics
                 Key: RATIS-1743
                 URL: https://issues.apache.org/jira/browse/RATIS-1743
             Project: Ratis
          Issue Type: Bug
          Components: server
    Affects Versions: 3.0.0, 2.4.1
            Reporter: Attila Doroszlai
         Attachments: Screenshot from 2022-11-12 22-17-11.png

OOME happens in Ozone integration tests.  Currently Xmx=2g, but increasing it does not help.

{code:title=https://github.com/adoroszlai/hadoop-ozone/actions/runs/3450185096/jobs/5761108630#step:5:3155}
[INFO] Running org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
Error:  java.lang.OutOfMemoryError: Java heap space
Error:  Tests run: 8, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 426.774 s <<< FAILURE! - in org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
{code}

{code}
java.lang.OutOfMemoryError: Java heap space
	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.lambda$new$4(SegmentedRaftLogWorker.java:223)
	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$603/1771708635.get(Unknown Source)
	at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.write(SegmentedRaftLogOutputStream.java:101)
	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:568)
	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:320)
	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$595/1626598428.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)
{code}

Ozone registers JMX reporter (this is not new):

{code:title=https://github.com/apache/ozone/blob/a13c62b60556cd003ee2149179f72029d9e35756/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/http/RatisDropwizardExports.java#L51-L53}
    MetricRegistries.global()
        .addReporterRegistration(MetricsReporting.jmxReporter(),
            MetricsReporting.stopJmxReporter());
{code}

Based on the heap dump and test log, {{SegmentedRaftLogWorker}} instances are retained by JmxMBeanServer after {{close()}}.

The problem is probably not new, but its effect is much worse now, because {{SegmentedRaftLogWorker}} recently got a shared buffer (RATIS-1717).

{code:title=config in Ozone}
raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
{code}

See screenshot for GC root.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)