You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2022/11/13 07:49:00 UTC
[jira] [Created] (RATIS-1743) Memory leak in SegmentedRaftLogWorker due to metrics
Attila Doroszlai created RATIS-1743:
---------------------------------------
Summary: Memory leak in SegmentedRaftLogWorker due to metrics
Key: RATIS-1743
URL: https://issues.apache.org/jira/browse/RATIS-1743
Project: Ratis
Issue Type: Bug
Components: server
Affects Versions: 3.0.0, 2.4.1
Reporter: Attila Doroszlai
Attachments: Screenshot from 2022-11-12 22-17-11.png
OOME happens in Ozone integration tests. Currently Xmx=2g, but increasing it does not help.
{code:title=https://github.com/adoroszlai/hadoop-ozone/actions/runs/3450185096/jobs/5761108630#step:5:3155}
[INFO] Running org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
Error: java.lang.OutOfMemoryError: Java heap space
Error: Tests run: 8, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 426.774 s <<< FAILURE! - in org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
{code}
{code}
java.lang.OutOfMemoryError: Java heap space
at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.lambda$new$4(SegmentedRaftLogWorker.java:223)
at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$603/1771708635.get(Unknown Source)
at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.write(SegmentedRaftLogOutputStream.java:101)
at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:568)
at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:320)
at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$595/1626598428.run(Unknown Source)
at java.lang.Thread.run(Thread.java:750)
{code}
Ozone registers JMX reporter (this is not new):
{code:title=https://github.com/apache/ozone/blob/a13c62b60556cd003ee2149179f72029d9e35756/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/http/RatisDropwizardExports.java#L51-L53}
MetricRegistries.global()
.addReporterRegistration(MetricsReporting.jmxReporter(),
MetricsReporting.stopJmxReporter());
{code}
Based on the heap dump and test log, {{SegmentedRaftLogWorker}} instances are retained by JmxMBeanServer after {{close()}}.
The problem is probably not new, but its effect is much worse now, because {{SegmentedRaftLogWorker}} recently got a shared buffer (RATIS-1717).
{code:title=config in Ozone}
raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
{code}
See screenshot for GC root.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)