You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Tsz-wo Sze (Jira)" <ji...@apache.org> on 2022/11/14 01:39:00 UTC

[jira] [Assigned] (RATIS-1743) Memory leak in SegmentedRaftLogWorker due to metrics

     [ https://issues.apache.org/jira/browse/RATIS-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz-wo Sze reassigned RATIS-1743:
---------------------------------

    Assignee: Tsz-wo Sze

> Memory leak in SegmentedRaftLogWorker due to metrics
> ----------------------------------------------------
>
>                 Key: RATIS-1743
>                 URL: https://issues.apache.org/jira/browse/RATIS-1743
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.0.0, 2.4.1
>            Reporter: Attila Doroszlai
>            Assignee: Tsz-wo Sze
>            Priority: Blocker
>         Attachments: Screenshot from 2022-11-12 22-17-11.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> OOME happens in Ozone integration tests.  Currently Xmx=2g, but increasing it does not help.
> {code:title=https://github.com/adoroszlai/hadoop-ozone/actions/runs/3450185096/jobs/5761108630#step:5:3155}
> [INFO] Running org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
> Error:  java.lang.OutOfMemoryError: Java heap space
> Error:  Tests run: 8, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 426.774 s <<< FAILURE! - in org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
> {code}
> {code}
> java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.lambda$new$4(SegmentedRaftLogWorker.java:223)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$603/1771708635.get(Unknown Source)
> 	at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.write(SegmentedRaftLogOutputStream.java:101)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:568)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:320)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$595/1626598428.run(Unknown Source)
> 	at java.lang.Thread.run(Thread.java:750)
> {code}
> Ozone registers JMX reporter (this is not new):
> {code:title=https://github.com/apache/ozone/blob/a13c62b60556cd003ee2149179f72029d9e35756/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/http/RatisDropwizardExports.java#L51-L53}
>     MetricRegistries.global()
>         .addReporterRegistration(MetricsReporting.jmxReporter(),
>             MetricsReporting.stopJmxReporter());
> {code}
> Based on the heap dump and test log, {{SegmentedRaftLogWorker}} instances are retained by JmxMBeanServer after {{close()}}.
> The problem is probably not new, but its effect is much worse now, because {{SegmentedRaftLogWorker}} recently got a shared buffer (RATIS-1717).
> {code:title=config in Ozone}
> raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
> {code}
> See screenshot for GC root.
> CC [~szetszwo], [~William Song]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)