You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Song Ziyang (Jira)" <ji...@apache.org> on 2022/11/13 11:30:00 UTC

[jira] [Comment Edited] (RATIS-1743) Memory leak in SegmentedRaftLogWorker due to metrics

    [ https://issues.apache.org/jira/browse/RATIS-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633276#comment-17633276 ] 

Song Ziyang edited comment on RATIS-1743 at 11/13/22 11:29 AM:
---------------------------------------------------------------

Seems that there is a reference leak of SegmentedRaftLogWorker. As a workaround, we can manually set the sharedBuffer to null in the close() hook. What do you think? [~adoroszlai] [~szetszwo] 


was (Author: JIRAUSER281912):
Seems that there is a reference leak of SegmentedRaftLogWorker. As a workaround, I can manually set the sharedBuffer to null in the close() hook. What do you think? [~adoroszlai] [~szetszwo] 

> Memory leak in SegmentedRaftLogWorker due to metrics
> ----------------------------------------------------
>
>                 Key: RATIS-1743
>                 URL: https://issues.apache.org/jira/browse/RATIS-1743
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.0.0, 2.4.1
>            Reporter: Attila Doroszlai
>            Priority: Blocker
>         Attachments: Screenshot from 2022-11-12 22-17-11.png
>
>
> OOME happens in Ozone integration tests.  Currently Xmx=2g, but increasing it does not help.
> {code:title=https://github.com/adoroszlai/hadoop-ozone/actions/runs/3450185096/jobs/5761108630#step:5:3155}
> [INFO] Running org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
> Error:  java.lang.OutOfMemoryError: Java heap space
> Error:  Tests run: 8, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 426.774 s <<< FAILURE! - in org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
> {code}
> {code}
> java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.lambda$new$4(SegmentedRaftLogWorker.java:223)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$603/1771708635.get(Unknown Source)
> 	at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.write(SegmentedRaftLogOutputStream.java:101)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:568)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:320)
> 	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$595/1626598428.run(Unknown Source)
> 	at java.lang.Thread.run(Thread.java:750)
> {code}
> Ozone registers JMX reporter (this is not new):
> {code:title=https://github.com/apache/ozone/blob/a13c62b60556cd003ee2149179f72029d9e35756/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/http/RatisDropwizardExports.java#L51-L53}
>     MetricRegistries.global()
>         .addReporterRegistration(MetricsReporting.jmxReporter(),
>             MetricsReporting.stopJmxReporter());
> {code}
> Based on the heap dump and test log, {{SegmentedRaftLogWorker}} instances are retained by JmxMBeanServer after {{close()}}.
> The problem is probably not new, but its effect is much worse now, because {{SegmentedRaftLogWorker}} recently got a shared buffer (RATIS-1717).
> {code:title=config in Ozone}
> raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
> {code}
> See screenshot for GC root.
> CC [~szetszwo], [~William Song]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)