You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Shashikant Banerjee (Jira)" <ji...@apache.org> on 2020/05/15 09:06:00 UTC

[jira] [Resolved] (RATIS-845) Memory leak of RaftServerImpl for no unregister from reporter

     [ https://issues.apache.org/jira/browse/RATIS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shashikant Banerjee resolved RATIS-845.
---------------------------------------
    Fix Version/s: 0.6.0
       Resolution: Fixed

Thanks [~yjxxtd] for working on this. i have committed this.

> Memory leak of RaftServerImpl for no unregister from reporter
> -------------------------------------------------------------
>
>                 Key: RATIS-845
>                 URL: https://issues.apache.org/jira/browse/RATIS-845
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Major
>             Fix For: 0.6.0
>
>         Attachments: screenshot-10.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png, screenshot-8.png, screenshot-9.png
>
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *What's the problem ? *
> As the image shows, there are 1885 instances of  RaftServerImpl, most of them are Closed, and should be GC, but actually not. You can find from the image 
>  1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap. So 1513 RaftServerImpl leak in ratis, and 372 leak in ozone. If RaftServerImpl can not GC, there are a lot of related resource can not be GC, such as the [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150]  in SegmentRaftLogWorker, which result 1GB memory leak out of heap.
> h3. *{color:#DE350B}1.  1885 instances of RaftServerImpl {color}*
>  !screenshot-4.png! 
> h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap{color}*
>  !screenshot-5.png! 
> h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap{color}*
>  !screenshot-6.png! 
> h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap{color}*
>  !screenshot-7.png! 
> h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by RaftServerImpl.{color}*
>  !screenshot-8.png! 
>  !screenshot-9.png! 
> h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, 802 DirectByteBuffer were held by Datanode ReportManager Thread, total 1885.{color}*
>  !screenshot-10.png! 
> h3. *{color:#DE350B}7. The reason RaftServerImpl held by ManagermentFactory->jxmMBeanServer->HashMap is ratis start [JmxReporter|https://github.com/apache/incubator-ratis/blob/master/ratis-metrics/src/main/java/org/apache/ratis/metrics/MetricsReporting.java#L47], but does not stop it. {color}*
> h3. *{color:#DE350B}8. The reason RaftServerImpl held by Datanode ReportManager Thread -> prometheus -> HashMap is ozone call the ratis function to  [register|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java#L189] metric in prometheus, but does not unregister it.{color}*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)