You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Darrel Schneider (Jira)" <ji...@apache.org> on 2021/12/09 18:43:00 UTC

[jira] [Resolved] (GEODE-9854) Orphaned .drf files causing memory leak

     [ https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Darrel Schneider resolved GEODE-9854.
-------------------------------------
    Fix Version/s: 1.15.0
         Assignee: Jakov Varenina  (was: Darrel Schneider)
       Resolution: Fixed

> Orphaned .drf files causing memory leak
> ---------------------------------------
>
>                 Key: GEODE-9854
>                 URL: https://issues.apache.org/jira/browse/GEODE-9854
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>         Attachments: screenshot-1.png, screenshot-2.png, server1.log
>
>
> Issue:
> An OpLog files are compacted, but the .drf file is left because it contains deletes ofentries in previous .crfs. The .crf file is deleted, but the orphaned .drf is not until all
> previous .crf files (.crfs with smaller id) are deleted.
> The problem is that compacted Oplog object representing orphaned .drf file holds a structure in memory (Oplog.regionMap) that contains information that is not useful
> after the compaction and it takes certain amount of memory. Besides, there is a race condition in the code when creating .krf files that, depending on the execution order,
> could make the problem more severe  (it could leave pendingKrfTags structure on the regionMap and this could take up a significant amount of memory). This
> pendingKrfTags HashMap is actually empty, but consumes memory because it was used previously and the size of the HashMap was not reduced after it is cleared.
> This race condition usually happens when new Oplog is rolled out and previous Oplog is immediately marked as eligible for compaction. Compaction and .krf creation start at
> the similar time and compactor cancels creation of .krf if it is executed first. The pendingKrfTags structure is usually cleared when .krf file is created, but sincecompaction canceled creation of .krf, the pendingKrfTags structure remain in memory until Oplog representing orphaned .drf file is deleted.
> Below it can be see that actually .krf is never created for the orphaned .drf Oplog object that has memory allocated in pendingKrfTags:
> {code:java}
> server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 drf for disk store store1.
> server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 crf for disk store store1.
> server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 <OplogCompactor store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction oplog id(s): oplog#129
> server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 <OplogCompactor store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates in 41 ms
> server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 <Oplog Delete Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1.
> {code}
> !screenshot-1.png|width=1123,height=268!
> Below you can see the log and heap dump of orphaned .drf Oplg that dont have pendingKrfTags allocated in memory. This is because pendingKrfTags is cleared when .krf is created as can be seen in below logs.
> {code:java}
> server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 drf for disk store store1.
> server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 crf for disk store store1.
> server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 <Idle OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
> server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 <OplogCompactor store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for store1 compaction oplog id(s): oplog#130
> server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 <OplogCompactor store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 creates and updates in 64 ms
> server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1.
> server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1.
> {code}
> !screenshot-2.png|width=1123,height=268!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)