You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2017/02/27 13:11:45 UTC

[jira] [Commented] (OAK-3287) DocumentMK revision GC

    [ https://issues.apache.org/jira/browse/OAK-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885761#comment-15885761 ] 

Vikas Saurabh commented on OAK-3287:
------------------------------------

_Adding a comment here as it's related to 2 issues under this epic - OAK-3070 and OAK-5704_
The issue that they are dealing with that version gc might get false positives due to resurrection of documents during collection phase. That requires un-necessary scan of documents which aren't meant to be deleted.
OAK-3070 proposes an approach to have a lower bound on documents that are scanned for collection. This lower bound relies on the time that last successful revGc run would have committed to {{settings}} collection
OAK-5704 proposes to reset {{_deletedOnce}} flag of false positives as next step of collection phase so that next run of revGc won't collect them.

In terms of logic, OAK-5704 is clearly a better approach as it writes back data to represent a more accurate reality (I mean sure not in english meaning of "Once" but rather how we inteded to use it). As a secondary (yet pretty big) benefit, it reduces index size consumed due to the flag. That being said, it breaks the implicit assumption that the flag was once set and forget form.
Otoh, OAK-3070 takes a simpler approach to evade false positives given the data as it being committed today.
Also, a difference between OAK-5704 and OAK-3070 is that OAK-3070 requires prevision revGc run to complete to evade false positives detected in the last run. Otoh, OAK-5704, would clean up the flag after first scan - so, even partial run of prev revGc run would help in reducing/evading false positives.

My only concern with OAK-5704 is that it's a bit too intrusive and I'm not feeling comfortable with it getting backported until it's gets some back time. So, what I'm proposing is that we do both OAK-5704 and OAK-3070 in trunk (although OAK-3070 is kind of useless in presence of OAK-5704) and backport only OAK-3070.
The rationale being that the biggest losses of OAK-3070 - (1) requiring a complete revGc run before it gives benefits and (2) reduction in {{_deletedOnce}} index size might not be the biggest problem to solve atm.

/cc [~mreutegg], [~reschke], [~stefan.eissing], [~chetanm]

> DocumentMK revision GC
> ----------------------
>
>                 Key: OAK-3287
>                 URL: https://issues.apache.org/jira/browse/OAK-3287
>             Project: Jackrabbit Oak
>          Issue Type: Epic
>          Components: documentmk, mongomk, rdbmk
>            Reporter: Michael Marth
>            Assignee: Marcel Reutegger
>             Fix For: 1.8
>
>
> Collector for various tasks on implementing DocMK revision GC



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)