You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2015/12/02 18:37:11 UTC

[jira] [Comment Edited] (OAK-3710) Continuous revision GC

    [ https://issues.apache.org/jira/browse/OAK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036140#comment-15036140 ] 

Vikas Saurabh edited comment on OAK-3710 at 12/2/15 5:36 PM:
-------------------------------------------------------------

[~mreutegg], I was further discussing this with [~chetanm] and it seemed that we might be able to 'reduce' number of document writes during 'rewrite commit entries (step 3.2)' if we introduce some sort of early document re-write attached to lastRev updates. Chetan had concerns around slowing down background-write so we might want to do in a separate thread with queue of docs similar to pending-last-revs.
The idea is to clean up document for which last rev is to updated to also be scanned for revisions from same cluster node older than last-rev being updated from ie. for last-rev update of r-0-2=r-X-2, we can clean properties with revisions r-Y-2 where Y<X... also, rewrite _commitRoot and _revision as well.

This, won't remove the need to scan all documents (under correct _modified condition) but should reduce the number of documents that would require to be updated. I'd open up a new issue if this makes sense (this might be similar to OAK-3716 btw... except that waiting for candidates to split would have lesser documents that would get _commitRoot moved to _revision)


was (Author: catholicon):
[~mreutegg], I was further discussing this with [~chetanm] and it seemed that we might be able to 'reduce' number of document writes during 'rewrite commit entries (step 3.2)' if we introduce some sort of early document re-write attached to lastRev updates. Chetan had concerns around slowing down background-write so we might want to do in a separate thread with queue of docs similar to pending-last-revs.
The idea is to clean up document for which last rev is to updated to also be scanned for revisions from same cluster node older than last-rev being updated from ie. for last-rev update of r-0-2=r-X-2, we can clean properties with revisions r-Y-2 where Y<X... also, rewrite _commitRoot and _revision as well.

This, won't remove the need to scan all documents (under correct _modified condition) but should reduce the number of documents that would require to be updated. I'd open up a new issue if this makes sense.

> Continuous revision GC
> ----------------------
>
>                 Key: OAK-3710
>                 URL: https://issues.apache.org/jira/browse/OAK-3710
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core, documentmk
>            Reporter: Marcel Reutegger
>
> Implement continuous revision GC cleaning up documents older than a given threshold (e.g. one day). This issue is related to OAK-3070 where each GC run starts where the last one finished.
> This will avoid peak load on the system as we see it right now, when GC is triggered once a day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)