You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Stefan Eissing (JIRA)" <ji...@apache.org> on 2017/03/03 13:51:45 UTC

[jira] [Comment Edited] (OAK-4780) VersionGarbageCollector should be able to run incrementally

    [ https://issues.apache.org/jira/browse/OAK-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894387#comment-15894387 ] 

Stefan Eissing edited comment on OAK-4780 at 3/3/17 1:51 PM:
-------------------------------------------------------------

Updated my github clone with the following:
 * configure {{maxIterations}} that a gc run is allowed to make (default 0 == no limit)
 * configure {{maxDuration}} that a gc run might take (default 0 == no limit)
 * configure {{batchDelay}} that gc shall sleep between modification batches (default == 0, no delay)

added test case for cleanup in iterations.

The idea how to use these configuration parameters is:
 * use {{maxIterations}} only in test setups where one wants to check the immediate results
 * use {{maxDuration}} when the gc runs in a daily (weekly?) maintenance window, e.g. during the night and shall stop iterating when working hours resume.
 * use {{batchDelay}} when gc shall run during busy times or all the time, e.g. on 24/7 systems. A small delay should prevent the gc from taking over the write locks (on db/table/index), depending on database used.




was (Author: stefan.eissing):
Updated my github clone with the following:
 * configure ```maxIterations``` that a gc run is allowed to make (default 0 == no limit)
 * configure ```maxDuration``` that a gc run might take (default 0 == no limit)
 * configure ```batchDelay``` that gc shall sleep between modification batches (default == 0, no delay)

added test case for cleanup in iterations.

The idea how to use these configuration parameters is:
 * use ```maxIterations``` only in test setups where one wants to check the immediate results
 * use ```maxDuration``` when the gc runs in a daily (weekly?) maintenance window, e.g. during the night and shall stop iterating when working hours resume.
 * use ```batchDelay``` when gc shall run during busy times or all the time, e.g. on 24/7 systems. A small delay should prevent the gc from taking over the write locks (on db/table/index), depending on database used.



> VersionGarbageCollector should be able to run incrementally
> -----------------------------------------------------------
>
>                 Key: OAK-4780
>                 URL: https://issues.apache.org/jira/browse/OAK-4780
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: core, documentmk
>            Reporter: Julian Reschke
>         Attachments: leafnodes.diff, leafnodes-v2.diff, leafnodes-v3.diff
>
>
> Right now, the documentmk's version garbage collection runs in several phases.
> It first collects the paths of candidate nodes, and only once this has been successfully finished, starts actually deleting nodes.
> This can be a problem when the regularly scheduled garbage collection is interrupted during the path collection phase, maybe due to other maintenance tasks. On the next run, the number of paths to be collected will be even bigger, thus making it even more likely to fail.
> We should think about a change in the logic that would allow the GC to run in chunks; maybe by partitioning the path space by top level directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)