You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alex Parvulescu (JIRA)" <ji...@apache.org> on 2016/02/03 16:55:40 UTC
[jira] [Commented] (OAK-3362) Estimate compaction based on diff to
previous compacted head state
[ https://issues.apache.org/jira/browse/OAK-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130592#comment-15130592 ]
Alex Parvulescu commented on OAK-3362:
--------------------------------------
started a prototype here [0], estimation seems to be more robust using the previous state as a baseline, the interesting part begins when we use this reference for the compactor as well, to reduce the scope of compaction to only focus on the content delta. fyi [~mduerig]
[0] https://github.com/apache/jackrabbit-oak/compare/trunk...stillalex:partial-compaction
> Estimate compaction based on diff to previous compacted head state
> ------------------------------------------------------------------
>
> Key: OAK-3362
> URL: https://issues.apache.org/jira/browse/OAK-3362
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: segmentmk
> Reporter: Alex Parvulescu
> Assignee: Alex Parvulescu
> Priority: Minor
> Labels: compaction, gc
> Fix For: 1.6
>
>
> Food for thought: try to base the compaction estimation on a diff between the latest compacted state and the current state.
> Pros
> * estimation duration would be proportional to number of changes on the current head state
> * using the size on disk as a reference, we could actually stop the estimation early when we go over the gc threshold.
> * data collected during this diff could in theory be passed as input to the compactor so it could focus on compacting a specific subtree
> Cons
> * need to keep a reference to a previous compacted state. post-startup and pre-compaction this might prove difficult (except maybe if we only persist the revision similar to what the async indexer is doing currently)
> * coming up with a threshold for running compaction might prove difficult
> * diff might be costly, but still cheaper than the current full diff
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)