You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2015/10/01 10:49:26 UTC

[jira] [Commented] (OAK-2392) [DocumentMK] Garbage Collect older revisions of binary properties in main document

    [ https://issues.apache.org/jira/browse/OAK-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939522#comment-14939522 ] 

Chetan Mehrotra commented on OAK-2392:
--------------------------------------

One option to implement this would be via split support
# Currently while modifying a document we check for it being a split candidate. That logic only checks for Document size. In addition it can check if it has binary property and if yes then are there older version of binary properties present. If yes then consider it as a split candidate
# Now in SplitOperations make it more aggressive for binary properties - Keep only the latest revision in main document and move the older ones in split documents. 

And then Revision GC logic for collecting older split document would be able to reclaim them. 

[~mreutegg] [~catholicon] Thoughts?

> [DocumentMK] Garbage Collect older revisions of binary properties in main document
> ----------------------------------------------------------------------------------
>
>                 Key: OAK-2392
>                 URL: https://issues.apache.org/jira/browse/OAK-2392
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.3.8
>
>
> Current GC logic for DocumentMK only collects certain types of garbage (see OAK-1981) and currently only split documents are removed. While complete full blow gc would take time and yet not fully implemented we should handle those documents which have binary properties and those properties get updated few times (but not very frequently).
> For e.g. performing a reindex for Lucene index would lead to removal of index files nodes and again creation of nodes with same name. In such a case the older revision of binary property would remain in main document and would not be eligible for gc as per current impl.
> As a fix the GC logic should look for document which might have binaries and then remove the older revisions of binary properties. Currently we do scan all such documents for Blob GC.
> So this can be done either as part of Revision GC or Blob GC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)