You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alex Parvulescu (JIRA)" <ji...@apache.org> on 2014/11/03 11:32:34 UTC
[jira] [Commented] (OAK-2140) Segment Compactor will not compact binaries > 16k

    [ https://issues.apache.org/jira/browse/OAK-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194416#comment-14194416 ] 

Alex Parvulescu commented on OAK-2140:
--------------------------------------

I believe the outcome of this is quite hard to assess, so I'm proposing exposing this via a flag that controls if a #clone call should by a shallow clone or a deep clone. this is only called via the compaction bits anyway, and it will allow the user to choose between the 2 options, but we'll probably have to document the impact of each choice:
 - shallow clone: a more reduced repository growth during compaction itself, but a less effective cleanup after
    ex: using a 1G repo: first compaction, repo size is at 1.1G, second 1.2G, third 1.3G, forth back to 1Gb.
the example is highly subjective, and cannot be used as a reference for bigger repositories, and because of the interweaved references between segments, the size growth has been known to be a lot bigger for larger repos, and the stabilization to the original size can occur after a larger number of runs.

 - deep clone: the repo size effectively doubles during compaction, but goes back to the original size after cleanup
    ex: 1G initial repo, during the first compaction repo size goes up to 2G, after the cleanup is done, back to 1G, the process repeats itself for the following compaction runs.


> Segment Compactor will not compact binaries > 16k
> -------------------------------------------------
>
>                 Key: OAK-2140
>                 URL: https://issues.apache.org/jira/browse/OAK-2140
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: core, segmentmk
>            Reporter: Alex Parvulescu
>            Assignee: Alex Parvulescu
>             Fix For: 1.1.3
>
>         Attachments: OAK-2140.patch
>
>
> The compaction bit rely on the SegmentBlob#clone method in the case a binary is being processed but it looks like the #clone contract is not fully enforced for streams that are qualified as 'long values' (>16k if I read the code correctly). 
> What happens is the stream is initially persisted as chunks in a ListRecord. When compaction calls #clone it will get back the original list of record ids, which will get referenced from the compacted node state [0], making compaction on large binaries ineffective as the bulk segments will never move from the original location where they were created, unless the reference node gets deleted.
> I think the original design was setup to prevent large binaries from being copied over but looking at the size problem we have now it might be a good time to reconsider this approach.
> [0] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentBlob.java#L75



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)