You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2015/05/11 18:43:59 UTC

[jira] [Commented] (OAK-2862) CompactionMap#compress() inefficient for large compaction maps

    [ https://issues.apache.org/jira/browse/OAK-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538151#comment-14538151 ] 

Michael Dürig commented on OAK-2862:
------------------------------------

Turns out that the main problem is [copying|https://github.com/apache/jackrabbit-oak/blob/017f0764fae6ece3e352dfb13c54a0e4e8f8b496/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/CompactionMap.java#L273] all uuids into a {{TreeMap}} on each compression cycle. As both, the current list of uuids and the recent maps are sorted already, a better approach would be to "merge them on the fly". I.e. iterate through both in parallel always taken the lesser element of the two. 

A quick test with 1M segment of 10 records each memory consumption went down to 20MB (from 103MB) and execution time went down to 21s (from 115s). 


> CompactionMap#compress() inefficient for large compaction maps
> --------------------------------------------------------------
>
>                 Key: OAK-2862
>                 URL: https://issues.apache.org/jira/browse/OAK-2862
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: segmentmk
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: compaction, gc
>             Fix For: 1.3.0
>
>
> I've seen {{CompactionMap#compress()}} take up most of the time spent in compaction. With 40M record ids in the compaction map compressing runs for hours. 
> I will back this with numbers as soon as I have a better grip on the issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)