You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2015/05/11 19:17:59 UTC

[jira] [Comment Edited] (OAK-2862) CompactionMap#compress() inefficient for large compaction maps

    [ https://issues.apache.org/jira/browse/OAK-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538151#comment-14538151 ] 

Michael Dürig edited comment on OAK-2862 at 5/11/15 5:17 PM:
-------------------------------------------------------------

Turns out that the main problem is [copying|https://github.com/apache/jackrabbit-oak/blob/017f0764fae6ece3e352dfb13c54a0e4e8f8b496/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/CompactionMap.java#L273] all uuids into a {{TreeMap}} on each compression cycle. As both, the current list of uuids and the recent maps are sorted already, a better approach would be to "merge them on the fly". I.e. iterate through both in parallel always taken the lesser element of the two. 

A quick test with 1M segment of 10 records each memory consumption went down to 20MB (from 103MB) and execution time went down to 21s (from 115s). 

With 1M segments of 40 records each memory consumption went down to 20MB (from 309MB) and execution time went down to 71s (from 19min, 45s). 



was (Author: mduerig):
Turns out that the main problem is [copying|https://github.com/apache/jackrabbit-oak/blob/017f0764fae6ece3e352dfb13c54a0e4e8f8b496/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/CompactionMap.java#L273] all uuids into a {{TreeMap}} on each compression cycle. As both, the current list of uuids and the recent maps are sorted already, a better approach would be to "merge them on the fly". I.e. iterate through both in parallel always taken the lesser element of the two. 

A quick test with 1M segment of 10 records each memory consumption went down to 20MB (from 103MB) and execution time went down to 21s (from 115s). 


> CompactionMap#compress() inefficient for large compaction maps
> --------------------------------------------------------------
>
>                 Key: OAK-2862
>                 URL: https://issues.apache.org/jira/browse/OAK-2862
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: segmentmk
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: compaction, gc
>             Fix For: 1.3.0
>
>
> I've seen {{CompactionMap#compress()}} take up most of the time spent in compaction. With 40M record ids in the compaction map compressing runs for hours. 
> I will back this with numbers as soon as I have a better grip on the issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)