You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2014/06/12 16:35:02 UTC

[jira] [Reopened] (OAK-1804) TarMK compaction

     [ https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting reopened OAK-1804:
--------------------------------


There's two more problems:

* On a really large repository with hundreds of millions of nodes, the uncompressed compaction map inside the Compactor class can become huge, up to a few gigabytes. It would be better if we could use the far more memory-efficient CompactionMap data structure instead, and perhaps further limit the number of entries we store in the map in the first place.
* The compaction checks in fastEquals() add up to some performance overhead since they get executed for all sorts of record comparisons, not just for nodes and blobs. It would be better to do the compaction checks only for those higher level comparisons.

I'll take a look at fixing the above issues.

> TarMK compaction
> ----------------
>
>                 Key: OAK-1804
>                 URL: https://issues.apache.org/jira/browse/OAK-1804
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segmentmk
>            Reporter: Jukka Zitting
>            Assignee: Alex Parvulescu
>              Labels: production, tools
>             Fix For: 1.0.1, 1.1
>
>         Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, compaction-map-as-bytebuffer.patch, compaction.patch, fast-equals.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would traverse and recreate (parts of) the content tree in order to optimize the storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)