You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2016/11/30 11:42:58 UTC

[jira] [Commented] (OAK-5192) Reduce Lucene related growth of repository size

    [ https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708339#comment-15708339 ] 

Michael Dürig commented on OAK-5192:
------------------------------------

See http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html for some interesting inside on optimising Lucene's merges.

Reading above article and having had a discussion with [~teofili] I conclude that we need a better merging strategy for the way we use Lucene in Oak. The default merge strategies have been optimised for the file system. Using them on an MVCC store like Oak results in way too much churn.

> Reduce Lucene related growth of repository size
> -----------------------------------------------
>
>                 Key: OAK-5192
>                 URL: https://issues.apache.org/jira/browse/OAK-5192
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, segment-tar
>            Reporter: Michael Dürig
>              Labels: perfomance
>         Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. While the size of the index itself is well inside reasonable bounds, the overall turnover of data being written and removed again can be as much as 99%. 
> In the case of the TarMK this negatively impacts overall system performance due to fast growing number of tar files / segments, bad locality of reference, cache misses/thrashing when looking up segments and vastly prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)