You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2016/11/30 11:40:58 UTC
[jira] [Comment Edited] (OAK-5192) Reduce Lucene related growth of
repository size
[ https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708334#comment-15708334 ]
Michael Dürig edited comment on OAK-5192 at 11/30/16 11:40 AM:
---------------------------------------------------------------
The following plots show added bytes over time in content (upper plot) and added bytes over time in index (lower plot). Index is 3 order of magnitudes above regular content in terms of number of bytes added.
!added-bytes-zoom.png|width=500!
The pattern with the spike every 40s in the writes to the index is caused by Lucene's merging. Switching from {{SerialMergeScheduler}} to {{NoMergeScheduler}} flattens the curve out and also reduces the total amount of data written by factor 13.
{code}
--- oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/IndexWriterUtils.java (date 1480408502000)
+++ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/writer/IndexWriterUtils.java (revision )
@@ -61,7 +60,8 @@
Analyzer analyzer = new PerFieldAnalyzerWrapper(definitionAnalyzer, analyzers);
IndexWriterConfig config = new IndexWriterConfig(VERSION, analyzer);
if (remoteDir) {
- config.setMergeScheduler(new SerialMergeScheduler());
+ config.setMergeScheduler(NoMergeScheduler.INSTANCE);
+ config.setMergePolicy(NoMergePolicy.COMPOUND_FILES);
}
if (definition.getCodec() != null) {
config.setCodec(definition.getCodec());
{code}
was (Author: mduerig):
The following plots show added bytes over time in content (upper plot) and added bytes over time in index (lower plot). Index is 3 order of magnitudes above regular content in terms of number of bytes added.
!added-bytes-zoom.png|width=500!
The pattern with the spike every 40s in the writes to the index is caused by Lucene's merging. Switching from {{SerialMergeScheduler}} to {{NoMergeScheduler}} flattens the curve out and also reduces the total amount of data written by factor 13.
> Reduce Lucene related growth of repository size
> -----------------------------------------------
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene, segment-tar
> Reporter: Michael Dürig
> Labels: perfomance
> Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. While the size of the index itself is well inside reasonable bounds, the overall turnover of data being written and removed again can be as much as 99%.
> In the case of the TarMK this negatively impacts overall system performance due to fast growing number of tar files / segments, bad locality of reference, cache misses/thrashing when looking up segments and vastly prolonged garbage collection cycles.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)