You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Christoph Kiehl (JIRA)" <ji...@apache.org> on 2007/06/19 18:14:26 UTC

[jira] Issue Comment Edited: (JCR-974) Manage Lucene FieldCaches per index segment

    [ https://issues.apache.org/jira/browse/JCR-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506232 ] 

Christoph Kiehl edited comment on JCR-974 at 6/19/07 9:13 AM:
--------------------------------------------------------------

This is a first patch which uses a FieldCache per index segment. To make this work we had to use our own implementation of FieldCache.StringIndex which does not keep an array of sort indexes for the document, but which keeps an array terms associated which each document. This of course uses more memory and there need to be some performance/scaling tests done.
We had to modify SearchIndex.CombinedIndexReader and CachingMultiReader to allow access to the underlying IndexReaders because those IndexReaders are used as cache keys in SharedFieldCache.
I'm not absolutely satisfied about this solution, because SharedFieldSortComparator has to know that there is a CombinedIndexReader and currently even assumes it.
Performance wise we achieved a speed up by factor 5-15 for queries sorting by some field in our current application. In our scenario we have got a lot of write operations and more than 1000000 nodes . For read-only repositories this patch slightly degrades performance by a factor of about 2.


 was:
This is a first patch which uses a FieldCache per index segment. To make this work we had to use our own implementation of FieldCache.StringIndex which does not keep an array of sort indexes for the document, but which keeps an array terms associated which each document. This of course uses more memory and there need to be some performance/scaling tests done.
We had to modify SearchIndex.CombinedIndexReader and CachingMultiReader to allow access to the underlying IndexReaders because those IndexReaders are used as cache keys in SharedFieldCache.
I'm not absolutely satisfied about this solution, because SharedFieldSortComparator has to know that there is a CombinedIndexReader and currently even assumes it.
Performance wise we achieved a speed up by factor 5-15 in our current application where we have got a lot of write operations and more than 1000000 nodes . For read-only repositories this patch slightly degrades performance by a factor of about 2.

> Manage Lucene FieldCaches per index segment
> -------------------------------------------
>
>                 Key: JCR-974
>                 URL: https://issues.apache.org/jira/browse/JCR-974
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>         Attachments: patch.txt
>
>
> Jackrabbit uses an IndexSearcher which searches on a single IndexReader which is most likely to be an instance of CachingMultiReader. On every search that does sorting or range queries a FieldCache is populated and associated with this instance of a CachingMultiReader. On successive queries which operate on this CachingMultiReader you will get a tremendous speedup for queries which can reuse  those associated FieldCache instances.
> The problem is that Jackrabbit creates a new CachingMultiReader _everytime_ one of the underlying indexes are modified. This means if you just change _one_ item in the repository you will need to rebuild all those FieldCaches because the existing FieldCaches are associated with the old instance of CachingMultiReader.
> This does not only lead to slow search response times for queries which contains range queries or are sorted by a field but also leads to massive memory consumption (depending on the size of your indexes) because there might be multiple instances of CachingMultiReaders in use if you have a scenario where a lot of queries and item modifications are executed concurrently.
> The goal is to keep those FieldCaches as long as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.