You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Paul Elschot (JIRA)" <ji...@apache.org> on 2008/06/02 00:51:44 UTC

[jira] Commented: (LUCENE-1296) Allow use of compact DocIdSet in CachingWrapperFilter

    [ https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601504#action_12601504 ] 

Paul Elschot commented on LUCENE-1296:
--------------------------------------

I tried to come up with a sensible performance test to determine a good criterium to choose between OpenBitSet and SortedVIntList as the DocIdSet supporting data structure to be cached.
There is a criterium for this in the patch in docIdSetToCache() method of CachingWrapperFilter, but it's only based on byte size, and it favours SortedVIntList when it is defenitely more compact than OpenBitSet.

The current criterium is to use (cardinality (=nr bits set in OpenBitSet) < maxDocs/9) as a test to prefer SortedVIntList over OpenBitSet for caching. The constant 9 might be replaced by a configuration parameter to allow easy performance experiments there. It could be that a larger value than 9 is  turns out to be "optimal" in runtime.

In some cases OpenBitSet can be faster on skipTo(int docNum) than SortedVIntList, even when SortedVIntList is more compact. As Filters can be expected to use skipTo() heavily, this could be important for performance.

Even even though it might be possible to measure the skipTo() performance directly, the effect of the more compact cached data structure of SortedVIntList on garbage collection is (pretty close to) impossible to measure in a simple test case.

Eks Dev had some interesting results there in the very early stages of LUCENE-584 (September 2006), so I wonder whether these results could be confirmed somehow using the patch here and the current trunk.

Comments?




> Allow use of compact DocIdSet in CachingWrapperFilter
> -----------------------------------------------------
>
>                 Key: LUCENE-1296
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1296
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Paul Elschot
>            Assignee: Michael Busch
>            Priority: Minor
>         Attachments: cachedFilter20080529.patch
>
>
> Extends CachingWrapperFilter with a protected method to determine the DocIdSet to be cached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org