You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2013/08/09 20:30:48 UTC
[jira] [Commented] (LUCENE-5101) make it easier to plugin different
bitset implementations to CachingWrapperFilter
[ https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735113#comment-13735113 ]
Adrien Grand commented on LUCENE-5101:
--------------------------------------
I just updated http://people.apache.org/~jpountz/doc_id_sets.html based on the latest updated on WAH8DocIdSet and PFORDeltaDocIdSet and added the building time to the charts. PFORDeltaDocIdSet now better compresses doc id sets of medium load factors (~ 0.7) by switching to unary coding when it saves space compared to PFor. This way, it never grows much larger than a FixedBitSet. Similarly, WAH8DocIdSet now better compresses dense sets (LUCENE-5150).
I wish I had more time to work on the EliasFanoDocIdSet to add an index to it and see how it behaves. It has interesting characteristics!
Regarding this issue, I think we could update the patch to always use WAH8DocIdSet? I think the most interesting benchmark is advance(31) since it maps to a Scorer that performs leap frog between a query and a filter that both contain lots of documents (hence the query execution is slow) and WAH8DocIdSet is the one which has the best worst case here?
> make it easier to plugin different bitset implementations to CachingWrapperFilter
> ---------------------------------------------------------------------------------
>
> Key: LUCENE-5101
> URL: https://issues.apache.org/jira/browse/LUCENE-5101
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Attachments: DocIdSetBenchmark.java, LUCENE-5101.patch
>
>
> Currently this is possible, but its not so friendly:
> {code}
> protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) throws IOException {
> if (docIdSet == null) {
> // this is better than returning null, as the nonnull result can be cached
> return EMPTY_DOCIDSET;
> } else if (docIdSet.isCacheable()) {
> return docIdSet;
> } else {
> final DocIdSetIterator it = docIdSet.iterator();
> // null is allowed to be returned by iterator(),
> // in this case we wrap with the sentinel set,
> // which is cacheable.
> if (it == null) {
> return EMPTY_DOCIDSET;
> } else {
> /* INTERESTING PART */
> final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
> bits.or(it);
> return bits;
> /* END INTERESTING PART */
> }
> }
> }
> {code}
> Is there any value to having all this other logic in the protected API? It seems like something thats not useful for a subclass... Maybe this stuff can become final, and "INTERESTING PART" calls a simpler method, something like:
> {code}
> protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) {
> final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
> bits.or(iterator);
> return bits;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org