You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Feng Guo (Jira)" <ji...@apache.org> on 2021/11/15 19:52:00 UTC

[jira] [Comment Edited] (LUCENE-10233) Store docIds as bitset when leafCardinality = 1 to speed up addAll

    [ https://issues.apache.org/jira/browse/LUCENE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444070#comment-17444070 ] 

Feng Guo edited comment on LUCENE-10233 at 11/15/21, 7:51 PM:
--------------------------------------------------------------

[~jpountz]  Thanks! +1 to remove the OffsetBitSet class, but i find that there has not been a faster implementation for FixedBitSet.or(SparseFixedBitSet), do you mean we should implement it?

Another way to replace the OffsetBitSet class i can think of is to support a docBase in BitSetIterator, i implement this in the newest commit: https://github.com/apache/lucene/pull/438, i wonder if this approach makes sense to you.


was (Author: gf2121):
[~jpountz]  Thanks! +1 to remove the OffsetBitSet class, but i find that there has not been a faster implementation for FixedBitSet.or(SparseFixedBitSet), do you mean we should implement it?

Another way to replace the OffsetBitSet class i can think of is to support a docBase in BitSetIterator, i implement this in the newest commit: https://github.com/apache/lucene/pull/438, i want to know if this approach makes sense to you.

> Store docIds as bitset when leafCardinality = 1 to speed up addAll
> ------------------------------------------------------------------
>
>                 Key: LUCENE-10233
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10233
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Feng Guo
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In low cardinality points cases, id blocks will usually store doc ids that have the same point value, and {{intersect}} will get into {{addAll}} logic. If we store ids as bitset, and give the IntersectVisitor bulk visiting ability, we can speed up addAll because we can just execute the 'or' logic between the result and the block ids.
> Optimization will be triggered when the following conditions are met at the same time:
>  # leafCardinality = 1
>  # max(docId) - min(docId) <= 16 * pointCount (in order to avoid expanding too much storage)
>  # no duplicate doc id
> I mocked a field that has 10,000,000 docs per value and search it with a 1 term PointInSetQuery, the build scorer time decreased from 71ms to 8ms.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org