You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Greg Miller (Jira)" <ji...@apache.org> on 2021/10/04 13:57:00 UTC

[jira] [Commented] (LUCENE-10080) Use a bit set to count long-tail of singleton FacetLabels?

    [ https://issues.apache.org/jira/browse/LUCENE-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423959#comment-17423959 ] 

Greg Miller commented on LUCENE-10080:
--------------------------------------

Sound good to me. Thanks [~mdmarshmallow]!

> Use a bit set to count long-tail of singleton FacetLabels?
> ----------------------------------------------------------
>
>                 Key: LUCENE-10080
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10080
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Priority: Major
>
> I was talking about this with [~rcmuir ] about LUCENE-9969, and he had a neat idea for more efficient facet counting.
> Today we accumulate counts directly in an HPPC native int/int map, or a non-sparse {{int[]}} (if enough hits match the query).
> But it is likely that many of these facet counts are singletons (occur only once in each query). To be more space efficient, we could wrap a bit set around the map or {{int[]}}.  The first time we see an ordinal, we set its bit.  The second and subsequent times, we increment the count as we do today.
> If we use a non-sparse bitset (e.g. {{FixedBitSet}}) that will add some non-sparse heap cost O(maxDoc) for each segment, but if there are enough ordinals to count, that can be a win over just the HPPC native int map for some cases?
> Maybe this could be an intermediate implementation, since we already cover the "very low hit count" (use HPPC int/int map) and "very high hit count" (using {{int[]}}) today?
> Also, this bit set would be able to quickly iterate over the sorted ordinals, which might be helpful if we move the three big {{int[]}} into numeric doc values?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org