You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Alexander L (Jira)" <ji...@apache.org> on 2021/06/01 01:56:00 UTC

[jira] [Commented] (LUCENE-9950) Support both single- and multi-value string fields in facet counting (non-taxonomy based approaches)

    [ https://issues.apache.org/jira/browse/LUCENE-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354735#comment-17354735 ] 

Alexander L commented on LUCENE-9950:
-------------------------------------

Thank you for adding the new facet implementation, [~gsmiller]!

??It seems like the only advantage it might offer over a taxonomy-based approach is not requiring the side-car index??

A couple of SSDVFF advantages we found is the ability to perform fast index merge operation, since it is a regular index and does not require [global ordinals translation logic|https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/TaxonomyMergeUtils.java] (regular index merge with HardlinkCopyDirectoryWrapper takes 3 minutes in our tests, while main+taxonomy pairs merge is about 85 minutes for ~ 200Gb index size). Also, SSDVFF indexing performance is better and unlike the Taxonomy approach, scales with added threads. These advantages tipped the scales in favor of SSDVFF in our case, although Taxonomy provides a bit better query performance and allows hierarchical faceting.

?? there may still be some use-cases for "packing" multiple "dimensions" into one field??

I wonder what use cases do you have in mind for that, or maybe you have some performance comparison with SortedSetDocValuesFacetField implementation available? I remember reading somewhere that facet dimensions stored in a single field can provide better performance (e.g due to CPU reference locality), but not sure how big the difference can be.

> Support both single- and multi-value string fields in facet counting (non-taxonomy based approaches)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9950
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9950
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: main (9.0)
>            Reporter: Greg Miller
>            Priority: Minor
>             Fix For: main (9.0), 8.9
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> Users wanting to facet count string-based fields using a non-taxonomy-based approach can use {{SortedSetDocValueFacetCounts}}, which accumulates facet counts based on a {{SortedSetDocValues}} field. This requires the stored doc values to be multi-valued (i.e., {{SORTED_SET}}), and doesn't work on single-valued fields (i.e., SORTED). In contrast, if a user wants to facet count on a stored numeric field, they can use {{LongValueFacetCounts}}, which supports both single- and multi-valued fields (and in LUCENE-9948, we now auto-detect instead of asking the user to specify).
> Let's update {{SortedSetDocValueFacetCounts}} to also support, and automatically detect single- and multi-value fields. Note that this is a spin-off issue from LUCENE-9946, where [~rcmuir] points out that this can essentially be a one-line change, but we may want to do some class renaming at the same time. Also note that we should do this in {{ConcurrentSortedSetDocValuesFacetCounts}} while we're at it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org