You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2014/02/01 05:24:13 UTC

[jira] [Commented] (LUCENE-5428) Make Faceting counting array overridable

    [ https://issues.apache.org/jira/browse/LUCENE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888449#comment-13888449 ] 

Shai Erera commented on LUCENE-5428:
------------------------------------

Mike and I discussed this in the past, but I cannot find the discussion now, perhaps it was on Chat. The idea was the same as your patch - add an abstraction layer to how you count facets (and BTW, not just for SortedSet, but for the Taxonomy path too), because e.g. I'm working with a team which seems to have the exact same problem like yours -- they have few million categories, yet sometimes they need to count only 1 (of very few), yet have to incur the cost of allocating the big FacetArrays.

The discussion happened in parallel to our attempts to abstract the taxonomy arrays API, on LUCENE-5316. We were forced to back off from that idea though, because faceted search insisted to slow down, to our disappointment.

For now, I advised the other team to write their own FacetsAggregator (Facets in the new API). I'm all for exploring a FacetsCounter API abstraction here, just noting that you have an option already, which is to implement your own Facets (yes, and maybe duplicate code...).

> Make Faceting counting array overridable
> ----------------------------------------
>
>                 Key: LUCENE-5428
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5428
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 4.6.1
>            Reporter: John Wang
>         Attachments: facetcounter.patch
>
>
> In SortedSetDocValuesFacetCounts, the count array is allocated as an int[] size of number of total values across all facets and that is allocated per query.
> In the case where number of values are large, large amount of garbage maybe created. Furthermore, the size of the array is dependent on the number of possible values, not number of number values needed for which facets fields are being accumulated for. E.g. if FacetSearchParam indicates counting only one 1 field with 2 values, we are still creating the array for all values across all fields.
> This patch makes the count array abstract to allow for
> 1) caching
> 2) hash counting - which can choose to count only of needed fields.
> This patch can be further enhanced to create FacetCouter per segment, per field by pass in the ordinal map.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org