You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael Gibney (Jira)" <ji...@apache.org> on 2019/10/01 16:25:00 UTC
[jira] [Commented] (SOLR-13807) Caching for term facet counts

    [ https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942117#comment-16942117 ] 

Michael Gibney commented on SOLR-13807:
---------------------------------------

I initially proposed this idea, along with an implementation over {{SimpleFacets}}, as (very) tangentially related to [SOLR-8096|https://issues.apache.org/jira/browse/SOLR-8096?focusedCommentId=15960982#comment-15960982]. As a natural consequence of working to address some performance issues with full-domain SKG/relatedness (SOLR-13132), I updated the initial facet cache implementation to be compatible with JSON facets (while maintaining cross-compatibility with {{SimpleFacets}}).

[PR #751|https://github.com/apache/lucene-solr/pull/751] (associated with SOLR-13132) incorporates a facet cache that I believe realizes all of the potential mentioned in the proposal/description above, including being NRT-friendly/segment-aware ... with the exception of point 5 (the PR does not leverage the facet cache for distributed refinement; the facet cache itself was a prerequisite for the SKG/relatedness work, but distributed refinement would have definitely been out of scope).

In retrospect I would have preferred to submit a separate PR for only the facet cache; I did not go that route, but only because the facet cache implementation grew organically out of (and was prerequisite to) the work on SKG/relatedness. Would people be comfortable (at least initially) evaluating the facet cache implementation in the context of SOLR-13132? Whether or not I end up having to extract the facet cache work into a separate PR, I thought it would be worth opening this separate Jira issue for the facet cache, since its use could potentially be much more general (beyond SKG/relatedness). 

> Caching for term facet counts
> -----------------------------
>
>                 Key: SOLR-13807
>                 URL: https://issues.apache.org/jira/browse/SOLR-13807
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>    Affects Versions: master (9.0), 8.2
>            Reporter: Michael Gibney
>            Priority: Minor
>
> Solr does not have a facet count cache; so for _every_ request, term facets are recalculated for _every_ (facet) field, by iterating over _every_ field value for _every_ doc in the result domain, and incrementing the associated count.
> This redoes a lot of work, including all associated object allocation, GC, etc., and could benefit greatly from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet calculation, latency is proportional to the size of the result domain. Consequently, one common/clear manifestation of this issue is high latency for faceting over an unrestricted domain (e.g., {{*:*}}), as might be observed on a top-level landing page that exposes facets. This type of "static" case is often mitigated by external (to Solr) caching, either with a caching layer between Solr and a front-end application, or within a front-end application, or even with a caching layer between the end user and a front-end application.
> But in addition to the overhead of handling this caching elsewhere in the stack (or, for a new user, even being aware of this as a potential issue to mitigate), any external caching mitigation is really only appropriate for relatively static cases like the "landing page" example described above. A Solr-internal facet count cache (analogous to the {{filterCache}}) would provide the following additional benefits:
>  # ease of use/out-of-the-box configuration to address a common performance concern
>  # compact (specifically caching count arrays, without the extra baggage that accompanies a naive external caching approach)
>  # NRT-friendly (could be implemented to be segment-aware)
>  # modular, capable of reusing the same cached values in conjunction with variant requests over the same result domain (this would support common use cases like paging, but also potentially more interesting direct uses of facets). 
>  # could be used for distributed refinement (i.e., if facet counts over a given domain are cached, a refinement request could simply look up the ordinal value for each enumerated term and directly grab the count out of the count array that was cached during the first phase of facet calculation)
>  # composable (e.g., in aggregate functions that calculate values based on facet counts across different domains, like SKG/relatedness – see SOLR-13132)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org