You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Michael Gibney (Jira)" <ji...@apache.org> on 2022/01/05 15:42:00 UTC

[jira] [Assigned] (SOLR-15836) Address counterintuitive behavior of JSON "terms" subfacet refinement

     [ https://issues.apache.org/jira/browse/SOLR-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Gibney reassigned SOLR-15836:
-------------------------------------

    Assignee: Michael Gibney

> Address counterintuitive behavior of JSON "terms" subfacet refinement
> ---------------------------------------------------------------------
>
>                 Key: SOLR-15836
>                 URL: https://issues.apache.org/jira/browse/SOLR-15836
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: main (9.0), 8.11
>            Reporter: Michael Gibney
>            Assignee: Michael Gibney
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In distributed faceting, uneven distribution of terms across different shards can artificially include or exclude terms (this discussion will focus on JSON Facet "terms" faceting).
> This is inevitable, and can be mitigated via {{overrequest}} and {{overrefine}} parameters -- respectively casting a "wider net" for "phase#1" (determining the set of "terms of interest") and "phase#2" (cross-checking "terms of interest" against terms that did not initially report them).
> It is possible to devise artificial situations that push the limit of what {{overrefine}} is capable of mitigating, resulting in counterintuitive behavior. But despite such edge cases, in general it is relatively straightforward to reason about how the {{simple}} JSON Facet refinement method works for "flat" (i.e., non-hierarchical) terms facets.
> This issue discusses some ways in which subfacets (hierarchical or nested facets) can more readily behave counterintuitively in practical usage, and possible ways to address/mitigate such behavior.
> ---------------------
> AFAICT, the {{simple}} (default, currently the only) refinement method has two defining requirements:
> # there is at most _one_ refinement request issued to each shard, and
> # any buckets returned are guaranteed to have accurate counts (or perhaps more generally, stats?) reflecting contributions from all shards. (this makes [no guarantees|https://issues.apache.org/jira/browse/SOLR-11159?focusedCommentId=16103386#comment-16103386] about buckets _not_ returned that would in principle be eligible to be returned).
>  
> The simplest counterintuitive case is when refinement of higher-level facets uncovers more subfacets on shards that have no opportunity to influence results/refinement of the child facet. I'm pretty sure it's this situation that's described in [this comment|https://github.com/apache/solr/blob/0287458f836e3b7ea4b2401538b29f3d2e9b6cf4/solr/core/src/test/org/apache/solr/search/facet/TestJsonFacetRefinement.java#L992-L994] (by [~hossman]?):
> {code:java}
>     //   - or at the very least, if the purpose of "_l" is to give other buckets a chance to "bubble up"
>     //     in phase#2, then shouldn't a "_l" refinement requests still include the buckets choosen in
>     //     phase#1, and request that the shard fill them in in addition to returning its own top buckets?
> {code}
> The proposal in the above linked comment would work iff the "own top buckets" returned in phase#2 did not introduce any new/unseen values (and note, the only case in which returning "own top buckets" would be significant _would_ be the case in which it would introduce new/unseen values). If new values _were_ returned in phase#2, the only way to ensure that requirement2 is respected would be to violate requirement1 (i.e., by issuing _another_ refinement request to determine whether any other shards have anything to contribute to the previously unseen value).
> This counterintuitive behavior can't exactly be called a "bug", because IIUC the intuitive behavior is fundamentally incompatible with the current default/only {{simple}} refinement method.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org