You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2015/09/18 01:28:04 UTC

[jira] [Commented] (SOLR-6168) enhance collapse QParser so that "group head" documents can be selected by more complex sort options

    [ https://issues.apache.org/jira/browse/SOLR-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804701#comment-14804701 ] 

Hoss Man commented on SOLR-6168:
--------------------------------

bq. The intention of the FieldComparator.value API is that you would use it to look up a value after all collection is complete, e.g. to fill the fields to create the FieldDoc for this hit, and so the comparator would naturally not be changing it anymore.

ok, but that's not clear/enforced anywhere in the API itself -- if you are only suppose to fetch back all the slots after the search is complete, why support fetching the individual slots (and compareValues()) at all?

bq. But I don't really understand what SOLR-6168 is trying to do ... maybe there is another way to do it w/o abusing this API?

In a nutshell the CollapseQParser is another style of "group collapsing" where it picks a single best representative doc for each group on the fly as docs are collected -- currently it does this only based on a single field via DocValues, SOLR-6168 is about making this work with any arbitrary "Sort" object.  The code in the patch (in SOLR-6168) is analogous to the type of logic that FieldValueHitQueue does, but instead of tracking the "N best docs" (according to the sort) in a single priority queue (which re-orders on the fly), it tracks the *1* best doc for every "group" as it makes a single pass collection uses value() + compareValues to know when a "better" doc has been found for that group.

(The most trivial implementation would be to use a FieldValueHitQueue of size '1' for each group, but this approach unwinds the logic a track the values directly.  When i was first looking at this approach, i initially considered using a FieldComparator "slot" for each group, but the number of groups isn't neccessarly known in advance and there's no way to "grow" a FieldComparator)

It doesn't feel like an abuse of the FieldComparator API to do this ... every other Comparator works fine as is.  As i mentioned I could always put hacks in the caller code (ColapseQParser) to do instanceof/casting/deepCopy when the FieldComparator returns BytesRefs -- but fixing Term(Ord)ValComparator.value to return "safe" objects seemed saner.

---

Forgetting SOLR-6168, even if you asssume that all callers should know that  FieldComparator.value is only intended to be usable after the end of a search, the BytesRefs returned by Term(Ord)ValComparator.value don't really seem to be safe for a caller to hang on to (for later use searchAfter, etc...) in any situation where the FieldComparator itself might get re-used -- ie: re-using a FieldValueHitQueue (or Collector that uses FieldValueHitQueue) in a subsequent search.

what protects the user from those values being changed out from under them in that situation?


> enhance collapse QParser so that "group head" documents can be selected by more complex sort options
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6168
>                 URL: https://issues.apache.org/jira/browse/SOLR-6168
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.7.1, 4.8.1
>            Reporter: Umesh Prasad
>            Assignee: Joel Bernstein
>         Attachments: CollapsingQParserPlugin-6168.patch.1-1stcut, SOLR-6168-group-head-inconsistent-with-sort.patch, SOLR-6168.patch, SOLR-6168.patch
>
>
> The fundemental goal of this issue is add additional support to the CollapseQParser so that as an alternative to the existing min/max localparam options, more robust sort syntax can be used to sort on multiple criteria when selecting the "group head" documents used to represent each collapsed group.
> Since support for arbitrary, multi-clause, sorting is almost certainly going to require more RAM then the existing min/max functionaly, this new functionality should be in addition to the existing min/max localparam implementation, not a replacement of it.
> (NOTE: early comments made in this jira may be confusing in historical context due to the way this issue was originally filed as a bug report)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org