You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2015/09/05 01:14:47 UTC

[jira] [Updated] (SOLR-6168) ehance collapse QParser so that "group head" documents can be selected by more complex sort options

     [ https://issues.apache.org/jira/browse/SOLR-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-6168:
---------------------------
    Description: 




The fundemental goal of this issue is add additional support to the CollapseQParser so that as an alternative to the existing min/max localparam options, more robust sort syntax can be used to sort on multiple criteria when selecting the "group head" documents used to represent each collapsed group.

Since support for arbitrary, multi-clause, sorting is almost certainly going to require more RAM then the existing min/max functionaly, this new functionality should be in addition to the existing min/max localparam implementation, not a replacement of it.

(NOTE: early comments made in this jira may be confusing in historical context due to the way this issue was originally filed as a bug report)

  was:
CollapsingQParser Plugin ranks documents incorrectly when more than 2 sort fields are used.
   I have attached a test case, which demonstrates the broken behavior when 3 sort fields are used.

The failing test case patch is against Lucene/Solr 4.8.1 revision  number 1603061

PS :     SOLR-5408 fixed the issue with sorting only for two sort fields, by allowing one to specify max/min=<field-name>. However that requires 2nd sort field to be a numeric field. It will not work with string field or function query sort.




     Issue Type: Improvement  (was: Bug)
        Summary: ehance collapse QParser so that "group head" documents can be selected by more complex sort options  (was: CollapsingQParserPlugin ranks incorrectly when 3 or more sort params are used)


I was recently asked about this issue, and when i initially started digging into it got more and more confused.

It seems that fundementally, what happened here is that Umesh initially filled a _bug_ regarding the way the collapse QParser selects the "group head" -- but this bug report was based on a missunderstanding about what default behavior of CollapseQParser is when dealing with a sort param (as compared to the older GroupingCOmponent).

There was some key discussiong about this issue on the solr-user mailing list, which did *not* result in updating the summary/description of this issue, followed by Umesh attaching a patch ettempting to implement some changes in behavior.

I have some thoughts on Umesh's approach, and my own suggestions, but before I get into that i want to make sure the situation is accurately represented in this Jira

----

First off, some key discussion from the solr-user mailing list circa June 2014 that should really be captured directly in this issue.

* http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3CCAJc64EXgnPn-RiqgUYn=S_Wn5wPZsvtirEHP_nctZ-AFa=AxEw@mail.gmail.com%3E
* http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3CCAE4tqLP-jqBjrWB0Yr2vNs8J15qW8BwVK61hZOG=__EjFpJJgQ@mail.gmail.com%3E
* http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3CCAJc64EVQP=aSa6OfDSvPUdcOEA+-mO1USmLNfAFJgP4OeVbdSQ@mail.gmail.com%3E

In particular these comments from Joel...

{quote}
So, the question is what is the cost (performance and memory) of having the
CollapsingQParserPlugin choose the group head by using the Solr sort
criteria?

Keep in mind that the CollapsingQParserPlugin's main design goal is to
provide fast performance when collapsing on a high cardinality field. How
you choose the group head can have a big impact here, both on memory
consumption performance.

The function query collapse criteria was added to allow you to come up with
custom formulas for selecting the group head, with little or no impact on
performance and memory. Using Solr's recip() function query it seems like
you could come up with some nice scenarios where two variables could be
used to select the group head. For example:

...
{quote}

And this respons from Umesh...

{quote}
...

I agree 200 MB per request just for collapsing the search results is huge
but at least it increases linearly with number of sort fields.. For my use
case, I am willing to pay the linear cost specially when I can't combine
the sort fields intelligently into a sort function. Plus it allows me to
sort by String/Text fields also which is a big win.

...
{quote}

----

Based on the total comments regarding this issue, including the email discussion, i've revised the summary & description to make it clear:

* this is a feature request
* that the goal is to expand the options available to users of the collapse QParser by allowing "group head" documents to be selected by more complex sort options


> ehance collapse QParser so that "group head" documents can be selected by more complex sort options
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6168
>                 URL: https://issues.apache.org/jira/browse/SOLR-6168
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.7.1, 4.8.1
>            Reporter: Umesh Prasad
>            Assignee: Joel Bernstein
>         Attachments: CollapsingQParserPlugin-6168.patch.1-1stcut, SOLR-6168-group-head-inconsistent-with-sort.patch
>
>
> The fundemental goal of this issue is add additional support to the CollapseQParser so that as an alternative to the existing min/max localparam options, more robust sort syntax can be used to sort on multiple criteria when selecting the "group head" documents used to represent each collapsed group.
> Since support for arbitrary, multi-clause, sorting is almost certainly going to require more RAM then the existing min/max functionaly, this new functionality should be in addition to the existing min/max localparam implementation, not a replacement of it.
> (NOTE: early comments made in this jira may be confusing in historical context due to the way this issue was originally filed as a bug report)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org