You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tony Paloma <to...@valvesoftware.com> on 2013/08/03 00:00:18 UTC

Unexpected behavior when sorting groups

I'm using field collapsing to group documents by a single field and have run into something unexpected with how sorting of the groups works. Right now I have each group return one document. The documents within each group are sorted by a field ("price") in ascending order using group.sort so that the document returned for each group in the search results is the cheapest document of the group. If I sort the groups amongst themselves using sort=price asc, I get what I expect with groups having documents whose lowest price value is low show first and groups having documents whose lowest price value is high show last.

If I change this to sort on price desc, what happens is not what I would expect. I would like the groups to be returned in reverse order from what happened when sorting by price asc. Instead, what happens is the groups are sorted in descending order according to the highest priced document in each group. I want groups to be sorted in descending order according to the lowest priced document in each group, but it appears this is not possible. In other words, it appears sorting when groups are involved is limited to either MAX([field]) DESC or MIN([field]) ASC. The other two combinations are not possible. Does anyone know whether or not this is in fact impossible, and if not, how I might put in a feature request?

Re: Unexpected behavior when sorting groups

Posted by Paul Masurel <pa...@gmail.com>.
On Mon, Aug 5, 2013 at 2:42 AM, Tony Paloma <to...@valvesoftware.com> wrote:

> Thanks Paul. That's helpful. I'm not familiar with the concept of custom
> caches. Would this be custom Java code or something defined in the
> config/schema? Can you point me to some documentation?
>
>
My solution requires both writing custom java code and define stuff in your
solr.config.
I'm waiting for approval to release my plugin, but I'm afraid I don't have
any
visibility on the length of the process.

There is only the bare minimum in the documentation.
http://wiki.apache.org/solr/SolrCaching

Write a class extending

*public class YourCache extends SolrCacheBase implements
SolrCache<BytesRef,Double>*

You just add some XML in your solr config to instantiate your custom cache.
At each commit, Solr will call warm... You can inline the code to recompute
all your min price here or delegate it to a CacheRegenerator.

You then need to declare ValueSource hitting on this cache.
You can access your cache in its parse function via the functionqparser :*


        SolrIndexSearcher searcher = fp.getReq().getSearcher();
        YourCache cache = (YourCache)searcher.getCache(cacheName);*





Another workaround I was thinking of using was making two Solr queries when
> wanting to sort groups by price desc. One to get the number of total groups
> and then another that gets groups sorted by price asc starting from ngroups
> - (start+rows) and then just flip the ordering to fake sorting by
> min(price) desc, but I was worried about the performance implications of
> that.
>

That should work indeed... But keep in mind it will be extremely expensive
if you start distributing your queries :
if you want to get hits from 100 to 110, shards will be asked to send hits
from 0 to 110.



> SOLR-2072 has a similar request.
> https://issues.apache.org/jira/browse/SOLR-2072
>
> Bryan's comment is exactly what I'm looking for:
> > I would like to able to use sort and group.sort together such that the
> group.sort is applied with in the group first and the first document of
> each group is then used as the representative document to perform the
> overall sorting of groups.
>
> The latest comment there suggests that it's a bug in distributed mode, but
> I don't think that's the case since I'm only using one instance of Solr
> with no sharding or anything.
>

This is not a bug. If I get some time, I'll try to write a post about how
collapsing is working in Solr.
Even though it is counterintuitive, what you are asking for is actually a
difficult problem.

Regards,

Paul



> -----Original Message-----
> From: Paul Masurel [mailto:paul.masurel@gmail.com]
> Sent: Sunday, August 04, 2013 2:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Unexpected behavior when sorting groups
>
> Dear Tony,
>
> The behavior you described is correct, and what you are requiring is
> impossible with Solr as it is.
>
> I wouldn't however say it is a limitation of Solr : your problem is
> actually difficult and require some preprocessing.
>
> One solution if it is feasible for you is to precompute the lowest price
> of your groups beforehands and add a field to all of the document of your
> group.
>
> The other way to address your problem is to do that within Solr.
> This can be done by adding a custom cache holding these values.
> You can implement the computation of the min price in the warm method.
>
> You can then add a custom function to return the result stored in this
> cache. Function values can be used for sorting.
>
> If if does not exist yet, you may open a ticket. I will try and get
> authorization to opensource a solution for this.
>
> Regards,
>
> Paul
>
>
>
>
> On Sat, Aug 3, 2013 at 12:00 AM, Tony Paloma <tonyp@valvesoftware.com
> >wrote:
>
> > I'm using field collapsing to group documents by a single field and
> > have run into something unexpected with how sorting of the groups
> > works. Right now I have each group return one document. The documents
> > within each group are sorted by a field ("price") in ascending order
> > using group.sort so that the document returned for each group in the
> > search results is the cheapest document of the group. If I sort the
> > groups amongst themselves using sort=price asc, I get what I expect
> > with groups having documents whose lowest price value is low show
> > first and groups having documents whose lowest price value is high show
> last.
> >
> > If I change this to sort on price desc, what happens is not what I
> > would expect. I would like the groups to be returned in reverse order
> > from what happened when sorting by price asc. Instead, what happens is
> > the groups are sorted in descending order according to the highest
> > priced document in each group. I want groups to be sorted in
> > descending order according to the lowest priced document in each group,
> but it appears this is not possible.
> > In other words, it appears sorting when groups are involved is limited
> > to either MAX([field]) DESC or MIN([field]) ASC. The other two
> > combinations are not possible. Does anyone know whether or not this is
> > in fact impossible, and if not, how I might put in a feature request?
> >
>
>
>
> --
> ______________________________________________
>
>  Masurel Paul
>  e-mail: paul.masurel@gmail.com
>



-- 
______________________________________________

 Masurel Paul
 e-mail: paul.masurel@gmail.com

Re: Unexpected behavior when sorting groups

Posted by Paul Masurel <pa...@gmail.com>.
Here is some detail about how grouping is implemented in Solr.
http://fulmicoton.com/posts/grouping-in-solr/



On Mon, Aug 5, 2013 at 2:42 AM, Tony Paloma <to...@valvesoftware.com> wrote:

> Thanks Paul. That's helpful. I'm not familiar with the concept of custom
> caches. Would this be custom Java code or something defined in the
> config/schema? Can you point me to some documentation?
>
> Another workaround I was thinking of using was making two Solr queries
> when wanting to sort groups by price desc. One to get the number of total
> groups and then another that gets groups sorted by price asc starting from
> ngroups - (start+rows) and then just flip the ordering to fake sorting by
> min(price) desc, but I was worried about the performance implications of
> that.
>
> SOLR-2072 has a similar request.
> https://issues.apache.org/jira/browse/SOLR-2072
>
> Bryan's comment is exactly what I'm looking for:
> > I would like to able to use sort and group.sort together such that the
> group.sort is applied with in the group first and the first document of
> each group is then used as the representative document to perform the
> overall sorting of groups.
>
> The latest comment there suggests that it's a bug in distributed mode, but
> I don't think that's the case since I'm only using one instance of Solr
> with no sharding or anything.
>
> -----Original Message-----
> From: Paul Masurel [mailto:paul.masurel@gmail.com]
> Sent: Sunday, August 04, 2013 2:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Unexpected behavior when sorting groups
>
> Dear Tony,
>
> The behavior you described is correct, and what you are requiring is
> impossible with Solr as it is.
>
> I wouldn't however say it is a limitation of Solr : your problem is
> actually difficult and require some preprocessing.
>
> One solution if it is feasible for you is to precompute the lowest price
> of your groups beforehands and add a field to all of the document of your
> group.
>
> The other way to address your problem is to do that within Solr.
> This can be done by adding a custom cache holding these values.
> You can implement the computation of the min price in the warm method.
>
> You can then add a custom function to return the result stored in this
> cache. Function values can be used for sorting.
>
> If if does not exist yet, you may open a ticket. I will try and get
> authorization to opensource a solution for this.
>
> Regards,
>
> Paul
>
>
>
>
> On Sat, Aug 3, 2013 at 12:00 AM, Tony Paloma <tonyp@valvesoftware.com
> >wrote:
>
> > I'm using field collapsing to group documents by a single field and
> > have run into something unexpected with how sorting of the groups
> > works. Right now I have each group return one document. The documents
> > within each group are sorted by a field ("price") in ascending order
> > using group.sort so that the document returned for each group in the
> > search results is the cheapest document of the group. If I sort the
> > groups amongst themselves using sort=price asc, I get what I expect
> > with groups having documents whose lowest price value is low show
> > first and groups having documents whose lowest price value is high show
> last.
> >
> > If I change this to sort on price desc, what happens is not what I
> > would expect. I would like the groups to be returned in reverse order
> > from what happened when sorting by price asc. Instead, what happens is
> > the groups are sorted in descending order according to the highest
> > priced document in each group. I want groups to be sorted in
> > descending order according to the lowest priced document in each group,
> but it appears this is not possible.
> > In other words, it appears sorting when groups are involved is limited
> > to either MAX([field]) DESC or MIN([field]) ASC. The other two
> > combinations are not possible. Does anyone know whether or not this is
> > in fact impossible, and if not, how I might put in a feature request?
> >
>
>
>
> --
> ______________________________________________
>
>  Masurel Paul
>  e-mail: paul.masurel@gmail.com
>



-- 
______________________________________________

 Masurel Paul
 e-mail: paul.masurel@gmail.com

RE: Unexpected behavior when sorting groups

Posted by Tony Paloma <to...@valvesoftware.com>.
Thanks Paul. That's helpful. I'm not familiar with the concept of custom caches. Would this be custom Java code or something defined in the config/schema? Can you point me to some documentation?

Another workaround I was thinking of using was making two Solr queries when wanting to sort groups by price desc. One to get the number of total groups and then another that gets groups sorted by price asc starting from ngroups - (start+rows) and then just flip the ordering to fake sorting by min(price) desc, but I was worried about the performance implications of that.

SOLR-2072 has a similar request.
https://issues.apache.org/jira/browse/SOLR-2072

Bryan's comment is exactly what I'm looking for:
> I would like to able to use sort and group.sort together such that the group.sort is applied with in the group first and the first document of each group is then used as the representative document to perform the overall sorting of groups.

The latest comment there suggests that it's a bug in distributed mode, but I don't think that's the case since I'm only using one instance of Solr with no sharding or anything. 

-----Original Message-----
From: Paul Masurel [mailto:paul.masurel@gmail.com] 
Sent: Sunday, August 04, 2013 2:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexpected behavior when sorting groups

Dear Tony,

The behavior you described is correct, and what you are requiring is impossible with Solr as it is.

I wouldn't however say it is a limitation of Solr : your problem is actually difficult and require some preprocessing.

One solution if it is feasible for you is to precompute the lowest price of your groups beforehands and add a field to all of the document of your group.

The other way to address your problem is to do that within Solr.
This can be done by adding a custom cache holding these values.
You can implement the computation of the min price in the warm method.

You can then add a custom function to return the result stored in this cache. Function values can be used for sorting.

If if does not exist yet, you may open a ticket. I will try and get authorization to opensource a solution for this.

Regards,

Paul




On Sat, Aug 3, 2013 at 12:00 AM, Tony Paloma <to...@valvesoftware.com>wrote:

> I'm using field collapsing to group documents by a single field and 
> have run into something unexpected with how sorting of the groups 
> works. Right now I have each group return one document. The documents 
> within each group are sorted by a field ("price") in ascending order 
> using group.sort so that the document returned for each group in the 
> search results is the cheapest document of the group. If I sort the 
> groups amongst themselves using sort=price asc, I get what I expect 
> with groups having documents whose lowest price value is low show 
> first and groups having documents whose lowest price value is high show last.
>
> If I change this to sort on price desc, what happens is not what I 
> would expect. I would like the groups to be returned in reverse order 
> from what happened when sorting by price asc. Instead, what happens is 
> the groups are sorted in descending order according to the highest 
> priced document in each group. I want groups to be sorted in 
> descending order according to the lowest priced document in each group, but it appears this is not possible.
> In other words, it appears sorting when groups are involved is limited 
> to either MAX([field]) DESC or MIN([field]) ASC. The other two 
> combinations are not possible. Does anyone know whether or not this is 
> in fact impossible, and if not, how I might put in a feature request?
>



--
______________________________________________

 Masurel Paul
 e-mail: paul.masurel@gmail.com

Re: Unexpected behavior when sorting groups

Posted by Paul Masurel <pa...@gmail.com>.
Dear Tony,

The behavior you described is correct, and what you are requiring
is impossible with Solr as it is.

I wouldn't however say it is a limitation of Solr : your problem is actually
difficult and require some preprocessing.

One solution if it is feasible for you is to precompute the lowest price
of your groups beforehands and add a field to all of the document of your
group.

The other way to address your problem is to do that within Solr.
This can be done by adding a custom cache holding these values.
You can implement the computation of the min price in the warm method.

You can then add a custom function to return the result stored in this
cache. Function values can be used for sorting.

If if does not exist yet, you may open a ticket. I will try and get
authorization
to opensource a solution for this.

Regards,

Paul




On Sat, Aug 3, 2013 at 12:00 AM, Tony Paloma <to...@valvesoftware.com>wrote:

> I'm using field collapsing to group documents by a single field and have
> run into something unexpected with how sorting of the groups works. Right
> now I have each group return one document. The documents within each group
> are sorted by a field ("price") in ascending order using group.sort so that
> the document returned for each group in the search results is the cheapest
> document of the group. If I sort the groups amongst themselves using
> sort=price asc, I get what I expect with groups having documents whose
> lowest price value is low show first and groups having documents whose
> lowest price value is high show last.
>
> If I change this to sort on price desc, what happens is not what I would
> expect. I would like the groups to be returned in reverse order from what
> happened when sorting by price asc. Instead, what happens is the groups are
> sorted in descending order according to the highest priced document in each
> group. I want groups to be sorted in descending order according to the
> lowest priced document in each group, but it appears this is not possible.
> In other words, it appears sorting when groups are involved is limited to
> either MAX([field]) DESC or MIN([field]) ASC. The other two combinations
> are not possible. Does anyone know whether or not this is in fact
> impossible, and if not, how I might put in a feature request?
>



-- 
______________________________________________

 Masurel Paul
 e-mail: paul.masurel@gmail.com