You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Owen <mi...@hotmail.com> on 2011/04/19 19:02:02 UTC

Custom Sorting

Hi,
I want to able to have a custom sorting algorithm such that for each comparison of document results (A v B) I can rank them. i.e. writing a comparator like I would normally do in Java (Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second).
In the comparator I want to be able to take into account the score of the results, as well as other fields in the documents.
I've looked at using things such as the score/boost/bf parameters etc, however, want the flexibility of being able to code the comparator, so I can do if conditions and such.
Is this possible? And if so what's the best way of doing this? I've upgraded to use the latest version of Solr 3.1, and of course for this use case would expect to have to build from source, in order to add custom source.
Or/and, when using the score/boost/bf parameters etc - is it possible to use the score parameter in functions, to say scale it between 0 and 1?
Thanks
Mike



 		 	   		  

RE: Custom Sorting

Posted by Michael Owen <mi...@hotmail.com>.
Ok thank you for the discussion. As I thought regard to not possible within performance limits.
I think the way to go is to document some more stats at index time, and use them in boost queries. :)
Thanks
Mike

> Date: Tue, 19 Apr 2011 15:12:00 -0400
> Subject: Re: Custom Sorting
> From: erickerickson@gmail.com
> To: solr-user@lucene.apache.org
> 
> As I understand it, sorting by field is what caches are all
> about. You have a big list in memory of all of the terms for
> a field, indexed by Lucene doc ID so fetching the term to
> compare by doc ID is fast, and also why the caches need
> to be warmed, and why sort fields should be single-valued.
> 
> If you try to do this yourself and fetch data from each document,
> you can incur a huge performance hit, since you'll be seeking
> all over your disk...
> 
> Score is special though since it's transient. Internally, all Lucene
> has to do is keep track of the top N scores encountered where
> N is something like "start + queryResultWindowSize", this
> latter from solrconfig.xml, with no seeks to disk at all...
> 
> Best
> Erick
> 
> On Tue, Apr 19, 2011 at 2:50 PM, Jonathan Rochkind <ro...@jhu.edu> wrote:
> > On 4/19/2011 1:43 PM, Jan Høydahl wrote:
> >>
> >> Hi,
> >>
> >> Not possible :)
> >> Lucene compares each matching document against the query and produces a
> >> score for each.
> >> Documents are not compared to eachother like normal sort, that would be
> >> way too costly.
> >
> > That might be true for sort by 'score' (although even if you have all the
> > scores, it still seems like some kind of sort must be neccesary to see which
> > comes first), but when you sort by a field value, which is also possible,
> > Lucene must be doing some kind of 'normal sort' algorithm, no?  Ah, I guess
> > it could just be using each term's position in the index, which is available
> > in constant time, always kept track of in an index? Maybe, I don't know?
> >
> >
> >
 		 	   		  

Re: Custom Sorting

Posted by Erick Erickson <er...@gmail.com>.
As I understand it, sorting by field is what caches are all
about. You have a big list in memory of all of the terms for
a field, indexed by Lucene doc ID so fetching the term to
compare by doc ID is fast, and also why the caches need
to be warmed, and why sort fields should be single-valued.

If you try to do this yourself and fetch data from each document,
you can incur a huge performance hit, since you'll be seeking
all over your disk...

Score is special though since it's transient. Internally, all Lucene
has to do is keep track of the top N scores encountered where
N is something like "start + queryResultWindowSize", this
latter from solrconfig.xml, with no seeks to disk at all...

Best
Erick

On Tue, Apr 19, 2011 at 2:50 PM, Jonathan Rochkind <ro...@jhu.edu> wrote:
> On 4/19/2011 1:43 PM, Jan Høydahl wrote:
>>
>> Hi,
>>
>> Not possible :)
>> Lucene compares each matching document against the query and produces a
>> score for each.
>> Documents are not compared to eachother like normal sort, that would be
>> way too costly.
>
> That might be true for sort by 'score' (although even if you have all the
> scores, it still seems like some kind of sort must be neccesary to see which
> comes first), but when you sort by a field value, which is also possible,
> Lucene must be doing some kind of 'normal sort' algorithm, no?  Ah, I guess
> it could just be using each term's position in the index, which is available
> in constant time, always kept track of in an index? Maybe, I don't know?
>
>
>

Re: Custom Sorting

Posted by Jonathan Rochkind <ro...@jhu.edu>.
On 4/19/2011 1:43 PM, Jan Høydahl wrote:
> Hi,
>
> Not possible :)
> Lucene compares each matching document against the query and produces a score for each.
> Documents are not compared to eachother like normal sort, that would be way too costly.

That might be true for sort by 'score' (although even if you have all 
the scores, it still seems like some kind of sort must be neccesary to 
see which comes first), but when you sort by a field value, which is 
also possible, Lucene must be doing some kind of 'normal sort' 
algorithm, no?  Ah, I guess it could just be using each term's position 
in the index, which is available in constant time, always kept track of 
in an index? Maybe, I don't know?



Re: Custom Sorting

Posted by lboutros <bo...@gmail.com>.
You could create a new Similarity class  plugin that take in account every
parameters you need. :

http://wiki.apache.org/solr/SolrPlugins?highlight=%28similarity%29#Similarity

but, as Jan said, be carefull with the cost of the the similarity function.

Ludovic.

2011/4/19 Jan Høydahl / Cominvent [via Lucene] <
ml-node+2839526-2100261518-383657@n3.nabble.com>

> Hi,
>
> Not possible :)
> Lucene compares each matching document against the query and produces a
> score for each.
> Documents are not compared to eachother like normal sort, that would be way
> too costly.
>
> But if you explain your use case, I'm sure we can find ways to express your
> needs in other ways
>
> Perhaps it is possible for you to use Sort by Function?
> http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
> Then you can decide exactly what goes into your sort score.
> If you want to do conditional stuff, you may need to pre-process your
> documents a bit and create new fields which can be used in a FunctionQuery.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 19. apr. 2011, at 19.02, Michael Owen wrote:
>
> >
> > Hi,
> > I want to able to have a custom sorting algorithm such that for each
> comparison of document results (A v B) I can rank them. i.e. writing a
> comparator like I would normally do in Java (Compares its two arguments for
> order. Returns a negative integer, zero, or a positive integer as the first
> argument is less than, equal to, or greater than the second).
> > In the comparator I want to be able to take into account the score of the
> results, as well as other fields in the documents.
> > I've looked at using things such as the score/boost/bf parameters etc,
> however, want the flexibility of being able to code the comparator, so I can
> do if conditions and such.
> > Is this possible? And if so what's the best way of doing this? I've
> upgraded to use the latest version of Solr 3.1, and of course for this use
> case would expect to have to build from source, in order to add custom
> source.
> > Or/and, when using the score/boost/bf parameters etc - is it possible to
> use the score parameter in functions, to say scale it between 0 and 1?
> > Thanks
> > Mike
> >
> >
> >
> >
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Custom-Sorting-tp2839375p2839526.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383657@n3.nabble.com
> To unsubscribe from Solr - User, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>.
>
>


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/Custom-Sorting-tp2839375p2839593.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Sorting

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

Not possible :)
Lucene compares each matching document against the query and produces a score for each.
Documents are not compared to eachother like normal sort, that would be way too costly.

But if you explain your use case, I'm sure we can find ways to express your needs in other ways

Perhaps it is possible for you to use Sort by Function? http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
Then you can decide exactly what goes into your sort score.
If you want to do conditional stuff, you may need to pre-process your documents a bit and create new fields which can be used in a FunctionQuery.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 19. apr. 2011, at 19.02, Michael Owen wrote:

> 
> Hi,
> I want to able to have a custom sorting algorithm such that for each comparison of document results (A v B) I can rank them. i.e. writing a comparator like I would normally do in Java (Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second).
> In the comparator I want to be able to take into account the score of the results, as well as other fields in the documents.
> I've looked at using things such as the score/boost/bf parameters etc, however, want the flexibility of being able to code the comparator, so I can do if conditions and such.
> Is this possible? And if so what's the best way of doing this? I've upgraded to use the latest version of Solr 3.1, and of course for this use case would expect to have to build from source, in order to add custom source.
> Or/and, when using the score/boost/bf parameters etc - is it possible to use the score parameter in functions, to say scale it between 0 and 1?
> Thanks
> Mike
> 
> 
> 
>