You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Julien Piquot <ju...@arisem.com> on 2011/01/11 12:12:41 UTC

pruning search result with search score gradient

Hi everyone,

I would like to be able to prune my search result by removing the less 
relevant documents. I'm thinking about using the search score : I use 
the search scores of the document set (I assume there are sorted by 
descending order), normalise them (0 would be the the lowest value and 1 
the greatest value) and then calculate the gradient of the normalised 
scores. The documents with a gradient below a threshold value would be 
rejected.
If the scores are linearly decreasing, then no document is rejected. 
However, if there is a brutal score drop, then the documents below the 
drop are rejected.
The threshold value would still have to be tuned but I believe it would 
make a much stronger metric than an absolute search score.

What do you think about this approach? Do you see any problem with it? 
Is there any SOLR tools that could help me dealing with that?

Thanks for your answer.

Julien

Re: pruning search result with search score gradient

Posted by "Grijesh.singh" <pi...@gmail.com>.
Look at Solr Function Queries they might help you

-----
Grijesh
-- 
View this message in context: http://lucene.472066.n3.nabble.com/pruning-search-result-with-search-score-gradient-tp2233760p2233773.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: pruning search result with search score gradient

Posted by Dennis Gearon <ge...@sbcglobal.net>.
that's a pretty good idea, using 'delta score'

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Toke Eskildsen <te...@statsbiblioteket.dk>
To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
Sent: Thu, January 20, 2011 11:31:48 PM
Subject: Re: pruning search result with search score gradient

On Tue, 2011-01-11 at 12:12 +0100, Julien Piquot wrote:
> I would like to be able to prune my search result by removing the less 
> relevant documents. I'm thinking about using the search score : I use 
> the search scores of the document set (I assume there are sorted by 
> descending order), normalise them (0 would be the the lowest value and 1 
> the greatest value) and then calculate the gradient of the normalised 
> scores. The documents with a gradient below a threshold value would be 
> rejected.

As part of experimenting with federated search, this is one approach
we'll be trying out to determine which results to discard when merging.

> If the scores are linearly decreasing, then no document is rejected. 
> However, if there is a brutal score drop, then the documents below the 
> drop are rejected.

So if we have the scores
1.0, 0.9, 0.2, 0.15, 0.1, 0.05
then the slopes will be
0.05, 0.4, 0.025, 0.025, 0.025
and with a slope threshold of 0.1, we would discard everything from
score 0.2 and below.

It makes sense if the scores are linear with the relevance (a document
with score 0.8 has double the relevance as one with 0.4). I don't know
if they are, so experiments must be made and I fear that this is another
demonstration of the inherent problem with quantifying quality.

- Toke

Re: pruning search result with search score gradient

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2011-01-11 at 12:12 +0100, Julien Piquot wrote:
> I would like to be able to prune my search result by removing the less 
> relevant documents. I'm thinking about using the search score : I use 
> the search scores of the document set (I assume there are sorted by 
> descending order), normalise them (0 would be the the lowest value and 1 
> the greatest value) and then calculate the gradient of the normalised 
> scores. The documents with a gradient below a threshold value would be 
> rejected.

As part of experimenting with federated search, this is one approach
we'll be trying out to determine which results to discard when merging.

> If the scores are linearly decreasing, then no document is rejected. 
> However, if there is a brutal score drop, then the documents below the 
> drop are rejected.

So if we have the scores
1.0, 0.9, 0.2, 0.15, 0.1, 0.05
then the slopes will be
0.05, 0.4, 0.025, 0.025, 0.025
and with a slope threshold of 0.1, we would discard everything from
score 0.2 and below.

It makes sense if the scores are linear with the relevance (a document
with score 0.8 has double the relevance as one with 0.4). I don't know
if they are, so experiments must be made and I fear that this is another
demonstration of the inherent problem with quantifying quality.

- Toke


Re: pruning search result with search score gradient

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Some times I've _considered_ trying to do this (but generally decided it 
wasn't worth it) was when I didn't want those documents below the 
threshold to show up in the facet values.  In my application the facet 
counts are sometimes very pertinent information, that are sometimes not 
quite as useful as they could be when they include barely-relevant hits.

On 1/12/2011 11:42 AM, Erick Erickson wrote:
> What's the use-case you're trying to solve? Because if you're
> still showing results to the user, you're taking information away
> from them. Where are you expecting to get the list? If you try
> to return the entire list, you're going to pay the penalty
> of creating the entire list and transmitting it across the wire rather
> than just a pages' worth.
>
> And if you're paging, the user will do this for you by deciding for
> herself when she's getting less relevant results.
>
> So I don't understand what the value to the end user you're trying
> to provide is, perhaps if you elaborate on that I'll have more useful
> response....
>
> Best
> Erick
>
> On Tue, Jan 11, 2011 at 3:12 AM, Julien Piquot<ju...@arisem.com>wrote:
>
>> Hi everyone,
>>
>> I would like to be able to prune my search result by removing the less
>> relevant documents. I'm thinking about using the search score : I use the
>> search scores of the document set (I assume there are sorted by descending
>> order), normalise them (0 would be the the lowest value and 1 the greatest
>> value) and then calculate the gradient of the normalised scores. The
>> documents with a gradient below a threshold value would be rejected.
>> If the scores are linearly decreasing, then no document is rejected.
>> However, if there is a brutal score drop, then the documents below the drop
>> are rejected.
>> The threshold value would still have to be tuned but I believe it would
>> make a much stronger metric than an absolute search score.
>>
>> What do you think about this approach? Do you see any problem with it? Is
>> there any SOLR tools that could help me dealing with that?
>>
>> Thanks for your answer.
>>
>> Julien
>>

Re: pruning search result with search score gradient

Posted by Erick Erickson <er...@gmail.com>.
What's the use-case you're trying to solve? Because if you're
still showing results to the user, you're taking information away
from them. Where are you expecting to get the list? If you try
to return the entire list, you're going to pay the penalty
of creating the entire list and transmitting it across the wire rather
than just a pages' worth.

And if you're paging, the user will do this for you by deciding for
herself when she's getting less relevant results.

So I don't understand what the value to the end user you're trying
to provide is, perhaps if you elaborate on that I'll have more useful
response....

Best
Erick

On Tue, Jan 11, 2011 at 3:12 AM, Julien Piquot <ju...@arisem.com>wrote:

> Hi everyone,
>
> I would like to be able to prune my search result by removing the less
> relevant documents. I'm thinking about using the search score : I use the
> search scores of the document set (I assume there are sorted by descending
> order), normalise them (0 would be the the lowest value and 1 the greatest
> value) and then calculate the gradient of the normalised scores. The
> documents with a gradient below a threshold value would be rejected.
> If the scores are linearly decreasing, then no document is rejected.
> However, if there is a brutal score drop, then the documents below the drop
> are rejected.
> The threshold value would still have to be tuned but I believe it would
> make a much stronger metric than an absolute search score.
>
> What do you think about this approach? Do you see any problem with it? Is
> there any SOLR tools that could help me dealing with that?
>
> Thanks for your answer.
>
> Julien
>