You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chantal Ackermann <ch...@btelligent.de> on 2010/06/24 16:17:57 UTC

MoreLikeThis (mlt) : use the match's maxScore for result score normalization

Hi there,

consider the following response extract for a MoreLikeThis request:

<result name="match" numFound="1" start="0" maxScore="13.4579935">
<result name="response" numFound="103708" start="0"
maxScore="4.1711807">

The first result element is the document that was input and for which to
return "more like this" results.
The second result element contains the results returned by the handler.

As they both come with a different maxScore I was wondering whether I
could safely use the match's maxScore to normalize the scores of the
"more like this" documents.

Would that allow to reflect to the user the quality/relevancy of the
hits for different MoreLikeThis requests (and only those)?
(What does the match's maxScore mean?)

Thanks!
Chantal


Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

Posted by MitchK <mi...@web.de>.
Hi Chantal,

Munich? Germany seems to be soo small :-).


Chantal Ackermann wrote:
> 
> I only want a way to show to the 
> user a kind of relevancy or similarity indicator (for example using a 
> range of 10 stars) that would give a hint on how similar the mlt hit is 
> to the input (match) item. 
> 
Okay, that's making more sense.
Unfortunately, you can not do that with Lucene with results that might fit
your needs (as far as I know).

Kind regards
- Mitch
-- 
View this message in context: http://lucene.472066.n3.nabble.com/MoreLikeThis-mlt-use-the-match-s-maxScore-for-result-score-normalization-tp919598p921942.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

Posted by Chantal Ackermann <ch...@btelligent.de>.
Hi Mitch,

thanks for the answer and the link.

The use case is to provide content based recommendations for a single
item no matter where that came from. So, this input (match) item is "the
best match", all "more like this" items compare to it, and the ones that
are the most alike would have the highest scores.

(Meaning also that the most similar are probably not as good as
recommendations because they are too similar. But that is a different
story.)

Again, I don't want to compare the scores of regular search results
(e.g. from dismax) with those of mlt. I only want a way to show to the
user a kind of relevancy or similarity indicator (for example using a
range of 10 stars) that would give a hint on how similar the mlt hit is
to the input (match) item.

Greetings from Munich ;-)
Chantal



On Thu, 2010-06-24 at 17:06 +0200, MitchK wrote:
> Chantal,
> 
> have a look at 
> http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/similar/MoreLikeThis.html
> More like this  to have a guess what the MLT's score concerns.
> 
> The problem is that you can't compare scores.
> The query for the "normal" result-response was maybe something like 
> "Bill Gates featuring Linus Torvald - The perfect OS song".
> The user picks now one of the responsed documents and says he wants "More
> like this" - maybe, because the concerned topic was okay, but the content
> was not enough or whatever...
> But the sent query is totaly different (as you can see in the link) - so
> that would be like comparing apples and oranges, since they do not use the
> same base.
> 
> What would be the use case? Why is score-normalization needed?
> 
> Kind regards from Germany,
> - Mitch




Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

Posted by MitchK <mi...@web.de>.
Chantal,

have a look at 
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/similar/MoreLikeThis.html
More like this  to have a guess what the MLT's score concerns.

The problem is that you can't compare scores.
The query for the "normal" result-response was maybe something like 
"Bill Gates featuring Linus Torvald - The perfect OS song".
The user picks now one of the responsed documents and says he wants "More
like this" - maybe, because the concerned topic was okay, but the content
was not enough or whatever...
But the sent query is totaly different (as you can see in the link) - so
that would be like comparing apples and oranges, since they do not use the
same base.

What would be the use case? Why is score-normalization needed?

Kind regards from Germany,
- Mitch
-- 
View this message in context: http://lucene.472066.n3.nabble.com/MoreLikeThis-mlt-use-the-match-s-maxScore-for-result-score-normalization-tp919598p919716.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

Posted by Chantal Ackermann <ch...@btelligent.de>.
Hi Otis,

thank you for this super quick answer. I understand that normalizing and
comparing scores is fishy, and I wouldn't want to do it for regular
search results.

I just thought that in this special case, the maxScore which is returned
for the input document to the MoreLikeThis handler -- and this is only
present in MoreLikeThis responses (with include=true) -- might be the
missing additional value that would allow to normalize on. (In this
special case there are two maxScores.)

But I don't know what the match's maxScore is derived from. As the input
element should surely be the best match for the request a maxScore of
13.4579935 looks suspicious?

Thanks,
Chantal




On Thu, 2010-06-24 at 16:25 +0200, Otis Gospodnetic wrote:
> Chantal,
> 
> The short answer is that you can't compare relevancy scores across requests.  I think this may be in a FAQ.
> Check this:
> http://search-lucene.com/?q=score+compare+absolute+relative&fc_project=Lucene&fc_project=Solr
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message ----
> > From: Chantal Ackermann <ch...@btelligent.de>
> > To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> > Sent: Thu, June 24, 2010 10:17:57 AM
> > Subject: MoreLikeThis (mlt) : use the match's maxScore for result score normalization
> > 
> > Hi there,
> 
> consider the following response extract for a MoreLikeThis 
> > request:
> 
> <result name="match" numFound="1" start="0" 
> > maxScore="13.4579935">
> <result name="response" numFound="103708" 
> > start="0"
> maxScore="4.1711807">
> 
> The first result element is the 
> > document that was input and for which to
> return "more like this" 
> > results.
> The second result element contains the results returned by the 
> > handler.
> 
> As they both come with a different maxScore I was wondering 
> > whether I
> could safely use the match's maxScore to normalize the scores of 
> > the
> "more like this" documents.
> 
> Would that allow to reflect to the 
> > user the quality/relevancy of the
> hits for different MoreLikeThis requests 
> > (and only those)?
> (What does the match's maxScore 
> > mean?)
> 
> Thanks!
> Chantal




Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Chantal,

The short answer is that you can't compare relevancy scores across requests.  I think this may be in a FAQ.
Check this:
http://search-lucene.com/?q=score+compare+absolute+relative&fc_project=Lucene&fc_project=Solr

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Chantal Ackermann <ch...@btelligent.de>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Thu, June 24, 2010 10:17:57 AM
> Subject: MoreLikeThis (mlt) : use the match's maxScore for result score normalization
> 
> Hi there,

consider the following response extract for a MoreLikeThis 
> request:

<result name="match" numFound="1" start="0" 
> maxScore="13.4579935">
<result name="response" numFound="103708" 
> start="0"
maxScore="4.1711807">

The first result element is the 
> document that was input and for which to
return "more like this" 
> results.
The second result element contains the results returned by the 
> handler.

As they both come with a different maxScore I was wondering 
> whether I
could safely use the match's maxScore to normalize the scores of 
> the
"more like this" documents.

Would that allow to reflect to the 
> user the quality/relevancy of the
hits for different MoreLikeThis requests 
> (and only those)?
(What does the match's maxScore 
> mean?)

Thanks!
Chantal