You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Brian Yee <by...@wayfair.com> on 2018/01/12 19:52:51 UTC

LTR original score feature

I wanted to get some opinions on using the original score feature. The original score produced by Solr is intuitively a very important feature. In my data set I'm seeing that the original score varies wildly between different queries. This makes sense since the score generated by Solr is not normalized across all queries. However, won't this mess with our training data? If this feature is 3269.4 for the top result for one query, and then 32.7 for the top result for another query, it does not mean that the first document was 10x more relevant to its query than the second document. I am using a normalize param within Ranklib, but that only normalizes features between each other, not within one feature, right? How are people handling this? Am I missing something?

Re: LTR original score feature

Posted by Michael Alcorn <ma...@redhat.com>.

>It seems to me that the original score feature is not useful because it is
not normalized across all queries and therefore cannot be used to compare
relevance in different queries.

I don't agree with this statement and it's not what Alessandro was
suggesting ("When you put the original score together with the rest of
features, it may
be of potential usage."). The magnitude of the score could very well
contain useful information in certain contexts. The simplest way to
determine whether or not the score is useful is to just train and test the
model with and without the feature included and see which one performs
better.

On Thu, Jan 25, 2018 at 3:41 PM, Brian Yee <by...@wayfair.com> wrote:

> Thanks for the reply Alessandro. I'm starting to agree with you but I
> wanted to see if others agree. It seems to me that the original score
> feature is not useful because it is not normalized across all queries and
> therefore cannot be used to compare relevance in different queries.
>
> -----Original Message-----
> From: alessandro.benedetti [mailto:a.benedetti@sease.io]
> Sent: Wednesday, January 24, 2018 10:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: LTR original score feature
>
> This is actually an interesting point.
> The original Solr score alone will mean nothing, the ranking position of
> the document would be a more relevant feature at that stage.
>
> When you put the original score together with the rest of features, it may
> be of potential usage ( number of query terms, tf for a specific field, idf
> for another field ...).
> Also because some training algorithms will group the training samples by
> query.
>
> personally I start to believe it would be better to decompose the original
> score into finer grain features and then rely on LTR to weight them ( as
> the original score is effectively already mixing up finer grain features
> following a standard formula).
>
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director Sease Ltd. -
> www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

RE: LTR original score feature

Posted by Brian Yee <by...@wayfair.com>.

Thanks for the reply Alessandro. I'm starting to agree with you but I wanted to see if others agree. It seems to me that the original score feature is not useful because it is not normalized across all queries and therefore cannot be used to compare relevance in different queries.

-----Original Message-----
From: alessandro.benedetti [mailto:a.benedetti@sease.io] 
Sent: Wednesday, January 24, 2018 10:22 AM
To: solr-user@lucene.apache.org
Subject: Re: LTR original score feature

This is actually an interesting point.
The original Solr score alone will mean nothing, the ranking position of the document would be a more relevant feature at that stage.

When you put the original score together with the rest of features, it may be of potential usage ( number of query terms, tf for a specific field, idf for another field ...).
Also because some training algorithms will group the training samples by query.

personally I start to believe it would be better to decompose the original score into finer grain features and then rely on LTR to weight them ( as the original score is effectively already mixing up finer grain features following a standard formula).

-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: LTR original score feature

Posted by "alessandro.benedetti" <a....@sease.io>.

This is actually an interesting point.
The original Solr score alone will mean nothing, the ranking position of the
document would be a more relevant feature at that stage.

When you put the original score together with the rest of features, it may
be of potential usage ( number of query terms, tf for a specific field, idf
for another field ...).
Also because some training algorithms will group the training samples by
query.

personally I start to believe it would be better to decompose the original
score into finer grain features and then rely on LTR to weight them ( as the
original score is effectively already mixing up finer grain features
following a standard formula).





-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: LTR original score feature

Posted by Michael Alcorn <ma...@redhat.com>.

What you're suggesting is that there's a "nonlinear relationship
<http://blog.minitab.com/blog/adventures-in-statistics-2/what-is-the-difference-between-linear-and-nonlinear-equations-in-regression-analysis>"
between the original score (the input variable) and some measure of
"relevance" (the output variable). Nonlinear models like decision trees
(which include LambdaMART) and neural networks (which include RankNet) can
handle these types of situations, assuming there's enough data. The
nonlinear phenomena you brought up are also probably part of the
reason why pairwise
models tend to perform better than pointwise models
<https://www.quora.com/What-are-the-differences-between-pointwise-pairwise-and-listwise-approaches-to-Learning-to-Rank>
in
learning to rank tasks.

On Fri, Jan 12, 2018 at 1:52 PM, Brian Yee <by...@wayfair.com> wrote:

> I wanted to get some opinions on using the original score feature. The
> original score produced by Solr is intuitively a very important feature. In
> my data set I'm seeing that the original score varies wildly between
> different queries. This makes sense since the score generated by Solr is
> not normalized across all queries. However, won't this mess with our
> training data? If this feature is 3269.4 for the top result for one query,
> and then 32.7 for the top result for another query, it does not mean that
> the first document was 10x more relevant to its query than the second
> document. I am using a normalize param within Ranklib, but that only
> normalizes features between each other, not within one feature, right? How
> are people handling this? Am I missing something?
>