You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Martin Koch <ma...@issuu.com> on 2011/11/04 11:28:25 UTC

Comparing apples & oranges?

Hi List

I have a solr index where I want to include numerical fields in my ranking
function as well as keyword relevance. For example, each document has a
document view count, and I'd like to increase the relevancy of documents
that are read often, and penalize documents with a very low view count. I'm
aware that this could be achieved with a filter as well, but ignore that
for this question :) since this will be extended to other numerical fields.

The keyword scoring works just fine and I can include the view count as a
factor in the scoring, but I would like to somehow express that the view
count accounts for e.g. 25% of the total score. This could be achieved by
mapping the view count into some predetermined fixed range and then
performing suitable arithmetic to scale to the score of the query. The
score of the term query is normalized to queryNorm, so I'd like somehow to
express that the view count score should be normalized to the queryNorm.

If I look at the explain of how the score below is computed, the 17.4 is
the part of the score that comes from term relevancy. Searching for another
(set of) terms yields a different queryNorm, so I can't see how I can
a-priori pick a scaling function (I've used log for this example) and boost
factor that will give control of the final contribution of the view count
to the score.

19.14161 = (MATCH) sum of:
  17.403849 = (MATCH) max plus 0.1 times others of:
    16.747877 = (MATCH) weight(document:water^4.0 in 1076362), product of:
      0.22298127 = queryWeight(document:water^4.0), product of:
        4.0 = boost
        2.939238 = idf(docFreq=527730, maxDocs=3669552)
        0.018965907 = queryNorm
      75.108894 = (MATCH) fieldWeight(document:water in 1076362), product
of:
        25.553865 = tf(termFreq(document:water)=653)
        2.939238 = idf(docFreq=527730, maxDocs=3669552)
        1.0 = fieldNorm(field=document, doc=1076362)
[snip]
  1.7377597 = (MATCH) FunctionQuery(log(map(int(views),0.0,0.0,1.0))),
product of:
    1.8325089 = log(map(int(views)=68,min=0.0,max=0.0,target=1.0))
    50.0 = boost
    0.018965907 = queryNorm

Thanks in advance for your help,
/Martin

Re: Comparing apples & oranges?

Posted by Erick Erickson <er...@gmail.com>.
What about Function Queries? They can essentially take field values
and use them as part of the score calculations....

Best
Erick

On Fri, Nov 4, 2011 at 6:28 AM, Martin Koch <ma...@issuu.com> wrote:
> Hi List
>
> I have a solr index where I want to include numerical fields in my ranking
> function as well as keyword relevance. For example, each document has a
> document view count, and I'd like to increase the relevancy of documents
> that are read often, and penalize documents with a very low view count. I'm
> aware that this could be achieved with a filter as well, but ignore that
> for this question :) since this will be extended to other numerical fields.
>
> The keyword scoring works just fine and I can include the view count as a
> factor in the scoring, but I would like to somehow express that the view
> count accounts for e.g. 25% of the total score. This could be achieved by
> mapping the view count into some predetermined fixed range and then
> performing suitable arithmetic to scale to the score of the query. The
> score of the term query is normalized to queryNorm, so I'd like somehow to
> express that the view count score should be normalized to the queryNorm.
>
> If I look at the explain of how the score below is computed, the 17.4 is
> the part of the score that comes from term relevancy. Searching for another
> (set of) terms yields a different queryNorm, so I can't see how I can
> a-priori pick a scaling function (I've used log for this example) and boost
> factor that will give control of the final contribution of the view count
> to the score.
>
> 19.14161 = (MATCH) sum of:
>  17.403849 = (MATCH) max plus 0.1 times others of:
>    16.747877 = (MATCH) weight(document:water^4.0 in 1076362), product of:
>      0.22298127 = queryWeight(document:water^4.0), product of:
>        4.0 = boost
>        2.939238 = idf(docFreq=527730, maxDocs=3669552)
>        0.018965907 = queryNorm
>      75.108894 = (MATCH) fieldWeight(document:water in 1076362), product
> of:
>        25.553865 = tf(termFreq(document:water)=653)
>        2.939238 = idf(docFreq=527730, maxDocs=3669552)
>        1.0 = fieldNorm(field=document, doc=1076362)
> [snip]
>  1.7377597 = (MATCH) FunctionQuery(log(map(int(views),0.0,0.0,1.0))),
> product of:
>    1.8325089 = log(map(int(views)=68,min=0.0,max=0.0,target=1.0))
>    50.0 = boost
>    0.018965907 = queryNorm
>
> Thanks in advance for your help,
> /Martin
>