You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jeff Newburn <jn...@zappos.com> on 2009/09/10 00:00:23 UTC

Nonsensical Solr Relevancy Score

I have done a search on the word ³blue² in our index.  The debugQuery shows
some extremely strange methods of scoring.  Somehow product 1 gets a higher
score with only 1 match on the word blue when product 2 gets a lower score
with the same field match AND an additional field match.  Can someone please
help me understand why such an obviously more relevant product is given a
lower score.

  <str name="954058">
2.3623571 = (MATCH) sum of:
  0.26248413 = (MATCH) max plus 0.5 times others of:
    0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product
of:
      0.032673787 = queryWeight(productNameSearch:blue), product of:
        8.033478 = idf(docFreq=120, numDocs=136731)
        0.0040672035 = queryNorm
      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
product of:
        1.0 = tf(termFreq(productNameSearch:blue)=1)
        8.033478 = idf(docFreq=120, numDocs=136731)
        1.0 = fieldNorm(field=productNameSearch, doc=112779)
  2.099873 = (MATCH) max plus 0.5 times others of:
    2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product
of:
      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
        8.0 = boost
        8.033478 = idf(docFreq=120, numDocs=136731)
        0.0040672035 = queryNorm
      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
product of:
        1.0 = tf(termFreq(productNameSearch:blue)=1)
        8.033478 = idf(docFreq=120, numDocs=136731)
        1.0 = fieldNorm(field=productNameSearch, doc=112779)
</str>
  <str name="402943">
1.9483687 = (MATCH) sum of:
  0.63594794 = (MATCH) max plus 0.5 times others of:
    0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of:
      0.032673787 = queryWeight(productNameSearch:blue), product of:
        8.033478 = idf(docFreq=120, numDocs=136731)
        0.0040672035 = queryNorm
      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
product of:
        1.0 = tf(termFreq(productNameSearch:blue)=1)
        8.033478 = idf(docFreq=120, numDocs=136731)
        0.625 = fieldNorm(field=productNameSearch, doc=8142)
    0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
      0.15009704 = queryWeight(color:blue^10.0), product of:
        10.0 = boost
        3.6904235 = idf(docFreq=9309, numDocs=136731)
        0.0040672035 = queryNorm
      3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
        1.0 = tf(termFreq(color:blue)=1)
        3.6904235 = idf(docFreq=9309, numDocs=136731)
        1.0 = fieldNorm(field=color, doc=8142)
  1.3124207 = (MATCH) max plus 0.5 times others of:
    1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product
of:
      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
        8.0 = boost
        8.033478 = idf(docFreq=120, numDocs=136731)
        0.0040672035 = queryNorm
      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
product of:
        1.0 = tf(termFreq(productNameSearch:blue)=1)
        8.033478 = idf(docFreq=120, numDocs=136731)
        0.625 = fieldNorm(field=productNameSearch, doc=8142)
</str>

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewburn@zappos.com - 702-943-7562


Re: Nonsensical Solr Relevancy Score

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Factor 1: idf
  If you do a search on "blue whales" you are probably much more
interested in whales than you are in things that are blue.  The idf
factor takes this term rarity into account.  In your case, color:blue
appears in over 9000 documents, but productNameSearch:blue only
appears in 120 documents (and thus it's idf factor is much higher).
One option is to simply boost searches on your color field higher.

Factor 2: length normalization
  0.625 = fieldNorm(field=productNameSearch, doc=8142)
The second document probably has a match in a longer field, which is a
less specific match and thus gets penalized. Because this is in the
very important field (as measured by idf) this causes the second doc
to lose.

Factor 3: No coord factor in the top level boolean query in generated
dismax queries.  This would generally cause matches in more fields to
be boosted beyond just adding their scores together.   Maybe we should
have an option for this.

-Yonik
http://www.lucidimagination.com



On Wed, Sep 9, 2009 at 6:00 PM, Jeff Newburn <jn...@zappos.com> wrote:
> I have done a search on the word ³blue² in our index.  The debugQuery shows
> some extremely strange methods of scoring.  Somehow product 1 gets a higher
> score with only 1 match on the word blue when product 2 gets a lower score
> with the same field match AND an additional field match.  Can someone please
> help me understand why such an obviously more relevant product is given a
> lower score.
>
>  <str name="954058">
> 2.3623571 = (MATCH) sum of:
>  0.26248413 = (MATCH) max plus 0.5 times others of:
>    0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product
> of:
>      0.032673787 = queryWeight(productNameSearch:blue), product of:
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        1.0 = fieldNorm(field=productNameSearch, doc=112779)
>  2.099873 = (MATCH) max plus 0.5 times others of:
>    2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product
> of:
>      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>        8.0 = boost
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        1.0 = fieldNorm(field=productNameSearch, doc=112779)
> </str>
>  <str name="402943">
> 1.9483687 = (MATCH) sum of:
>  0.63594794 = (MATCH) max plus 0.5 times others of:
>    0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of:
>      0.032673787 = queryWeight(productNameSearch:blue), product of:
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.625 = fieldNorm(field=productNameSearch, doc=8142)
>    0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
>      0.15009704 = queryWeight(color:blue^10.0), product of:
>        10.0 = boost
>        3.6904235 = idf(docFreq=9309, numDocs=136731)
>        0.0040672035 = queryNorm
>      3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
>        1.0 = tf(termFreq(color:blue)=1)
>        3.6904235 = idf(docFreq=9309, numDocs=136731)
>        1.0 = fieldNorm(field=color, doc=8142)
>  1.3124207 = (MATCH) max plus 0.5 times others of:
>    1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product
> of:
>      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>        8.0 = boost
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.625 = fieldNorm(field=productNameSearch, doc=8142)
> </str>
>
> --
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewburn@zappos.com - 702-943-7562
>
>

Re: Nonsensical Solr Relevancy Score

Posted by Jeff Newburn <jn...@zappos.com>.
Ah that makes more sense.  It does seem that the coord would be a good
option especially in cases like this.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewburn@zappos.com - 702-943-7562


> From: Yonik Seeley <yo...@lucidimagination.com>
> Reply-To: <so...@lucene.apache.org>
> Date: Fri, 11 Sep 2009 14:44:50 -0400
> To: <so...@lucene.apache.org>
> Subject: Re: Nonsensical Solr Relevancy Score
> 
> At a high level, there's this:
> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-343e33b6472ca53afb94e1544ae3
> fcf7d474e5fc
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Fri, Sep 11, 2009 at 1:05 PM, Matthew Runo <mr...@zappos.com> wrote:
>> I'd actually like to see a detailed wiki page on how all the parts of a
>> score are actually calculated and inter-related, but I'm not knowledgeable
>> enough to write it =\
>> 
>> Thanks for your time!
>> 
>> Matthew Runo
>> Software Engineer, Zappos.com
>> mruno@zappos.com - 702-943-7833
>> 
>> On Sep 9, 2009, at 3:00 PM, Jeff Newburn wrote:
>> 
>>> I have done a search on the word ³blue² in our index.  The debugQuery
>>> shows
>>> some extremely strange methods of scoring.  Somehow product 1 gets a
>>> higher
>>> score with only 1 match on the word blue when product 2 gets a lower score
>>> with the same field match AND an additional field match.  Can someone
>>> please
>>> help me understand why such an obviously more relevant product is given a
>>> lower score.
>>> 
>>>  <str name="954058">
>>> 2.3623571 = (MATCH) sum of:
>>>  0.26248413 = (MATCH) max plus 0.5 times others of:
>>>   0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product
>>> of:
>>>     0.032673787 = queryWeight(productNameSearch:blue), product of:
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
>>> product of:
>>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       1.0 = fieldNorm(field=productNameSearch, doc=112779)
>>>  2.099873 = (MATCH) max plus 0.5 times others of:
>>>   2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product
>>> of:
>>>     0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>>>       8.0 = boost
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
>>> product of:
>>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       1.0 = fieldNorm(field=productNameSearch, doc=112779)
>>> </str>
>>>  <str name="402943">
>>> 1.9483687 = (MATCH) sum of:
>>>  0.63594794 = (MATCH) max plus 0.5 times others of:
>>>   0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of:
>>>     0.032673787 = queryWeight(productNameSearch:blue), product of:
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
>>> product of:
>>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.625 = fieldNorm(field=productNameSearch, doc=8142)
>>>   0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
>>>     0.15009704 = queryWeight(color:blue^10.0), product of:
>>>       10.0 = boost
>>>       3.6904235 = idf(docFreq=9309, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
>>>       1.0 = tf(termFreq(color:blue)=1)
>>>       3.6904235 = idf(docFreq=9309, numDocs=136731)
>>>       1.0 = fieldNorm(field=color, doc=8142)
>>>  1.3124207 = (MATCH) max plus 0.5 times others of:
>>>   1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product
>>> of:
>>>     0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>>>       8.0 = boost
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
>>> product of:
>>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.625 = fieldNorm(field=productNameSearch, doc=8142)
>>> </str>
>>> 
>>> --
>>> Jeff Newburn
>>> Software Engineer, Zappos.com
>>> jnewburn@zappos.com - 702-943-7562
>>> 
>> 
>> 


Re: Nonsensical Solr Relevancy Score

Posted by Yonik Seeley <yo...@lucidimagination.com>.
At a high level, there's this:
http://wiki.apache.org/solr/SolrRelevancyFAQ#head-343e33b6472ca53afb94e1544ae3fcf7d474e5fc

-Yonik
http://www.lucidimagination.com



On Fri, Sep 11, 2009 at 1:05 PM, Matthew Runo <mr...@zappos.com> wrote:
> I'd actually like to see a detailed wiki page on how all the parts of a
> score are actually calculated and inter-related, but I'm not knowledgeable
> enough to write it =\
>
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mruno@zappos.com - 702-943-7833
>
> On Sep 9, 2009, at 3:00 PM, Jeff Newburn wrote:
>
>> I have done a search on the word “blue” in our index.  The debugQuery
>> shows
>> some extremely strange methods of scoring.  Somehow product 1 gets a
>> higher
>> score with only 1 match on the word blue when product 2 gets a lower score
>> with the same field match AND an additional field match.  Can someone
>> please
>> help me understand why such an obviously more relevant product is given a
>> lower score.
>>
>>  <str name="954058">
>> 2.3623571 = (MATCH) sum of:
>>  0.26248413 = (MATCH) max plus 0.5 times others of:
>>   0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product
>> of:
>>     0.032673787 = queryWeight(productNameSearch:blue), product of:
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
>> product of:
>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       1.0 = fieldNorm(field=productNameSearch, doc=112779)
>>  2.099873 = (MATCH) max plus 0.5 times others of:
>>   2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product
>> of:
>>     0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>>       8.0 = boost
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
>> product of:
>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       1.0 = fieldNorm(field=productNameSearch, doc=112779)
>> </str>
>>  <str name="402943">
>> 1.9483687 = (MATCH) sum of:
>>  0.63594794 = (MATCH) max plus 0.5 times others of:
>>   0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of:
>>     0.032673787 = queryWeight(productNameSearch:blue), product of:
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
>> product of:
>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.625 = fieldNorm(field=productNameSearch, doc=8142)
>>   0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
>>     0.15009704 = queryWeight(color:blue^10.0), product of:
>>       10.0 = boost
>>       3.6904235 = idf(docFreq=9309, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
>>       1.0 = tf(termFreq(color:blue)=1)
>>       3.6904235 = idf(docFreq=9309, numDocs=136731)
>>       1.0 = fieldNorm(field=color, doc=8142)
>>  1.3124207 = (MATCH) max plus 0.5 times others of:
>>   1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product
>> of:
>>     0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>>       8.0 = boost
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
>> product of:
>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.625 = fieldNorm(field=productNameSearch, doc=8142)
>> </str>
>>
>> --
>> Jeff Newburn
>> Software Engineer, Zappos.com
>> jnewburn@zappos.com - 702-943-7562
>>
>
>

Re: Nonsensical Solr Relevancy Score

Posted by Matthew Runo <mr...@zappos.com>.
I'd actually like to see a detailed wiki page on how all the parts of  
a score are actually calculated and inter-related, but I'm not  
knowledgeable enough to write it =\

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mruno@zappos.com - 702-943-7833

On Sep 9, 2009, at 3:00 PM, Jeff Newburn wrote:

> I have done a search on the word “blue” in our index.  The  
> debugQuery shows
> some extremely strange methods of scoring.  Somehow product 1 gets a  
> higher
> score with only 1 match on the word blue when product 2 gets a lower  
> score
> with the same field match AND an additional field match.  Can  
> someone please
> help me understand why such an obviously more relevant product is  
> given a
> lower score.
>
>  <str name="954058">
> 2.3623571 = (MATCH) sum of:
>  0.26248413 = (MATCH) max plus 0.5 times others of:
>    0.26248413 = (MATCH) weight(productNameSearch:blue in 112779),  
> product
> of:
>      0.032673787 = queryWeight(productNameSearch:blue), product of:
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        1.0 = fieldNorm(field=productNameSearch, doc=112779)
>  2.099873 = (MATCH) max plus 0.5 times others of:
>    2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779),  
> product
> of:
>      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>        8.0 = boost
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        1.0 = fieldNorm(field=productNameSearch, doc=112779)
> </str>
>  <str name="402943">
> 1.9483687 = (MATCH) sum of:
>  0.63594794 = (MATCH) max plus 0.5 times others of:
>    0.16405259 = (MATCH) weight(productNameSearch:blue in 8142),  
> product of:
>      0.032673787 = queryWeight(productNameSearch:blue), product of:
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.625 = fieldNorm(field=productNameSearch, doc=8142)
>    0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
>      0.15009704 = queryWeight(color:blue^10.0), product of:
>        10.0 = boost
>        3.6904235 = idf(docFreq=9309, numDocs=136731)
>        0.0040672035 = queryNorm
>      3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
>        1.0 = tf(termFreq(color:blue)=1)
>        3.6904235 = idf(docFreq=9309, numDocs=136731)
>        1.0 = fieldNorm(field=color, doc=8142)
>  1.3124207 = (MATCH) max plus 0.5 times others of:
>    1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142),  
> product
> of:
>      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>        8.0 = boost
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.625 = fieldNorm(field=productNameSearch, doc=8142)
> </str>
>
> -- 
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewburn@zappos.com - 702-943-7562
>