You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by roySolr <ro...@gmail.com> on 2011/05/03 13:48:20 UTC

Dismax scoring multiple fields TIE

Hello,

I have a question about scoring when i use the dismax handler. I will give
some examples:

    name                                  category                  related
category
1. Chelsea best club ever                Chelsea                    Sport
2. Chelsea                                    Chelsea                   
Sport

When i search for "Chelsea" i want a higher score for number 2. I think it
is a better match on fieldlength.
I use the dismax and both records has the same score. I see some difference
in fieldNorm both still the score is the same. How can i fix this?


my config:

<requestHandler name="dismax" class="solr.SearchHandler" default="true">
    <lst name="defaults">
     <str name="defType">dismax</str>
     <str name="qf">
       name category related_category
     </str>
    </lst>
    <str name="tie">1.0</str>
</requestHandler>

SCORE 1:
0.75269306 = (MATCH) sum of:
  0.75269306 = (MATCH) max of:
    0.75269306 = (MATCH) weight(category:chelsea in 680), product of:
      0.3085193 = queryWeight(category:chelsea), product of:
        2.4396951 = idf(docFreq=236, maxDocs=1000)
        0.12645814 = queryNorm
      2.4396951 = (MATCH) fieldWeight(category:chelsea in 680), product of:
        1.0 = tf(termFreq(category:chelsea)=1)
        2.4396951 = idf(docFreq=236, maxDocs=1000)
        1.0 = fieldNorm(field=category, doc=680)
    0.37634653 = (MATCH) weight(name:chelsea in 680), product of:
      0.3085193 = queryWeight(name:chelsea), product of:
        2.4396951 = idf(docFreq=236, maxDocs=1000)
        0.12645814 = queryNorm
      1.2198476 = (MATCH) fieldWeight(name:chelsea in 680), product of:
        1.0 = tf(termFreq(name:chelsea)=1)
        2.4396951 = idf(docFreq=236, maxDocs=1000)
        0.5 = fieldNorm(field=name, doc=680)


SCORE 2:
0.75269306 = (MATCH) sum of:
  0.75269306 = (MATCH) max of:
    0.75269306 = (MATCH) weight(category:chelsea in 678), product of:
      0.3085193 = queryWeight(category:chelsea), product of:
        2.4396951 = idf(docFreq=236, maxDocs=1000)
        0.12645814 = queryNorm
      2.4396951 = (MATCH) fieldWeight(category:chelsea in 678), product of:
        1.0 = tf(termFreq(category:chelsea)=1)
        2.4396951 = idf(docFreq=236, maxDocs=1000)
        1.0 = fieldNorm(field=category, doc=678)
    0.75269306 = (MATCH) weight(name:chelsea in 678), product of:
      0.3085193 = queryWeight(name:chelsea), product of:
        2.4396951 = idf(docFreq=236, maxDocs=1000)
        0.12645814 = queryNorm
      2.4396951 = (MATCH) fieldWeight(name:chelsea in 678), product of:
        1.0 = tf(termFreq(name:chelsea)=1)
        2.4396951 = idf(docFreq=236, maxDocs=1000)
        1.0 = fieldNorm(field=name, doc=678)





--
View this message in context: http://lucene.472066.n3.nabble.com/Dismax-scoring-multiple-fields-TIE-tp2893923p2893923.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dismax scoring multiple fields TIE

Posted by roySolr <ro...@gmail.com>.
No, but i think the difference between fieldlength is large and the score is
still the same.

Same score for this results(q=chelsea):

1. Chelsea is a very very big club in london, england          Chelsea          
Sport
2. Chelsea                                                                
Chelsea           Sport

--
View this message in context: http://lucene.472066.n3.nabble.com/Dismax-scoring-multiple-fields-TIE-tp2893923p2894026.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dismax scoring multiple fields TIE

Posted by Erick Erickson <er...@gmail.com>.
I'm not sure you can. very short fields aren't differentiated on the basis
of field length due to rounding errors. Here's a cut-n-paste from Jay Hill:

********************************
So the values are not pre-set for the lengthNorm, but for some counts the
fieldLength value winds up being the same because of the precision los. Here
is a list of lengthNorm values for 1 to 10 term fields:

# of terms    lengthNorm
  1          1.0
  2         .625
  3         .5
  4         .5
  5         .4375
  6         .375
  7         .375
  8         .3125
  9         .3125
 10         .3125
****************

I'd ask, though, if this behavior is "good enough", are your users well
served by spending time on this case?

Best
Erick

On Tue, May 3, 2011 at 7:48 AM, roySolr <ro...@gmail.com> wrote:
> Hello,
>
> I have a question about scoring when i use the dismax handler. I will give
> some examples:
>
>    name                                  category                  related
> category
> 1. Chelsea best club ever                Chelsea                    Sport
> 2. Chelsea                                    Chelsea
> Sport
>
> When i search for "Chelsea" i want a higher score for number 2. I think it
> is a better match on fieldlength.
> I use the dismax and both records has the same score. I see some difference
> in fieldNorm both still the score is the same. How can i fix this?
>
>
> my config:
>
> <requestHandler name="dismax" class="solr.SearchHandler" default="true">
>    <lst name="defaults">
>     <str name="defType">dismax</str>
>     <str name="qf">
>       name category related_category
>     </str>
>    </lst>
>    <str name="tie">1.0</str>
> </requestHandler>
>
> SCORE 1:
> 0.75269306 = (MATCH) sum of:
>  0.75269306 = (MATCH) max of:
>    0.75269306 = (MATCH) weight(category:chelsea in 680), product of:
>      0.3085193 = queryWeight(category:chelsea), product of:
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.12645814 = queryNorm
>      2.4396951 = (MATCH) fieldWeight(category:chelsea in 680), product of:
>        1.0 = tf(termFreq(category:chelsea)=1)
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        1.0 = fieldNorm(field=category, doc=680)
>    0.37634653 = (MATCH) weight(name:chelsea in 680), product of:
>      0.3085193 = queryWeight(name:chelsea), product of:
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.12645814 = queryNorm
>      1.2198476 = (MATCH) fieldWeight(name:chelsea in 680), product of:
>        1.0 = tf(termFreq(name:chelsea)=1)
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.5 = fieldNorm(field=name, doc=680)
>
>
> SCORE 2:
> 0.75269306 = (MATCH) sum of:
>  0.75269306 = (MATCH) max of:
>    0.75269306 = (MATCH) weight(category:chelsea in 678), product of:
>      0.3085193 = queryWeight(category:chelsea), product of:
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.12645814 = queryNorm
>      2.4396951 = (MATCH) fieldWeight(category:chelsea in 678), product of:
>        1.0 = tf(termFreq(category:chelsea)=1)
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        1.0 = fieldNorm(field=category, doc=678)
>    0.75269306 = (MATCH) weight(name:chelsea in 678), product of:
>      0.3085193 = queryWeight(name:chelsea), product of:
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.12645814 = queryNorm
>      2.4396951 = (MATCH) fieldWeight(name:chelsea in 678), product of:
>        1.0 = tf(termFreq(name:chelsea)=1)
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        1.0 = fieldNorm(field=name, doc=678)
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Dismax-scoring-multiple-fields-TIE-tp2893923p2893923.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Dismax scoring multiple fields TIE

Posted by elisabeth benoit <el...@gmail.com>.
for category:chelsea, you have a fieldNorm=1.0, so your category field must
have a type with omitNorms=true. if you don't have omitNorms=true, then
shorter field will score higher.

I'm new to Solr, but from what I've experienced, this is the cause.

Regards,
Elisabeth

2011/5/3 roySolr <ro...@gmail.com>

> Hello,
>
> I have a question about scoring when i use the dismax handler. I will give
> some examples:
>
>    name                                  category                  related
> category
> 1. Chelsea best club ever                Chelsea                    Sport
> 2. Chelsea                                    Chelsea
> Sport
>
> When i search for "Chelsea" i want a higher score for number 2. I think it
> is a better match on fieldlength.
> I use the dismax and both records has the same score. I see some difference
> in fieldNorm both still the score is the same. How can i fix this?
>
>
> my config:
>
> <requestHandler name="dismax" class="solr.SearchHandler" default="true">
>    <lst name="defaults">
>     <str name="defType">dismax</str>
>     <str name="qf">
>       name category related_category
>     </str>
>    </lst>
>    <str name="tie">1.0</str>
> </requestHandler>
>
> SCORE 1:
> 0.75269306 = (MATCH) sum of:
>  0.75269306 = (MATCH) max of:
>    0.75269306 = (MATCH) weight(category:chelsea in 680), product of:
>      0.3085193 = queryWeight(category:chelsea), product of:
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.12645814 = queryNorm
>      2.4396951 = (MATCH) fieldWeight(category:chelsea in 680), product of:
>        1.0 = tf(termFreq(category:chelsea)=1)
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        1.0 = fieldNorm(field=category, doc=680)
>    0.37634653 = (MATCH) weight(name:chelsea in 680), product of:
>      0.3085193 = queryWeight(name:chelsea), product of:
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.12645814 = queryNorm
>      1.2198476 = (MATCH) fieldWeight(name:chelsea in 680), product of:
>        1.0 = tf(termFreq(name:chelsea)=1)
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.5 = fieldNorm(field=name, doc=680)
>
>
> SCORE 2:
> 0.75269306 = (MATCH) sum of:
>  0.75269306 = (MATCH) max of:
>    0.75269306 = (MATCH) weight(category:chelsea in 678), product of:
>      0.3085193 = queryWeight(category:chelsea), product of:
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.12645814 = queryNorm
>      2.4396951 = (MATCH) fieldWeight(category:chelsea in 678), product of:
>        1.0 = tf(termFreq(category:chelsea)=1)
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        1.0 = fieldNorm(field=category, doc=678)
>    0.75269306 = (MATCH) weight(name:chelsea in 678), product of:
>      0.3085193 = queryWeight(name:chelsea), product of:
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        0.12645814 = queryNorm
>      2.4396951 = (MATCH) fieldWeight(name:chelsea in 678), product of:
>        1.0 = tf(termFreq(name:chelsea)=1)
>        2.4396951 = idf(docFreq=236, maxDocs=1000)
>        1.0 = fieldNorm(field=name, doc=678)
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Dismax-scoring-multiple-fields-TIE-tp2893923p2893923.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>