You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vincent Pérès <vi...@gmail.com> on 2009/07/29 12:55:53 UTC

Relevant results with DisMaxRequestHandler

Hello,

I did notice several strange behaviors on queries. I would like to share
with you an example, so maybe you can explain to me what is going wrong.

Using the following query :
http://localhost:8983/solr/others/select/?debugQuery=true&q=anna%20lewis&rows=20&start=0&fl=*&qt=dismax

I get back around 100 results. Follow the two first :
<doc>
<str name="id">Person:151</str>
<str name="name_s">Victoria Davisson</str>
</doc>
<doc>
<str name="id">Person:37</str>
<str name="name_s">Anna Lewis</str>
</doc>

And the related debugs :
57.998047 = (MATCH) sum of:
  0.048290744 = (MATCH) sum of:
    0.024546575 = (MATCH) max plus 0.01 times others of:
      0.024546575 = (MATCH) weight(text:anna^0.5 in 64288), product of:
        0.027395602 = queryWeight(text:anna^0.5), product of:
          0.5 = boost
          5.734427 = idf(docFreq=564, numDocs=30400)
          0.009554783 = queryNorm
        0.8960042 = (MATCH) fieldWeight(text:anna in 64288), product of:
          1.0 = tf(termFreq(text:anna)=1)
          5.734427 = idf(docFreq=564, numDocs=30400)
          0.15625 = fieldNorm(field=text, doc=64288)
    0.02374417 = (MATCH) max plus 0.01 times others of:
      0.02374417 = (MATCH) weight(text:lewi^0.5 in 64288), product of:
        0.026944114 = queryWeight(text:lewi^0.5), product of:
          0.5 = boost
          5.6399217 = idf(docFreq=620, numDocs=30400)
          0.009554783 = queryNorm
        0.88123775 = (MATCH) fieldWeight(text:lewi in 64288), product of:
          1.0 = tf(termFreq(text:lewi)=1)
          5.6399217 = idf(docFreq=620, numDocs=30400)
          0.15625 = fieldNorm(field=text, doc=64288)
  57.949757 = (MATCH) FunctionQuery(ord(name_s)), product of:
    1213.0 = ord(name_s)=1213
    5.0 = boost
    0.009554783 = queryNorm

5.006892 = (MATCH) sum of:
  0.038405567 = (MATCH) sum of:
    0.021955125 = (MATCH) max plus 0.01 times others of:
      0.021955125 = (MATCH) weight(text:anna^0.5 in 62632), product of:
        0.027395602 = queryWeight(text:anna^0.5), product of:
          0.5 = boost
          5.734427 = idf(docFreq=564, numDocs=30400)
          0.009554783 = queryNorm
        0.80141056 = (MATCH) fieldWeight(text:anna in 62632), product of:
          2.236068 = tf(termFreq(text:anna)=5)
          5.734427 = idf(docFreq=564, numDocs=30400)
          0.0625 = fieldNorm(field=text, doc=62632)
    0.016450444 = (MATCH) max plus 0.01 times others of:
      0.016450444 = (MATCH) weight(text:lewi^0.5 in 62632), product of:
        0.026944114 = queryWeight(text:lewi^0.5), product of:
          0.5 = boost
          5.6399217 = idf(docFreq=620, numDocs=30400)
          0.009554783 = queryNorm
        0.61053944 = (MATCH) fieldWeight(text:lewi in 62632), product of:
          1.7320508 = tf(termFreq(text:lewi)=3)
          5.6399217 = idf(docFreq=620, numDocs=30400)
          0.0625 = fieldNorm(field=text, doc=62632)
  4.968487 = (MATCH) FunctionQuery(ord(name_s)), product of:
    104.0 = ord(name_s)=104
    5.0 = boost
    0.009554783 = queryNorm

I'm using a simple boost function :
   <requestHandler name="dismax" class="solr.SearchHandler" >
     <lst name="defaults">
      <str name="defType">dismax</str>
      <str name="echoParams">explicit</str>
      <float name="tie">0.01</float>
      <str name="qf">
         text^0.5 name_s^5.0
      </str>
      <str name="pf">
         name_s^5.0
      </str>
      <str name="bf">
         name_s^5.0
      </str>
     </lst>
   </requestHandler>

Can anyone explain to me why the first result is on top (the query is 'anna
lewis') with a huge weight and nothing related (it seems the weight come
from the name_s field...) ?

A second general question... is it possible to boost a field if the query
match exactly the content of a field?

Thank you !
Vincent
-- 
View this message in context: http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24716870.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Relevant results with DisMaxRequestHandler

Posted by Vincent Pérès <vi...@gmail.com>.
Wow, it's like the 'mm' parameters is just appeared for the first time...
Yes, I read the doc few times, but never understood that the documents who
doesn't match any of the expressions will not be return... my apologize
everything seems more clear now thanks to the min number parameter.

Thank you,
Vincent


hossman wrote:
> 
> 
> : The 'qf' parameter used in the dismax seems to work with a 'AND'
> separator.
> : I have much more results without dixmax. Is there any way to keep the
> same
> : amount of document and process the 'qf' ?
> 
> did you read any of the docs on dismax?
> 
> 	http://wiki.apache.org/solr/DisMaxRequestHandler
> 
> did you look at the "mm" param?
> 
> 	http://wiki.apache.org/solr/DisMaxRequestHandler#mm
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p25041314.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Relevant results with DisMaxRequestHandler

Posted by Chris Hostetter <ho...@fucit.org>.
: The 'qf' parameter used in the dismax seems to work with a 'AND' separator.
: I have much more results without dixmax. Is there any way to keep the same
: amount of document and process the 'qf' ?

did you read any of the docs on dismax?

	http://wiki.apache.org/solr/DisMaxRequestHandler

did you look at the "mm" param?

	http://wiki.apache.org/solr/DisMaxRequestHandler#mm


-Hoss


Re: Relevant results with DisMaxRequestHandler

Posted by Vincent Pérès <vi...@gmail.com>.
I actually have an other question...

The 'qf' parameter used in the dismax seems to work with a 'AND' separator.
I have much more results without dixmax. Is there any way to keep the same
amount of document and process the 'qf' ?

My dismax : 
   <requestHandler name="dismax" class="solr.SearchHandler" >
     <lst name="defaults">
      <str name="defType">dismax</str>
      <str name="echoParams">explicit</str>
      <float name="tie">0.01</float>
      <str name="qf">
         text^0.5 title_ac^4.0 name_ac^4.0 authors_list_sm^4.0
      </str>
     </lst>
   </requestHandler>
-- 
View this message in context: http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24903219.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Relevant results with DisMaxRequestHandler

Posted by Vincent Pérès <vi...@gmail.com>.
Hello,

Thank you for your answer, I finally used only a 'qf' parameter in the
dismax requesthandler and it seems that I have now better and more relevant
results.
I just don't understand why a result is mainly boosted by his last update by
default !

Vincent
-- 
View this message in context: http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24903143.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Relevant results with DisMaxRequestHandler

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 29, 2009, at 6:55 AM, Vincent Pérès wrote:
> Using the following query :
> http://localhost:8983/solr/others/select/?debugQuery=true&q=anna%20lewis&rows=20&start=0&fl=*&qt=dismax
>
> I get back around 100 results. Follow the two first :
> <doc>
> <str name="id">Person:151</str>
> <str name="name_s">Victoria Davisson</str>
> </doc>
> <doc>
> <str name="id">Person:37</str>
> <str name="name_s">Anna Lewis</str>
> </doc>
>
> And the related debugs :
> 57.998047 = (MATCH) sum of:
>  0.048290744 = (MATCH) sum of:
>    0.024546575 = (MATCH) max plus 0.01 times others of:
>      0.024546575 = (MATCH) weight(text:anna^0.5 in 64288), product of:
>        0.027395602 = queryWeight(text:anna^0.5), product of:
>          0.5 = boost
>          5.734427 = idf(docFreq=564, numDocs=30400)
>          0.009554783 = queryNorm
>        0.8960042 = (MATCH) fieldWeight(text:anna in 64288), product  
> of:
>          1.0 = tf(termFreq(text:anna)=1)
>          5.734427 = idf(docFreq=564, numDocs=30400)
>          0.15625 = fieldNorm(field=text, doc=64288)
>    0.02374417 = (MATCH) max plus 0.01 times others of:
>      0.02374417 = (MATCH) weight(text:lewi^0.5 in 64288), product of:
>        0.026944114 = queryWeight(text:lewi^0.5), product of:
>          0.5 = boost
>          5.6399217 = idf(docFreq=620, numDocs=30400)
>          0.009554783 = queryNorm
>        0.88123775 = (MATCH) fieldWeight(text:lewi in 64288), product  
> of:
>          1.0 = tf(termFreq(text:lewi)=1)
>          5.6399217 = idf(docFreq=620, numDocs=30400)
>          0.15625 = fieldNorm(field=text, doc=64288)
>  57.949757 = (MATCH) FunctionQuery(ord(name_s)), product of:
>    1213.0 = ord(name_s)=1213
>    5.0 = boost
>    0.009554783 = queryNorm
>
> 5.006892 = (MATCH) sum of:
>  0.038405567 = (MATCH) sum of:
>    0.021955125 = (MATCH) max plus 0.01 times others of:
>      0.021955125 = (MATCH) weight(text:anna^0.5 in 62632), product of:
>        0.027395602 = queryWeight(text:anna^0.5), product of:
>          0.5 = boost
>          5.734427 = idf(docFreq=564, numDocs=30400)
>          0.009554783 = queryNorm
>        0.80141056 = (MATCH) fieldWeight(text:anna in 62632), product  
> of:
>          2.236068 = tf(termFreq(text:anna)=5)
>          5.734427 = idf(docFreq=564, numDocs=30400)
>          0.0625 = fieldNorm(field=text, doc=62632)
>    0.016450444 = (MATCH) max plus 0.01 times others of:
>      0.016450444 = (MATCH) weight(text:lewi^0.5 in 62632), product of:
>        0.026944114 = queryWeight(text:lewi^0.5), product of:
>          0.5 = boost
>          5.6399217 = idf(docFreq=620, numDocs=30400)
>          0.009554783 = queryNorm
>        0.61053944 = (MATCH) fieldWeight(text:lewi in 62632), product  
> of:
>          1.7320508 = tf(termFreq(text:lewi)=3)
>          5.6399217 = idf(docFreq=620, numDocs=30400)
>          0.0625 = fieldNorm(field=text, doc=62632)
>  4.968487 = (MATCH) FunctionQuery(ord(name_s)), product of:
>    104.0 = ord(name_s)=104
>    5.0 = boost
>    0.009554783 = queryNorm
>
> I'm using a simple boost function :
>   <requestHandler name="dismax" class="solr.SearchHandler" >
>     <lst name="defaults">
>      <str name="defType">dismax</str>
>      <str name="echoParams">explicit</str>
>      <float name="tie">0.01</float>
>      <str name="qf">
>         text^0.5 name_s^5.0
>      </str>
>      <str name="pf">
>         name_s^5.0
>      </str>
>      <str name="bf">
>         name_s^5.0
>      </str>
>     </lst>
>   </requestHandler>
>
> Can anyone explain to me why the first result is on top (the query  
> is 'anna
> lewis') with a huge weight and nothing related (it seems the weight  
> come
> from the name_s field...) ?

The ord function perhaps isn't doing what you want.  It is returning  
the term position, and thus it appears "Anna Lewis" is the 104th  
name_s value in your index lexicographically.  And of course "Victoria  
Davisson" is much further down, at the 1203rd position.  Maybe you  
want rord instead?   But probably not...

> A second general question... is it possible to boost a field if the  
> query
> match exactly the content of a field?

You can use set dismax to have a qs (query slop) factor which will  
boost documents where the users terms are closer together (within the  
number of terms distance specified).

	Erik