You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by geeky2 <ge...@hotmail.com> on 2012/08/23 17:44:42 UTC

need help understanding an issue with scoring

hello,

i am trying to understand the "debug" output from a query, and specifically
- how scores for two (2) documents are derived and why they are so far
apart.

the user is entering 9030 for the search

the search is rightfully returning the top document, however - the question
is why is the document with id 90302 so far down on the list.  

i have attached a text file i generated with xslt, pulling the document
information.  the text file has the itemNo, the rankNo and the partCnt.  the
sort order of the response handler is:

  <str name="sort">score desc, rankNo desc, partCnt desc</str>



if you look at the text file - you will see that 90302 is 174'th on the
list!  90302 has a rankNo of 6849 - and i would think that would drive it
much higher on the list and therefore much closer to 9030.

what is happening from a business perspective - is - 9030 is one of our top
selling parts as is 90302.  they need to be closer together in the results
instead of separated by 170+ documents that have a rankNo of 0.

i have also CnP the response handler that is being used - below

can someone help me understand the scoring so i can correct this?

this is the scoring for the two documents:

      <str name="9030                    ,0046,046">
12.014634 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
    0.022755474 = queryWeight(itemNo:9030^0.9), product of:
      0.9 = boost
      9.11329 = idf(docFreq=2565, maxDocs=8566704)
      0.0027743944 = queryNorm
    9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of:
      1.0 = tf(termFreq(itemNo:9030)=1)
      9.11329 = idf(docFreq=2565, maxDocs=8566704)
      1.0 = fieldNorm(field=itemNo, doc=2308681)
  12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
product of:
    1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1)
    12.014634 = idf(docFreq=140, maxDocs=8566704)
    1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681)
</str>




      <str name="90302                   ,0046,046">
0.20737723 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
    0.022755474 = queryWeight(itemNo:9030^0.9), product of:
      0.9 = boost
      9.11329 = idf(docFreq=2565, maxDocs=8566704)
      0.0027743944 = queryNorm
    9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
      1.0 = tf(termFreq(itemNo:9030)=1)
      9.11329 = idf(docFreq=2565, maxDocs=8566704)
      1.0 = fieldNorm(field=itemNo, doc=1796597)
</str>

~      

  <requestHandler name="itemNoProductTypeBrandSearch"
class="solr.SearchHandler" default="false">
    <lst name="defaults">
      <str name="defType">edismax</str>
      <str name="echoParams">all</str>
      <int name="rows">10</int>
      <str name="qf">itemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
brand^.5</str>
      <str name="q.alt">*:*</str>
      <str name="sort">score desc, rankNo desc, partCnt desc</str>
      <str name="facet">true</str>
      <str name="facet.field">itemDescFacet</str>
      <str name="facet.field">brandFacet</str>
      <str name="facet.field">divProductTypeIdFacet</str>
    </lst>
    <lst name="appends">
    </lst>
    <lst name="invariants">
    </lst>
  </requestHandler>
     
thank you for any help




--
View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help understanding an issue with scoring

Posted by geeky2 <ge...@hotmail.com>.
looks like the original complete list of the results did not get attached to
this thread 

here is a snippet of the list.

what i am trying to demonstrate, is the difference in scoring and
ultimately, sorting - and the breadth of documents (a few hundred) between
the two documents of interest (9030 and 90302)

thank you,

itemNo, score, rankNo, partCnt

  [9030                    ],12.014701,10353,1
[9030                    ],12.014701,37,1
[9030                    ],12.014701,1,1
[9030                               ],12.014701,0,167
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[9030                    ],12.014701,0,1
[PC-9030                            ],7.509188,0,169
[58-9030                 ],7.509188,0,1
[9030-1R                 ],7.509188,0,1
[903028-9030             ],7.509188,0,1
[903139-9030             ],7.509188,0,1
[903091-9030             ],7.509188,0,1
[903099-9030             ],7.509188,0,1
[903153-9030             ],7.509188,0,1
[031-9030                ],7.509188,0,1
[308-9030                ],7.509188,0,1
[9030-6010               ],7.509188,0,1
[9030-6010               ],7.509188,0,1
[9030-6006               ],7.509188,0,1
[9030-6008               ],7.509188,0,1
[9030-6008               ],7.509188,0,1
[9030-6001               ],7.509188,0,1
[9030-6003               ],7.509188,0,1
[9030-6006               ],7.509188,0,1
[208568-9030             ],7.509188,0,1
[79-9030                 ],7.509188,0,1
[33-9030                 ],7.509188,0,1
[M-9030                  ],7.509188,0,1

... a few hundred more ...

[LGQ9030PQ1                         ],0.41475832,0,150
[LEQ9030PQ0                         ],0.41475832,0,124
[LEQ9030PQ1                         ],0.41475832,0,123
[CWE9030BCE                         ],0.41475832,0,115
[PJDS9030Z               ],0.29327843,0,1
[8A-CT9-030-010          ],0.29327843,0,1
[RDT9030A                ],0.29327843,0,1
[PJDG9030Z               ],0.29327843,0,1
[90302                   ],0.20737916,6849,1
~               



--
View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002922.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help understanding an issue with scoring

Posted by geeky2 <ge...@hotmail.com>.
update:

as an experiment - i changed the query to a wildcard (9030*) instead of an
explicit value (9030)

example:

QUERY="http://$SERVER.intra.searshc.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearch&q=9030*&rows=2000&debugQuery=on&fl=*,score"

this resulted in a results list that appears much more rational from a sort
order perspective -

however - the wildcard query is not acceptable from a performance stand
point.

any input or illumination would be appreciated ;)

thank you

itemNo, score, rankNo, partCnt

  [9030                    ],1.0,10353,1
[90302                   ],1.0,6849,1
[9030P                   ],1.0,444,1
[903093                  ],1.0,51,1
[9030430                 ],1.0,47,1
[9030                    ],1.0,37,1
[903057-9010             ],1.0,26,1
[903061-9010             ],1.0,20,1
[903046-9010             ],1.0,18,1
[903056-9010             ],1.0,14,1
[903095                  ],1.0,14,1
[90303-MR1-000           ],1.0,14,1
[903097-9050             ],1.0,12,1
[903046-9011             ],1.0,12,1
[903097-9010             ],1.0,11,1
[903097-9040             ],1.0,11,1
[903063-9100             ],1.0,6,1
[903066-9011             ],1.0,6,1
[903098                  ],1.0,3,1




--
View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002919.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help understanding an issue with scoring

Posted by geeky2 <ge...@hotmail.com>.
Chris, Jack,

thank you for the detailed replies and help ;)






--
View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4003782.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help understanding an issue with scoring

Posted by Chris Hostetter <ho...@fucit.org>.
: What is your query and "qf"?

FYI: these are both inlcuded in the original message (which was also 
quoted in the reply below)

As jack points out, the differnece in score comes from thediffernece 
in which fields are matched on.


Your high scoring example doc matches on *both* the 
itemNo and itemNoExactMatchStr fields, but your low scoring example doc 
matches only on the itemNo field.  And you have a (relatively) huge boost 
on the itemNoExactMatchStr field compared to itemNo.

These queries are fairly simple, so the explain output isn't very 
complicated, and it's easy to see from the match -- but it may help to 
prune out some of the small details, and just look at the top level 
calculations...

:      <str name="9030                    ,0046,046">
: 12.014634 = (MATCH) max of:
:  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
:  12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
: </str>

	...vs...

:      <str name="90302                   ,0046,046">
: 0.20737723 = (MATCH) max of:
:  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
: </str>

ou specified a huge boost on itemNoExactMatchStr, so the doc that matches 
on that field is going to score a lot higher then the doc that only 
matches on itemNo...

:      <str name="qf">itemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
: brand^.5</str>


-Hoss

Re: need help understanding an issue with scoring

Posted by geeky2 <ge...@hotmail.com>.
hello,


this is the query i am using:

 cat goquery.sh
#!/bin/bash

SERVER=$1
PORT=$2


QUERY="http://$SERVER.blah.blah.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearch&q=9030&rows=2000&debugQuery=on&fl=*,score"

curl -v $QUERY




--
View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002969.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help understanding an issue with scoring

Posted by Jack Krupansky <ja...@basetechnology.com>.
What is your query and "qf"?

The first doc gets its high score due to a match on the 
"itemNoExactMatchStr" field which the second doc doesn't have:

12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),

With a low document frequency (inverts to high inverse document frequency):

12.014634 = idf(docFreq=140, maxDocs=8566704)

-- Jack Krupansky

-----Original Message----- 
From: geeky2
Sent: Thursday, August 23, 2012 11:44 AM
To: solr-user@lucene.apache.org
Subject: need help understanding an issue with scoring

hello,

i am trying to understand the "debug" output from a query, and specifically
- how scores for two (2) documents are derived and why they are so far
apart.

the user is entering 9030 for the search

the search is rightfully returning the top document, however - the question
is why is the document with id 90302 so far down on the list.

i have attached a text file i generated with xslt, pulling the document
information.  the text file has the itemNo, the rankNo and the partCnt.  the
sort order of the response handler is:

  <str name="sort">score desc, rankNo desc, partCnt desc</str>



if you look at the text file - you will see that 90302 is 174'th on the
list!  90302 has a rankNo of 6849 - and i would think that would drive it
much higher on the list and therefore much closer to 9030.

what is happening from a business perspective - is - 9030 is one of our top
selling parts as is 90302.  they need to be closer together in the results
instead of separated by 170+ documents that have a rankNo of 0.

i have also CnP the response handler that is being used - below

can someone help me understand the scoring so i can correct this?

this is the scoring for the two documents:

      <str name="9030                    ,0046,046">
12.014634 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
    0.022755474 = queryWeight(itemNo:9030^0.9), product of:
      0.9 = boost
      9.11329 = idf(docFreq=2565, maxDocs=8566704)
      0.0027743944 = queryNorm
    9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of:
      1.0 = tf(termFreq(itemNo:9030)=1)
      9.11329 = idf(docFreq=2565, maxDocs=8566704)
      1.0 = fieldNorm(field=itemNo, doc=2308681)
  12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
product of:
    1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1)
    12.014634 = idf(docFreq=140, maxDocs=8566704)
    1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681)
</str>




      <str name="90302                   ,0046,046">
0.20737723 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
    0.022755474 = queryWeight(itemNo:9030^0.9), product of:
      0.9 = boost
      9.11329 = idf(docFreq=2565, maxDocs=8566704)
      0.0027743944 = queryNorm
    9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
      1.0 = tf(termFreq(itemNo:9030)=1)
      9.11329 = idf(docFreq=2565, maxDocs=8566704)
      1.0 = fieldNorm(field=itemNo, doc=1796597)
</str>

~

  <requestHandler name="itemNoProductTypeBrandSearch"
class="solr.SearchHandler" default="false">
    <lst name="defaults">
      <str name="defType">edismax</str>
      <str name="echoParams">all</str>
      <int name="rows">10</int>
      <str name="qf">itemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
brand^.5</str>
      <str name="q.alt">*:*</str>
      <str name="sort">score desc, rankNo desc, partCnt desc</str>
      <str name="facet">true</str>
      <str name="facet.field">itemDescFacet</str>
      <str name="facet.field">brandFacet</str>
      <str name="facet.field">divProductTypeIdFacet</str>
    </lst>
    <lst name="appends">
    </lst>
    <lst name="invariants">
    </lst>
  </requestHandler>

thank you for any help




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html
Sent from the Solr - User mailing list archive at Nabble.com.