You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by bbarani <bb...@gmail.com> on 2018/03/26 18:39:04 UTC

Score different for different documents containing same value

Hi,

I was trying to query a field that has specific term in it and to my
surprise the score was different for different documents even though the
field I am searching for contained the same exact terms in all the
documents. 

Any idea when this issue would come up?

*Note:* All the documents contained the value 'iphone brown case' in query_t
field and I am on SOLR 6.1

*Query:*
select?q=iphone+brown+case&omitHeader=false&fl=score,query_t,timestamp_tdt&sort=score%20desc&wt=xml&qf=query_t&defType=edismax&mm=100%25&rows=5

<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">9</int>
<lst name="params">
<str name="mm">100%</str>
<str name="q">iphone brown case</str>
<str name="defType">edismax</str>
<str name="omitHeader">false</str>
<str name="qf">query_t</str>
<str name="fl">score,query_t,timestamp_tdt</str>
<str name="callback">getSuggestions</str>
<str name="sort">score desc</str>
<str name="rows">5</str>
<str name="wt">xml</str>
<str name="_">1521045725381</str>
</lst>
</lst>
<result name="response" numFound="4" start="0" maxScore="6.306856">
<doc>
<arr name="query_t">
<str>iphone brown case</str>
</arr>
<date name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
<float name="score">*6.306856*</float>
</doc>
<doc>
<arr name="query_t">
<str>iphone brown case</str>
</arr>
<date name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
<float name="score">*4.8550515*</float>
</doc>
<doc>
<arr name="query_t">
<str>iphone brown case</str>
</arr>
<date name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
<float name="score">*4.8550515*</float>
</doc>
<doc>
<arr name="query_t">
<str>iphone brown case</str>
</arr>
<date name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
<float name="score">*4.8550515*</float>
</doc>
</result>
</response>



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Score different for different documents containing same value

Posted by Erick Erickson <er...@gmail.com>.
add debug=true to the query and you'll see exactly how the scores are
calculated, that should give you a clue as to what's going on.

In particular look at the parsed query and be sure that your query is
parsed as you expect. It should be given you you specify the query,
but as a sanity check.

Is your setup sharded? If so, fire the query at each replica (add
&distrib=false) and see what the scores are.

If this is a very small corpus, a few deleted documents can skew the scores.

Try turning on distributed IDF (assuming your collection is sharded).
The stats on different shards can be different on a small corpus, it's
only when you get into significant numbers of docs that the stats even
out.

Oh, and a side note. To make the return order deterministic, I'd add a
secondary sort on id. It's not your problem at this point, but when
all the sort criteria match, the _internal_ Lucene doc ID is used to
break ties, and that can vary after segments are merged. For future
reference.

Best,
Erick



On Mon, Mar 26, 2018 at 11:39 AM, bbarani <bb...@gmail.com> wrote:
> Hi,
>
> I was trying to query a field that has specific term in it and to my
> surprise the score was different for different documents even though the
> field I am searching for contained the same exact terms in all the
> documents.
>
> Any idea when this issue would come up?
>
> *Note:* All the documents contained the value 'iphone brown case' in query_t
> field and I am on SOLR 6.1
>
> *Query:*
> select?q=iphone+brown+case&omitHeader=false&fl=score,query_t,timestamp_tdt&sort=score%20desc&wt=xml&qf=query_t&defType=edismax&mm=100%25&rows=5
>
> <response>
> <lst name="responseHeader">
> <bool name="zkConnected">true</bool>
> <int name="status">0</int>
> <int name="QTime">9</int>
> <lst name="params">
> <str name="mm">100%</str>
> <str name="q">iphone brown case</str>
> <str name="defType">edismax</str>
> <str name="omitHeader">false</str>
> <str name="qf">query_t</str>
> <str name="fl">score,query_t,timestamp_tdt</str>
> <str name="callback">getSuggestions</str>
> <str name="sort">score desc</str>
> <str name="rows">5</str>
> <str name="wt">xml</str>
> <str name="_">1521045725381</str>
> </lst>
> </lst>
> <result name="response" numFound="4" start="0" maxScore="6.306856">
> <doc>
> <arr name="query_t">
> <str>iphone brown case</str>
> </arr>
> <date name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
> <float name="score">*6.306856*</float>
> </doc>
> <doc>
> <arr name="query_t">
> <str>iphone brown case</str>
> </arr>
> <date name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
> <float name="score">*4.8550515*</float>
> </doc>
> <doc>
> <arr name="query_t">
> <str>iphone brown case</str>
> </arr>
> <date name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
> <float name="score">*4.8550515*</float>
> </doc>
> <doc>
> <arr name="query_t">
> <str>iphone brown case</str>
> </arr>
> <date name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
> <float name="score">*4.8550515*</float>
> </doc>
> </result>
> </response>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html