You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Elaine Cario <et...@gmail.com> on 2019/01/22 16:16:55 UTC

difference in behavior of term boosting between Solr 6 and Solr 7

We're preparing to upgrade from Solr 6.4.2 to Solr 7.6.0, and found an
inconsistency in scoring. It appears that term boosts in the query are not
applied in Solr 7.

The query itself against both versions is identical (removed un-important
params):

<str name="q">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="defType">edismax</str>
<str name="qf">max_term</str>
<str name="q.op">AND</str>
<str name="fq">dictionary_id:"WKUS-TAL-DEPLURALIZATION-THESAURUS"</str>
<str name="rows">100</str>
<str name="wt">xml</str>
<str name="debugQuery">on</str>
</lst>

3 documents are returned, but in Solr 6 results the docs are returned in
order of the boosts (three, two, one), as the boosts accounts for the
entirety of the score, while in Solr 7 they are returned randomly, as all
the scores are 1.0.

Looking at the debug and explains, in Solr 6 the boost is multiplied to the
rest of the score:

<lst name="debug">
<str name="rawquerystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="querystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="parsedquery">(+(DisjunctionMaxQuery((max_term:"aaaa one
zzzz"))^1.0 DisjunctionMaxQuery((max_term:"aaaa two zzzz"))^2.0
DisjunctionMaxQuery((max_term:"aaaa three zzzz"))^3.0))/no_coord</str>
<str name="parsedquery_toString">+(((max_term:"aaaa one zzzz"))^1.0
((max_term:"aaaa two zzzz"))^2.0 ((max_term:"aaaa three zzzz"))^3.0)</str>
<lst name="explain">
<str name="WKUS-TAL-DEPLURALIZATION-THESAURUS_three">
3.0 = sum of:
  3.0 = weight(max_term:"aaaa three zzzz" in 658) [WKSimilarity], result of:
    3.0 = score(doc=658,freq=1.0 = phraseFreq=1.0
), product of:
      3.0 = boost
      1.0 = idf(), for phrases, always set to 1
      1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
[WKSimilarity] from:
        1.0 = phraseFreq=1.0
        1.2 = k1a
        1.2 = k1b
        0.0 = b (norms omitted for field)
</str>

But in Solr 7, the boost is not there at all:

<lst name="debug">
<str name="rawquerystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="querystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="parsedquery">+((+DisjunctionMaxQuery((max_term:"aaaa one
zzzz"))^1.0) (+DisjunctionMaxQuery((max_term:"aaaa two zzzz"))^2.0)
(+DisjunctionMaxQuery((max_term:"aaaa three zzzz"))^3.0))</str>
<str name="parsedquery_toString">+((+((max_term:"aaaa one zzzz"))^1.0)
(+((max_term:"aaaa two zzzz"))^2.0) (+((max_term:"aaaa three
zzzz"))^3.0))</str>
<lst name="explain">
<str name="WKUS-TAL-DEPLURALIZATION-THESAURUS_two">
1.0 = sum of:
  1.0 = weight(max_term:"aaaa two zzzz" in 436) [WKSimilarity], result of:
    1.0 = score(doc=436,freq=1.0 = phraseFreq=1.0
), product of:
      1.0 = idf(), for phrases, always set to 1
      1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
[WKSimilarity] from:
        1.0 = phraseFreq=1.0
        1.2 = k1a
        1.2 = k1b
        0.0 = b (norms omitted for field)
</str>

I noted a subtle difference in the parsedquery between the 2 versions as
well, not sure if that is causing the boost to drop out in Solr 7:

SOLR 6:  +(((max_term:"aaaa one zzzz"))^1.0 ((max_term:"aaaa two
zzzz"))^2.0 ((max_term:"aaaa three zzzz"))^3.0)
SOLR 7:  +((+((max_term:"aaaa one zzzz"))^1.0) (+((max_term:"aaaa two
zzzz"))^2.0) (+((max_term:"aaaa three zzzz"))^3.0))
For our use case , I think we can work around it using a constant score
query, but it would be good to know if this is a bug or expected behavior,
or we're missing something in the query to get boost to work again.

Thanks!

Re: difference in behavior of term boosting between Solr 6 and Solr 7

Posted by Elaine Cario <et...@gmail.com>.
I predicted some colleague would come to me 2 minutes after I sent this
with some finding - I was wrong, it was a few hours! It seems there was a
change in a custom similarity class (I think because of an API change in
Solr), which caused the query boost to not be applied.  We're looking at
this angle, so please ignore this for now.

On Tue, Jan 22, 2019 at 11:16 AM Elaine Cario <et...@gmail.com> wrote:

> We're preparing to upgrade from Solr 6.4.2 to Solr 7.6.0, and found an
> inconsistency in scoring. It appears that term boosts in the query are not
> applied in Solr 7.
>
> The query itself against both versions is identical (removed un-important
> params):
>
> <str name="q">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="defType">edismax</str>
> <str name="qf">max_term</str>
> <str name="q.op">AND</str>
> <str name="fq">dictionary_id:"WKUS-TAL-DEPLURALIZATION-THESAURUS"</str>
> <str name="rows">100</str>
> <str name="wt">xml</str>
> <str name="debugQuery">on</str>
> </lst>
>
> 3 documents are returned, but in Solr 6 results the docs are returned in
> order of the boosts (three, two, one), as the boosts accounts for the
> entirety of the score, while in Solr 7 they are returned randomly, as all
> the scores are 1.0.
>
> Looking at the debug and explains, in Solr 6 the boost is multiplied to
> the rest of the score:
>
> <lst name="debug">
> <str name="rawquerystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="querystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="parsedquery">(+(DisjunctionMaxQuery((max_term:"aaaa one
> zzzz"))^1.0 DisjunctionMaxQuery((max_term:"aaaa two zzzz"))^2.0
> DisjunctionMaxQuery((max_term:"aaaa three zzzz"))^3.0))/no_coord</str>
> <str name="parsedquery_toString">+(((max_term:"aaaa one zzzz"))^1.0
> ((max_term:"aaaa two zzzz"))^2.0 ((max_term:"aaaa three zzzz"))^3.0)</str>
> <lst name="explain">
> <str name="WKUS-TAL-DEPLURALIZATION-THESAURUS_three">
> 3.0 = sum of:
>   3.0 = weight(max_term:"aaaa three zzzz" in 658) [WKSimilarity], result
> of:
>     3.0 = score(doc=658,freq=1.0 = phraseFreq=1.0
> ), product of:
>       3.0 = boost
>       1.0 = idf(), for phrases, always set to 1
>       1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
> [WKSimilarity] from:
>         1.0 = phraseFreq=1.0
>         1.2 = k1a
>         1.2 = k1b
>         0.0 = b (norms omitted for field)
> </str>
>
> But in Solr 7, the boost is not there at all:
>
> <lst name="debug">
> <str name="rawquerystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="querystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="parsedquery">+((+DisjunctionMaxQuery((max_term:"aaaa one
> zzzz"))^1.0) (+DisjunctionMaxQuery((max_term:"aaaa two zzzz"))^2.0)
> (+DisjunctionMaxQuery((max_term:"aaaa three zzzz"))^3.0))</str>
> <str name="parsedquery_toString">+((+((max_term:"aaaa one zzzz"))^1.0)
> (+((max_term:"aaaa two zzzz"))^2.0) (+((max_term:"aaaa three
> zzzz"))^3.0))</str>
> <lst name="explain">
> <str name="WKUS-TAL-DEPLURALIZATION-THESAURUS_two">
> 1.0 = sum of:
>   1.0 = weight(max_term:"aaaa two zzzz" in 436) [WKSimilarity], result of:
>     1.0 = score(doc=436,freq=1.0 = phraseFreq=1.0
> ), product of:
>       1.0 = idf(), for phrases, always set to 1
>       1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
> [WKSimilarity] from:
>         1.0 = phraseFreq=1.0
>         1.2 = k1a
>         1.2 = k1b
>         0.0 = b (norms omitted for field)
> </str>
>
> I noted a subtle difference in the parsedquery between the 2 versions as
> well, not sure if that is causing the boost to drop out in Solr 7:
>
> SOLR 6:  +(((max_term:"aaaa one zzzz"))^1.0 ((max_term:"aaaa two
> zzzz"))^2.0 ((max_term:"aaaa three zzzz"))^3.0)
> SOLR 7:  +((+((max_term:"aaaa one zzzz"))^1.0) (+((max_term:"aaaa two
> zzzz"))^2.0) (+((max_term:"aaaa three zzzz"))^3.0))
> For our use case , I think we can work around it using a constant score
> query, but it would be good to know if this is a bug or expected behavior,
> or we're missing something in the query to get boost to work again.
>
> Thanks!
>
>
>
>
>