You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2009/02/20 22:03:00 UTC

queryNorm affect on score

The explanation of scores from the same document returned from 2 similar
queries differ in an unexpected way. There are 2 fields involved, 'contents'
and 'literals'. The 'literals' field has setBoost = 0. As you an see from
the explanations below, the total weight of the matching terms from the
'literal' field is 0. However, the weights produced by the matching terms in
the 'contents' field is very different, even with the same matching terms.
The reason is that the 'queryNorm' value is very different because the
'sumOfSquaredWeights' is very different. Why is this?

First query: +(+contents:sales +contents:representative) +literals:jb$1
Explanation:
32.274593  sum of:
  32.274593  sum of:
    10.336284  weight(contents:sales in 14578), product of:
      0.54963183  queryWeight(contents:sales), product of:
        2.6595461  idf(contents: sales=83179)
        0.20666377  queryNorm
      18.805832  fieldWeight(contents:sales in 14578), product of:
        7.071068  btq, product of:
          1.4142135  tf(phraseFreq=3.0)
          5.0  scorePayload(...)
        2.6595461  idf(contents: sales=83179)
        1.0  fieldNorm(field=contents, doc=14578)
    21.93831  weight(contents:representative in 14578), product of:
      0.8007395  queryWeight(contents:representative), product of:
        3.8746004  idf(contents: representative=24678)
        0.20666377  queryNorm
      27.397562  fieldWeight(contents:representative in 14578), product of:
        7.071068  btq, product of:
          1.4142135  tf(phraseFreq=2.0)
          5.0  scorePayload(...)
        3.8746004  idf(contents: representative=24678)
        1.0  fieldNorm(field=contents, doc=14578)
  0.0  weight(literals:jb$1 in 14578), product of:
    0.23816177  queryWeight(literals:jb$1), product of:
      1.1524118  idf(docFreq=375455, numDocs=436917)
      0.20666377  queryNorm
    0.0  fieldWeight(literals:jb$1 in 14578), product of:
      1.0  tf(termFreq(literals:jb$1)=1)
      1.1524118  idf(docFreq=375455, numDocs=436917)
      0.0  fieldNorm(field=literals, doc=14578)


Second query: +(+contents:sales +contents:representative) +(literals:jb$1
literals:jb$9999)
Explanation:
10.550879  sum of:
  10.550879  sum of:
    3.3790317  weight(contents:sales in 14578), product of:
      0.17967999  queryWeight(contents:sales), product of:
        2.6595461  idf(contents: sales=83179)
        0.0675604  queryNorm
      18.805832  fieldWeight(contents:sales in 14578), product of:
        7.071068  btq, product of:
          1.4142135  tf(phraseFreq=3.0)
          5.0  scorePayload(...)
        2.6595461  idf(contents: sales=83179)
        1.0  fieldNorm(field=contents, doc=14578)
    7.171847  weight(contents:representative in 14578), product of:
      0.26176953  queryWeight(contents:representative), product of:
        3.8746004  idf(contents: representative=24678)
        0.0675604  queryNorm
      27.397562  fieldWeight(contents:representative in 14578), product of:
        7.071068  btq, product of:
          1.4142135  tf(phraseFreq=2.0)
          5.0  scorePayload(...)
        3.8746004  idf(contents: representative=24678)
        1.0  fieldNorm(field=contents, doc=14578)
  0.0  product of:
    0.0  sum of:
      0.0  weight(literals:jb$1 in 14578), product of:
        0.0778574  queryWeight(literals:jb$1), product of:
          1.1524118  idf(docFreq=375455, numDocs=436917)
          0.0675604  queryNorm
        0.0  fieldWeight(literals:jb$1 in 14578), product of:
          1.0  tf(termFreq(literals:jb$1)=1)
          1.1524118  idf(docFreq=375455, numDocs=436917)
          0.0  fieldNorm(field=literals, doc=14578)
    0.5  coord(1/2)





Peter

Re: queryNorm affect on score

Posted by Peter Keegan <pe...@gmail.com>.

If I set the boost=0 at query time and the query contains only terms with
boost=0, the scores are NaN (because weight.queryNorm = 1/0 = infinity),
instead of 0.

Peter


On Sun, Mar 1, 2009 at 9:27 PM, Erick Erickson <er...@gmail.com>wrote:

> FWIW, Hossman pointed out that the difference between index and
> query time boosts is that index time boosts on title, for instance,
> express "I care about this document's title more than other documents'
> titles [when it matches]" Query time boosts express "I care about matches
> on the title field more than matches on other fields".
>
> Best
> Erick
>
> On Sun, Mar 1, 2009 at 8:57 PM, Peter Keegan <pe...@gmail.com>
> wrote:
>
> > As suggested, I added a query-time boost of 0.0f to the 'literals' field
> > (with index-time boost still there) and I did get the same scores for
> both
> > queries :)  (there is a subtlety between index-time and query-time
> boosting
> > that I missed.)
> >
> > I also tried disabling the coord factor, but that had no affect on the
> > score, when combined with the above. This seems ok in this example since
> > the
> > the matching terms had boost = 0.
> >
> > Thanks Yonik,
> > Peter
> >
> >
> >
> > On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley <
> yonik@lucidimagination.com
> > >wrote:
> >
> > > On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan <pe...@gmail.com>
> > > wrote:
> > > >> in situations where you  deal with simple query types, and matching
> > > query
> > > > structures, the queryNorm
> > > >> *can* be used to make scores semi-comparable.
> > > >
> > > > Hmm. My example used matching query structures. The only difference
> was
> > a
> > > > single term in a field with zero weight that didn't exist in the
> > matching
> > > > document. But one score was 3X the other.
> > >
> > > But the zero boost was an index-time boost, and the queryNorm takes
> > > into account query-time boosts and idfs.  You might get closer to what
> > > you expect with a query time boost of 0.0f
> > >
> > > The other thing affecting the score is the coord factor - the fact
> > > that fewer of the optional terms matched (1/2) lowers the score.  The
> > > coordination factor can be disabled on any BooleanQuery.
> > >
> > > If you do both of the above, I *think* you would get the same scores
> > > for this specific example.
> > >
> > > -Yonik
> > > http://www.lucidimagination.com
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Re: queryNorm affect on score

Posted by Erick Erickson <er...@gmail.com>.

FWIW, Hossman pointed out that the difference between index and
query time boosts is that index time boosts on title, for instance,
express "I care about this document's title more than other documents'
titles [when it matches]" Query time boosts express "I care about matches
on the title field more than matches on other fields".

Best
Erick

On Sun, Mar 1, 2009 at 8:57 PM, Peter Keegan <pe...@gmail.com> wrote:

> As suggested, I added a query-time boost of 0.0f to the 'literals' field
> (with index-time boost still there) and I did get the same scores for both
> queries :)  (there is a subtlety between index-time and query-time boosting
> that I missed.)
>
> I also tried disabling the coord factor, but that had no affect on the
> score, when combined with the above. This seems ok in this example since
> the
> the matching terms had boost = 0.
>
> Thanks Yonik,
> Peter
>
>
>
> On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley <yonik@lucidimagination.com
> >wrote:
>
> > On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan <pe...@gmail.com>
> > wrote:
> > >> in situations where you  deal with simple query types, and matching
> > query
> > > structures, the queryNorm
> > >> *can* be used to make scores semi-comparable.
> > >
> > > Hmm. My example used matching query structures. The only difference was
> a
> > > single term in a field with zero weight that didn't exist in the
> matching
> > > document. But one score was 3X the other.
> >
> > But the zero boost was an index-time boost, and the queryNorm takes
> > into account query-time boosts and idfs.  You might get closer to what
> > you expect with a query time boost of 0.0f
> >
> > The other thing affecting the score is the coord factor - the fact
> > that fewer of the optional terms matched (1/2) lowers the score.  The
> > coordination factor can be disabled on any BooleanQuery.
> >
> > If you do both of the above, I *think* you would get the same scores
> > for this specific example.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: queryNorm affect on score

Posted by Peter Keegan <pe...@gmail.com>.

As suggested, I added a query-time boost of 0.0f to the 'literals' field
(with index-time boost still there) and I did get the same scores for both
queries :)  (there is a subtlety between index-time and query-time boosting
that I missed.)

I also tried disabling the coord factor, but that had no affect on the
score, when combined with the above. This seems ok in this example since the
the matching terms had boost = 0.

Thanks Yonik,
Peter



On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan <pe...@gmail.com>
> wrote:
> >> in situations where you  deal with simple query types, and matching
> query
> > structures, the queryNorm
> >> *can* be used to make scores semi-comparable.
> >
> > Hmm. My example used matching query structures. The only difference was a
> > single term in a field with zero weight that didn't exist in the matching
> > document. But one score was 3X the other.
>
> But the zero boost was an index-time boost, and the queryNorm takes
> into account query-time boosts and idfs.  You might get closer to what
> you expect with a query time boost of 0.0f
>
> The other thing affecting the score is the coord factor - the fact
> that fewer of the optional terms matched (1/2) lowers the score.  The
> coordination factor can be disabled on any BooleanQuery.
>
> If you do both of the above, I *think* you would get the same scores
> for this specific example.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: queryNorm affect on score

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan <pe...@gmail.com> wrote:
>> in situations where you  deal with simple query types, and matching query
> structures, the queryNorm
>> *can* be used to make scores semi-comparable.
>
> Hmm. My example used matching query structures. The only difference was a
> single term in a field with zero weight that didn't exist in the matching
> document. But one score was 3X the other.

But the zero boost was an index-time boost, and the queryNorm takes
into account query-time boosts and idfs.  You might get closer to what
you expect with a query time boost of 0.0f

The other thing affecting the score is the coord factor - the fact
that fewer of the optional terms matched (1/2) lowers the score.  The
coordination factor can be disabled on any BooleanQuery.

If you do both of the above, I *think* you would get the same scores
for this specific example.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: queryNorm affect on score

Posted by Peter Keegan <pe...@gmail.com>.

> in situations where you  deal with simple query types, and matching query
structures, the queryNorm
> *can* be used to make scores semi-comparable.

Hmm. My example used matching query structures. The only difference was a
single term in a field with zero weight that didn't exist in the matching
document. But one score was 3X the other.

Peter

On Sat, Feb 28, 2009 at 12:35 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : I guess I don't really understand this comment in the similarity java doc
> : then:
> :
> :
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
> :
> : *queryNorm(q) * is a normalizing factor used to make scores between
> queries
> : comparable.
>
> that comment should probably be removed ... in situations where you
> deal with simple query types, and matching query structures, the queryNorm
> *can* be used to make scores semi-comparable.
>
> To be 100% correct about what the queryNorm does in all cases: it
> normalizes each of the constituent values that are used in the score
> computation relative to the other constituent values.  the main value I've
> seen from it is that it prevents a loss of floating point accuracy that
> can result from addition/multiplication of large values.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: queryNorm affect on score

Posted by Chris Hostetter <ho...@fucit.org>.

: I guess I don't really understand this comment in the similarity java doc
: then:
: 
: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
: 
: *queryNorm(q) * is a normalizing factor used to make scores between queries
: comparable.

that comment should probably be removed ... in situations where you 
deal with simple query types, and matching query structures, the queryNorm 
*can* be used to make scores semi-comparable.

To be 100% correct about what the queryNorm does in all cases: it 
normalizes each of the constituent values that are used in the score 
computation relative to the other constituent values.  the main value I've 
seen from it is that it prevents a loss of floating point accuracy that 
can result from addition/multiplication of large values.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: queryNorm affect on score

Posted by Michael Stoppelman <st...@gmail.com>.

I guess I don't really understand this comment in the similarity java doc
then:

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm

*queryNorm(q) * is a normalizing factor used to make scores between queries
comparable.

:/.

M

On Fri, Feb 27, 2009 at 9:44 AM, Peter Keegan <pe...@gmail.com>wrote:

> Got it. This is another example of why scores can't be compared between
> (even similar) queries.
>  (we don't)
>
> Thanks.
>
> On Fri, Feb 27, 2009 at 11:39 AM, Yonik Seeley
> <yo...@lucidimagination.com>wrote:
>
> > On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan <pe...@gmail.com>
> > wrote:
> > > Any comments about this? Is this just the way queryNorm works or is
> this
> > a
> > > bug?
> >
> > That's just the way it works... since it's applied to all clauses, it
> > really just changes the range of scores returned, not relative
> > ordering of documents or anything.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: queryNorm affect on score

Posted by Peter Keegan <pe...@gmail.com>.

Got it. This is another example of why scores can't be compared between
(even similar) queries.
 (we don't)

Thanks.

On Fri, Feb 27, 2009 at 11:39 AM, Yonik Seeley
<yo...@lucidimagination.com>wrote:

> On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan <pe...@gmail.com>
> wrote:
> > Any comments about this? Is this just the way queryNorm works or is this
> a
> > bug?
>
> That's just the way it works... since it's applied to all clauses, it
> really just changes the range of scores returned, not relative
> ordering of documents or anything.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: queryNorm affect on score

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan <pe...@gmail.com> wrote:
> Any comments about this? Is this just the way queryNorm works or is this a
> bug?

That's just the way it works... since it's applied to all clauses, it
really just changes the range of scores returned, not relative
ordering of documents or anything.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: queryNorm affect on score

Posted by Peter Keegan <pe...@gmail.com>.

Any comments about this? Is this just the way queryNorm works or is this a
bug?

Thanks,
Peter

On Fri, Feb 20, 2009 at 4:03 PM, Peter Keegan <pe...@gmail.com>wrote:

>
> The explanation of scores from the same document returned from 2 similar
> queries differ in an unexpected way. There are 2 fields involved, 'contents'
> and 'literals'. The 'literals' field has setBoost = 0. As you an see from
> the explanations below, the total weight of the matching terms from the
> 'literal' field is 0. However, the weights produced by the matching terms in
> the 'contents' field is very different, even with the same matching terms.
> The reason is that the 'queryNorm' value is very different because the
> 'sumOfSquaredWeights' is very different. Why is this?
>
> First query: +(+contents:sales +contents:representative) +literals:jb$1
> Explanation:
> 32.274593  sum of:
>   32.274593  sum of:
>     10.336284  weight(contents:sales in 14578), product of:
>       0.54963183  queryWeight(contents:sales), product of:
>         2.6595461  idf(contents: sales=83179)
>         0.20666377  queryNorm
>       18.805832  fieldWeight(contents:sales in 14578), product of:
>         7.071068  btq, product of:
>           1.4142135  tf(phraseFreq=3.0)
>           5.0  scorePayload(...)
>         2.6595461  idf(contents: sales=83179)
>         1.0  fieldNorm(field=contents, doc=14578)
>     21.93831  weight(contents:representative in 14578), product of:
>       0.8007395  queryWeight(contents:representative), product of:
>         3.8746004  idf(contents: representative=24678)
>         0.20666377  queryNorm
>       27.397562  fieldWeight(contents:representative in 14578), product of:
>         7.071068  btq, product of:
>           1.4142135  tf(phraseFreq=2.0)
>           5.0  scorePayload(...)
>         3.8746004  idf(contents: representative=24678)
>         1.0  fieldNorm(field=contents, doc=14578)
>   0.0  weight(literals:jb$1 in 14578), product of:
>     0.23816177  queryWeight(literals:jb$1), product of:
>       1.1524118  idf(docFreq=375455, numDocs=436917)
>       0.20666377  queryNorm
>     0.0  fieldWeight(literals:jb$1 in 14578), product of:
>       1.0  tf(termFreq(literals:jb$1)=1)
>       1.1524118  idf(docFreq=375455, numDocs=436917)
>       0.0  fieldNorm(field=literals, doc=14578)
>
>
> Second query: +(+contents:sales +contents:representative) +(literals:jb$1
> literals:jb$9999)
> Explanation:
> 10.550879  sum of:
>   10.550879  sum of:
>     3.3790317  weight(contents:sales in 14578), product of:
>       0.17967999  queryWeight(contents:sales), product of:
>         2.6595461  idf(contents: sales=83179)
>         0.0675604  queryNorm
>       18.805832  fieldWeight(contents:sales in 14578), product of:
>         7.071068  btq, product of:
>           1.4142135  tf(phraseFreq=3.0)
>           5.0  scorePayload(...)
>         2.6595461  idf(contents: sales=83179)
>         1.0  fieldNorm(field=contents, doc=14578)
>     7.171847  weight(contents:representative in 14578), product of:
>       0.26176953  queryWeight(contents:representative), product of:
>         3.8746004  idf(contents: representative=24678)
>         0.0675604  queryNorm
>       27.397562  fieldWeight(contents:representative in 14578), product of:
>         7.071068  btq, product of:
>           1.4142135  tf(phraseFreq=2.0)
>           5.0  scorePayload(...)
>         3.8746004  idf(contents: representative=24678)
>         1.0  fieldNorm(field=contents, doc=14578)
>   0.0  product of:
>     0.0  sum of:
>       0.0  weight(literals:jb$1 in 14578), product of:
>         0.0778574  queryWeight(literals:jb$1), product of:
>           1.1524118  idf(docFreq=375455, numDocs=436917)
>           0.0675604  queryNorm
>         0.0  fieldWeight(literals:jb$1 in 14578), product of:
>           1.0  tf(termFreq(literals:jb$1)=1)
>           1.1524118  idf(docFreq=375455, numDocs=436917)
>           0.0  fieldNorm(field=literals, doc=14578)
>     0.5  coord(1/2)
>
>
>
>
>
> Peter
>