You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2009/02/20 22:03:00 UTC
queryNorm affect on score
The explanation of scores from the same document returned from 2 similar
queries differ in an unexpected way. There are 2 fields involved, 'contents'
and 'literals'. The 'literals' field has setBoost = 0. As you an see from
the explanations below, the total weight of the matching terms from the
'literal' field is 0. However, the weights produced by the matching terms in
the 'contents' field is very different, even with the same matching terms.
The reason is that the 'queryNorm' value is very different because the
'sumOfSquaredWeights' is very different. Why is this?
First query: +(+contents:sales +contents:representative) +literals:jb$1
Explanation:
32.274593 sum of:
32.274593 sum of:
10.336284 weight(contents:sales in 14578), product of:
0.54963183 queryWeight(contents:sales), product of:
2.6595461 idf(contents: sales=83179)
0.20666377 queryNorm
18.805832 fieldWeight(contents:sales in 14578), product of:
7.071068 btq, product of:
1.4142135 tf(phraseFreq=3.0)
5.0 scorePayload(...)
2.6595461 idf(contents: sales=83179)
1.0 fieldNorm(field=contents, doc=14578)
21.93831 weight(contents:representative in 14578), product of:
0.8007395 queryWeight(contents:representative), product of:
3.8746004 idf(contents: representative=24678)
0.20666377 queryNorm
27.397562 fieldWeight(contents:representative in 14578), product of:
7.071068 btq, product of:
1.4142135 tf(phraseFreq=2.0)
5.0 scorePayload(...)
3.8746004 idf(contents: representative=24678)
1.0 fieldNorm(field=contents, doc=14578)
0.0 weight(literals:jb$1 in 14578), product of:
0.23816177 queryWeight(literals:jb$1), product of:
1.1524118 idf(docFreq=375455, numDocs=436917)
0.20666377 queryNorm
0.0 fieldWeight(literals:jb$1 in 14578), product of:
1.0 tf(termFreq(literals:jb$1)=1)
1.1524118 idf(docFreq=375455, numDocs=436917)
0.0 fieldNorm(field=literals, doc=14578)
Second query: +(+contents:sales +contents:representative) +(literals:jb$1
literals:jb$9999)
Explanation:
10.550879 sum of:
10.550879 sum of:
3.3790317 weight(contents:sales in 14578), product of:
0.17967999 queryWeight(contents:sales), product of:
2.6595461 idf(contents: sales=83179)
0.0675604 queryNorm
18.805832 fieldWeight(contents:sales in 14578), product of:
7.071068 btq, product of:
1.4142135 tf(phraseFreq=3.0)
5.0 scorePayload(...)
2.6595461 idf(contents: sales=83179)
1.0 fieldNorm(field=contents, doc=14578)
7.171847 weight(contents:representative in 14578), product of:
0.26176953 queryWeight(contents:representative), product of:
3.8746004 idf(contents: representative=24678)
0.0675604 queryNorm
27.397562 fieldWeight(contents:representative in 14578), product of:
7.071068 btq, product of:
1.4142135 tf(phraseFreq=2.0)
5.0 scorePayload(...)
3.8746004 idf(contents: representative=24678)
1.0 fieldNorm(field=contents, doc=14578)
0.0 product of:
0.0 sum of:
0.0 weight(literals:jb$1 in 14578), product of:
0.0778574 queryWeight(literals:jb$1), product of:
1.1524118 idf(docFreq=375455, numDocs=436917)
0.0675604 queryNorm
0.0 fieldWeight(literals:jb$1 in 14578), product of:
1.0 tf(termFreq(literals:jb$1)=1)
1.1524118 idf(docFreq=375455, numDocs=436917)
0.0 fieldNorm(field=literals, doc=14578)
0.5 coord(1/2)
Peter
Re: queryNorm affect on score
Posted by Peter Keegan <pe...@gmail.com>.
If I set the boost=0 at query time and the query contains only terms with
boost=0, the scores are NaN (because weight.queryNorm = 1/0 = infinity),
instead of 0.
Peter
On Sun, Mar 1, 2009 at 9:27 PM, Erick Erickson <er...@gmail.com>wrote:
> FWIW, Hossman pointed out that the difference between index and
> query time boosts is that index time boosts on title, for instance,
> express "I care about this document's title more than other documents'
> titles [when it matches]" Query time boosts express "I care about matches
> on the title field more than matches on other fields".
>
> Best
> Erick
>
> On Sun, Mar 1, 2009 at 8:57 PM, Peter Keegan <pe...@gmail.com>
> wrote:
>
> > As suggested, I added a query-time boost of 0.0f to the 'literals' field
> > (with index-time boost still there) and I did get the same scores for
> both
> > queries :) (there is a subtlety between index-time and query-time
> boosting
> > that I missed.)
> >
> > I also tried disabling the coord factor, but that had no affect on the
> > score, when combined with the above. This seems ok in this example since
> > the
> > the matching terms had boost = 0.
> >
> > Thanks Yonik,
> > Peter
> >
> >
> >
> > On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley <
> yonik@lucidimagination.com
> > >wrote:
> >
> > > On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan <pe...@gmail.com>
> > > wrote:
> > > >> in situations where you deal with simple query types, and matching
> > > query
> > > > structures, the queryNorm
> > > >> *can* be used to make scores semi-comparable.
> > > >
> > > > Hmm. My example used matching query structures. The only difference
> was
> > a
> > > > single term in a field with zero weight that didn't exist in the
> > matching
> > > > document. But one score was 3X the other.
> > >
> > > But the zero boost was an index-time boost, and the queryNorm takes
> > > into account query-time boosts and idfs. You might get closer to what
> > > you expect with a query time boost of 0.0f
> > >
> > > The other thing affecting the score is the coord factor - the fact
> > > that fewer of the optional terms matched (1/2) lowers the score. The
> > > coordination factor can be disabled on any BooleanQuery.
> > >
> > > If you do both of the above, I *think* you would get the same scores
> > > for this specific example.
> > >
> > > -Yonik
> > > http://www.lucidimagination.com
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
Re: queryNorm affect on score
Posted by Erick Erickson <er...@gmail.com>.
FWIW, Hossman pointed out that the difference between index and
query time boosts is that index time boosts on title, for instance,
express "I care about this document's title more than other documents'
titles [when it matches]" Query time boosts express "I care about matches
on the title field more than matches on other fields".
Best
Erick
On Sun, Mar 1, 2009 at 8:57 PM, Peter Keegan <pe...@gmail.com> wrote:
> As suggested, I added a query-time boost of 0.0f to the 'literals' field
> (with index-time boost still there) and I did get the same scores for both
> queries :) (there is a subtlety between index-time and query-time boosting
> that I missed.)
>
> I also tried disabling the coord factor, but that had no affect on the
> score, when combined with the above. This seems ok in this example since
> the
> the matching terms had boost = 0.
>
> Thanks Yonik,
> Peter
>
>
>
> On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley <yonik@lucidimagination.com
> >wrote:
>
> > On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan <pe...@gmail.com>
> > wrote:
> > >> in situations where you deal with simple query types, and matching
> > query
> > > structures, the queryNorm
> > >> *can* be used to make scores semi-comparable.
> > >
> > > Hmm. My example used matching query structures. The only difference was
> a
> > > single term in a field with zero weight that didn't exist in the
> matching
> > > document. But one score was 3X the other.
> >
> > But the zero boost was an index-time boost, and the queryNorm takes
> > into account query-time boosts and idfs. You might get closer to what
> > you expect with a query time boost of 0.0f
> >
> > The other thing affecting the score is the coord factor - the fact
> > that fewer of the optional terms matched (1/2) lowers the score. The
> > coordination factor can be disabled on any BooleanQuery.
> >
> > If you do both of the above, I *think* you would get the same scores
> > for this specific example.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
Re: queryNorm affect on score
Posted by Peter Keegan <pe...@gmail.com>.
As suggested, I added a query-time boost of 0.0f to the 'literals' field
(with index-time boost still there) and I did get the same scores for both
queries :) (there is a subtlety between index-time and query-time boosting
that I missed.)
I also tried disabling the coord factor, but that had no affect on the
score, when combined with the above. This seems ok in this example since the
the matching terms had boost = 0.
Thanks Yonik,
Peter
On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:
> On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan <pe...@gmail.com>
> wrote:
> >> in situations where you deal with simple query types, and matching
> query
> > structures, the queryNorm
> >> *can* be used to make scores semi-comparable.
> >
> > Hmm. My example used matching query structures. The only difference was a
> > single term in a field with zero weight that didn't exist in the matching
> > document. But one score was 3X the other.
>
> But the zero boost was an index-time boost, and the queryNorm takes
> into account query-time boosts and idfs. You might get closer to what
> you expect with a query time boost of 0.0f
>
> The other thing affecting the score is the coord factor - the fact
> that fewer of the optional terms matched (1/2) lowers the score. The
> coordination factor can be disabled on any BooleanQuery.
>
> If you do both of the above, I *think* you would get the same scores
> for this specific example.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: queryNorm affect on score
Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan <pe...@gmail.com> wrote:
>> in situations where you deal with simple query types, and matching query
> structures, the queryNorm
>> *can* be used to make scores semi-comparable.
>
> Hmm. My example used matching query structures. The only difference was a
> single term in a field with zero weight that didn't exist in the matching
> document. But one score was 3X the other.
But the zero boost was an index-time boost, and the queryNorm takes
into account query-time boosts and idfs. You might get closer to what
you expect with a query time boost of 0.0f
The other thing affecting the score is the coord factor - the fact
that fewer of the optional terms matched (1/2) lowers the score. The
coordination factor can be disabled on any BooleanQuery.
If you do both of the above, I *think* you would get the same scores
for this specific example.
-Yonik
http://www.lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: queryNorm affect on score
Posted by Peter Keegan <pe...@gmail.com>.
> in situations where you deal with simple query types, and matching query
structures, the queryNorm
> *can* be used to make scores semi-comparable.
Hmm. My example used matching query structures. The only difference was a
single term in a field with zero weight that didn't exist in the matching
document. But one score was 3X the other.
Peter
On Sat, Feb 28, 2009 at 12:35 PM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : I guess I don't really understand this comment in the similarity java doc
> : then:
> :
> :
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
> :
> : *queryNorm(q) * is a normalizing factor used to make scores between
> queries
> : comparable.
>
> that comment should probably be removed ... in situations where you
> deal with simple query types, and matching query structures, the queryNorm
> *can* be used to make scores semi-comparable.
>
> To be 100% correct about what the queryNorm does in all cases: it
> normalizes each of the constituent values that are used in the score
> computation relative to the other constituent values. the main value I've
> seen from it is that it prevents a loss of floating point accuracy that
> can result from addition/multiplication of large values.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: queryNorm affect on score
Posted by Chris Hostetter <ho...@fucit.org>.
: I guess I don't really understand this comment in the similarity java doc
: then:
:
: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
:
: *queryNorm(q) * is a normalizing factor used to make scores between queries
: comparable.
that comment should probably be removed ... in situations where you
deal with simple query types, and matching query structures, the queryNorm
*can* be used to make scores semi-comparable.
To be 100% correct about what the queryNorm does in all cases: it
normalizes each of the constituent values that are used in the score
computation relative to the other constituent values. the main value I've
seen from it is that it prevents a loss of floating point accuracy that
can result from addition/multiplication of large values.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: queryNorm affect on score
Posted by Michael Stoppelman <st...@gmail.com>.
I guess I don't really understand this comment in the similarity java doc
then:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
*queryNorm(q) * is a normalizing factor used to make scores between queries
comparable.
:/.
M
On Fri, Feb 27, 2009 at 9:44 AM, Peter Keegan <pe...@gmail.com>wrote:
> Got it. This is another example of why scores can't be compared between
> (even similar) queries.
> (we don't)
>
> Thanks.
>
> On Fri, Feb 27, 2009 at 11:39 AM, Yonik Seeley
> <yo...@lucidimagination.com>wrote:
>
> > On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan <pe...@gmail.com>
> > wrote:
> > > Any comments about this? Is this just the way queryNorm works or is
> this
> > a
> > > bug?
> >
> > That's just the way it works... since it's applied to all clauses, it
> > really just changes the range of scores returned, not relative
> > ordering of documents or anything.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
Re: queryNorm affect on score
Posted by Peter Keegan <pe...@gmail.com>.
Got it. This is another example of why scores can't be compared between
(even similar) queries.
(we don't)
Thanks.
On Fri, Feb 27, 2009 at 11:39 AM, Yonik Seeley
<yo...@lucidimagination.com>wrote:
> On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan <pe...@gmail.com>
> wrote:
> > Any comments about this? Is this just the way queryNorm works or is this
> a
> > bug?
>
> That's just the way it works... since it's applied to all clauses, it
> really just changes the range of scores returned, not relative
> ordering of documents or anything.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: queryNorm affect on score
Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan <pe...@gmail.com> wrote:
> Any comments about this? Is this just the way queryNorm works or is this a
> bug?
That's just the way it works... since it's applied to all clauses, it
really just changes the range of scores returned, not relative
ordering of documents or anything.
-Yonik
http://www.lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: queryNorm affect on score
Posted by Peter Keegan <pe...@gmail.com>.
Any comments about this? Is this just the way queryNorm works or is this a
bug?
Thanks,
Peter
On Fri, Feb 20, 2009 at 4:03 PM, Peter Keegan <pe...@gmail.com>wrote:
>
> The explanation of scores from the same document returned from 2 similar
> queries differ in an unexpected way. There are 2 fields involved, 'contents'
> and 'literals'. The 'literals' field has setBoost = 0. As you an see from
> the explanations below, the total weight of the matching terms from the
> 'literal' field is 0. However, the weights produced by the matching terms in
> the 'contents' field is very different, even with the same matching terms.
> The reason is that the 'queryNorm' value is very different because the
> 'sumOfSquaredWeights' is very different. Why is this?
>
> First query: +(+contents:sales +contents:representative) +literals:jb$1
> Explanation:
> 32.274593 sum of:
> 32.274593 sum of:
> 10.336284 weight(contents:sales in 14578), product of:
> 0.54963183 queryWeight(contents:sales), product of:
> 2.6595461 idf(contents: sales=83179)
> 0.20666377 queryNorm
> 18.805832 fieldWeight(contents:sales in 14578), product of:
> 7.071068 btq, product of:
> 1.4142135 tf(phraseFreq=3.0)
> 5.0 scorePayload(...)
> 2.6595461 idf(contents: sales=83179)
> 1.0 fieldNorm(field=contents, doc=14578)
> 21.93831 weight(contents:representative in 14578), product of:
> 0.8007395 queryWeight(contents:representative), product of:
> 3.8746004 idf(contents: representative=24678)
> 0.20666377 queryNorm
> 27.397562 fieldWeight(contents:representative in 14578), product of:
> 7.071068 btq, product of:
> 1.4142135 tf(phraseFreq=2.0)
> 5.0 scorePayload(...)
> 3.8746004 idf(contents: representative=24678)
> 1.0 fieldNorm(field=contents, doc=14578)
> 0.0 weight(literals:jb$1 in 14578), product of:
> 0.23816177 queryWeight(literals:jb$1), product of:
> 1.1524118 idf(docFreq=375455, numDocs=436917)
> 0.20666377 queryNorm
> 0.0 fieldWeight(literals:jb$1 in 14578), product of:
> 1.0 tf(termFreq(literals:jb$1)=1)
> 1.1524118 idf(docFreq=375455, numDocs=436917)
> 0.0 fieldNorm(field=literals, doc=14578)
>
>
> Second query: +(+contents:sales +contents:representative) +(literals:jb$1
> literals:jb$9999)
> Explanation:
> 10.550879 sum of:
> 10.550879 sum of:
> 3.3790317 weight(contents:sales in 14578), product of:
> 0.17967999 queryWeight(contents:sales), product of:
> 2.6595461 idf(contents: sales=83179)
> 0.0675604 queryNorm
> 18.805832 fieldWeight(contents:sales in 14578), product of:
> 7.071068 btq, product of:
> 1.4142135 tf(phraseFreq=3.0)
> 5.0 scorePayload(...)
> 2.6595461 idf(contents: sales=83179)
> 1.0 fieldNorm(field=contents, doc=14578)
> 7.171847 weight(contents:representative in 14578), product of:
> 0.26176953 queryWeight(contents:representative), product of:
> 3.8746004 idf(contents: representative=24678)
> 0.0675604 queryNorm
> 27.397562 fieldWeight(contents:representative in 14578), product of:
> 7.071068 btq, product of:
> 1.4142135 tf(phraseFreq=2.0)
> 5.0 scorePayload(...)
> 3.8746004 idf(contents: representative=24678)
> 1.0 fieldNorm(field=contents, doc=14578)
> 0.0 product of:
> 0.0 sum of:
> 0.0 weight(literals:jb$1 in 14578), product of:
> 0.0778574 queryWeight(literals:jb$1), product of:
> 1.1524118 idf(docFreq=375455, numDocs=436917)
> 0.0675604 queryNorm
> 0.0 fieldWeight(literals:jb$1 in 14578), product of:
> 1.0 tf(termFreq(literals:jb$1)=1)
> 1.1524118 idf(docFreq=375455, numDocs=436917)
> 0.0 fieldNorm(field=literals, doc=14578)
> 0.5 coord(1/2)
>
>
>
>
>
> Peter
>